The following 114 words could not be found in the dictionary of 615 words (including 615 LocalSpellingWords) and are highlighted below:

906314e   amount   an   and   assess   at   balls   better   between   bioinfogp   black   bound   But   but   by   calculating   chance   cnb   compare   comparing   condition   csic   diagram   Diagrams   diagrams   distribution   drawn   else   Example   experiments   explanation   factor   For   for   from   function   gene   genes   given   how   hypergeometric   identified   identify   if   in   In   independent   index   indicates   instance   intersection   involves   language   Let   likelyhood   look   lower   making   Making   nice   Now   number   of   often   one   One   overlap   pangloss   phyper   promoters   Protocols   recently   regulated   replacement   sample   say   see   seeing   seidel   sets   significant   simply   size   someone   tail   that   The   the   this   those   to   tools   total   transcription   two   under   up   Urn   urn   use   utility   value   Venn   venn   venny   want   way   What   white   with   without   wonder   wrote   you  

Clear message

In making Venn diagrams to look at overlap of sets, I often wonder how significant a given amount of overlap is. What is the likelyhood of seeing a given amount of overlap from two sets, simply by chance?

One way to assess this, is to use the hypergeometric distribution. The R language has a nice function for calculating the p-value, but the explanation of how to use it involves an Urn of black and white balls.

phyper(q,m,n,k,lower.tail=F)

q = the number of white balls drawn from the urn (without replacement)

m = the number of white balls in the urn

n = the number of black balls in the urn

k = the number of balls drawn from the urn (sample size)

Example comparing gene sets

Let's say you want to compare sets of genes identified in two independent experiments. For instance, in experiment one, you identify 1000 genes up regulated under a given condition. In experiment two you identify 2872 genes with promoters bound by a transcription factor. Now you want to compare the two experiments to see if the up-regulated genes are also those bound by the transcription factor. A venn diagram between the experiments indicates that the two sets (1000 up-regulated genes, and 2872 TF bound genes) have and intersection of 448. Is this significant? The total number of genes in the experiment is 14,800.

q = 448

m = 1000

n = 14800 - 1000

k = 2872

1 - phyper(448,1000,13800,2872)
[1] 1.906314e-81

Making Venn Diagrams

* I wrote a utility for making venn diagrams: venn diagrams

* But someone else wrote a better one recently: venny

VennSignificance (last edited 2011-08-29 17:54:13 by ChrisSeidel)