Last week I started discussing the exciting news about the six ENCODE papers published in the Nature September 6 issue. If you haven't already, I highly recommend reading the review ENCODE explained , which has a nice summary of the papers and an excellent perspective on what these results mean.
One paragraph in particular is worth quoting:
"The authors report that the space between genes is filled with enhancers (regulatory DNA elements), promoters (the sites at which DNA’s transcription into RNA is initiated) and numerous previously overlooked regions that encode RNA transcripts that are not translated into proteins but might have regulatory roles. Of note, these results show that many DNA variants previously correlated with certain diseases lie within or very near non-coding functional DNA elements, providing new leads for linking genetic variation and disease ."To review what promoters and enhancers are you can take a look at this older post.
I can't stress enough how relevant these findings are. Previously, genes were thought to be the minimal "coding" unit, so much so that the rest of the genome had been dubbed "junk DNA" (and by now you should know how much I hate that unfortunate expression!). In , Djebali et al. report that
"about 75% of the genome is transcribed at some point in some cells, and that genes are highly interlaced with overlapping transcripts that are synthesized from both DNA strands ."
"The consequent reduction in the length of ‘intergenic regions’ leads to a significant overlapping of neighbouring gene regions and prompts a redefinition of a gene ."Djebali et al. looked at RNA isolates in the whole cell, nucleus and cytosol of 15 different cell lines. They found novel exons, novel splice junctions and sites, and novel transcripts. Many of these elements are in intergenic regions, and many are antisense. They also investigated which of these newly found elements show evidence of protein expression.
When they looked at expression patterns specific to cell lines, they found that gene expression levels were similar across cell lines. The majority of protein-coding genes were expressed across all cell lines, and only a minority (~7%) was specific to certain cell lines. On the other hand, the researchers found many long non-coding RNAs that were largely cell-line specific, while only 10% was expressed across all cell lines. I found this bit to be quite intriguing, as it seems to point that RNAs have a large role in controlling gene expression across cell lines.
Overall, their findings yield an increase overlap in what they call "genic regions". What were previously thought to be "deserts" between genes, aren't so deserted after all, rather, populated by lots and lots of regulatory elements. In their final discussion, Djebali et al. conclude
"The likely continued reduction in the lengths of intergenic regions will steadily lead to the overlap of most genes previously assumed to be distinct genetic loci. This supports and is consistent with earlier observations of a highly interleaved transcribed genome, but more importantly, prompts the reconsideration of the definition of a gene."
 Joseph R. Ecker, Wendy A. Bickmore, Inês Barroso, Jonathan K. Pritchard, Yoav Gilad, & & Eran Segal (2012). Genomics: ENCODE explained Nature DOI: 10.1038/489052a
 Sarah Djebali, Carrie A. Davis, Angelika Merkel, Alex Dobin,, Timo Lassmann, Ali Mortazavi, Andrea Tanzer, Julien Lagarde, Wei Lin, Felix Schlesinger, & et al. (2012). Landscape of transcription in human cells Nature DOI: 10.1038/nature11233