(In case you missed it, this originally appeared last Thursday as a guest blog on the Writer's Forensics Blog.)
There are roughly three billion pairs of nucleotides in human DNA, and the vast majority is identical across individuals. When we talk about “genetic fingerprinting,” we really mean, “looking for a needle in a hay stack.” Luckily, for the most part, we all differ at the same loci. Over the years, the techniques used for DNA typing have improved greatly, diminishing both costs and the likelihood of errors. These days, most forensic laboratories use commercial kits to type specific regions of the DNA that are known to vary across the population. Here in the US, the standard for DNA fingerprinting is to type 13 loci called short tandem repeats (STRs), regions that are 4 or 5 nucleotides long. The likelihood of two individuals having all 13 loci identical is so low that we can deem it virtually impossible (with the exception of identical twins, of course).
Using PCR-based technology (which creates many clone sequences out of a small sample), the commercial kits can rapidly determine the 13 STR alleles even from old, partly degraded DNA. These alleles are then run through CODIS, the DNA database maintained by the FBI, and if the genetic profile is already in the system, a match can be determined.
However, there’s a catch, and it’s called microvariant. From time to time, an individual will have a mutation that is so uncommon it’s never been observed before. The commercial kits are made to recognize specific variants that have already been documented, so when the DNA with the rare mutation is analyzed, the kit will not be able to recognize it. This can potentially lead to mislabeling.
How can we buld a reliable library of STR alleles that faithfully represents the whole population? Until a few years ago, the two sequencing methods available — the chain-termination method (Sanger et al., 1975), and pyrosequencing (Ronaghi et al., 1996) — yielded tens of sequences at the time. The breakthrough came in 2005, when 454 Life Sciences, a biotechnology company based in Branford, CT, invented a new fiber-optic chip that allowed the typing of tens of thousands of DNA sequences [1]. The new method is called 454 sequencing or ultra-deep sequencing.
For those of us working in HIV research, this was a breakthrough. Since we had already shown that only a handful of viruses are transferred during a sexual transmission, deep sequencing allowed us to type the genome of those transmitted viruses, shedding new light on vaccine design.
But what about forensic analyses?
Researchers from Denmark used deep sequencing to analyze five STR loci and found rare base mutations and repeat variations that would have not been found using conventional methods [2]. As mentioned before, in order to reduce typing errors, it’s important to find these variants and incorporate them in the commercially available typing kits. Here in the US, a similar analysis is ongoing at the Forensic Science Program of the Western Carolina University. The goal of the study, led by Professor Mark Wilson, is to understand how deep sequencing can uncover minor variants and hence minimize the rate of inconclusive results from genetic fingerprinting analyses.
In conclusion, just like its name implies, deep sequencing can give us a new depth to DNA sequencing, unveiling new, previously unknown alleles in the population.
[1] Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, et al. (2005). Genome sequencing in microfabricated high-density picolitre reactors. Nature, 437 (7057), 376-80 PMID: 16056220
[2] Fordyce SL, Ávila-Arcos MC, Rockenbauer E, Børsting C, Frank-Hansen R, Petersen FT, Willerslev E, Hansen AJ, Morling N, & Gilbert MT (2011). High-throughput sequencing of core STR loci for forensic genetic investigations using the Roche Genome Sequencer FLX platform. BioTechniques, 51 (2), 127-33 PMID: 21806557
Photo: red glass in iron cast. Focal length 85mm, shutter speed 1/50, F-stop 5.6, ISO 100. I know, it's a weird picture... It vaguely reminded me of a jumbled fiber-optic chip... very vaguely, though...