"Every genome is the result of a mostly shared, but partly unique, 3.8-billion-year evolutionary journey from the origin of life. Diversity is created mostly by copy errors during replication."The above is taken from a review in the latest issue of Science  that summarizes the progress made in the field of computational genomics since the first sequences obtained back in the mid-seventies. I highly recommend reading the review. Here, I'd like to highlight a few relevant points.
Zerbino, Paten, and Haussler summarize nicely the different types of DNA edits that over those 3.8 billion years have brought us the genetic diversity we observe today. Replication copy errors give rise to single-base changes that can get fixed in the entire population (substitution) or can be present in only part of the population (single-nucleotide polymorphisms). Multiple sequential bases can be duplicated or erased, in which case we talk about indels. Rearrangements can occur, leading to changes in gene copies or even chromosome numbers.
There's so much more to a DNA sequence than just a string of four letters. Genes are not fully understood until you look at their history throughout evolution and throughout the single individual's life, their regulatory mechanisms, their interactions with other genes (epistasis), their epigenetic pathways, their function, etc. With this in mind, computational genomics has the arduous task of not only efficiently store and retrieve the enormous amounts of data, but also build models that encompass epigenetic mechanisms, metabolic pathways, and gene regulatory networks.
"Combining evolutionary, mechanistic, and functional models, computational genomics interprets genomic data along three dimensions. A gene is simultaneously a DNA sequence evolving in time (history), a piece of chromatin that interacts with other molecules (mechanism), and, as a gene product, an actor in pathways of activity within the cell that affect the organism (function). [. . .] Beyond the basics of storing, indexing, and searching the world's genomes, the three fundamental, interrelated challenges of computational genomics are to explain genome evolution, model molecular phenotypes as a consequence of genotype, and predict organismal phenotype."Genomic evolution is studied using phylogenetic analyses. This presents its challenges, starting from finding optimal ways to align the sequences: in order to compare different sequences, one has to make sure that there is a one-to-one correspondence between each base in each sequence, as shown in the figure below.
"When applied to more than two species or to multiple gene copies within a species, phylogenetic methods provide an explicit order of gene descent through shared ancestry. [. . .] Finding the optimal phylogeny under probabilistic or parsimony models of substitutions (and also of indels) is NP-hard, and considerable effort has been devoted to obtaining efficient and accurate heuristic solutions."Right now algorithms that compute phylogenetic trees are computationally intensive and take a long time to run. As the sequencing technology advances and it's possible to sequence more data, larger regions, and in a more efficient way, the challenge is in making also the phylogenetic analyses more computationally efficient.
The next big challenge computational genomics embraces is predicting causal variants. Whole genome studies have to take to account population stratification due to the fact that we are a relatively young species and, as such, all related. New databases are emerging in order to provide epigenetic context and data, RNA expression, and protein levels. All this needs to be folded in in order to make causal predictions from genotype to phenotype.
The coming together of all this information will benefit medical research on multiple levels. Since nearly all cancers are caused by genetic modifications, computational genomics will help us understand cancer therapeutics and tumorigenesis. Stem cell research will also benefit from progress made in computational genomics as it involves the full understanding of variants and their effects not just on the genome, but also on the epigenome and gene expression.
"To face the challenges of obtaining the maximum information from every sequencing experiment, we must borrow advances from a spectrum of different research fields and tie them together into foundational mathematical models implemented with numerical methods. There is a tension between the comprehensiveness of models and their computational efficiency. [. . .] As a common language develops, shaped by our increasing knowledge of biology, we anticipate that computational genomics will provide enhanced ability to explore and exploit the genome structures and processes that lie at the heart of life." Zerbino, D., Paten, B., & Haussler, D. (2012). Integrating Genomes Science, 336 (6078), 179-182 DOI: 10.1126/science.1216830