Debunking myths on genetics and DNA

Thursday, November 10, 2011

What shall we play today? How about a protein folding game?


So, after hearing all the talking about it, I finally did it. I checked out Foldit, the online scientific discovery game. I'm sure you've all heard of it, and many of you may have even played with it -- if so, please share your experience in the comments because I'd love to hear about it! Developed by researchers at the University of Washington, the beta version of Foldit was released in 2008. Players compete to solve protein structures that would otherwise require an enormous amount of computing time and resources.

THE BIOLOGY PROBLEM. Proteins are formed through the translation of RNA into a sequence of amino acids. As such, the protein is nothing more than a string of 20 letters (each letter represents one amino acid), just like DNA is a string of 4 letters. However, once formed, proteins coil and fold up into a 3-D structure that not only varies from protein to protein, but it also determines the protein's ability to bond (or not) to other molecules, thus affecting the way it performs its function. Solving the 3-D structure of a protein is a very complicated business, mostly because it's a problem with an enormous amount of variables: the folding structure is determined by the way the side chains of the amino acids interact with one another, by their electric charges and energy potential, and other factors such as temperature, pH, the presence of molecules able to "aid" the folding process, the formation of hydrogen bonds, etc*. Some proteins are essentially rigid, others undergo "conformational" changes as they perform their functions.

Because much of a protein's functionality depends on its shape, we can't understand how it works unless we completely understand its folding structure. The crystal structure of gp120 for example, the protein HIV uses for viral entry into target cells, was completed in 1998 [1] and this allowed us to understand how the virus docks with T-cells and its ability to dodge immune responses. This protein has regions that are highly variable (mutations and changes in the DNA arise constantly), enabling the virus to "hide" from the host's immune system. However, the bit of the protein that binds to the CD4 receptor (to initiate viral entry), is highly conserved and well hidden between the coils and folds of the protein itself. When the virus nears the target cell, gp120 undergoes a "conformational change": the docking bit "comes out of its hiding spot" and binds to the receptor. In other words, it changes its shape so it can "fit" into the receptor and initiate the infection.

THE COMPUTATIONAL PROBLEM. Mathematically, in order to solve a 3D structure, you have to give a set of spatial coordinates (x,y,z) -- the "unknowns." The number of unknowns in the system is the "degrees of freedom," and in order to solve the system, you need one equation per unknown. A complete protein structure is encoded in a file like this one. Jargon aside, notice the lines that start with "ATOM": what you have there are the (x,y,z) coordinates (in Angstroms) of each atom. Now, human proteins are quite big: hundreds, sometimes thousands, of amino acids -- go figure how many atoms! On top of that, you have to consider all the additional variables I mentioned above -- pH, temperature factor, chemistry, energy potential, etc. You can see how the number of degrees of freedom and additional constraints can easily escalate to astronomical. Even the fastest computer will take a very long time to generate all the information needed to uniquely determine the structure of the protein.

To optimize the run time of computationally intensive programs, the problem can be broken down in several parallel jobs that run across different machines, a process called distributed computing. That's the principle behind Rosetta@home, a program that anybody can download, and that will use the idle time on the volunteer's computer to run portions of the huge protein folding algorithm.

But what about the game? As you can imagine, folding a protein requires a lot of machine learning and CPU time, whereas human brains have a natural knack for complicated visual tasks like spotting patterns. As I checked out the basic rules of Foldit, I realized that they sounded simple enough for a human brain to understand, but are extremely complicated to program in a machine: (a) you need to make sure that your protein is packed in its structure, with no empty spaces in between; (b) proteins have a hydrophobic part, which stays away from water, and a hydrophilic part, which instead can touch water molecules. Since proteins move in water all the time, you need to make sure that the hydrophobic part is packed inside the protein and that the hydrophilic part surrounds it completely; (c) respect the space constraints, in other words, no two atoms can be in the same position at the same time.
Things like minimizing the energy gradient are built-in tools.

THE GOAL. For now the main goal of Foldit is to see whether humans are better than machines at "guessing" protein structures. If this turns out to be the case, the next goal will be to have gamers predict unknown protein structures and also create new synthetic proteins for drug design. In June 2009 Foldit introduced a new feature called "recipes," which basically are algorithms and strategies that players implement themselves and, if they choose to, they can save them and share them. In a recent PNAS paper [2], Khatib et al. analyzed the recipes uploaded by Foldit players, their use, and their success. While no single recipe allowed for the achievement of a structure without human intervention, players strategized the use of many different ones in different parts of the game to solve specific within-game tasks.

This introduces a new aspect of the game, which I found quite intriguing -- social interactions: apparently the use of recipes across the population of players spreads mostly by word of mouth, and successful recipes are implemented, copied, and varied constantly by multiple players as their use increases. (The social evolution of these algorithms could probably constitute a quite interesting research study on its own.) One of the most popular recipes, Blue Fuse, was found to be strikingly similar to the Fast Relax algorithm developed by Foldit scientists, indicating that the social evolution of these algorithms can lead to independent discovery of optimal strategies. As the authors conclude,
"Benchmark calculations show that the new algorithm independently discovered by scientists and by Foldit players outperforms previously published methods. Thus, online scientific game frameworks have the potential not only to solve hard scientific problems, but also to discover and formalize effective new strategies and algorithms."
Does anybody need further evidence to prove that cooperation paves the way to success?

*There are also post-translational modifications called glycosylation and phosphrylation, enzymatic processes that attach additional molecules to the protein.

Edited to include a wonderful comment from Antisocialbutterflie, who works on protein crystallography:
Since you like photography I'll sum it up in that metaphor. Crystallography basically acts in the same way as a camera. In a camera the light reflects off an object and the image is scattered (or in some diagrams shown flipped upside down) by the first lens. The wavelength of visible light is too long to see things on an atomic scale (100s of nm versus seeing things on the angstrom level). We replace the light with x-rays and the crystal acts as both the object we are seeing and the first lens.

In photography there is a second lens that recollects the scattered light to reconstitute the image and makes it look bigger depending on the focal length. We don't have one of those for X-rays so that's where the math comes in. Each point of scatter indicates a slice of space in 3D and how intense it is tells us how many electrons there are in that spot. The math gets a little crazy and there is a whole aspect called the phase problem (which is one of the things that the program addresses) that we'll skip over, but for the purposes of this explanation we've solved it and what you get isn't positions of atoms per se but a cloud of electron density. We have to take that cloud and build where we think the atoms are, taking into account what we understand about chemistry. The higher the resolution the less ambiguous the solution is, but most high impact papers are low resolution so the "structure" is really a best guess model from the information we have. Wow that got really long.

FoldIt is a best case scenario where the algorithm is the human brain and all of its spacial reasoning power. The current means of predicting protein structure is sketchy at best. The FoldIt guys have one of the best ones right now but applying the stuff they learned from the study is like trying to teach a robot how to describe the color blue. It's still a long way off. There are methods in existence to try and detect how a protein would change in the face of a particularly mutation but frankly the very nature of protein folding is one of the big existential questions that isn't going to be answered any time soon.

I'll admit that the game was pretty cool though I think it would be cooler for someone who isn't staring at the exact same thing on a daily basis.

[1] Kwong PD, Wyatt R, Robinson J, Sweet RW, Sodroski J, & Hendrickson WA (1998). Structure of an HIV gp120 envelope glycoprotein in complex with the CD4 receptor and a neutralizing human antibody. Nature, 393 (6686), 648-59 PMID: 9641677

[2] Khatib, F., Cooper, S., Tyka, M., Xu, K., Makedon, I., Popovic, Z., Baker, D., & Players, F. (2011). Algorithm discovery by protein folding game players Proceedings of the National Academy of Sciences DOI: 10.1073/pnas.1115898108

ResearchBlogging.org

6 comments:

  1. Thanks! I had actually not heard of FoldIt. I'll have to check it out.

    ReplyDelete
  2. Hey, if you do start playing, don't forget to come let me know! :)

    ReplyDelete
  3. I started playing FoldIt but than I realized I was doing my job in my free time (crystallographer here). FYI we rarely solve structures from primary amino acid sequence for all of the reasons you describe. This is why we don't go that route. Crystallography is more like photography with computers and math as the second focal lens.

    ReplyDelete
  4. Hi there! You can tell I'm a theoretician, right?? :) So, correct me if I'm wrong: when you do crystallography, you do a sort of X-ray of the protein and that basically gives you the (x,y,z)'s for every atom, right? But then the Foldit game is really a way of getting the mathematical model, at least the way I understand it. My understanding is that the game so far is a way to theoretically solve structures that have already been obtained through crystallography, but if this turns out to be successful, then in the future we will be able to use these algorithm to mathematically predict the structure of the protein... Correct?

    Also, from what I see happening with HIV, I'm guessing the utility will also be to predict how certain mutations in the DNA affect the folding of the protein.

    How did you like the game?

    Thanks for your input!

    ReplyDelete
  5. I will have to go with sort of right. Since you like photography I'll sum it up in that metaphor. Crystallography basically acts in the same way as a camera. In a camera the light reflects off an object and the image is scattered (or in some diagrams shown flipped upside down) by the first lens. The wavelength of visible light is too long to see things on an atomic scale (100s of nm versus seeing things on the angstrom level). We replace the light with x-rays and the crystal acts as both the object we are seeing and the first lens.

    In photography there is a second lens that recollects the scattered light to reconstitute the image and makes it look bigger depending on the focal length. We don't have one of those for X-rays so that's where the math comes in. Each point of scatter indicates a slice of space in 3D and how intense it is tells us how many electrons there are in that spot. The math gets a little crazy and there is a whole aspect called the phase problem (which is one of the things that the program addresses) that we'll skip over, but for the purposes of this explanation we've solved it and what you get isn't positions of atoms per se but a cloud of electron density. We have to take that cloud and build where we think the atoms are, taking into account what we understand about chemistry. The higher the resolution the less ambiguous the solution is, but most high impact papers are low resolution so the "structure" is really a best guess model from the information we have. Wow that got really long.

    FoldIt is a best case scenario where the algorithm is the human brain and all of its spacial reasoning power. The current means of predicting protein structure is sketchy at best. The FoldIt guys have one of the best ones right now but applying the stuff they learned from the study is like trying to teach a robot how to describe the color blue. It's still a long way off. There are methods in existence to try and detect how a protein would change in the face of a particularly mutation but frankly the very nature of protein folding is one of the big existential questions that isn't going to be answered any time soon.

    I'll admit that the game was pretty cool though I think it would be cooler for someone who isn't staring at the exact same thing on a daily basis.

    ReplyDelete
  6. That's really great, thanks for clarifying the process!

    ReplyDelete

Comments are moderated. Comments with spam links will be deleted and never published. So, if your intention is to leave a comment just to post a bogus link, please spare your time and mine. To all others: thank you for leaving a comment, I will respond as soon as possible.