Sunday, August 26, 2007

Optimization, RNAi, and Lessons Learned from the Summer

I had met Milan Chheda at an outreach event a few weeks earlier, and my final day at the Broad started with a tour of his workspace. Milan, a neurologist, works in William Hahn's Lab, which focuses on how human cells transform into cancer cells. Milan's work is to optimize a technique that is currently applied to study the development of glioblastomas, a particularly virulent type of brain tumor.

The technique in question uses RNA interference to knock down the expression of certain genes. Using a lentivirus, the researcher introduces a hairpin structure to candidate nerve cells to reduce the expression of a specific gene in vitro. After culturing the nerve cells that have received this structure, the researcher typically uses a microscope to check for glioblastomas or other types of cells that may form.

What does it mean to optimize this technique? Milan is part of an effort to create methods that would enable RNAi experiments to scale up the way the Broad's sequencing center has industrialized gene sequencing. In one example, members of his group are working to use software that automatically checks microscope images for glioblastomas and other features. By identifying trouble spots in the experimental pipeline, the hope is that these experiments can be conducted at a larger scale, resulting in reliable and larger quantities of data to analyze.

I left the Broad with a greater appreciation of the interplay between data gathering and data analysis. Writing about my discussions with other researchers greatly enhanced my ability to gain this appreciation. With this in mind, expect further updates to this blog once I return to Berkeley.

Wednesday, August 8, 2007

Science on Wednesday

During junior high, I attended a weekly lecture series at the Princeton Plasma Physics Laboratory designed to introduce students to science research. The program was called Science on Saturday, and it featured scientists in fields as diverse as cosmology and forensics talking about their work to a largely non-technical audience that consisted of students and their parents.

The Broad Institute had a similar program this summer called Midsummer Nights' Science, an apt name given this year's Shakespeare on the Common production. For the four Wednesdays following Independence Day, scientists at the Broad would describe their work to the greater Boston community. Each Wednesday featured a different researcher describing his or her work. While the projects and interests described each week were quite different, all of them implicitly promoted the idea that large databases of data can enable new kinds of research.

The first talk featured David Reich, who discussed how he and his colleague conducted a comparative analysis of the DNA of humans, chimpanzees, and gorillas, which have led them to a new model for the evolution of these species from a common ancestor. The way I first learned about evolution was that it starts when two groups of the same species are physically isolated from one another. Then, under appropriate environmental conditions, the two groups would eventually evolve into different species, after which any hybrids between these two species would be less fertile and die out. This is called allopatric speciation. If this is true, then one can model the DNA sequences as following a branching process, so the evolution of species would look like a tree, where each fork in the tree indicates one species dividing into two. Reich and his colleagues discovered that this model is not a great one to describe the evolution of humans, chimps, and gorillas. In fact, if one constructs a phylogenetic tree for these three species using DNA from one section of the genome, one tree emerges indicating that the most recent split was between humans and chimps split, but if the same analysis is performed using a sequence from another section, which comprises between a fifth and a third of the genome, a different tree emerges, indicating the most recent split was between humans and gorillas. An alternate hypothesis the group proposed was that hybridization among these species took place, and by a careful analysis of the sequence data they had, they were able to confirm this was a better model by which to describe the speciation of these three species. Indeed, the study probably would not have been possible without all the DNA sequence information available for these three species.

During the following week, Pardis Sabeti explained how the HapMap project, another data gathering effort, is enabling researchers to determine the role natural selection has played in humans and pathogens. The HapMap project collects DNA samples from different populations around the world. The samples of DNA they collect account for 90% of the genetic variation among humans. These samples are divided into haplotypes, which represent sections of DNA that are inherited as a group. If there is no selective pressure on an organism, one would expect the prevalence of a particular haplotype to decay as its size gets larger. By similar reasoning, if a larger haplotype is highly prevalent in a population, then there is evidence that the corresponding section of DNA is under selective pressure. Sabeti explained how this has allowed researchers to track lactose tolerance in European populations, who domesticated cattle relatively early, and link the sickle cell trait to malaria resistance. Once again, the availability of this data enabled such an analysis.

The third talk, given by Todd Golub, was about cancer research in the era of genomics. He started the talk by describing two patients, both the same age, both diagnosed with the same type of leukemia in a similar stage of progression, and both given similar doses chemotherapy. However, Patient A lived and Patient B did not. Golub then explained how the mutations that had occurred at the genomic level for these patients were actually quite different, and if one were to look at patient survival by isolating these two different mutations, the group with the same mutation as Patient A had a survival rate much closer to 1 and those treated with the same mutation as Patient B had a survival rate close to 0 within a few years of diagnosis. Golub then went on to describe a treatment that had been customized to target the mutation in groups with Patient B's mutation. The result led to Gleevec, a drug now available to patients with this version of the disease. Since it's introduction, patients diagnosed with this specific mutation have had a 100% survival rate with minimal side effects from the medication. Golub appeared optimistic that similar treatments could be developed for most of these mutations that results in cancer.

Unlike the the preceeding talks, the final speaker barely mentioned genomics in his talk. Vamsi Mootha described mitochondria and his group's research efforts on understanding them. Mitochondria are found inside the cell and produce much of the energy a cell uses. Unlike most other organelles, mitochondria contain their own DNA. However, proteins found in mitochondria are mix of genes derived from within the mitochondrial DNA and the cell's nuclear DNA. It turns out that metabolic diseases are closely related to problems with mitochondrial function, which are apparent in changes to their protein composition. Mootha's group is building an atlas of the protein content of mitochondria in different parts of the body. Their hope is that this data will enable researchers to characterize the specific problems associated with certain metabolic diseases. In this instance, the hope of a future payoff inspired his group to procure a large data set.

Midsummer Nights' Science showcased how biological and medical research have benefited and can continue to do so when certain kinds of data are available in significant quantities. Hopefully this message reached the students who attended and will capture the imagination of those who decide to pursue research in the future.

Wednesday, August 1, 2007

Cultural Learnings

When I told a friend I would be interning this summer, he was surprised.

"Why are you doing an internship?" he asked.

"The idea," I responded, "is to get introduced to a new environment, so I return to grad school with a broader perspective."

"Sounds like Borat."

Like a foreign correspondent reporting to his home country, I gave an informal talk to the Stochastic Systems Group about my summer project. The resulting feedback helped me improve my results this summer. However, once the problem was described, there were a lot of similarities with problems familiar to the group. It was hardly Borat.

That said, there are practices at the Broad outside of my work that I would be surprised to see in my own research community. Perhaps the most surprising thing I have discovered is that people are willing to share their ongoing research with people at the Broad. Weekly seminars feature researchers from outside the Broad discussing their as yet unpublished work. Broadies see data that has yet to be made public. I was particularly surprised by this since there is some controversy that Watson and Crick's paper about the structure of DNA used unpublished data from Rosalind Franklin.

There is a catch. Attendees of the seminar must agree not to work on anything they pick up during the course of the presentation. This understanding and the honor system are what make people comfortable enough to discuss work they might otherwise keep private.

The presentations may also be a way to start collaborations. In a field driven by data, if someone provides the data for a figure on a paper, that person frequently becomes an author, even if the idea of the paper came from others. Thus, advertising results before they are published might allow other researchers to avoiding running the same experiments.

A consequence of this practice is that one rarely finds single authored papers and often finds papers with four or more authors. How does one delineate the contributions of each author? Author ordering may only give a coarse indication of an individual's contribution. An existing solution in some journals is to include an author contributions section. This section typically follows the acknowledgments and may read some like the following:
S.B.C. conceived and designed the experiments. B.S. conducted the experiments. S.B.C. and B.S. performed the analysis. S.B.C. and B.S. wrote the manuscript.
What happens if the work is primarily by two authors? The practice described to me for these instances is called co-first authorship. To do this, one simply places an asterisk next to each author's name with a footnote that reads: "These authors contributed equally to the work."

While some biologists I spoke to joked about some of these practices (one described how an author contributions section might read if each individual's contribution were described honestly), almost all of them were comfortable with the idea that providing data is a legitimate way to become an author on a paper. The same might not be true for my community, but I wonder if any of these practices would transfer well.