Shoulders and Giants

Saturday, June 28, 2008

Publications

S. Jagadisan was an English professor at Madras Presidency College. When he called me up to locate a book he was trying to find several years ago, I was more than happy to oblige. After all, he was my grandfather. A King's Story was the first of many books he asked me to locate, and on trips to India, Professor Jagadisan would turn either my sister or me into his dutiful assistants, who would type up articles he had written.

During my cousin's wedding a few years ago, I had a chance to talk to my grandfather about being an academic in India. In what ways did his experiences compare to mine? Was there a research component to his work?

While there were attempts to start a few peer-reviewed journals, my grandfather answered, for the most part, the emphasis was on teaching.

The answer failed to include Professor Jagadisan's publications in other types of media. Some of these came in the form of books and many more could be found in the form of articles for The Hindu, India's national newspaper.

On a whim and with the help of Google, I set out this morning to find as many of these articles as I could. The collection I've amassed runs the gamut from literature to religion. The articles about English literature range from Shakespeare to the Berenstain Bears. There is an article about Pudukottai, his home town, and another about James Russell Macphail, most likely his former professor. One muses about religion and another discusses religious tolerance.

An anything but comprehensive list is included below:

Saturday, March 22, 2008

What's Your Problem?

The first time I remember asking about another graduate student's research was during a meeting with Bobak on June 25, 2004. It was Bobak's birthday, and my job was to distract him while others were setting up his surprise party. While I still have a copy of Bobak's LaTeXed writeup from that day, it didn't appear that talking to other students about their research would become a common occurrence.

Things changed once we moved into the Wong Center. In fact, during our first year there, Paolo and I developed a habit of sharing interesting technical problems that were coming out of our research on an almost daily basis. While I like to believe this frequent interaction led us to solutions faster, the real reason I did it was simple. Another person's problems offered me a welcome break from my routine, especially on days when I wasn't gaining any traction on my own problems.

As student interaction became more common, conversations started to shift away from research into areas that ranged from politics to musical preferences. While sharing our problems has declined (at least of the research variety), brain teasers have been on the rise. Many of these have been initiated by Prasad, so I thought I'd share a personal favorite from his collection.

Alice and Bob each roll a six-sided die. Each of them can only see the outcome of the other's roll. Without communicating with one another, Alice and Bob will each win a dollar if both of them correctly guess the outcome of their own rolls. If either Alice or Bob guesses incorrectly, neither wins anything. Is there a way for them to win with a probability of at least 1/6? The answer is yes, which they do by guessing the other person's die roll as their own. Note that if each of them simply guessed at random without looking at the other person's die, they would only win with probability 1/36.

Now suppose 1000 people each roll a six-sided die and observe the outcome of everyone else's roll. Without communicating with one another, they each win a dollar if all of them correctly guess the outcome of their own rolls. If any of them guesses incorrectly, none of them wins anything. Is there a way for them to win with a probability of at least 1/6?

Sunday, December 16, 2007

Closed Form Expressions

This summer, I had a conversation with Baris in which I claimed ignorance about the meaning of a closed form solution. One may say an expression is in closed form if one can intuit its behavior simply by looking at it; however, this is both imprecise and subjective. Baris suggested an alternate definition that went along the following lines:

closed form solution. n. an equation that can be evaluated by a scientific calculator.

I was happier with this definition, but it was still a little unsatisfying mainly because the capabilities of scientific calculators have improved significantly over the years. What if a scientific calculator can solve differential equations or an LP? I can't intuit these solutions. Even if we accept such a definition, we would likely reject a paper titled "Closed Form Solution for [insert problem here] via Improved Scientific Calculator" for publication... or would we?

While some may disagree, computers have proved useful in helping information theorists develop intuition. For instance, Permuter et al.'s "Capacity of the Trapdoor Channel via Feedback" proves the capacity after using a computer to conjecture the solution. At a seminar this summer, Stephen P. Boyd advocated for more information theorists to adopt this pragmatic approach to problem solving.

Recently, Michael suggested MATLAB to help me gain insights about a research problem. My own intuitions were a little jumbled, and a few plots in MATLAB seemed like the best path to clarity.

Not everyone is a fan of MATLAB. Prasad told me he prefers a combination of C and gnuplot. Others are purer still. Someone scoffed at the idea that as an information theorist, I needed to resort to a computer for help.

Published in 1945, Vannevar Bush's "As We May Think" predicts a future device called the memex (memory extension), a device in many ways reminiscent of the modern computer that enables the easy storage and retrieval of one's books, communications, etc. If I can use a memex to organize my records, then I have no problem using an intuitex to organize my thoughts.

Monday, September 24, 2007

Reading the Classics

When "Reading the Classics" was first offered two years ago, the course announcement started by mentioning the Iliad. This year the facetious introduction was absent:

There are several papers in the history of science that seem to have been written with the express purpose of changing (or creating) a field. Their authors were often young, and the writing is almost always self-conscious --- and usually a joy to read.

In this seminar we shall read several such papers from computer science and abutting fields.

The seminar has been fundamentally different from other courses or reading groups in which I've participated. Although we've been reading papers by Turing, Feynman, Nash, and Shannon, we have already encountered many of the results or there consequences in some previous form. Indeed, when Christos Papadimitriou introduced the class, he mentioned that one definition of a classic is something one can only reread because by the time one reads it, it has already influenced his or her thinking in fundamental ways.

How does one reread a classic? In addition to understanding the technical aspects of the paper, its historical context is important. The context can be divided into two parts: research prior to the work and research following it. A classic sometimes alters how research is conducted, so it helps to gain an appreciation of how the research was conducted before its publication. The impact of the work complements this by giving people a sense of how research was affected by it.

In addition to the intellectual context, there are also the authors' biographies. Who were these authors? The answers have led to interesting class discussions. For instance, in last week's discussion about Nash's "Non-Cooperative Games" paper, Professor Addison, who attended graduate school with Nash, described students' impressions of Nash and also his experiences with Albert W. Tucker, their adviser. Christos, in addition to describing how his own research has been influenced by Nash's work, talked about his graduate days at Princeton, a time when Nash was known as The Phantom.

This week, we'll be discussing Claude Shannon's "A Mathematical Theory of Communication." I've already discovered a few gems on this reread and look forward to tomorrow's discussion.

Sunday, August 26, 2007

Optimization, RNAi, and Lessons Learned from the Summer

I had met Milan Chheda at an outreach event a few weeks earlier, and my final day at the Broad started with a tour of his workspace. Milan, a neurologist, works in William Hahn's Lab, which focuses on how human cells transform into cancer cells. Milan's work is to optimize a technique that is currently applied to study the development of glioblastomas, a particularly virulent type of brain tumor.

The technique in question uses RNA interference to knock down the expression of certain genes. Using a lentivirus, the researcher introduces a hairpin structure to candidate nerve cells to reduce the expression of a specific gene in vitro. After culturing the nerve cells that have received this structure, the researcher typically uses a microscope to check for glioblastomas or other types of cells that may form.

What does it mean to optimize this technique? Milan is part of an effort to create methods that would enable RNAi experiments to scale up the way the Broad's sequencing center has industrialized gene sequencing. In one example, members of his group are working to use software that automatically checks microscope images for glioblastomas and other features. By identifying trouble spots in the experimental pipeline, the hope is that these experiments can be conducted at a larger scale, resulting in reliable and larger quantities of data to analyze.

I left the Broad with a greater appreciation of the interplay between data gathering and data analysis. Writing about my discussions with other researchers greatly enhanced my ability to gain this appreciation. With this in mind, expect further updates to this blog once I return to Berkeley.

Wednesday, August 8, 2007

Science on Wednesday

During junior high, I attended a weekly lecture series at the Princeton Plasma Physics Laboratory designed to introduce students to science research. The program was called Science on Saturday, and it featured scientists in fields as diverse as cosmology and forensics talking about their work to a largely non-technical audience that consisted of students and their parents.

The Broad Institute had a similar program this summer called Midsummer Nights' Science, an apt name given this year's Shakespeare on the Common production. For the four Wednesdays following Independence Day, scientists at the Broad would describe their work to the greater Boston community. Each Wednesday featured a different researcher describing his or her work. While the projects and interests described each week were quite different, all of them implicitly promoted the idea that large databases of data can enable new kinds of research.

The first talk featured David Reich, who discussed how he and his colleague conducted a comparative analysis of the DNA of humans, chimpanzees, and gorillas, which have led them to a new model for the evolution of these species from a common ancestor. The way I first learned about evolution was that it starts when two groups of the same species are physically isolated from one another. Then, under appropriate environmental conditions, the two groups would eventually evolve into different species, after which any hybrids between these two species would be less fertile and die out. This is called allopatric speciation. If this is true, then one can model the DNA sequences as following a branching process, so the evolution of species would look like a tree, where each fork in the tree indicates one species dividing into two. Reich and his colleagues discovered that this model is not a great one to describe the evolution of humans, chimps, and gorillas. In fact, if one constructs a phylogenetic tree for these three species using DNA from one section of the genome, one tree emerges indicating that the most recent split was between humans and chimps split, but if the same analysis is performed using a sequence from another section, which comprises between a fifth and a third of the genome, a different tree emerges, indicating the most recent split was between humans and gorillas. An alternate hypothesis the group proposed was that hybridization among these species took place, and by a careful analysis of the sequence data they had, they were able to confirm this was a better model by which to describe the speciation of these three species. Indeed, the study probably would not have been possible without all the DNA sequence information available for these three species.

During the following week, Pardis Sabeti explained how the HapMap project, another data gathering effort, is enabling researchers to determine the role natural selection has played in humans and pathogens. The HapMap project collects DNA samples from different populations around the world. The samples of DNA they collect account for 90% of the genetic variation among humans. These samples are divided into haplotypes, which represent sections of DNA that are inherited as a group. If there is no selective pressure on an organism, one would expect the prevalence of a particular haplotype to decay as its size gets larger. By similar reasoning, if a larger haplotype is highly prevalent in a population, then there is evidence that the corresponding section of DNA is under selective pressure. Sabeti explained how this has allowed researchers to track lactose tolerance in European populations, who domesticated cattle relatively early, and link the sickle cell trait to malaria resistance. Once again, the availability of this data enabled such an analysis.

The third talk, given by Todd Golub, was about cancer research in the era of genomics. He started the talk by describing two patients, both the same age, both diagnosed with the same type of leukemia in a similar stage of progression, and both given similar doses chemotherapy. However, Patient A lived and Patient B did not. Golub then explained how the mutations that had occurred at the genomic level for these patients were actually quite different, and if one were to look at patient survival by isolating these two different mutations, the group with the same mutation as Patient A had a survival rate much closer to 1 and those treated with the same mutation as Patient B had a survival rate close to 0 within a few years of diagnosis. Golub then went on to describe a treatment that had been customized to target the mutation in groups with Patient B's mutation. The result led to Gleevec, a drug now available to patients with this version of the disease. Since it's introduction, patients diagnosed with this specific mutation have had a 100% survival rate with minimal side effects from the medication. Golub appeared optimistic that similar treatments could be developed for most of these mutations that results in cancer.

Unlike the the preceeding talks, the final speaker barely mentioned genomics in his talk. Vamsi Mootha described mitochondria and his group's research efforts on understanding them. Mitochondria are found inside the cell and produce much of the energy a cell uses. Unlike most other organelles, mitochondria contain their own DNA. However, proteins found in mitochondria are mix of genes derived from within the mitochondrial DNA and the cell's nuclear DNA. It turns out that metabolic diseases are closely related to problems with mitochondrial function, which are apparent in changes to their protein composition. Mootha's group is building an atlas of the protein content of mitochondria in different parts of the body. Their hope is that this data will enable researchers to characterize the specific problems associated with certain metabolic diseases. In this instance, the hope of a future payoff inspired his group to procure a large data set.

Midsummer Nights' Science showcased how biological and medical research have benefited and can continue to do so when certain kinds of data are available in significant quantities. Hopefully this message reached the students who attended and will capture the imagination of those who decide to pursue research in the future.

Wednesday, August 1, 2007

Cultural Learnings

When I told a friend I would be interning this summer, he was surprised.

"Why are you doing an internship?" he asked.

"The idea," I responded, "is to get introduced to a new environment, so I return to grad school with a broader perspective."

"Sounds like Borat."

Like a foreign correspondent reporting to his home country, I gave an informal talk to the Stochastic Systems Group about my summer project. The resulting feedback helped me improve my results this summer. However, once the problem was described, there were a lot of similarities with problems familiar to the group. It was hardly Borat.

That said, there are practices at the Broad outside of my work that I would be surprised to see in my own research community. Perhaps the most surprising thing I have discovered is that people are willing to share their ongoing research with people at the Broad. Weekly seminars feature researchers from outside the Broad discussing their as yet unpublished work. Broadies see data that has yet to be made public. I was particularly surprised by this since there is some controversy that Watson and Crick's paper about the structure of DNA used unpublished data from Rosalind Franklin.

There is a catch. Attendees of the seminar must agree not to work on anything they pick up during the course of the presentation. This understanding and the honor system are what make people comfortable enough to discuss work they might otherwise keep private.

The presentations may also be a way to start collaborations. In a field driven by data, if someone provides the data for a figure on a paper, that person frequently becomes an author, even if the idea of the paper came from others. Thus, advertising results before they are published might allow other researchers to avoiding running the same experiments.

A consequence of this practice is that one rarely finds single authored papers and often finds papers with four or more authors. How does one delineate the contributions of each author? Author ordering may only give a coarse indication of an individual's contribution. An existing solution in some journals is to include an author contributions section. This section typically follows the acknowledgments and may read some like the following:

S.B.C. conceived and designed the experiments. B.S. conducted the experiments. S.B.C. and B.S. performed the analysis. S.B.C. and B.S. wrote the manuscript.

What happens if the work is primarily by two authors? The practice described to me for these instances is called co-first authorship. To do this, one simply places an asterisk next to each author's name with a footnote that reads: "These authors contributed equally to the work."

While some biologists I spoke to joked about some of these practices (one described how an author contributions section might read if each individual's contribution were described honestly), almost all of them were comfortable with the idea that providing data is a legitimate way to become an author on a paper. The same might not be true for my community, but I wonder if any of these practices would transfer well.