Friday, July 27, 2007

Genome Factory

About ten years ago, I spent a summer with other high school students for a summer program at the Waksman Institute of Microbiology. The program's goal was to introduce us to protocols to extract plant DNA and isolate regions of interest for sequencing. We learned how to use restriction enzymes to cut the DNA into smaller fragments, bacterial transformations to make copies of the DNA within E. coli, PCR to make copies of DNA without the help of E. coli, and gel electrophoreses to separate the DNA fragments by size and isolate the one(s) we wanted. Finally, the DNA had to be sequenced, and for this, we were introduced to the Sanger method, developed in 1975 by Frederick Sanger and his colleagues.

The Sanger method involves adding modified nucleotides called dideoxynucleotides, which can only form bonds at one end. Think of a Lego piece with a flat top. Thus, a DNA chain that has such a nucleotide will immediately terminate. If these nucleotides are mixed in with regular nucleotides during a process like PCR, it creates fragments of the DNA sequence with the same starting point and varying endpoints. If only a particular type of dideoxynucleotide such as dideoxyadenine (ddATP) is used, then all the resulting fragments terminate with an 'A'. If these fragments are then separated by gel electrophoresis, one can get a rough idea of the positions where 'A' shows up in the DNA sequence of interest. If 'C', 'G', and 'T' wells are adjacent to the one for 'A', one can just read off the DNA sequence from the gel electrophoresis. This is the basic principle of the Sanger method.

By the time school started again, we had become familiar with the techniques and protocols. We continued to return to the Waksman Institute periodically and apply these techniques. We would eventually use the sequence data from these visits to construct a phylogenetic tree of the Allium (i.e. onion) genus. Unfortunately, the data collection process could often be slow and annoying. There were many stages in which something could go wrong, and I would have to return to the beginning. All of this work produced just a tiny fraction of sequence information from these genomes.

A lot can happen in ten years. Thanks to my friends in the Broad's Outreach Program, I had a chance to visit 320 Charles St., the location of the Broad Institute's DNA sequencing facility. It is sometimes called a high-throughput production facility because of the rate at which they manage to sequence DNA. The facility was responsible for many of the sequences that were part of the Human Genome Project, and I was about to find out how they did it.

We entered 320 Charles St. and sat down for a presentation. Before we could start our tour of the facility, one of the scientists wanted to describe the process. To my surprise, she described the Sanger method. How could this be the process of a high-throughput production facility? Once the tour started, it became clear how: they industrialized the process. We had entered a factory, complete with conveyor belts, robotic arms, and computers. A group of technicians oversaw that the work on this genome assembly line went smoothly. Others, including the scientist leading the tour, were working on ways to industrialize new and improved sequencing methods developed by Solexa and 454.

It was interesting to learn that part of the rate increase has come from engineering solutions to scale up production. The amount of sequence data now available is enabling some researchers to ask questions that may previously have been too time-consuming to answer. I have talked to biologists this summer that have told me how challenging data collection can be, and I am starting to realize how those difficulties play a role in the questions they ask. How might these questions change if other protocols for data gathering were similarly industrialized?

Wednesday, July 25, 2007

Sergio Servetto

It was a few weeks into the start of the semester, and my schedule was set. As I went to the lab printer to pick up a problem set, I accidentally picked up one for ECE 445. Some problems required tools from signals and systems or probability to answer questions about quantization. The final problem was to design a primitive image compression algorithm. Although I would have to switch my schedule, I wanted to take the class. An e-mail to the professor was met with an enthusiastic response, so I made the switch.

Sergio's class was one of my favorites at Cornell. The course mixed theory and programming and made me appreciate the important role theoretical questions have in the design of practical systems. Sergio's teaching style was also one that encouraged questions. He would often pause before answering as if the question being asked were important. Even if I later realized I had said something incorrect or the answer to my question was self-evident, Servetto never sounded dismissive when he answered.

Part of the reason Sergio was able to relate with students was how comfortable he was around them. The first time I walked into his office was just after someone had brought him a freshly baked chocolate chip cookie. Without a second thought, he immediately split the cookie and handed me half. I still remember seeing one of the melted chips stretch between the two halves of the cookie and thinking what a generous thing to do.

Given my experiences in Sergio's class and others, I wanted to pursue information theory and communications after starting graduate school. Sergio and I would periodically meet at conferences. As we were catching up during ITA 2006, he mentioned that he had looked over my Master's thesis. It was great a feeling to know that one of my former professors was still interested in my progress.

Most recently, I saw Sergio at ISIT 2007. He had recently agreed to oversee the Information Theory Society Student Committee, and we talked a bit at one of their events. I last saw him among the audience at my talk.

I found out about the plane crash this evening. It is much easier to reminisce about the past than to describe how I am currently feeling. I've been fortunate to have professors like Sergio Servetto who have encouraged my interests.

I remember we tested our primitive image compressors from that first problem set on a photo of one of Sergio's sons. My thoughts are with the family.

Saturday, July 21, 2007

Variations on a Theme

Information theory, statistical decision theory, and game theory have developed methods to analyze what some may consider adversarial situations. Lessons in these fields have certainly influenced how I model problems involving adversaries. Perhaps it should come as no surprise then that such models were in my thoughts as I attempted to read about host-pathogen interactions. The work in question was a review paper from Hidde Ploegh's lab. Ploegh's lab studies mechanisms by which seemingly simple bacteria have been able to infiltrate our complex immune systems.

How does the lab conduct this research? Renuka Sastry, a researcher at the Whitehead Institute and one of Ploegh's graduate students, gave me a tour of the lab. Our first stop was at what appeared to be a cylindrical dark room.

"It's used for western blots," Renuka said. As she explained, a western blot is a technique to test for a specific protein in a tissue sample. The results are represented as lines on a plastic page, where a line indicates the presence of said protein.

Surprisingly, it takes some effort to extract this bit of information. Part of the process was unfolding on Renuka's workbench. A gel electrophoresis was running, but there were a couple differences from ones I had seen for DNA. First, the gel was positioned vertically instead of horizontally. Second, the gel looked significantly thinner than an agarose gel. Renuka's electrophoresis was one stage in an experiment to test for a particular protein. She was hoping the result would validate an observation she had made earlier.

Noticeably absent from the workbench was a computer. While there was a computer in that room, our next destination was filled with them. The mass spectrometry room is used to identify proteins, and computers are used to crunch numbers and consult databases for protein matches. Of course, the proteins come from living cultures, and in the final part of our tour, I saw one under a microscope.

What struck me during the visit was that these methods and techniques could also be applied to problems that did not involve hosts and pathogens. What drew Renuka to the work?

"I wanted to do biochemistry research," she responded. I left with a better appreciation for this research. Whether this appreciation might inspire new ways to model aspects of these host-pathogen interactions is an open question.

Wednesday, July 18, 2007

Cat's Cradle in a Hard-Boiled Wonderland and the End of the Brave New World

During the ITA workshop in January, Desmond Lun and I had the following exchange.
Me: So what are you doing these days? Are you a post doc?
Desmond: Actually, I'm at the Broad Institute.
Me: What's the Broad Institute?
By the time I started looking for internships, I knew what the Broad Institute was and sent Desmond my resume. I have been at the Broad now for two months, and Desmond and I work in the same group. It helps working with someone here who hails from the same research community, and our conversations span topics that include information theory and biology research.

I had a taste of the future of biology research during lunch when Desmond described his project with George Church's lab. The goal of the project is to study ways to use biology to produce renewable fuel sources. One fuel source is ethanol, and there is a well-known biological recipe to produce it. Add yeast to a sugar solution. Mix. Let it ferment.

The approach Desmond described was a little different. It turns out one can modify the E. coli genome and use the modified E. coli to produce ethanol. Driven by this success, there is an effort to see if alkanes or other fuels can be created by hacking the genome. Indeed, some start-ups are trying to capitalize on this idea.

The technology that enables such genome hacking falls into the field of synthetic biology. What is synthetic biology? The answer can vary depending on who answers, but to my understanding, synthetic biology is the study of how to design and fabricate living systems that do not exist in nature. In addition to adding and removing genes from a genome, Desmond said there exist techniques that allow one to increase the mutation rate of certain organisms. Once enough mutations accrue over the population, a researcher can then create conditions that select the mutations most suited to a task of interest. This may be the only truly parallel implementation of a genetic algorithm.

Of course, such technology also generates concern. The ETC Group is a public watch-dog for synthetic biology. They have been vocal in challenging Craig Venter's attempt to patent synthetic life and oppose the idea of scientists creating synthetic life without regulations. "Playing God in the Galapagos," the title of one of their publications, reflects this position.

These concerns are also in the public consciousness. Desmond mentioned a recent online poll asking about such technologies. The response choices ranged from complete opposition to regulations to complete opposition to the research. How do scientists feel? It turns out Church's lab took a similar poll. Surprisingly, the group was in favor of more regulations.

Sunday, July 15, 2007

Neuroscience and Engineering

It was Friday morning at ISIT, and some people had already left. Like the previous days, the morning started with a plenary talk. Unlike the previous days, Friday's speaker was from outside the information theory community.

The speaker was Emery Brown, a professor in the Department of Brain and Cognitive Sciences at MIT and in the Department of Anesthesia and Critical Care at Harvard Medical School. He discussed his group's work to build signal processing algorithms in neuroscience, and he showed videos that showcased the performance of those algorithms. One video in particular stuck out in my mind. It featured an animated rat moving around an enclosure and an estimate of its position. The animated rat and its estimated position corresponded to an experiment his group conducted on a live rat using signals from roughly thirty neurons to track the rat's position.

The experiment was designed to test memory formation. The rat was introduced to the enclosure in question a few days before his group tracked its position. Furthermore, the neurons used were from the hippocampus, which is thought to play a role in memory. Indeed, Brown placed this work in the context of a series of experiments attempting to understand how the brain handles memory. However, there was something compelling about the experiment itself.

While those who study this experiment might not claim to understand exactly how memory works in the brain, they may still claim that what limited understanding of memory they have enables them to track a rat's position under the conditions of the experiment. This concrete way to describe the utility of the experiment appealed to my sense of research aesthetics.

Research aesthetics was just one of several topics I discussed with Ram Srinivasan, a postdoctoral researcher in Neurosurgery at Massachusetts General Hospital and student at Harvard Medical School. Ram earned his PhD in EECS from MIT, where he worked in part with researchers from Emery Brown's lab. He was also coadvised by Sanjoy Mitter, and his dissertation touched on the interplay between electrical engineering and neuroscience. It became clear from our discussions that his interest in this interplay dated back to his undergraduate days at Caltech.

The issue of aesthetics came up early in our conversation. We discussed the challenge of posing a concrete question. For instance, the context in which one asks another person if he is stressed can affect the answer. To contrast this with a concrete question, Ram mentioned a behavioral sciences paper that refuted a stereotype to show that women are no more talkative than men. How did the authors show this? The researchers had individuals carry around voice recorders over a period of days and found that both men and women spoke about the same number of words per day. In this case, word count in natural conversations gave a concrete way to measure talkativeness.

We discussed issues with applying standard ideas from control theory to the brain. For instance, one might engineer a control system such that sensing and actuation are distinct components. If one were to describe the brain as a control system, the delineation of sensing and actuation is an artificial choice of modeling. Such a choice may be informed by the specific application of the model. A separate but related issue is whether a distinction between motor and sensory regions in an organism actually exists. While standard dogma describes such a separation, an evolving perspective is that action and sensation are intertwined at multiple scales, from cell to organism.

In his own research, Ram is further exploring the brain-machine interface at the basic science and algorithmic levels. Having worked largely with data from collaborators and simulations during his PhD, his postdoctoral lab affords him the opportunity to design his own experiments to record single-neuron activity from awake humans. This work employs wet lab experiments to understand how movements are initiated. Additionally, he is beginning to reexamine the premise that the brain-machine interface is an estimation problem. He says his current position has given him a new perspective on the challenges of experimental design and the ways in which science and engineering research proposals can synergize to develop neurotechnology. Hopefully this new perspective will allow him to build on his earlier successes.

Thursday, July 12, 2007

Outreach

I had been at the Broad for a little over a month, but I had yet to meet the co-worker standing next to me in the elevator. To avoid my tendency to shift between staring awkwardly at the elevator doors and the lighted floor number, I introduced myself. "I'm Megan," she responded, and we started a conversation.

Megan Rokop is Director of the Broad Institute's Educational Outreach Program. In addition to the research that goes on at the Broad, the Institute also sponsors a series of programs to engage with students, teachers, and the general public in the Boston area. A main feature of the program is the opportunity for high school classes to visit the Broad, where students get to conduct experiments using Broad facilities.

Megan wasn't always interested in biology. She started college at Brown as a foreign languages major, but a scheduling error placed her in a biology class. Unlike her previous experiences with the subject, which primarily involved memorizing a list of facts, the professor for this class presented the material in a way that inspired Megan's interest in the subject. "I wanted to be like him," she said of the professor.

Sure enough, Megan switched majors and eventually received her PhD in biology from MIT. After teaching at MIT for a few years, a fellow biology instructor told her about an opening for the Outreach position at the Broad Institute. Although she enjoyed teaching undergraduates, Megan recognized that not everyone benefits from a scheduling error, and saw the position as an opportunity to reach students while they were still exploring interests. When I told her I was interested in learning more biology, she was more than happy to oblige.

Our first lab involved identifying and mating different strains of Caenorhabditis elegans, a worm that serves as a model organism for investigators with interests ranging from genetics to neuroscience. C. elegans are only a millimeter long, so we needed a microscope to observe them. Once under the microscope, the distinguishing characteristics of mutant strains and sexes were clearly visible.

C. elegans
are divided into two sexes: male and hermaphrodite. Mating two of the mutant strains requires the transfer of a male and hermaphrodite onto the same dish. The offspring can later be counted to determine whether their traits were dominant or recessive. After a few false starts, I was able to use a special hook to transfer a wild-type (WT) male onto the same dish as an uncoordinated (UNC) hermaphrodite. While we couldn't see the worms without a microscope, we could see the tracks the wild-type was making as he searched for his uncoordinated partner.

The second lab involved running a gel electrophoresis with an application to paternity testing. Not all DNA code for proteins, and in the non-coding regions, certain strings repeat. The number of times these strings repeat can be used to distinguish individuals and determine heredity.

One way to distinguish the number of repeats is via gel electrophoresis. The idea is to load the DNA into different wells on one side of a gel and run a current through the gel. Since DNA is negatively charged, this current causes the strands to move across the gel towards the positively charged end. Since longer sequences diffuse more slowly, sequences with more repeats don't travel as far away from the negative end.

Unlike the first lab, I worked on the second lab with a group of high school students. They were visiting from the National Youth Leadership Forum on Medicine, a summer program for aspiring doctors. After the lab, I had a chance to talk to some of the students, who were curious what a non-biologist was doing at the Broad. In turn, it was interesting to hear from the students, some of whom weren't completely set on a career in medicine. While I wasn't sure whether their experiences that week would increase their interest in medicine, mine certainly increased my curiosity about biology.

Saturday, July 7, 2007

Concept

In August of 2005, I volunteered to be the Faculty Interview Coordinator for the EEGSA at UC Berkeley. While it is not standard practice in all departments, the EECS department brings graduate students into the faculty interview process. Student involvement consists of a time slot during which graduate students may interview each faculty candidate. One of my friends, who had co-organized the student interviews the previous year, was leaving Berkeley, so I signed up for the vacant position. Interviews started in the spring, and I would have help from the other co-organizer.

That was the plan. Near the end of January, I received an e-mail that took me by surprise. My co-organizer, who had been involved in the process the previous year, would not be actively involved with the interview process that spring. I quickly recruited a friend to help out with the interviews, but neither of us had any experience. To handle this problem, I arranged a meeting with the previous co-organizer to run me through the process. While most of the issues we discussed at that meeting were logistical, I had a concern. How should I handle a faculty candidate whose expertise was in an area where I knew nothing?

His answer was in some of the advice he gave me. "My favorite question to ask is what they consider important research questions over the next ten years. The answers are usually pretty interesting. Plus, it works on any candidate, regardless of how much you know about their work."

Armed with this advice, I began interviewing prospective faculty. The list of interviewees ranged from graduate students wrapping up their dissertations to senior faculty at other universities, one of whom was considered a contender for the Nobel Prize. While there were some logistical headaches, the interview experience itself was a positive one. It gave me a window into research outside my direct area of interest and gave me perspective on larger questions in electrical engineering.

Why stop at electrical engineering? The intent of this blog is to summarize conversations with graduate students, faculty, and others about their fields of interest. Since faculty candidate interviews are confidential, I will have to look elsewhere for content. Hopefully my summer at the Broad Institute, where the focus is on biomedical research, will prove to be a useful starting point.