Lior Pachter (BS '94) is Caltech's new Bren Professor of Computational Biology. Recently, he was elected a fellow of the International Society for Computational Biology, one of the highest honors in the field. We sat down with him to discuss the emerging field of applying computational methods to biology problems, the transition from mathematics to biology, and his return to Pasadena.
What is computational biology?
Computational biology is the art of developing and applying computational methods to answer questions in biology, such as studying how proteins fold, identifying genes that are associated with diseases, or inferring human population histories from genetic data. I have interests in both the development of computational methods and in answering specific biology questions, primarily related to the function of RNA, a molecule central to the function of cells. RNA molecules transmit information through their roles as products of DNA transcription and as the precursors to translation to protein; they also act as enzymes catalyzing biochemical reactions. I am interested in understanding these functions of RNA through tools that involve the combination of computational methods with sequencing methods that together allow for high-resolution probing of RNA activity and structure in cells.
How did you get interested in this field?
During my PhD studies at MIT, I took a course in computational biology. In the course of working on a final project for the class, I got connected to the Human Genome Project—a large-scale endeavor to identify the full DNA sequence of a human genome—and I found the biology and associated math questions very interesting. This led me to change my intended direction of research from algebraic combinatorics to computational biology, and my interests expanded from math to statistics, computer science, and genomics.
Is it common for mathematicians to become biologists?
It's not very common. However, many prominent genomics biologists have backgrounds in mathematics, computer science, or statistics. For example, one of my mentors in graduate school was Eric Lander, the director of the Broad Institute of MIT and Harvard, who received a PhD in mathematics and then transitioned to working in biology. His transition, like mine years later, was sparked by the possibilities and challenges of utilizing genome sequencing to understand biology.
While genome sequencing has obviously been useful in revealing the sequences that are involved in coding various aspects of the molecular biology of the cell, it has had a secondary impact that is less obvious at first glance. The low cost and high throughput (the ability to process large volumes of material) of genome sequencing allowed for a more "big-data" approach to biology, so that experiments that previously could only be applied to individual genes could suddenly be applied in parallel to all of the genes in the genome. The design and analysis of such experiments demand much more sophisticated mathematics and statistics than had previously been needed in biology.
A result of the scale of these new experiments is the emergence of very large data sets in biology whose interpretation demands the application of state-of-the-art computer science methods. The problems require interdisciplinary dexterity and involve not only management of large data sets but also the development of novel abstract frameworks for understanding their structure. For example, there's a new technique called RNA-seq, developed by biologists including Barbara Wold [Caltech's Bren Professor of Molecular Biology], which involves measuring transcription—the process of copying segments of DNA into RNA—in cells. The RNA-seq technique consists of transforming RNA molecules into DNA sequences that allow the researchers to identify and count the original RNA molecules. The development of this technique required not only novel biochemistry and molecular biology, but also new definitions and ideas for how to think about transcriptomes, which are the sets of all the RNA molecules in a cell. I work on improvements to the assay, as well as the development of the associated statistics, computer science, and mathematics.
What did you do before becoming a professor at Caltech?
I was born in Israel and moved to South Africa when I was two. I lived there until moving to Palo Alto, California, at 15. After high school, I studied mathematics at Caltech and pursued my PhD in applied mathematics at MIT. I spent time at Berkeley as a postdoc before becoming professor of mathematics, molecular and cell biology, and computer science, and I held the Raymond and Beverly Sackler Chair in Computational Biology. I joined the Caltech faculty in early 2017.
What is it like to be back here?
It's a great pleasure. As an undergrad, I made very strong connections with very special people who just had a pure love of science. I've always missed the unique culture and atmosphere at Caltech and, returning now as a professor, I can feel the spirit of the Institute—an intense love of science emanating from individuals that is unlike anywhere else. It's a homecoming of sorts.