Understanding an airplane—or, understanding how hundreds of thousands of pounds of mechanical parts work together to ferry people through the sky—is not an easy task. Understanding biological systems is just as complicated. For example, the human body contains trillions of specialized cells, each able to express different genes, send signals, and serve different functions.
David Van Valen (PhD '11) has joined the faculty as an assistant professor of biology and biological engineering with the goal of developing tools to handle this so-called high-dimensional biological data. These techniques hold promise for holistic understandings of biological processes from viral infections to cancer.
What makes biology particularly challenging?
In my view, there are three main challenges that make biology difficult.
First: The "parts list" is large. If you work in the approximation of one gene being one part, then a simple organism like the bacterium Escherichia coli has around three or four thousand parts. In mammalian species, you're looking at the order of tens of thousands of parts. It's hard to understand how all of these different pieces work together.
Second: Living things, by definition of being alive, are heterogeneous. They change and evolve over time and space.
Third: One of our best tools for understanding complex, noisy things is to average all the data together. The unfortunate reality is that for living systems, the fundamental unit that you care about is a cell, and cells are different. They have different types and shapes and these differences are very important. In a lung, there are immune cells, blood cells, and so on; how these different cell types work in a coordinated fashion is what makes a lung a lung and not just a lump of matter. So, you can't just take a lung and throw it in a blender, and then throw the blended mess into a sequencer and say, "Hey, I understand a lung." This is the third problem: averaging doesn't work.
What makes biology so exciting right now is that for the first time, we have solutions for all three of these problems: Genomics techniques enable us to study the different genes in individual cells, imaging technology allows us to record cellular movement and behavior, and machine-learning algorithms combined with genomics and imaging will enable unprecedented analysis of biological data.
What does your lab work on in particular?
We've developed an open-source library called Deep Cell, which is a collection of deep-learning methods for analyzing single cells in images. This library is being used in labs across Caltech and the larger scientific community. For example, you could use deep-learning techniques to look at pathology images of breast cancer tumors and measure the interactions between immune cells and cancer cells.
Right now, we're also looking at how viruses make decisions. When a virus infects a bacterial cell, it can decide to either make more viruses or to be dormant and ingrain itself into the host's genome. It's often this dormancy that can make viruses hard to treat. We're interested in figuring out how the interaction between the host cell and the virus contributes to this decision. What kind of conversation happens between the host and the virus and to what extent is it some sort of mutually agreed upon decision? Experimentally, there are new technologies that allow us to explore these questions in a very systematic fashion. We can perturb every single host gene and see how that influences infection outcome, image what happens during the infection, and engineer different strains of viruses.
What led you to become a scientist?
My mom was a Caltech grad, my dad was an MIT grad, so it was always a given that I would be in some kind of STEM field. I thought I would be a mathematician, but during my undergrad at MIT, I was exposed to a lot of really interesting biology problems. I did two SURFs [Summer Undergraduate Research Fellowships] here at Caltech, one with Zhen Gang Wang [Dick and Barbara Dickinson Professor of Chemical Engineering; executive officer for chemistry and chemical engineering] and one with Rob Phillips [Fred and Nancy Morris Professor of Biophysics, Biology, and Physics].
I came to Caltech for my PhD, which was actually an MD/PhD program jointly with UCLA. I spent six years doing biophysics with Rob and then went to med school. After that, I did a year of residency at Stanford, but then I decided that I wanted to do science. So I got a postdoctoral position at Stanford for four years, and then I came back to Caltech to begin on the faculty. I was super thrilled that they wanted me to come back. I'm actually currently in the same office I was in as a graduate student.
There is a core group of people here at Caltech who like applying quantitative and physical methods to the problems in biology. I'm of that ilk. We had more training in physics, so we tend to think of living systems in this way. There is also a pretty vibrant machine learning and artificial intelligence community here. It's nice to have colleagues who are thinking about similar problems, because you can bounce ideas off each other; their work enriches yours and yours enriches theirs.
What do you like to do when you're not in the lab?
My biggest hobby is Brazilian jiu-jitsu. I started as a graduate student, and back in those days I used to compete. And one of the nice things about Los Angeles is that I have a large network of family and friends here, so it feels like I'm back home.