In recognition of the fact that scientific advances often rely on the development of specialized software, Caltech has launched the Schmidt Academy for Software Engineering to train the next generation of science-savvy software engineers and set new standards in scientific software.
In an era of big data and complex computer modeling, researchers often face difficult computing challenges. The new program will train recently graduated students to recognize potential computing solutions to problems faced by researchers, and to develop software to fill those needs.
Currently, scientific labs have staff with wildly varying degrees of computer science expertise who cobble together custom software to tackle these challenges. The result is often unwieldy Frankenstein-like code that is temperamental and only understood by the grad student who wrote it, and that needs regular revisions. The key to producing more robust, cleaner, and more effective software, say the minds behind the Schmidt Academy, is embedded computer scientists with significant training and who are growing in their experience in disciplined software engineering—individuals who understand both the science that the lab is doing and the engineering required to efficiently achieve their goals.
"Scientific progress and software development are intimately linked but academia has fallen behind industry in exploiting best practices in software engineering," says Mike Gurnis, the John E. and Hazel S. Smits Professor of Geophysics and the Schmidt Academy's inaugural director. "Caltech has been given a unique opportunity to explore new ways to bring software engineering into research groups campuswide. We'll be accelerating scientific progress while planting the seeds for a cultural shift in how software is developed at the Institute."
The three-year pilot program, which is supported by Eric and Wendy Schmidt, by recommendation of Schmidt Futures, is funding four Caltech Schmidt Scholars this year to work with research groups across campus; the cohort will grow to 12 scholars next year. For one to two years, the scholars will receive industry-competitive salaries while being mentored by a senior software engineer at Caltech and embedded in a research group.
"During this time of extraordinary scientific and technological growth, we must continue investing in the next generation of leaders and encourage inter-disciplinary collaboration," says Eric Schmidt. "This program is a unique opportunity for our Scholars to hone their software engineering skills while collaborating to address some of the world's biggest scientific challenges that might otherwise remain unsolved."
"The Software Academy is an experiment on how careful software engineering can enhance scientific research through producing stable, adaptable, and trusted platforms that can be used for years, not just a tool sufficient for one or two experiments," says Stu Feldman, Chief Scientist of Schmidt Futures. "Caltech with its intense scientific culture and great people is ideal to try this."
"This is a recognition that computing, software, and machine learning are going to play a very big role in science. Because Caltech is small and collaborative, we have the opportunity to really make a push in that direction," says Kaushik Bhattacharya, the Howell N. Tyson, Sr., Professor of Mechanics and Materials Science and vice provost.
Students and postdocs often already develop custom software to aid their research. Tapio Schneider, Theodore Y. Wu Professor of Environmental Science and Engineering, notes that atmospheric scientists have been generating computational experiments to learn about the climate system for decades, and they develop and maintain their own code.
"While this software engineering and management model has had success, it creates obvious challenges. Most scientists are not trained as software engineers; software engineering practices that are common in industry are less common in scientific computing," says Schneider, who is also a research scientist at JPL, which Caltech manages for NASA. Schneider is currently leading a Schmidt-funded effort, the Climate Modeling Alliance (CliMA), that is leveraging recent advances in the computational and data sciences to develop a wholly new climate model.
Bhattacharya says the Schmidt Academy has three key goals: to provide a unique training opportunity for undergraduates who have a strong interest in software engineering so that they may help advance scientific discovery; to transform software engineering practice within research groups; and to enable research groups to pursue new scientific and technological advances that would not be feasible otherwise.
"The Schmidt Software Academy is poised to set new standards in the quality of scientific software," Schneider says.
The processes for becoming a Caltech Schmidt Scholar and for having a project that receives the assistance of a Caltech Schmidt Scholar are competitive; graduating seniors apply to participate in the program and scientists on campus apply to have their project taken up by a Caltech Schmidt Scholar.
"The goal is to ensure that top candidates are tackling the most crucial challenges on campus," Gurnis says. The first round of four Caltech Schmidt Scholars was chosen from Caltech's Class of 2019. Each computer scientist collaborates with a different lab. Currently, the scholars are working on projects that range from automating the data-processing workflow for an imaging technique called cryo-electron microscopy to improving tools used to analyze imaging spectroscopy data sets.
As they begin to work on their projects, scholars will attend a software "bootcamp" of relevant classes led by Schmidt Software Academy instructor Donnie Pinkston (BS '98), a familiar face to Caltech computer science students. Pinkston has been a lecturer in Computing and Mathematical Sciences since 2005, teaching several of the core courses in that department. While the Caltech Schmidt Scholars typically have training in computer science, the bootcamp helps them hone their software engineering skills, Gurnis says.
Umesh Padia (BS '19), who is among the inaugural class of Caltech Schmidt Scholars, deferred enrollment as a graduate student at MIT to jump on what he describes as a way to build skills while simultaneously creating a tool that helps scientists who are at the forefront of their fields.
"This is a way to work on a scientific project as a leader, which is uncommon at this stage of your career," Padia says. "If you go to work for a tech company, you're going to work on someone else's project and be a cog in a machine. Here, you're the project manager, you're the tester, you're the software engineer—you get to see everything. It makes you a better software engineer."
Padia is embedded in the lab of Viviana Gradinaru (BS '05)—professor of neuroscience and biological engineering, director of the Center for Molecular and Cellular Neuroscience of the Tianqiao and Chrissy Chen Institute for Neuroscience at Caltech, and Heritage Medical Research Institute investigator—and collaborating with researchers in the Gradinaru lab, including graduate students David Brown and Xiaozhe Ding to build a cloud-based platform for analyzing and designing viral gene-delivery vehicles.
In her neuroscience lab, Gradinaru develops biological tools to access and fix the brain. "We took our lessons from nature and from Frances Arnold and have been using directed evolution to engineer viruses that can act as gene-delivery vectors," Gradinaru says. Arnold—Caltech's Linus Pauling Professor of Chemical Engineering, Bioengineering and Biochemistry and director of the Donna and Benjamin M. Rosen Bioengineering Center—developed "directed evolution," a method for creating new and better proteins in the laboratory using the principles of evolution, work that earned her the 2018 Nobel Prize in Chemistry.
For example, a disease might result from defects in the brain's motor centers—that is, the cells responsible for motion might be missing a certain gene needed to manufacture a specific protein, resulting in motor issues. Gradinaru's team is developing methods that could deliver the needed genes into the relevant brain cells to jump-start the creation of the necessary proteins. The challenge is in finding ways to safely and effectively deliver the genes. To do so, they use viruses.
"We're capitalizing on a system that nature has already perfected. Viruses can enter into cells and make missing proteins," Gradinaru says. "The problem is that natively, viruses don't know how to target the specific cell types and tissues we need for research or therapy."
That's where directed evolution comes in. Gradinaru's team has developed promising viruses and built huge libraries of different variants. However, combing through these libraries when searching for viruses that could perform specific tasks means managing a deluge of data, a task that no commercially available system can accomplish. Padia, who as an undergrad worked in the lab of David Baltimore, president emeritus and Robert Andrews Millikan Professor of Biology, was immediately intrigued by the challenge. "I thought that was a great opportunity to work on a problem that can really accelerate science using strong software engineering principles," he says.
Together, Padia and Gradinaru's team are developing a streamlined way for biologists to access that library using software that helps them to catalogue known viruses while also using machine learning to identify promising mutations to try.
"This won't just be robust, faster software. It'll be software that helps us make better decisions on the best vehicles for gene therapy," Gradinaru says. Once completed, the software will be made broadly available to the academic community, she says.
Meanwhile, Caltech Schmidt Scholar Sunash Sharma (BS '19) is working with Steven Low, Frank J. Gilloon Professor of Computing and Mathematical Sciences and Electrical Engineering, on a project to expand the utility of an electric vehicle (EV) charging network in the underground parking structure off of California Boulevard, on the southern edge of campus.
Low's project, which improves the capacity of large EV charging stations by creating an adaptive charging network (ACN) that optimizes how the vehicles are charged, has already been operational for more than three years. In a nutshell, the system addresses this issue: if you simultaneously plug 60 electric cars into EV charging stations, what is the best way to charge the cars, given that each car's batteries requires at least two hours to get a full charge, and the cars will be left at the charging station for six hours and depart at different times? A dumb system would simply charge all of the cars at their peak rates at once. But a smart system like an ACN knows that charging the cars at different rates based on their energy requests, their departure times, and the capacity limits of the electric infrastructure requires far less infrastructure, giving the ACN a much smaller impact on the energy grid.
Low's ACN uses an algorithm that calculates the optimal way to charge multiple cars based on driver behavior; for the past year he has been working on software that automatically collects and anonymizes data about how the EV charging grid is used, creating a data set for researchers to use. With Sharma's help, he is building on that work to create a platform that will allow researchers to test out new algorithms on a real charging grid.
For the project, Sharma is collaborating directly with Zach Lee (MS '18), a graduate student in Low's lab who first started working on the software two years ago. Sharma and Lee share office space where they check in with each other daily, working through problems. "We divide the workload based on how our skillsets align," Lee says. "I come in with knowledge of what researchers in the field might want, and we're able to discuss how that fits within the architecture of the software. It's an iterative process where we bounce ideas off each other, allowing us to create higher quality software."
Sharma, Lee, and Low hope to produce realistic models that will allow Low and researchers at other institutions around the world to create ever better and more efficient algorithms. Eventually, they would like for the ACN to be used both as a commercial product and a research tool—and they hope their successes will lure other researchers to work on building the smart grid of the future. "I'm hoping we can convince the broader scientific community that this is a useful tool for smart grid research as well as for data science and machine learning," Low says.
For his part, Sharma says that he feels a responsibility as one of the first class of Caltech Schmidt Scholars to do an exemplary job of building clean and useful software. "This is an experiment, like everything else. We have to show just how valuable this program can be," Sharma says.
Applications for new Caltech Schmidt Scholars and projects will be accepted starting in early October. The selections will be announced early in the new year. More information is available at SASE.caltech.edu.
Top image: Viviana Gradinaru collaborates with Caltech Schmidt Scholar Umesh Padia, who deferred enrollment at MIT to develop new software with her.