Caltech researchers in collaboration with the start-up Alexandria have built the world's first blockchain-powered platform for sharing scientific data. The Electron Tomography Database (ETDB), which is free and open to the public, uses blockchain to securely distribute and track ownership of data without relying on a central authority or moderation. The database is simultaneously a powerful new tool for basic research and proof of concept for a new model of scientific data sharing.
A paper describing the new database appeared April 15 in the journal PLOS One.
Seeing inside of nature's tiniest creatures, like viruses and bacteria, is a big challenge. The molecular structures that make up these organisms are so small that they slip between these crests of light waves, making them invisible to even the most powerful optical microscopes.
An imaging technique called electron tomography (ET) gets around this problem by taking images without light. Using electrons, which have a smaller wavelength than the photons that make up light waves, ET can be used to "see" beyond the fundamental limits of sight.
ET microscopes build up three-dimensional images by passing a beam of electrons through different two-dimensional planes within a sample. This creates a series of 2D images that can be used to reconstruct what the 3D object must have looked like. This is usually accomplished by rotating the sample slowly under the microscope while it is being imaged. Using ET, scientists can see the internal structure of cells, spy on viruses, and even observe individual molecules.
The 3D images on the ETDB are available for free online to both researchers and the public. By browsing through the ETDB website's featured tomograms, anyone can explore the hidden cosmos of the microscale through data captured by the laboratory of Grant Jensen, professor of biophysics and biology and Howard Hughes Medical Institute Investigator. "For the general public," says Davi Ortega, first author on the paper and a postdoctoral scholar in the Jensen Lab, "I think it is like looking at the stars. There might not be a practical reason why someone would like to load up the NASA files and look at telescope data, but they do, right? In addition to looking at the stars, now we can look at these things that are incredibly small."
ET may be powerful, but it is out of reach for many researchers because electron microscopes are so expensive.
"The Jensen Lab has been running these microscopes 24/7 for 15 years, but nobody had access to most of that data," Ortega says. "So, one day Professor Jensen walked into my office and said, 'Hey! How can we distribute this data in an interesting way?' He wanted me to design a database that nobody would have to take care of but to which everybody can contribute."
Ortega's solution turns the existing scientific database model on its head.
Instead of storing data in a central, moderated repository like the databases currently maintained by governmental funding agencies, the ETDB is "distributed." This means that it is made up of many separate servers, or nodes, each hosting only a subset of the data. The authorship, contents, and other metadata describing ETDB images are securely tracked by the FLO blockchain, which acts as a kind of permanent record managed by groups of computers rather than a single server. Users can filter the database based on content, authorship, and other categories, curating their experiences individually, without the need for database moderators.
Currently, the only node in the ETDB is the Jensen Lab's server. But the database is designed to grow, and this summer Ortega will travel to Leiden, Netherlands, to establish a second node there and add new data to the ETDB. In the future, the ETDB could consist of thousands of nodes all around the world, each associated with a different research group.
Adding more nodes will allow data to be served up to users directly from whichever lab hosts that data—and from any other node hosting a copy of these data. This peer-to-peer architecture is both fast and inexpensive: everyone who contributes to the ETDB is responsible for hosting their own data, so the cost of adding more nodes is distributed among research groups rather than accumulating on a single bill. Labs already keep track of their own data, so the added cost is small, especially compared to that of a central database that would need to be specially built and maintained just to keep copies of data that are already stored elsewhere.
Intended to complement existing databases of published electron tomographs, the ETDB's capacity for accommodating unpublished data promises to move biological research forward. "A tomogram has thousands and thousands of pixels—actually a lot more than that—and we maybe use 10 percent of those pixels for our research. Everything else is untouched data," says Ortega. Those untouched pixels contain information that scientists without their own electron microscopes could use to make new discoveries.
Going forward, the ETDB team hopes to develop tools to help other researchers develop distributed blockchain-powered databases of their own.
"The ETDB was the first step. It was a proof of concept," says Ortega. "Nobody knew how many OIP records we could publish. Nobody knew how much it would cost. Nobody knew how well the FLO blockchain would actually work. But then suddenly we publish 11,000 tomograms in one day, and everything worked straight off the bat without high cost or problems. Now, the next step is to generalize the ETDB so that one day there can be complete accountability for all kinds of scientific data, for the states of instruments, for authorship—for everything."
The paper is titled "ETDB-Caltech: A blockchain-based distributed public database for electron tomography." In addition to Ortega and Jensen, co-authors are Caltech senior writer Catherine M. Oikonomou; H. Jane Ding of the Howard Hughes Medical Institute; contractor Prudence Rees-Lee; and a team at Alexandria, a start-up that uses blockchain to independently publish and distribute digital content. Funding was provided by the National Institutes of Health and the John Templeton Foundation as part of its Boundaries of Life Initiative.