Computational Biology


Metabolite Atlas. The Metabolite Atlas project is a long standing effort to make untargeted metabolomics research easier, faster, and more accurate. In untargeted metabolomics, there is a great deal of re-inventing the wheel that is unnecessary and tedious, and a bookkeeping strategy, like an Atlas, is an ideal solution. In an untargeted study, metabolites are detected using a mass spectrometer and separated from each other using liquid-chromatography. This technique, LC-MS and LC-MS/MS, enables solvent extracts to be analyzed in such a way that the relative abundance of 100s of molecules can be measured. The pattern we detect for soft-ionization (e.g. electrospray ionization) is a series of peaks at a given mass to charge ratio. The peaks will have intensities spanning ~5 orders of magnitude. This large dynamic range is nice for quantifying large fold-changes and it’s also nice for detecting very low abundant and high abundant compounds during the same experiment. It also leads to the detection of tens of thousands of unwanted signals. These signals are due to a variety of factors, but generally are adducts (Na+, H+, K+, etc), isotopes (M0, M1, M2, etc), clusters (2M0+Na+, etc), and degradation products (M0 – COOH + H+). There are also artifacts associated with elevated baseline and overlapping peaks. On top of this, you have the fundamental nature of an untargeted experiment: You aren’t specifying what to detect a priori. Using the Metabolite Atlas framework, we are now overcoming much of the degeneracy of mass spectrometry data and zeroing in on what is important.

Bowen, B. P., & Northen, T. R. (2010). Dealing with the unknown: metabolomics and metabolite atlases. Journal of the American Society for Mass Spectrometry, 21(9), 1471–1476.
Yao, Y., Bowen, B. P., Baron, D., & Poznanski, D. (2015a). SciDB for High-Performance Array-Structured Science Data at NERSC. Computing in Science \& Engineering, 17(3), 44–52.
Yao, Y., Sun, T., Wang, T., Ruebel, O., Northen, T., & Bowen, B. P. (2015b). Analysis of Metabolomics Datasets with High-Performance Computing and Metabolite Atlases. Metabolites, 5(3), 431–442.


OpenMSI. OpenMSI is a cloud-based platform that allows users to view, analyze and manipulate mass spectrometry imaging (MSI) data directly in a web-browser. This means that users can access their data from anywhere in the world as long as they have an Internet connection. They can also easily share data and analyses with collaborators by simply sending them a URL. By leveraging the unique supercomputing and storage infrastructure at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC), this tool also reduces the time it takes to analyze MSI datasets from days to minutes. For more customized analysis capabilities, scientists can also directly interact with their experimental data using the REST application program interface (API). This function allows users with basic development skills to retrieve and work on their datasets programmatically. All of this is done via web browser. Once the analysis is done, the researchers can share their data and analyses with collaborators or journal editors by simply sending them a URL. Because OpenMSI’s highly optimized data standard is built on HDF5, these files are portable across platforms (Windows, Linux, Mac, etc.), self contained, highly efficient, and extensible. Building the file structure on HDF5 also means that Matlab, R, C/C++, Python, Fortran and many other languages natively support it. Thus, many researchers will be able to access and view these files using languages that they are already familiar with. Currently, OpenMSI primarily serves the mass spectrometry imaging community, but has been proven to support infrared-absorbance spectral images, neurological recording images, and can be adapted to support a wide range of multimodal data with information at multiple scales for each modality.

Yang, J., Rübel, O., Mahoney, M. W., & Bowen, B. P. (2015). Identifying important ions and positions in mass spectrometry imaging data using CUR matrix decompositions. Analytical Chemistry, 87(9), 4658–4666.
Fischer, C. R., Ruebel, O., & Bowen, B. P. (2016). An accessible, scalable ecosystem for enabling and sharing diverse mass spectrometry imaging analyses. Archives of Biochemistry and Biophysics, 589, 18–26.

midas screenshot

MIDAS & Pactolus. The Metabolite Atlas and OpenMSI projects at NERSC have begun to use high-performance computing to accelerate compound identification for mass spectrometry. For our approach with MIDAS and Pactolus, identification of unknowns is based on comparisons of measured spectra to theoretically possible fragmentation paths for known molecular structures. Defining these template, fragmentation path references is a major computational challenge, and to date, we have computed and stored complete fragmentation trees to a depth of five consecutive bond-disassociations for greater than 11,000 compounds. Metabolite Atlas and OpenMSI users are now searching their raw spectra against these trees and getting results in minutes. Without supercomputing, these tasks would take months or not be performed at all. In addition, the realtime queue enables users to make better use of their time – avoiding the highly variable wait times previously experienced on the normal queue.

Wang, Y., Kora, G., Bowen, B. P., & Pan, C. (2014). MIDAS: a database-searching algorithm for metabolite identification in metabolomics. Analytical Chemistry, 86(19), 9496–9503.