December 9, 2020 PAP-Q4-20-CL-019
For scientists, the traditional approach to storing, analyzing, and computing with data is inefficient and limiting. Scientific data is fundamentally different than business data, and, as such, it is not typically possible to effectively manipulate multidimensional and diverse scientific data using basic tables, files, or data lakes. Valuable scientific data must be simultaneously cleaned and curated on an ongoing basis so that researchers can continually augment it while also sharing it, reusing it, and collaborating. Additionally, scientists want to focus on their research and not have to learn complicated computer science methods to access and compute with large data sets.
Paradigm4 is leveraging technology developed by Turing laureate and MIT Professor Mike Stonebraker for the life sciences vertical, addressing data manipulation challenges with a suite of end-to-end solutions. We synthesize knowledge at the cutting edge of computer science with an understanding of pharma and biotech research requirements. Experts in bioinformatics, biomechanical engineering, bioimaging, and statistical genetics collaborate with experts in machine learning and other emerging aspects of computer science.
One of our focus areas is single-cell omics analysis. Single-cell sequencing within tumors can help oncologists understand the distribution of mutations and their co-occurrence within individual cells, potentially guiding precision medicine.
Single-cell studies involve not only RNA sequencing, but metabolomics and proteomics, examining genetic changes to consider their consequences for individual cells, such as morphology or protein expression, and adding additional layers of understanding essential for effective target identification and drug development.
We are facilitating the process by increasing the rate in which scientists can ask and answer questions to test hypotheses through appropriate data organization and elastic computing.
The vision for Paradigm4 was to store diverse types of data, along with metadata, in a unified, science-ready repository. SciDB is our next-generation analytics platform, which enables scientific data modeling, storage, and large-scale computation. This all-in-one, enterprise-ready storage and elastic computing platform is a massively parallel, transaction-safe, array-oriented, analytics solution.
Data is organized into arrays that can easily be queried with scientific languages, such as R and Python. The old way of working — opening many files and bringing the data together into a matrix — is no longer necessary, because the data is ready for extraction, evaluation, and transformation. It is also easily parsable; specific data can be selected from the arrays without the need to open files. For companies that have tens of thousands of data sets, aggregation of that data in a usable format is tremendously empowering.
The elastic computing capability makes it possible for individual scientists to run their own algorithms at any scale without the help of an IT specialist. They do their normal work, and the software automatically expands the compute to match what they’re doing. Any researcher can access the power of hundreds of computers from a laptop.
Paradigm4’s suite of REVEAL apps enables bioinformaticians and scientists to access a large quantity and diversity of public and proprietary multi-omics, behavioral, clinical, health outcomes, and environmental data to accelerate integrative, multimodal, longitudinal, population-scale data science exploration and discovery.
The REVEAL™ apps, or sets of use cases, are layered over the array data and compute engines, making it possible for scientists to answer questions in their own vernacular.
For example, the REVEAL: Single Cell™ app enables users to build a multidimensional understanding of disease biology; scale to handle more samples from patients with more cells, more features, and broader coverage; and readily assess key biological hypotheses for target evaluation, disease progression, and precision medicine. Our goal is to allow researchers to be as productive as possible by giving them an interface that is easy to use, intuitive, and understandable.
Our other apps include the REVEAL: Biobank™ app, which brings together multiple data types, such as multi-omics data; practitioner, hospital, diagnostic codes and prescription history; and biometric and imaging data to support scientists in population-scale translational medicine and healthcare research. It leverages 70 terabytes of genomic data held within the UK Biobank, which contains the data for 500,000 people, including 100,000 participants’ imaging data. Our app organizes analytical data from HPLC, UV, LC, mass spec, solubility, and other methods, which can also be terabytes worth of data, and makes it available to machine learning programs to improve processes.
The multi-omics REVEAL™ applications span 15 different data types from variance in copy number through proteomics. We also have experience working with wearables data through a collaboration with Pfizer on the Blue Sky Project, which led to our wearables app.
SciDB, combined with REVEAL™ apps, enables scientists to think differently by removing computational restraints, accelerating their ability to formulate and test hypotheses. Instead of spending most of their time organizing and accessing data, they can now spend time answering the real questions, including many that could never have even been asked before. Analysis of more comprehensive data sets also provides more accurate and much less approximate results.
Because we are leveraging spot instances in the cloud, we can provide this amazing computational power at a very cost-effective price, enabling nascent companies with small budgets to develop break through science.
Precision medicine will affect the treatments themselves, as well as dosing
and the timing of administration. Underlying that will be a revolution in our understanding of physiology. We are just beginning to see cellular interactions at different scales.
New instruments will afford the ability to see and measure things that weren’t previously accessible, driving pharma to a new scientific plateau — but software tools and new methods are needed to be able to use the data that those instruments are generating.
To answer detailed questions about similarities between distal tissues, exosomal communications, the fragments of genetic material and proteins that play a role in knitting organisms together, and the role of microbiome metabolites, all of that data must be supremely organized in appropriate matrices so that it can be ready to analyze using machine learning and different types of algorithms. Having both new analytical technologies and data management capabilities like those offered by Paradigm4 will together revolutionize drug development.
Marilyn Matz is CEO and co-founder, along with Turing laureate Michael Stonebreaker, of Paradigm4. The scientific analytics solutions company enables scientists and data scientists to transform their research with an integrative analytics platform that powers massively scalable analytics and machine-learning. Prior to Paradigm4, after completing an MS degree at the MIT AI lab, Marilyn was one of three co-founders of Cognex Corporation, now a publicly traded, global industrial machine vision company. Marilyn was the recipient of the sixth annual Women Entrepreneurs in Science and Technology (WEST) Leadership Award; a co-recipient of the SEMI industry award for outstanding technical contributions to the semiconductor industry; and a 2020 NACD Directorship 100.