FAIR data are data that meet principles of findability, accessibility, interoperability, and reusability (FAIR). The acronym and principles were defined by a consortium of scientists and organizations back in 2016 to improve and expand scientific study through better data management. Since then, FAIR principles for scientific data have received strong support from global organizations like G7; national governments; science funding agencies; including the European Commission and National Institutes of Health; and pharmaceutical leaders, including Novartis, Pfizer, and GSK. In fact, government legislation requiring data accessibility has passed in both the United States (Open, Public, Electronic and Necessary Government Data Act) and Europe (EU Data Governance Act). But how pervasive has FAIR become in the various areas of life sciences, and how is it being leveraged today?
FAIR Data in Academic Research
Academia has long been a leader in enhancing innovation via collaboration, data sharing, and iterative innovation. Its critical role in establishing and evangelizing FAIR data-sharing principles is thus no surprise. Additionally, the importance of FAIR has become even more significant to academics as government agencies increasingly require data openness and accessibility for funding eligibility.
FAIR Data in Biology Research
Many researchers were early supporters of FAIR data management. For example, the Pistoia Alliance –– a not-for-profit collaboration of life sciences companies, pharmaceutical leaders, vendors, publishers, and academic groups –– publicized their support in a 2019 Drug Discovery Today feature article. They said that by thelife sciences adopting FAIR for R&D, “the plethora of new and powerful analytical tools such as artificial intelligence and machine learning will be able, automatically and at scale, to access the data from which they learn, and on which they thrive. FAIR is a fundamental enabler for digital transformation.”
The bioinformatics and crystallography data used in biology research are shared widely in open repositories. Researchers likely often encounter FAIR data when using genomic databases like the Protein Data Bank (PDB), Universal Protein Resource (UniProt), or GenBank. But beyond these standardized and open data types, many life science organizations are also outfitting their labs with research tools that support FAIR data principles from the earliest days of data collection through analysis and reporting, which is becoming increasingly common for grant funding requirements.
Researchers may have tools like an electronic lab notebook (ELN) that can help ensure proper collection and management of lab data, even when the data types and research workflows used evolve. Effective ELNs should let researchers push raw data collected in the lab data directly into analytics software without needing to waste time or risk error by manually preparing and transferring data.
For example, a researcher might want to pass assay data and all associated metadata for curve fit calculation and then tie the results back to the ELN record file. The results should become part of a federated master data source so they are easily searchable and re-usable in the future by colleagues with appropriate access permissions.
FAIR Data in Chemistry Research
Although chemistry research has not inherently reflected a FAIR culture, efforts to evolve have been ongoing. In 2019, the Chemistry Implementation Network (ChIN) published a manifesto calling for the industry to “Go FAIR.”
Other leading chemistry organizations, including the Research Data Alliance (CRDIG) and International Union of Pure and Applied Chemistry (IUPAC), have joined the cause, calling for the establishment of chemistry standards (e.g., naming conventions, structural representations, and characterization and reaction data), as well as the widespread adoption of R&D tools and infrastructure that aid in FAIR data collection, sharing, and analysis.
Industry-wide support is growing. For example, there has been a call to make it easier for researchers to share chemical structure information in journal submissions. Awards have been established to recognize the best chemistry FAIR data sets published each year. And companies like Dotmatics are creating solutions that make it easier for chemists to annotate, track, and manage data throughout their chemistry workflows.
While change will be gradual, most experts agree that the chemistry community needs to create a FAIR culture that is supported by standards and infrastructure development promoting machine readability of chemical data and other digital resources.
FAIR Data in Chemicals and Materials R&D
Calls to “Go FAIR” have also been increasing in the chemical and materials industry, which has traditionally focused on experimental exploration and computational modeling rather than any data-driven approach. In fact, a data-driven approach to chemicals and materials R&D has often been deemed too difficult to achieve because the complex workflows and data types used are thought to make process documentation and data exchange uniquely challenging.
That mindset is changing as companies work to create a united platform for chemicals and materials R&D. In an April 2022 Nature perspective, leading materials experts argue that a fundamental paradigm shift toward data-driven materials R&D is necessary for the industry to thrive. They propose that such change is essential to reaping value from a “gold mine” of available research data that has largely remained unleveraged, despite the potential it holds for use in advanced analytics and AI.
These experts support the adoption of FAIR data principles for materials R&D and explain that there is a great need for supportive data infrastructures and research tools, like ELNs or LIMS, that will help facilitate a shift toward data-driven materials R&D. While. in the coming years, the predicted changes brought about by a FAIR data infrastructure will not replace scientists, scientists who use such an infrastructure will very likely replace those who don’t.
Christian Olsen is a Business Segment Lead at Dotmatics. Previously, he was a Solutions Architect for the Geneious Biologics Antibody Discovery Platform. His background includes infectious disease and public health, bioinformatics, drug target selection, and drug resistance surveillance.