Embracing FAIR Data on the Path to AI-Readiness

The FAIR (findable, accessible, interoperable, reusable) guiding principles were first published in the journal Scientific Data in 2016 by diverse stakeholders from academia, biopharma, financing, and publishing. They establish data-management standards that aim to let both humans and computers alike easily find and (re)use data, no matter the volume, complexity, or source of the data.

Key principles of data management achieved via FAIR implementation can have positive effects on many aspects of R&D, from workflow efficiency to collaboration to costs. While machines offer incredible benefits in terms of the speed and scale at which they can process data, they lack the judgment, intuition, and contextual understanding that humans bring to the table. Making data FAIR is in essence making them more accessible to machines and humans alike.

FAIR data requirements help ensure that data assets include all supplemental details needed for machines to identify, qualify, and use data, even if they have never been encountered before. Metadata collected can vary and is generally informed by an organization's business rules. Oftentimes, organizations build protocols to require the collection of specific metadata, such as author, date, process and methodology details, statistical data, license requirements, and authorization and access specifications. As such, it’s important to choose a software solution that accommodates this process.

Process Efficiency and Collaboration Benefits of FAIR Data Management

Implementing the FAIR guiding principles for scientific data management can help improve research efficiency and collaboration, potentially helping bring solutions to market faster. For example, implementing FAIR can:

  • Reduce the need for time-consuming and error-prone manual data handling (e.g., manual collection, pre-processing, cleaning, mapping, transferring, etc.) by establishing less error-prone and more reproducible automated processes that help move data through the various software tools and applications used in the R&D process.
  • Lessen the chances of redundant research efforts by improving data sharing and insight into the work of colleagues, academics, CROs, and collaborators.
  • Help ensure data and research records persists even after staff turnover.

Innovation Benefits of FAIR Data (including AI and ML)

The ultimate goal of FAIR is to support innovation, not just better processes. FAIR can help spur innovation by making it easier for researchers to build on prior knowledge. Additionally, data that are “clean”, labeled, and machine-ready are best suited for advanced analytics like artificial intelligence (AI) and machine learning (ML).

The original FAIR publication clarified this point, saying, “Good data management is not a goal in itself, but rather is the key conduit leading to knowledge discovery and innovation, and to subsequent data and knowledge integration and reuse by the community after the data publication process.” 

The streamlined processes, better collaboration, and faster R&D iteration realized through FAIR data implementation can ultimately lead to cost savings. An analysis by PricewaterhouseCoopers on behalf of the European Commission estimated that a lack of FAIR research data costs the European economy at least €10.2 billion annually. Factors influencing the loss include time inefficiencies, storage and licensing costs, research duplication and retractions, and impeded innovation.

Many companies implementing FAIR principles also implement data-quality assessment metrics at the same time, a combination approach sometimes referred to as FAIR + Q. This unified strategy helps best position companies to meet rigorous regulatory review, where data provenance and quality are also of utmost importance, not just data management and machine-readiness.

In the pharmaceutical industry, the gold standard for data integrity is thought to be ALCOA+, which stands for Attributable, Legible and intelligible, Contemporaneous, Original, Accurate, + (complete, consistent, enduring, available).

Organizations looking to “go FAIR” can help prepare for success by educating themselves about barriers they may face, including:

User Adoption and Change Management Challenges

A shift from a “my data” to “our data” mindset is essential for FAIR to work. Researchers need to be willing to get training and adjust their workflows to accommodate FAIR data principle guidelines. Leadership must be willing to make the investment in change and show how the initial extra effort will be worth it in the long run. Choosing technology solutions that are intuitive and flexible will help increase user-level adoption.

Data Diversity and Volume

The variety and volume of both structured and unstructured data types used in R&D means companies must choose supportive technology that can accommodate diverse data types from the past, present, and future. In some cases, work must be done to retroactively make existing data sets FAIR. Therefore, infrastructure and software that can handle diverse data types are a must.

Ontology and Naming Issues

When industry-standard naming conventions don’t exist or are insufficient, the onus to establish consistency may fall upon research teams. Lack of consistency will mean lessened benefits from FAIR implementation as searching and analytics become muddied. Finding a solution provider with a flexible solution and expertise in ontologies is key.

Financial and Time Commitment

Upfront costs implementing FAIR may be considerable in both time and expense. From technology adoption to process adjustments to training, the change can seem daunting. But FAIR principles can be rolled out strategically and gradually, helping both demonstrate ROI and gain buy-in along the way. A solution provider with industry expertise can help identify specific cases where FAIR implementation will reap the most reward.

Christian Olsen

Christian Olsen is a Business Segment Lead at Dotmatics. Previously, he was a Solutions Architect for the Geneious Biologics Antibody Discovery Platform. His background includes infectious disease and public health, bioinformatics, drug target selection, and drug resistance surveillance.