Real-World Data and Artificial Intelligence are Transforming Clinical Development

Among many advances in drug development and healthcare that were proceeding slowly only to be significantly boosted to overcome the novel challenges caused by the COVID-19 pandemic is the use of real-world data (RWD) and artificial intelligence (AI) solutions in clinical development. In this extended Q&A, Jeff Elton, Ph.D., Chief Executive Officer of ConcertAI, and Pharma’s Almanac Editor in Chief David Alvaro, Ph.D., discuss the promise and the rapid evolution of this disruptive technology, data sources, ethical considerations, and the role ConcertAI is taking as an accelerator of evidence generation.

David Alvaro (DA): How would you characterize historic attitudes toward the use of RWD in clinical trials, and how have things evolved over the last couple of years?

Jeff Elton (JE): The last couple of years have definitely been transformational for this space. If we looked way back at the earliest intersections between RWD and clinical trials, we would find some RWD that was used in consultative conversations with the U.S. FDA to illustrate the standard of care or to explain the study design in light of that standard and the lack of response in the population. In certain cases, for a particularly rare, devastating disease, the FDA had allowed for an external control arm that used retrospective data instead of a standard-of-care control, because it’s not ethical to put vulnerable patients on a standard of care that is known to have little if any benefit.

But, like so many things, the COVID-19 pandemic drove a convergence of multiple factors that led people to view all of this through a new lens. For one, analyses have shown that 60% or more of the open clinical trials at the time stopped recruiting new patients, and some patients in those trials actually elected to go back to the standard of care, because the trials became very difficult to run, and they still needed to get care. At the same time, organizations that were holding off started new studies.

This threat to clinical trials spurred the U.S. FDA into action, particularly the Oncology Center of Excellence (OCE), which emphasized the critical need to get these trials running again, given their mission to reduce cancer deaths and the recognition that such reductions have largely been attributable to new medical entities and protocols. Layering this public health issue on top of the pandemic itself, the determination was made that COVID therapeutics and vaccines and oncology took priority over other clinical studies, because they held the most lives in the balance. During this period, the FDA invited sponsors to redesign trials and protocols in whatever way they felt was most responsible and indicated the agency’s openness to alternative ways of reviewing those studies, in particular viewing RWD through a new lens. Until that point, RWD was mostly used by medical affairs and post-approval teams rather than clinical development organizations, but those organizations quickly became some of the largest users, building up data science teams to enhance their ability to glean insights from RWD.

Today, RWD has begun to profoundly inform study design, helping clinical researchers better understand the initial population, how to expand trials from a single targeted indication to others, how to select a setting for a trial, how small the footprint of a trial can be while still generating the critical data and the statistical power it needs, how best to select and articulate endpoints, and many other factors — all of these different ways to accelerate study outcomes and advance the medicine onto the next phase or toward approval.

Contemporaneously, we saw the development and implementation of new methodologies, tools, and solutions, including our focus on using AI and machine learning (ML) to help derive insights from these RWD to help inform trial design. Among other things, AI/ML tools can help predict the likelihood that a particular site will be able to access the appropriate patient and effectively run a study. Many of these tools were already in play to some extent, but the pandemic was really the catalyst that drove wider adoption and accelerated further development.

DA: The term RWD encompasses a wide range of data sources, from electronic medical records (EMRs) to wearables and other tools that capture data in real time. For the incorporation of RWD into clinical trials thus far and what seems likely in the near future, what are the most important sources of data, and can they be integrated?

JE: Going far back enough, the form of RWD that was used most often was probably medical claims data, if only because they’re accessible and machine readable, since they have to be processed so that people get paid and reimbursed for care. However, for many diseases, these data were pretty limited, lacking the depth of clinical information needed for a deeper understanding, but they did provide a landscape contour of what’s going on in healthcare.

Then, beginning maybe seven or eight years ago — which feels like an infinity given the pace of evolution over the last couple of years — EMR-derived data began to emerge from “chart reviews,” which involved pulling a chart or reading a screen and then filling out a form manually to create a little database. At first, however, this was very awkward and disorganized, but then interoperability standards began to be established, which made a huge difference in how these data can be organized and integrated.

Taking things another step further, molecular diagnostic data — including whole exomes and transcriptomes on different diseases — has become more accessible, as well as data from radiological images. Rather than reading notes of a physician’s interpretation, it is now possible to review the actual image data. Of course, these are really huge data sets, which created a new set of challenges.

In the last few years, particularly during the pandemic, decentralized trials became much more widespread, and these required the ability to acquire data in a remote setting, including using wearables and related technologies. This virtual data collection is supported by a cloud environment that allows the integration of data from multiple devices. Remote data capture creates some totally novel challenges, including determining how to authenticate whether the data is truly coming from a particular study subject, but a range of technologies, from geolocation to biometrics, has provided effective solutions. This accessibility and velocity of data and the ability to collect data contemporaneously or even retrospectively creates new opportunities for prospective design and new ways of configuring studies. Some of that was beginning to be explored 3–5 years back, but it didn’t really gain traction until the pandemic.

If you remember the old concept of disruptive technologies, they typically start off as inferior solutions in a well-established field, but they get better over time and exhibit a cost–performance rate on a very different curve from the legacy solution. That’s what’s happening now with a lot of these digital trial solutions — they started off okay but in need of optimization, but they got better really fast and are continuing to accelerate. At this point, most of the large pharma–biopharma players don’t want to return to those legacy approaches. They are working on maturing some of these new solutions that have robust cost performance and can be executed in a leaner footprint and a shorter period of time, which will have real benefits for patients and will move innovations forward rapidly.

DA: Clearly, getting the most out of these big data sets, like omics and image data, goes beyond human capabilities. Is this where technologies like AI, ML, and natural language processing (NLP) become critical?

JE: Historically, our classical and even neoclassical statistical methods have begun with developing a hypothesis, after which you determine the structure of the problem and collect data that is appropriate for that hypothesis. In contrast, AI and ML are designed to allow the data to determine features, patterns, and relationships — agnostic to the existing knowledge in the literature — and to do so with levels of speed, fidelity, and an ability to attribute causality, far beyond what has ever been possible in model-based approaches. However, none of this is really at odds with those legacy approaches: rather than replacing old models, it’s more appropriate to think of these tools as power expansions of the repertoire.

As an example, at ASCO this year, ConcertAI will present work that we have done exploring severe cardiac adverse events in a particular treated oncology population. The patients exhibiting these events present as one subgroup, but on closer examination, there are actually two discrete groups with similar incident levels, one associated with comorbidities and another with immunological factors. In the past, following classical analytic approaches, the two groups would have been treated the same, but now they can be selected for treatment and monitored in different ways.

We can use the same tools to design studies to improve representation, especially for diseases like prostate cancer or multiple myeloma that disproportionately affect black Americans. We can leverage these tools not only to assure that designs don’t unwittingly exclude those populations but to determine targeted numbers to provide statistical power for different ethnic, racial, and economic subpopulations, to ensure that the results are meaningful to them as well.

The utility of incorporating these tools into clinical development is very clear, and they will soon become integral to clinical development processes. I would estimate that, within the next three years, 75% or more of patients that participate in a clinical trial actually will have been matched to that trial through AI- and ML-based tools. This matching will involve reading records, examining features, exploring structured and unstructured confederated data, and returning a probabilized and selected view on why each patient may be a good match. In the end, these results will still likely be reviewed by a human before making a final decision, typically in consultation with the patient.

DA: Do you foresee roles for RWD and these AI and ML tools across the entire cycle of clinical trials: from recruitment and enrollment all the way through to post-launch?

JE: Potentially, although we won’t see the same movement across all these activities. Things like patient matching and upfront screening are usually performed as a specialized activity in the workflow of a healthcare provider, and although there are some things that need to be worked out, like obtaining patient consent to release data to a third party under HIPAA, I expect that to be an area of very high innovation with these tools. Once a trial is underway, there will be some touchpoints where AI can assist in processing data. In recent guidance documents, the FDA has expressed an expectation that the use of AI and machine learning will increase.

However, there remain some key traditional approaches that assure randomization and that the nature of the control trial and the way in which patients are selected do not introduce biases, and I think it’ll be some time before patient selection algorithms are entirely AI- and ML-based. But that will come, particularly with the more Bayesian-style trial designs where AI- and ML-based decision tools can support adaptive trial structures and make them more robust; for example, assessing whether a patient is responding and determining the next steps or things to monitor for both responding and non-responding patients. But I think that is all going to come more slowly.

Additionally, the data packages that come out of clinical trials will be able to feed into provider workflows to determine whether a patient is likely to respond to that medicine, essentially establishing digital AI biomarkers that identify patients who should be considered to be placed on that treatment. There have already been many advances in AI and ML that enable using radiological imaging data for both diagnosis and trial legibility — in many cases performing better than human experts or even panels of experts, although you’d still want an expert to review the results of the model and perform the final sign-off. Some of the biggest opportunities and the biggest challenges arise from integrating multimodal data, such as diagnostic imaging data and genomic data.

DA: Equity in clinical trials has long been an issue of interest at Pharma’s Almanac. Can you discuss the Engaging Research to Achieve Cancer Care Equity (ERACE) initiative, why this issue is important to ConcertAI, and how that intersects with the tools we have been discussing?

JE: We began this work by asking some foundational questions: Where do we get data from? How do we know that our base data is representative of the people that will ultimately be considered in clinical trials or access these medicines? Is it even possible to start any study without some kind of bias?

How we conceptualize these ideas is important. Equity is a principle that needs to be an integral and mindful aspect of everything you do rather than something you add on at some point. We foreground these principles in the design of every trial — Who is disproportionally negatively affected? Are diagnostic activities equitable across populations? Will data and evidence aid in treatment decisions? — and integrate them with our technologies, data sets, and everything else.

Another important consideration is that, although achieving equity will involve solving some technical or quantitative problems, trust is also a big piece. Historically, there has been a lack for trust for black Americans and clinical trial participation, based unfortunately on some very valid justifications. Finding investigators who are in the communities, are good stewards of their patients’ interests, and are able to establish a strong rapport is also critical to broadening access. We’re trying to be mindful of that very human side of trust by extending networks of investigators to help ensure that trials are both equitable in their design and accessible in practical terms, including establishing trust and selecting settings that are close to patients’ families and communities, where they stay engaged with the trial while maintaining a high comfort level without disruption to their lives.

ConcertAI is active in all these areas, and my colleagues internally have a great deal of passion about these issues. We feel that, if you interact with data and are associated with healthcare, there’s a moral compass that you have to be guided by. It’s a systemic problem, but we’ve found that, if you’re willing to take a leadership role, others who may not be willing to lead are willing to support and be active followers, and that’s how we will solve these problems.

DA: Going back to what you said about the life cycle of disruptive technologies — from the initial development of the tech through a cascade of acceptance by different stakeholders — where do you think we are in that process, and who else needs to buy in to realize the full potential?

JE: You can visualize this as a barbell, with biomedical innovators on one side and innovative healthcare providers on the other, both pushing for these innovations, but less activity in that middle section that needs to link the two to move things forward. That middle comprises a variety of players: contract research organizations, regulatory agencies, payer communities, and so on. But ultimately the entire system needs to align to move things forward.

Some of the key building blocks are already in place. You can now find the term “data science” in almost all large pharma and biopharma companies, but the notion of data science as a core capability of a life sciences company is pretty new. Some companies are even moving biostatistics into the data science team and viewing data science as the overarching lens, with AI and ML as a key data-centric approach that improves very rapidly over time. This is a very positive development: surrounding every biomedical innovation is a whole new suite of insights that have come through a range of new methodologies, which are getting validated as they’re going through these studies. The literature has really begun to reflect all the ways that AI and ML are aiding drug development and clinical research.

The FDA has built up a strong data science team with AI and ML expertise, and they are issuing guidance documents to suggest sensible approaches. The FDA really depends on biopharma, clinical trial sponsors, and providers to bring new applications to them to evaluate, but they now have the expertise to interact with them. At the same time, the provider community — a good example would be the Geisinger Health System — is increasingly creating high-level positions centered on bringing AI- and ML-based approaches into their clinical workflows. For many groups in radiology, oncology, and pathology, this evolution can’t happen fast enough, given the growing complexity of their work and the shrinking number of clinicians.

Another important development is that all of these stakeholders — innovators, regulators, providers — are pivoting to cloud infrastructure, which makes it easier to deploy and access data and introduce the newest tools and will accelerate innovation even further. Over the next 5–10 years, I think that technical innovation and modeling are going to converge to produce a quantum increase in what is possible.

DA: Is the foundational AI/ML tech in place already and is it more the case of curating and introducing data and refining models, or is there still work to be done on the core underlying technology?

JE: Let me foreground a really crisp answer upfront, and I’ll give you the amplification afterward: tools are only as good as the manner in which they’re deployed against the classes of problems you present to them. You may have heard the term “unsupervised learning,” which means eliminating constraints on how the AI/ML models are deployed and just letting them find all the relationships that they can over millions or billions of iterations. It’s very cool, but it often identifies relationships that have no meaning in disease or healthy organ system biology, so you need to root out lots of spurious findings.

As a result, for certain classes of problems, you add a semi-supervised component, which means that you restrict the process to an area — say, pathway biology — so that while the domain is still huge, the findings need to have a meaning in biology. These open, generic tools need to be refined using human expertise that is specifically relevant to the subject being analyzed. So, rather than end up with a single trillion-dollar AI entity, I see the field as ending up like a pointillist painting, with very defined classes of problems and a gradual effort to fill in the space around each class of problems with very precise, purposeful, and accurate ranges of solutions. That reflects how science and all human learning really works, with very discrete, purposeful learnings gradually coalescing into a unified framework — in this case a confederation of approaches.

DA: As a relatively young company in a young field that is evolving so quickly, has ConcertAI seen rapid shifts in your vision or mission or the scope of the problems that the company wants to solve?

JE: ConcertAI was formed in the latter part of 2017, and we continued to scale through 2018 and 2019, developing most of our data as a service (DaaS) at scale and beginning to introduce our software as a service (SaaS) and solutions that provided AI insights into different cancers. Then, of course, the pandemic hit in March 2020 and actually had a big impact on our progression that allowed us to scale a lot over the last two years. We always planned to work on clinical development to help advance new medical innovations more rapidly, but we found a rising need, with both the provider community and our biopharma partners asking us to really accelerate everything on our roadmap.

During that period, we started developing the concept of becoming an evidence-generation system and an accelerator of evidence generation. I mentioned earlier a visualization where you have biopharma innovators on one side and the provider community that receives those innovations on the other, and we see ourselves as the accelerator connecting the two sides. A part of our role is taking novel innovations and making sure that they get to the right patient to allow us to understand whether they can offer benefits, and then continuing to accelerate that process. Then, once those innovations are commercially available, we assess who else they can benefit, accelerating that evidence generation, and constantly increasing the confidence around it.

Just this moment as we are speaking, I was notified that we advanced our first patient on our first prospective digital trial solution; we’re now running prospective trials in addition to doing retrospective work. So, we’re not only taking technologies, building them into the workflow of healthcare providers, identifying patients, and matching them to trials for eligibility, but also building the file for running the actual study itself.

We are guided by a number of principles, including that data should only be entered once; that we should decrease the burden on the healthcare provider and the patients, and that clinical research should be no more complicated than receiving the standard of care itself — essentially making the patient and the provider indifferent as to whether they’re treating for a clinical trial or with approved, standard-of-care medicine. Ever mindful of that end-to-end ecosystem of components and infrastructure that is needed, we’ve sought strategic partnerships throughout. It is a cooperative world, and we’re driven by a commitment to collaborating and cooperating with anybody who can help to advance things and resolve a patient need. I believe that we have some of the highest-quality RWD in oncology, spanning research, molecular data, imaging, EMRs, and medical claims data.

We have what may be the most sophisticated solutions for designing and running clinical trials and ensuring that we can find those patients through clinical diagnostics activities. At the end of the day, we’re a data science organization, and our role is to assemble the data and technology and put them in front of the right experts to move the innovations and the whole field forward and to treat patients more precisely and more effectively. So, that’s where we are in a nutshell. It's been an amazing period of time.

During the pandemic, the need to be 100% digital was a huge catalyst. As we see with nonlinearities and “big bang” events that shock systems, only extremely rarely do sick systems end up returning to the original state. Pieces of this transformation had been in place for a long time (I think eSourcing in clinical development goes back 17 years), but it took a big shock to get organizations to abandon the old ways in favor of the new. But once that happens and they discover that the new ways are much more effective, everyone is able to embrace the disruptions and make them the new standard going forward.

Another important thing to remember in all of this is that, when a clinical development organization can get trials done twice as quickly, they don’t use that as a reason to cut their budget — they use it to advance more trials. All of this work is ultimately about opening up the aperture to allow more innovations to advance and to determine which are meaningful more rapidly and then deploy them more rapidly. It’s not just about speed and efficiency. The scale and the scope of the innovations that we can put through that system increase substantially. That’s an enormous opportunity and something everybody should rally around and get super excited about.