Integrating Experimental Data and Artificial Intelligence to Accelerate Drug Discovery

Leading preclinical contract research organization Charles River Laboratories and human data–centric artificial intelligence (AI) technology provider Valo Health recently announced the launch of Logica^TM, a collaborative AI-powered drug discovery solution that leverages the expertise and experience of both partners to rapidly deliver optimized preclinical assets — at both the advanceable lead and candidate stages — to pharma clients. Charles River’s Executive Director of Business Development, Early Discovery Ronald Dorenbos, Ph.D. and Valo Health’s Vice President of Integrated Research Guido Lanza discussed Logica, the inefficiencies in drug development it seeks to overcome, the underlying business model, and what the future holds for the partnership, with Pharma’s Almanac Editor in Chief David Alvaro, Ph.D.

David Alvaro (DA): To start things off, can you tell me about the inception of the partnership between Charles River Laboratories and Valo Health and why both organizations felt that there was a potentially productive synergy between them?

Guido Lanza (GL): I don’t believe that I’ve ever been a part of a partnership where the vision and the framework for achieving it came together as quickly as what occurred between Valo and Charles River. I think that was possible because the idea had already been incubating for a very long time, essentially independently at each company. Charles River had a vision of undergoing a very deep digital transformation that would enable the company to combine, unite, and extract more value from the data that they generate across their operations, which would unlock some significant new opportunities.

There was a complimentary vision on the Valo side. While Valo is a relatively young company, the relevant digital platform for computational drug design was built in part through acquisition of another company called Numerate, of which I had been CEO. We felt that if we could figure out a way to partner with a data-generation powerhouse like Charles River, we could overcome some of those bottlenecks.

Ultimately, setting up our first meeting to discuss what our combined capabilities could offer was the trickiest part. After that, it was a smooth journey to establish the actual model, what the offering would look like, and the benefits to the customer.

Ronald Dorenbos (RD): Over the last 10–15 years, we’ve seen hundreds of millions of dollars poured into the industry, with lots of AI companies trying to perform drug discovery and development using AI alone, which has not been particularly successful. While AI unlocks all kinds of new possibilities, it’s clear that it’s not sufficient on its own for productive drug discovery. This collaboration was designed to advance to the next logical step: Valo brings extensive AI expertise from the chemistry perspective (because a lot of people there are also chemists themselves), which combines with the powerful data generation and experimental engine that Charles River has. Charles River provides an enormous arsenal of capabilities that are unmatched by any other company in the world. By combining the considerable traditional drug discovery capabilities of Charles River with Valo’s AI expertise and technology, we could create a very effective platform that could really advance drug discovery, which we have named Logica.

DA: Can you expand a bit about the conventional drug discovery and development process and where you see the most critical bottlenecks or inefficiencies that inspired the platform and why this combination of traditional discovery and AI is the most sensible way to overcome them?

GL: I’ve been working in the AI space for over 20 years. If you look at the history of the deployment of AI or machine learning (ML) and where it has had an impact, most has occurred within the traditional siloes of the pharma industry: image analytics as a screening platform, virtual molecule design, and so on. The data and the algorithms unlocked a lot of new possibilities, but they operated within the traditional chevrons. As a result, we saw a great opportunity to rethink the whole paradigm by removing the traditional chevrons and focusing on the real moments of kind of value generation.

I would argue that there are three key moments of value. The first is performing some magic biology — omics or the like — and finding a target. The next is a chemistry that allows you to test a hypothesis that is advanceable and patentable. And the third is the moment in which you have a candidate that is ready to enter IND-enabling studies and beyond on the road to the clinic. At every point in between, you can’t really be certain how close you are to that value. So, we wanted to take a step back to focus on defining those value-generation points and assessing where we are underutilizing data that could increase our chances of reaching those points.

AI essentially provides a means of cheating and looking into the future, or at least a good simulation of it. For example, if we have a good model for tox studies, we can simulate the results of those studies much earlier, which reduces or improves the odds of success downstream. If you break down siloes, you can use the data about future success and failure to inform decisions today. AI lets you melt away those chevrons and think about data as something more fluid that supports the reduction of uncertainty, which can allow you to apply totally unrelated data from a different project to guide your decisions.

I can’t imagine a greater data generation platform than Charles River, who supports more than 1,300 IND programs every year. We just needed to figure out how to unlock the value in those data for future programs to increase the chance of success, or at least to help programs to fail fast and early rather than later and at greater cost.

RD: We see Logica as version 3.0 of applying AI and drug discovery. Version 1.0 had a very narrow problem scope, a siloed approach, an inability to extend the analysis beyond the initial problems and no intentional, large-scale data generation. Version 2.0 added a limited amount of data generation, as well as expansion into broader problem categories and some wet lab access. With Logica, we are breaking down those siloes across early drug development, integrating wet lab work with the AI capability, and focusing on cycle numbers and data intentionality to, as Guido was saying, predict a likely future as early as possible.

DA: With Logica, were you looking to tackle all relevant pain points in drug discovery or begin with some low-hanging fruit and then build up to more complicated challenges?

RD: Before we start a project with a pharma or academic partner, we take a good look at the target, because targets come in all kinds of different varieties, from easier ones to approach, like kinases, down to RNA, epigenetic, and more exotic targets. Across more than 25 projects, we have had more than 90% success. Logica has processes and methods that enable it to work on various types of targets, of varying difficulty levels.

While the technology is essentially target agnostic, we always perform a feasibility study at the start of a project to determine which targets we feel comfortable with pursuing, because we don’t want to get involved with a project that our platform and our experience indicates has a very low chance of success. To that end, it helps that Charles River is such a large organization, with around 18,000 people, including many with 15–20 years of experience working at the major pharmaceutical companies, like Pfizer, Novartis, Merck, AstraZeneca, and GSK. That experience helps with the feasibility studies but also in navigating certain challenges and bottlenecks.

GL: Over time, the offering will get better and better: the output quality will improve, and the time will be reduced. At a high level, this all helps better align the CRO model with what the customer wants: they want the best product as fast as possible, we do better when we can make that happen, and doing so improves our platform so that the results are even better in the future.

In traditional drug discovery, you typically start off by running a screen, and then build your set within the universe of compounds that comes from the screen, analogs of those compounds, and so on. What we do is a little bit different; we see three parts to the process. The first is the generation of data to train the model. It’s great if you can understand the chemotypes and get to some starting point, but what you really want is information as the very first path. That whole universe is flattened in our mind because it’s a data-generation universe.

The second step is to unleash that on very large spaces of chemistry that are bespoke for your problem — if the first space is tens of billions of compounds, you want to go even larger on your second space, evaluating hundreds of billions or trillions of compounds specifically designed for your problem. Then, third, you want to pick the series that are most advanceable, because you’ve made millions of virtual analogs of those and simulated your future against models of all the things that can go wrong. Some series are of course intrinsically going to be better than others, so the ability to measure that a priori sets you up for success later. This provides a significant quality advantage because you’ve looked at so much more information about that series than people typically would.

RD: It’s critical to start with the highest quality and value of compounds. Obviously, clinical trials are coming further down the road, and a lot of molecules will eventually fail in trials, but if you can increase the chances of success even by only a tiny amount, that will have tremendous benefits. So, it’s not just a case of better molecules but of an increased success rate further down the road to help get these molecules to the market and the patient.

GL: We work with clients ranging from early seed companies all the way to big pharma, and they have very different drivers: the pipeline, the timing, or the cost. We offer a model that is very transparent and very straightforward: six to nine months for the first phase to get to the advanceable lead — which we call Logica-AL — and then another 12 to 18 months to get to the IND-enabling candidate that is ready to go into GLP tox and safety — which is Logica-C.

The whole trajectory of going from scratch to an IND-enabling molecule takes at least 36 months. Logica can get there within 18 months; if we run into some challenges or need to set up special assays, that may extend to 27 months, which is still significantly faster than the traditional method. And being able to reach one critical conclusion in six to nine months and the second in another 12 to 18 months is very attractive.

DA: As you mentioned before, the more data that you put into an algorithm like this, the more refined and accurate it becomes. To that end, are you able to leverage data from customer projects to feed back into the platform, or do you run your own internal experiments to generate data?

GL: There’s a continuum. Some customer data is pre-competitive, and some is not, so there are some kinds of data that customers are quite willing to share and others that they generally are not. In some cases, we have to generate our own data or import published data.

The questions “Can I use the data to learn from?” and “Can I see how my model did?” are very different. Both Charles River and Valo have a lot of experience handling confidential customer data and building the appropriate firewalls, which helps customers be confident that we will only use their data in approved ways. Of course, many customers see the value of more people sharing data and how that benefits their projects and are very happy to share what isn’t hypersensitive.

DA: Can you explain the business model underlying the Logica platform?

RD: We use a risk-sharing model where the cost is tied to success and the creation of value, which aligns incentives with the customer. Rather than charging on the basis of the number of experiments run or the hours needed, most of the payment is tied to those moments when the customer receives real value. We typically divide things into the two phases we discussed, but everything that is needed to reach the advanceable lead series is included in the milestone payment for that phase. Sometimes people ask us how many FTE hours they get for the price, but that’s not really a relevant question, because it’s as important for us as for the client to reach the milestone — that’s how we get paid and how we advance to the IND-enabling phase. If the client then wants us to pursue optimization, there is a continuation payment, which is typically higher than the payment for the first phase, because this second phase requires more lab work, chemistry, and animal experiments. Then, after we spend another 12–18 months to get to an IND-enabling candidate that is consistent with the target product profile and the specifications that were agreed on at the beginning of the project, there is another milestone payment. Finally, clinical milestone payments and royalties will come into play when the candidate moves through the clinical phases and goes to market. The client’s success is our success, and we keep everything very straightforward and transparent.

DA: What response have you seen from the market? Has it been relatively easy to convince potential customers of the value of this approach?

RD: There is a great hunger for a value-based offering in the small molecule discovery space. At the BIO International in June, I spoke with many people who were really excited about this new approach to drug discovery, and we are in further conversations with many of them. We are in discussions with big pharma companies, venture capital firms, small biotech companies, and seed companies from universities, and, across the board, people are enthusiastic and see Logica as a great model that could fit into their strategy.

I have not heard really anything negative, but we are relatively new and still need to build our track record and our history together to match the very strong individual track records of the two companies.

DA: Assuming that Logica leads to the optimal outcomes you envision and ends up in widespread use, how transformative do you think it could be for the industry as a whole and the ways that drug discovery is conducted?

GL: At the moment, there’s an interesting economic argument to be made for Logica as a small molecule generation engine for various types of things. If we can consistently fix the uncertainty in small molecule discovery, you’re re-empowering the people that are doing the earliest work — if we can level the playing field on chemistry, then biology will become the dominant piece.

Those who can best define human disease and translate those definitions into preclinical models will be the winners of the future, not those who can have the biggest libraries. For example, Parkinson’s disease is currently defined by the FDA and the ICD9 code as a single disease, but in reality it’s probably 50–100 different diseases. Those that can better define targeted subpopulations and develop specific compounds against them will benefit. But before you begin, you can focus on your translational path and your translational journey and establish a patient ID for a compound even before running a screen. If we can define that journey in a frictionless way, we totally change the economics by dramatically increasing the plane of symmetry (POS) and quality of the compound, which empowers the people that are defining the disease as the value-generation hub.

That’s where I think AI is going to make the biggest impact after Logica, because that’s where you have complex, high-volume data reflecting all the omics on a per-patient basis. To me, that’s the really exciting development a little way down the line.

RD: Everything boils down to getting better medications to patients faster. What Logica can unlock is the ability to make the whole process more efficient and more economically attractive and to operate as a well-oiled machine, where you also can consider targets that you normally would not consider because of cost concerns. As Guido often puts it, Logica is “democratizing” the process of drug discovery and the AI capabilities for a much wider audience, and the whole world will benefit from that.

DA: Before we wrap up, is there anything you can share about what might come next for Logica or for the partnership more broadly? Is it possible to build on this success and tackle large molecules?

GL: We are looking at other modalities, although we can’t disclose anything right now. Beyond that, we want to advance the concept to predict ever-more complex phenomena. That intersects with the need to avoid or minimize failures and how they become costlier the later in discovery that they occur. We continue working on determining the best sources to inform our design decisions. Where we really want to push the envelope is in making sure that we’re not modeling intermediate steps that are poor proxies for the ultimate goal but instead finding the best proxies and focusing there.

RD: The expansion of our platform will be tied into what’s happening in the rest of the field. Lots of groups are applying AI and machine learning to particularly complex biological systems: neuroscience, gastrointestinal disease, oncology. These technologies can read all relevant published manuscripts and become much more effective at natural language processing, which will also lead to new insights and new targets, which can be combined with our efforts to use these targets to develop new drugs. I think we will probably see that happening more in the future.

Another potential impact suggested by the insights we’ve gained from the AI models that interact closely with the wet lab work at Charles River is that Logica can also help reduce the wet lab work that needs to happen. We can scale down the numbers of the animals that need to be used for these kinds of studies and some of the assays, even getting rid of some assays altogether, because the AI can predict the result without needing to perform even one experiment. I think that any of these adjacent areas where a lot of development is occurring will become very important to how Logica develops into the future.