Simplifying Target Discovery with an Artificial Intelligence Engine and Natural Language Chat

Across the pharmaceutical industry, companies are finding new ways to leverage artificial intelligence (AI) to accelerate and revolutionize drug discovery and development, clinical trials, and other critical activities. Simultaneously, generative AI platforms, most notably ChatGPT, are becoming popular with the public and are being applied in an ever-growing number of contexts. In this Q&A, Petrina Kamya, Ph.D., Head of AI Platforms and President of Insilico Medicine Canada, discusses Insilico’s AI-based software platforms and how adding a natural language–based chat functionality to their PandaOmics target discovery engine enables researchers to probe large, heterogeneous data sets through conversations with the platform.

David Alvaro (DA): To begin, can you introduce us to Insilico Medicine in general and the company’s overall mission?

Petrina Kamya (PK): Insilico Medicine is an AI (artificial intelligence) drug discovery company. We have two discrete but overlapping business models: we develop generative AI–driven software, and we use that software internally to develop our own assets. We license both the software and the assets that we create.

Insilico was established in 2014, having emerged from Johns Hopkins University in Maryland. Since then, we have grown to be a global company with headquarters in Hong Kong and New York and offices in Abu Dhabi and Montreal, Canada. Our R&D center is in Shanghai, and we have a robotics lab in Suzhou, China. We are now located literally all over the world; I believe that we cover all time zones.

In terms of our mission, we are focused on pursuing diseases for which there is high unmet need and accelerating the discovery of new targets and new therapeutics to get them to patients faster, primarily by using AI.

DA: Was that dual business model planned from the beginning, or did it emerge along the way?

PK: Initially, we began developing deep learning algorithms to address many of the challenges associated with drug discovery and development. We put these algorithms together and built three specific platforms: PandaOmics, which focuses on target discovery; Chemistry42, which is involved in small molecule discovery and development optimization; and InClinico, the platform we launched most recently, which we developed for clinical trials planning and outcome predictions, to predict the probability that a program will transition from phase II to phase III.

That was the founding goal of the company. However, we quickly realized that we needed to validate the software platforms in order to show that they truly worked as we intended. To do so, we started to build out our own programs. Beyond successfully validating our platforms, we saw clear value in developing these programs to the point where they can be licensed out, since that becomes another revenue-generating business model.

DA: Across the areas that your platforms support, where are some of the most pressing bottlenecks, data burdens, or shortcomings that are best served through AI approaches?

PK: First and foremost, for target discovery and chemistry more broadly, there is a real need to overcome human bias in terms of identifying novel targets and candidate molecules that could become first-in-class therapeutics. In target discovery, AI will take multimodal data and tease out patterns that can help identify novel targets. Beyond the targets themselves, it can help us understand the pathways and the genes that are implicated in the disease, as well as other diseases that are linked to that disease. AI has a very strong ability to uncover patterns in multimodal data that are otherwise quite difficult for us to decipher on our own.

In chemistry, AI is very good at imagining things without the necessary human bias that we have. Many generative AI technologies have emerged that can be leveraged to discover novel chemical molecules. In addition, we are using other techniques to reinforce active learning to improve those molecules and to optimize them, so that they satisfy certain properties that are necessary for drugs.

In the realm of clinical trial outcomes predictions, the AI models underlying the InClinico platform allow us to again tease out features that affect the probability of the success of clinical programs during the critical transition from phase II to phase III that you would otherwise not be able to identify. Essentially, the core of all of these platforms is this ability to find patterns in multimodal data that are out of reach of human researchers.

DA: For the sake of clarity for those of us who may be a little behind on the AI field, can you explain what is meant by “generative AI?”

PK: In the simplest terms, you can think of “generative AI” as any use of AI to create something new based on data on which it was trained –– essentially, any time AI is asked to generate something. You might be familiar with some of the popular applications that generate text (like ChatGPT), voices, or images; in the same way, you can use AI to generate molecules. It uses neural nets, deep learning algorithms, and so forth, but the aim is to generate something new.

DA: What do you think differentiates PandaOmics from other computational target discovery methods that have been developed?

PK: We have quite a few strong differentiators. For example, we have our time machine approach and our iPanda algorithm. Both of those are used to identify the relationships between a gene and a disease in a manner that is unique to PandaOmics. In addition, we recently added a transformer-based knowledge graph, which is a feature that takes available information related to a disease and maps out all of the connections that are found in the literature. A user can then use this knowledge graph to better understand the relationships among genes, diseases, and medications that are used, the pathways that connect them all, and other diseases as well.

Most recently, we have connected that knowledge graph to a chat functionality based on large language models. I believe that we’re one of the only companies if not the only company that has done this with a target discovery engine. This chat functionality, which we call ChatPandaGPT, allows the user to query that knowledge graph and identify what these relationships are, based on exactly what the user would like to know. ChatPandaGPT makes the knowledge graph more accessible and more user friendly, and it makes the information more understandable as well.


DA: This really seems like an important development, since no matter how good AI is at accomplishing things beyond the reach of humans, the results ultimately need to be translated in a way that a human operator can understand. Before ChatPandaGPT was developed, what was the user interface and experience with PandaOmics like?

PK: Things were definitely somewhat disconnected. The knowledge map would be a beautiful image centered on the disease. There would be edges connecting different nodes, which would be a map of other little circles representing genes that are implicated, all of which would be connected. But you’d have to access this information in a piecemeal manner. You’d go: “Oh, this gene is interesting,” and you could click on that and find out more information. You’d be taken to a gene page and find out more information about that gene. And then, depending on what’s written on the edges that connect with the different nodes, the gene might either be upregulated or downregulated. So, the burden would to some extent be on the user to aggregate this piecemeal information and assemble it meaningfully at the end.

In contrast, with ChatPandaGPT, you can just type in as a prompt: “Show me the genes that are implicated in this disease and any other diseases and list the other diseases that are implicated.” And all the information that you are looking for will be listed out for you. Whatever sort of relationship you’re looking to find out more about, you can just type it into the prompt. And the ChatGPT functionality will talk to the knowledge graph, which is very specialized information, and then transform that into a form that is more informative to you and more comprehensive.

DA: In my limited experience with ChatGPT and things of that nature, I’m constantly discovering new features and benefits beyond what I was originally seeking. In the development of ChatPandaGPT, did you begin with certain goals in mind but found that unexpected benefits emerged along the way?

PK: In my personal experience using ChatPandaGPT, I was surprised at how useful it actually is. Initially, I thought that it would essentially ingest information and spit it back out in different ways that would be useful. But I found the interface to be so much more informative, and it     simplified     the whole process a lot more than I thought it would. In addition to that, we’re now looking into how we can incorporate this technology into our other platforms as well. We’ve found that it’s surprisingly useful, and we’ll see what additional uses and benefits evolve as we go.

DA: Will that process of applying the chat interface to your other platforms, like Chemistry42 and InClinico, be fairly straightforward?

PK: It should be relatively straightforward. It took our team about a week to do it for PandaOmics. That’s very fast, although I won’t say it was easy     , and they were able to integrate this very, very cool technology into the platform that has shown to be incredibly useful.

ChatPandaGPTQ&A (1)

DA: Does the chat functionality shift who the potential user can be and make it more widely accessible to a less specialized person?

PK: Absolutely. The work that goes behind creating the disease page and analyzing the data is not for everyone. But once the disease page has been analyzed and created, the ChatPandaGPT functionality definitely increases the usability and the accessibility of this information.

DA: This clearly represents a significant step forward in probing these multimodal data sets. Do you think that there is still a great deal more that can be unlocked here, perhaps as natural language processing itself continues to evolve?

PK: I definitely think so. There is a lot of information out there already, and biology is still not very well understood. We are always trying to investigate diseases in a much more in-depth way, and the etiology, pathology, and epidemiology are still very much unknown for a lot of diseases. The heterogeneity of disease adds yet another layer of complexity.

All of that is data, and everything can be processed and hopefully be used to train a large language model that will help us better understand the biology of diseases. Ultimately, I think we are just at the very beginning of all of this.

DA: I’d love to briefly touch on the second business model at Insilico Medicine. Can you tell us about the therapeutic areas where the company is focusing and to what extent you believe your platforms have enabled discoveries that may not have been possible using other approaches?

PK: We primarily focus on a few therapeutic areas: fibrosis, oncology, CNS diseases, and immunology. Our CEO is particularly passionate about aging, and so a lot of the diseases and targets that we pursue are implicated in aging, such as fibrosis, inflammation, and some of the key pathways associated with aging. There is great synergy in that many of the diseases that we’re investigating, whether they are chronic diseases or diseases that people are suffering from now, involve targets are also implicated in aging. What would be really cool is to see whether these drugs that we’re developing have that dual effect on patients: on the disease itself but also on people’s lives and their quality of life. I think there is the potential to unlock a lot of interesting outcomes from our pipeline.

DA: It seems like the study of aging aligns very well with what your platforms can achieve, since it is so inherently heterogeneous and has eluded more traditional, conventional approaches to target discovery.  

PK: That’s exactly right. Aging is not classified as a disease, but there are a lot of diseases that develop as the result of aging. In essence, if you’re investigating a disease, you’re looking into an underlying pathway that is probably linked to aging anyway, even though aging itself is not a disease. To date, as a pharma company, you still can’t really just say you’re targeting aging.

DA: You mentioned that you license both your software platforms and your pipeline assets. Particularly with regard to the software, how do those relationships typically work?

PK: We are very flexible and adaptable, and we work with different companies in different ways, depending on their needs. Every pharma company works in a unique way, and many are looking for a partner who can enable them to develop their own pipeline of therapeutics in their own way rather than take over their drug discovery programs. Those companies can license our software.      

Other companies are more interested in bolstering their internal pipelines with additional programs without taking time away from their focus with their internal resources, so they outsource the entire process. We can nominate an initial target and develop everything up to a stage where they are ready to in-license it as a partner.

DA: Since, as we’ve said, we are just at the very beginning of unlocking the potential of AI in drug discovery, clinical trials, and beyond, can you share a bit about Insilico’s vision of the full potential of AI and how you see it revolutionizing all of these areas in the coming years?

PK: Right now, I think that exactly what we set out to do is going to continue happening for some time. There are many, many stages of drug discovery and development, going all the way to commercialization. At the moment, we are just at the very beginning. At every single stage, there are definitely bottlenecks and challenges.

I think that what you’re going to see happening is that more and more of these challenges will be addressed using AI techniques. It’s just inevitable. In most processes there are certain things being done that are redundant, repetitive, or lacking in imagination –– not through anyone’s fault, but that’s just the way it is. For all those, you can adapt AI algorithms to help alleviate those bottlenecks, address challenges associated with insufficient imagination, and     improve and streamline the process. I believe      that’s what’s going to happen in our industry, piece by piece.

DA: In the near term, is Insilico Medicine more focused on further elaborating and tightening up these existing platforms or expanding and applying a similar approach to these different aspects of drug development?

PK: Both! We have thought leaders in the company that are really, really passionate about the products that we’ve created and further elaborating them. We also have innovators who are always thinking about the next thing and pushing the envelope. I anticipate many developments on both fronts.


Petrina Kamya, Ph.D.

Petrina Kamya, Ph.D., is the Head of AI Platforms and President of Insilico Medicine Canada, overseeing Insilico's end-to-end generative AI-driven drug discovery platform, Pharma.AI, which includes target discovery (PandaOmics), small molecule generation (Chemistry42), and clinical trial outcomes prediction (inClinico). Prior to joining Insilico, Dr. Kamya was at Chemical Computing Group, where she led sales and business development of molecular modeling software for pharma and biotech companies, and then Certara, where she consulted for pharma companies. She holds a BS in biochemistry and a Ph.D. in chemistry.