Scientists at TCS designed novel chemical compounds using AI that can inhibit the 3CL protease of SARS-CoV-2, which is responsible for viral replication. Ananth Krishnan, Chief Technology Office, TCS, discusses the research here
COVID-19 has put the whole world into lockdown, impacting our health, economy and our day to day lives. The pandemic can be traced to one virus, SARS-CoV-2. Around the world, scientists are coming together, putting aside rivalries to design therapeutics to target the virus, sharing intel, research and formulae online as part of a global effort.
In TCS’ Innovation Lab in Hyderabad, India, a team of TCS scientists have identified 31 molecular compounds that hold promise towards helping to find a cure for COVID-19. They are doing so through the use of artificial intelligence (AI) to focus on one of the well-studied protein targets for coronavirus, chymotrypsin-like (3CL) protease, which is responsible for the virus’ survival and replication in humans.
In this study, TCS utilised our in-house deep neural network-based generative and predictive models to design novel drug-like small molecules. These models have been validated on a multitude of drug design tasks to tailor compounds to a specific protein of interest. These pre-trained state-of-the-art models were used to generate small molecules capable of inhibiting the 3CL protease of SARS-CoV-2.
AI: The future of drug discovery
Typically, finding a new drug takes a decade or more with a very low success rate. However, advances in data curation and management have fuelled the emergence of an AI-driven revolution in drug discovery.
AI-based methods are emerging as promising tools to explore chemistry. AI models are capable of learning the feature representations based on existing drugs that can be used to explore chemical options in search of more drug-like molecules. This has provided a beacon of opportunity to the drug design community to overcome many challenges including the one we find ourselves in now. Most importantly, an AI-based approach can reduce the initial phase of the drug-discovery process from years to a few days, in part thanks to its ability to streamline through rapid data processing.
Recent studies have proven the efficiency of AI techniques in understanding the known chemical space and generating novel small molecules. These molecules have to satisfy several physicochemical properties to be able to be used as potential drug molecules. With the advent of AI-based methods, it is possible to design these small molecules with the desired drug-like properties.
Using AI to solve a health crisis
TCS created an AI model which was initially trained on a dataset of 1.6 million drug-like molecules from the ChEMBL database, a public database which maintains the most comprehensive collection of drug-like small molecules.
The fundamental strength of AI is that it can rapidly evaluate multiple scenarios with a multitude of parameters while problem-solving. The molecules were represented in Simplified Molecular Input Line Entry System (SMILES) format which enabled the model to learn the necessary features to design drug-like small molecules. But any AI model must first be trained to learn the grammar of the subject language – in this case, medicinal chemistry – before it can start suggesting possible scenarios that could build towards a solution.
The team used the basic SMILES dataset to train a generative model. The model calculates based on the chemical and synthetic feasibility of the drug-like molecules. This general model was further adapted to generate small molecules specific to a target of interest using transfer learning so as to focus solely on the 3CL protease of SARS-CoV-2. Further, tweaking of the model allowed the team to produce molecules with the optimised physiochemical properties.
The trained generative model was used for sampling 50,000 small molecules. After duplicates were removed, the molecules were streamlined based on their chemical properties, drug likeness, water content and other factors. This resulted in a set of 3,960 molecules. These molecules were further filtered based on their affinity towards the SARS-CoV-2 3CL protease. After virtual screening, a total of 1,333 small molecules were obtained which could act as potential inhibitors.
The team also noticed that the generative model could produce small molecules that are similar to HIV-protease inhibitors, but with better binding to the SARS-CoV-2 3CL protease. This narrowed down the dataset to only 31 molecular compounds. The complete set of promising small molecules can be found here so that anyone can test them against SARS-CoV-2 to help with their research.
Following the preprint research being made public, the TCS team is now working closely with India’s Council for Scientific and Industrial Research (CSIR) that has agreed to provide its labs for the synthesized testing of these 31 molecular compounds.
Much remains to be done before the process can move from drug design to drug discovery and finally, drug development at scale, but we are incredibly proud of the TCS team and its contribution to such important research.