Researchers from Nanyang Technological University (NTU) are applying variational graph encoders as an effective generalist algorithm in computer-aided drug design (CADD).

Selecting ideal drug molecules from a chemical space of over 1063 molecules is extremely challenging, compounded by high failure rates in clinical research, which make the time and the financial costs of drug development daunting. Although CADD has significantly reduced these costs, there are still many issues to be resolved in practical research.

Currently, many models specialize in predicting a single property of a chemical compound, such as solubility. Existing CADD techniques focus on discovering and optimizing small molecule drugs, but predictions and optimizations of pharmacological and toxicological properties remain unsatisfactory.

Therefore, researchers from NTU have embarked on this work to find a single model capable of predicting multiple properties simultaneously. The model will be able to expedite the drug discovery process.

 

The Research

Molecules and compounds used in our daily drugs vary in shapes and sizes. Currently, there are an estimated 1063 unique compounds that have the potential to be used as pharmaceutical drugs. However, these molecules are discrete in nature and cannot be easily represented mathematically.

In this research, the researchers employed a variational graph encoder to convert individual molecules into a continuous numerical space. Consequently, each molecule can now be represented using 64 numbers.

Using these 64 numbers, they demonstrated the capability to predict various compound properties, including absorption, distribution, metabolism, excretion, and toxicity (ADMET) profiles.

This research specifically showcased that building smaller models based on these 64 numbers generated competent models that were comparable to or on par with state-of-the-art approaches, hence it was tagged as a “generalist” model. Additionally, the researchers discovered that the 64 numbers output from the model can be directly explored using other AI methods, enabling the effective specification of desired drug properties and automatic exploration for molecules that meet the criteria through the use of AI.

 

The Technology

Molecule clustering from the ZINC database: Approximately 690 million molecules sourced from the ZINC database were split on their tranches and used for downstream training of the variational graph encoder.

Latent space surrogate model: Training of a specific machine learning model to predict the binding affinity of small molecules to a specific protein.

HPC resources: The project was allocated 500,000 CPU and 500,000 GPU hours from NSCC Singapore. These resources facilitated large-scale virtual screening, computational tasks and scoring of the test targets.

 

The Impact

Increased Speed:  The surrogate models developed by the researchers using latent space information, not only excel in predicting properties like ADMET but also notably accelerate virtual screening in ligand-based drug design.

Increased Accuracy: Employing the surrogate support vector machine model alongside five scoring functions, the researchers predicted compound affinity for five target proteins. The results demonstrated comparable accuracy in ranking ability to specialized scoring functions, while reducing computational time by 1-2 orders of magnitude. This significantly minimizes the computational resources required to process the evaluations.

Prospects: The NTU team will be working towards applying this innovative methodology to conduct virtual screenings across extensive datasets within the molecule library.

To find out more about how NSCC’s HPC resources can help you, please contact [email protected].

NSCC NewsBytes March 2024

Other Case Studies

Gaining Deeper Insights into Mental Disorders through Brain Imaging and High-Performance Computing

Researchers from NUS are leveraging supercomputing to develop better strategies for prevention and treatment to mitigate the impact of mental illness. The human brain is a marvel...

Tackling antibiotic resistance with HPC

Tackling antibiotic resistance with HPC Antimicrobial resistance (AMR) is a global healthcare issue, which has resulted in 1.27 million deaths in 2019 and is expected to cause 10...

Using Digital Twin Technology to Optimise the Industrial 3D Printing Process

Researchers from the Institute of High-Performance Computing (IHPC) are utilizing supercomputers to create a digital twin that furnishes users with comprehensive information...