Final Chemistry Project Report 2024

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

MEDICAPS UNIVERSITY INDORE

Title-Prediction of Molecular Properties using


Neutral Network

Subject-Engineering Chemistry
Class- C11(B)
Session-2024

Team members:
1. Pragati Patil
2. Gopal Sharma
3. Advait Lole
4. Kavyansh Yadav
5. Krishna Yadav
6. Prachi Pandey
7. Rishi Malviya
8. Kanchi Jamindar
9. Chandrabhan Kushwah
10.Kushal Pathak
Prediction of Molecular Properties using Neural Networks

1. Introduction

Machine learning (ML), and particularly deep learning (DL), has become an essential
tool in modern chemistry, offering transformative capabilities for predicting molecular
properties and aiding in the design of new materials and molecules. In this project, we
explore the application of deep learning techniques to predict various chemical
properties, ranging from physicochemical to pharmacological properties of molecules.
The goal is to develop a robust framework capable of learning complex molecular
features and accurately predicting chemical properties.

Various molecular representations, such as graphs, strings, and feature vectors, have
been utilized in conjunction with neural networks (NNs), including graph convolutional
neural networks (GCNNs), transformers, and feedforward neural networks (FFNs).
These models enable the extraction of meaningful patterns from molecular structures
and predict properties related to chemical analysis, such as IR spectra, UV/vis
absorption, or quantum mechanical calculations.

The field has evolved from using handcrafted features or simple fingerprinting methods
to more sophisticated end-to-end trainable models that can automatically learn the most
relevant features from molecular data. This report discusses a deep learning framework
that uses graph-based neural networks for molecular property prediction.

2. Model Overview

The core architecture of the model includes four key modules:

1. Local Features Encoding


2. Message Passing Neural Network (MPNN)
3. Aggregation of Atomic Embeddings
4. Feedforward Neural Network (FFN) for Property Prediction

2.1 Module 1: Local Features Encoding

The first step in the model involves transforming the molecular input (usually
represented in SMILES format) into a graph representation using cheminformatics
tools like RDKit. This graph represents atoms as nodes and bonds as edges. Each
atom and bond is assigned a set of initial features based on chemical properties.
These features include:
● Atom Features: atomic number, number of bonds, formal charge, chirality,
hybridization, aromaticity, and atomic mass.
● Bond Features: bond type (single, double, etc.), conjugation, ring membership,
and stereochemical information (e.g., cis/trans bonds).

These features are used to construct initial representations of the nodes (atoms) and
edges (bonds) in the molecular graph.

2.2 Module 2: Message Passing Neural Network (MPNN)

In this module, the model employs a message-passing scheme to propagate


information across the molecular graph. The message passing neural network
(MPNN) iteratively updates the features of atoms (nodes) and bonds (edges) by passing
messages between connected nodes in the graph. The message passing process is
performed through multiple iterations, allowing the model to learn complex relationships
between atoms in the molecule.

At each step, the features of neighboring nodes are aggregated, and the message
passing operation updates the feature vectors for each atom and bond. The number of
message passing steps can be customized, and the process continues until the final
hidden representations are obtained.

2.3 Module 3: Aggregation of Atomic Embeddings

Once the message passing steps are completed, the model aggregates the learned
atomic representations into a single molecular embedding. This molecular embedding
represents the entire molecule as a vector that captures the relationships and properties
of all the atoms within the molecule. The aggregation can be performed using different
strategies:

● Summation: The embeddings of all atoms are summed to form the molecular
embedding.
● Average: The atomic embeddings are averaged.
● Weighted Sum: A weighted sum is used, with user-defined weights assigned to
each atom.

This molecular embedding serves as the input to the final property prediction step.

2.4 Module 4: Feedforward Neural Network (FFN) for Property Prediction

In this module, the molecular embedding is passed through a feedforward neural


network (FFN) to predict the target properties. The FFN consists of fully connected
layers, with an activation function applied between layers. The output of the FFN can be
used for different types of prediction tasks:

● Regression: For continuous property prediction (e.g., boiling point, solubility).


● Classification: For categorical prediction (e.g., predicting toxicity or
drug-likeness).

For binary classification tasks, a sigmoid function is used to constrain the output to the
range (0, 1). For multi-class classification, a softmax function is applied to ensure that
the output probabilities sum to 1 across all classes.

3. Model Training and Hyperparameter Tuning

The model is end-to-end trainable, meaning all parameters across the different
modules are updated simultaneously during the training process. Training involves the
use of gradient-based optimization, typically with the Adam optimizer, to minimize a
loss function that measures the difference between the predicted and true values of the
properties.

Several hyperparameters can be tuned during training:

● Number of message passing steps: The number of iterations for message


passing, which determines how far information can propagate across the graph.
● Hidden layer size: The number of neurons in the hidden layers of the MPNN
and FFN.
● Learning rate: The step size used by the optimizer during training.
● Batch size: The number of training samples used in each optimization step.
● Regularization: Techniques like dropout or early stopping are used to avoid
overfitting, particularly when training on smaller datasets.

Cross-validation and ensemble learning (combining multiple models) can be employed


to improve the model's performance and generalizability. Training typically involves
monitoring the learning curve to ensure convergence and adjust the number of epochs
as needed.

4. Applications and Performance

This deep learning model has shown promise in various applications, including:

● Physicochemical Property Prediction: Predicting properties like boiling point,


melting point, solubility, and molecular weight.
● Pharmacological Property Prediction: Assessing molecular toxicity,
drug-likeness, and bioactivity.
● Material Design: Designing molecules with specific properties, such as polymers
with desired mechanical strength or conductivity.
● Quantum Mechanical Property Prediction: Predicting molecular orbital
energies or reaction barriers for complex chemical systems.

Benchmark results have demonstrated that the model performs well on standard
datasets for both simple and complex prediction tasks. The model can effectively
capture both local and global molecular features and learn complex interactions
between atoms, making it a powerful tool for researchers in cheminformatics,
materials science, and drug discovery.

5. Benchmarking and Model Performance

5.1 Datasets and Evaluation Metrics

We evaluated the model on various benchmark datasets for predicting molecular


properties, including physicochemical properties (e.g., boiling point, solubility),
pharmacological properties (e.g., toxicity), and quantum mechanical properties (e.g.,
molecular orbital energies).

Performance was assessed using:

● RMSE for continuous predictions (boiling point, solubility).


● Accuracy, F1-Score, and AUC-ROC for classification tasks (toxicity).
5.2 Results

● Physicochemical Properties: The model achieved an RMSE of 0.45–0.65 for


physicochemical properties like boiling point and solubility.
Figure 1 shows the RMSE for different property tasks.
● Pharmacological Properties (Toxicity): For binary classification tasks like toxicity,
the model reached 87% accuracy, 0.85 F1-score, and 0.92 AUC-ROC,
demonstrating strong discriminative power.
Figure 2 shows the ROC curve for toxicity prediction.
● Quantum Mechanical Properties: For tasks like predicting molecular orbital
energies, the model achieved RMSE = 0.35 eV, showcasing effective handling of
complex properties.
Figure 3 shows predicted vs. actual molecular orbital energies.
Original molecular weight vs predicted molecular weight:

Original molecular weight vs predicted molecular weight:


Original molecular weight vs predicted molecular weight:

Original molecular weight vs predicted molecular weight:


5.3 Comparison with Classical Methods

The model outperformed traditional methods like kernel regression and random forests.
For toxicity prediction, the deep learning model achieved higher accuracy and
AUC-ROC scores than these methods.

Original properties vs predicted properties:

6. Conclusion

In this project, we have demonstrated how deep learning, specifically graph-based


neural networks, can be used for the prediction of molecular properties. The model
architecture, which includes modules for feature encoding, message passing,
aggregation, and property prediction, has proven effective in capturing the complex
relationships within molecular structures. By incorporating customization options and
regularization techniques, the model can be adapted to a variety of chemical prediction
tasks.

The model's ability to learn from raw molecular data and provide accurate predictions
makes it a valuable tool for accelerating research in chemistry and related fields.
Furthermore, the flexibility of the architecture allows it to be extended to new
applications, including molecular design, drug discovery, and materials engineering.

You might also like