Final Chemistry Project Report 2024
Final Chemistry Project Report 2024
Final Chemistry Project Report 2024
Subject-Engineering Chemistry
Class- C11(B)
Session-2024
Team members:
1. Pragati Patil
2. Gopal Sharma
3. Advait Lole
4. Kavyansh Yadav
5. Krishna Yadav
6. Prachi Pandey
7. Rishi Malviya
8. Kanchi Jamindar
9. Chandrabhan Kushwah
10.Kushal Pathak
Prediction of Molecular Properties using Neural Networks
1. Introduction
Machine learning (ML), and particularly deep learning (DL), has become an essential
tool in modern chemistry, offering transformative capabilities for predicting molecular
properties and aiding in the design of new materials and molecules. In this project, we
explore the application of deep learning techniques to predict various chemical
properties, ranging from physicochemical to pharmacological properties of molecules.
The goal is to develop a robust framework capable of learning complex molecular
features and accurately predicting chemical properties.
Various molecular representations, such as graphs, strings, and feature vectors, have
been utilized in conjunction with neural networks (NNs), including graph convolutional
neural networks (GCNNs), transformers, and feedforward neural networks (FFNs).
These models enable the extraction of meaningful patterns from molecular structures
and predict properties related to chemical analysis, such as IR spectra, UV/vis
absorption, or quantum mechanical calculations.
The field has evolved from using handcrafted features or simple fingerprinting methods
to more sophisticated end-to-end trainable models that can automatically learn the most
relevant features from molecular data. This report discusses a deep learning framework
that uses graph-based neural networks for molecular property prediction.
2. Model Overview
The first step in the model involves transforming the molecular input (usually
represented in SMILES format) into a graph representation using cheminformatics
tools like RDKit. This graph represents atoms as nodes and bonds as edges. Each
atom and bond is assigned a set of initial features based on chemical properties.
These features include:
● Atom Features: atomic number, number of bonds, formal charge, chirality,
hybridization, aromaticity, and atomic mass.
● Bond Features: bond type (single, double, etc.), conjugation, ring membership,
and stereochemical information (e.g., cis/trans bonds).
These features are used to construct initial representations of the nodes (atoms) and
edges (bonds) in the molecular graph.
At each step, the features of neighboring nodes are aggregated, and the message
passing operation updates the feature vectors for each atom and bond. The number of
message passing steps can be customized, and the process continues until the final
hidden representations are obtained.
Once the message passing steps are completed, the model aggregates the learned
atomic representations into a single molecular embedding. This molecular embedding
represents the entire molecule as a vector that captures the relationships and properties
of all the atoms within the molecule. The aggregation can be performed using different
strategies:
● Summation: The embeddings of all atoms are summed to form the molecular
embedding.
● Average: The atomic embeddings are averaged.
● Weighted Sum: A weighted sum is used, with user-defined weights assigned to
each atom.
This molecular embedding serves as the input to the final property prediction step.
For binary classification tasks, a sigmoid function is used to constrain the output to the
range (0, 1). For multi-class classification, a softmax function is applied to ensure that
the output probabilities sum to 1 across all classes.
The model is end-to-end trainable, meaning all parameters across the different
modules are updated simultaneously during the training process. Training involves the
use of gradient-based optimization, typically with the Adam optimizer, to minimize a
loss function that measures the difference between the predicted and true values of the
properties.
This deep learning model has shown promise in various applications, including:
Benchmark results have demonstrated that the model performs well on standard
datasets for both simple and complex prediction tasks. The model can effectively
capture both local and global molecular features and learn complex interactions
between atoms, making it a powerful tool for researchers in cheminformatics,
materials science, and drug discovery.
The model outperformed traditional methods like kernel regression and random forests.
For toxicity prediction, the deep learning model achieved higher accuracy and
AUC-ROC scores than these methods.
6. Conclusion
The model's ability to learn from raw molecular data and provide accurate predictions
makes it a valuable tool for accelerating research in chemistry and related fields.
Furthermore, the flexibility of the architecture allows it to be extended to new
applications, including molecular design, drug discovery, and materials engineering.