Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (164)

Search Parameters:
Keywords = QSPR

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
12 pages, 4152 KiB  
Article
Exploring Molecular Heteroencoders with Latent Space Arithmetic: Atomic Descriptors and Molecular Operators
by Xinyue Gao, Natalia Baimacheva and Joao Aires-de-Sousa
Molecules 2024, 29(16), 3969; https://doi.org/10.3390/molecules29163969 - 22 Aug 2024
Viewed by 375
Abstract
A variational heteroencoder based on recurrent neural networks, trained with SMILES linear notations of molecular structures, was used to derive the following atomic descriptors: delta latent space vectors (DLSVs) obtained from the original SMILES of the whole molecule and the SMILES of the [...] Read more.
A variational heteroencoder based on recurrent neural networks, trained with SMILES linear notations of molecular structures, was used to derive the following atomic descriptors: delta latent space vectors (DLSVs) obtained from the original SMILES of the whole molecule and the SMILES of the same molecule with the target atom replaced. Different replacements were explored, namely, changing the atomic element, replacement with a character of the model vocabulary not used in the training set, or the removal of the target atom from the SMILES. Unsupervised mapping of the DLSV descriptors with t-distributed stochastic neighbor embedding (t-SNE) revealed a remarkable clustering according to the atomic element, hybridization, atomic type, and aromaticity. Atomic DLSV descriptors were used to train machine learning (ML) models to predict 19F NMR chemical shifts. An R2 of up to 0.89 and mean absolute errors of up to 5.5 ppm were obtained for an independent test set of 1046 molecules with random forests or a gradient-boosting regressor. Intermediate representations from a Transformer model yielded comparable results. Furthermore, DLSVs were applied as molecular operators in the latent space: the DLSV of a halogenation (H→F substitution) was summed to the LSVs of 4135 new molecules with no fluorine atom and decoded into SMILES, yielding 99% of valid SMILES, with 75% of the SMILES incorporating fluorine and 56% of the structures incorporating fluorine with no other structural change. Full article
(This article belongs to the Special Issue QSAR and QSPR: Recent Developments and Applications, 4th Edition)
Show Figures

Graphical abstract

15 pages, 2477 KiB  
Article
A Framework for Developing Tools to Predict PFAS Physical–Chemical Properties and Mass-Partitioning Parameters
by Mark L. Brusseau
Environments 2024, 11(8), 164; https://doi.org/10.3390/environments11080164 - 2 Aug 2024
Viewed by 633
Abstract
A framework for developing predictive models for PFAS physical–chemical properties and mass-partitioning parameters is presented. The framework is based on the objective of developing tools that are of sufficient simplicity to be used rapidly and routinely for initial site investigations and risk assessments. [...] Read more.
A framework for developing predictive models for PFAS physical–chemical properties and mass-partitioning parameters is presented. The framework is based on the objective of developing tools that are of sufficient simplicity to be used rapidly and routinely for initial site investigations and risk assessments. This is accomplished by the use of bespoke PFAS-specific QSPR models. The development of these models entails aggregation and curation of measured data sets for a target property or parameter, supplemented by estimates produced with quantum–chemical ab initio predictions. The application of bespoke QSPR models for PFAS is illustrated with several examples, including partitioning to different interfaces, uptake by several fish species, and partitioning to four different biological materials. Reasonable correlations to molar volume were observed for all systems. One notable observation is that the slopes of all of the regression functions are similar. This suggests that the partitioning processes in all of these systems are to some degree mediated by the same mechanism, namely hydrophobic interaction. Special factors and elements requiring consideration in the development of predictive models are discussed, including differences in bulk-phase versus interface partitioning processes. Full article
Show Figures

Figure 1

39 pages, 1308 KiB  
Review
Improving the Accuracy of Permeability Data to Gain Predictive Power: Assessing Sources of Variability in Assays Using Cell Monolayers
by Cristiana L. Pires and Maria João Moreno
Membranes 2024, 14(7), 157; https://doi.org/10.3390/membranes14070157 - 14 Jul 2024
Viewed by 907
Abstract
The ability to predict the rate of permeation of new compounds across biological membranes is of high importance for their success as drugs, as it determines their efficacy, pharmacokinetics, and safety profile. In vitro permeability assays using Caco-2 monolayers are commonly employed to [...] Read more.
The ability to predict the rate of permeation of new compounds across biological membranes is of high importance for their success as drugs, as it determines their efficacy, pharmacokinetics, and safety profile. In vitro permeability assays using Caco-2 monolayers are commonly employed to assess permeability across the intestinal epithelium, with an extensive number of apparent permeability coefficient (Papp) values available in the literature and a significant fraction collected in databases. The compilation of these Papp values for large datasets allows for the application of artificial intelligence tools for establishing quantitative structure–permeability relationships (QSPRs) to predict the permeability of new compounds from their structural properties. One of the main challenges that hinders the development of accurate predictions is the existence of multiple Papp values for the same compound, mostly caused by differences in the experimental protocols employed. This review addresses the magnitude of the variability within and between laboratories to interpret its impact on QSPR modelling, systematically and quantitatively assessing the most common sources of variability. This review emphasizes the importance of compiling consistent Papp data and suggests strategies that may be used to obtain such data, contributing to the establishment of robust QSPRs with enhanced predictive power. Full article
(This article belongs to the Section Biological Membrane Functions)
Show Figures

Figure 1

30 pages, 1610 KiB  
Review
A Review of Machine Learning and QSAR/QSPR Predictions for Complexes of Organic Molecules with Cyclodextrins
by Dariusz Boczar and Katarzyna Michalska
Molecules 2024, 29(13), 3159; https://doi.org/10.3390/molecules29133159 - 2 Jul 2024
Viewed by 810
Abstract
Cyclodextrins are macrocyclic rings composed of glucose residues. Due to their remarkable structural properties, they can form host–guest inclusion complexes, which is why they are frequently used in the pharmaceutical, cosmetic, and food industries, as well as in environmental and analytical chemistry. This [...] Read more.
Cyclodextrins are macrocyclic rings composed of glucose residues. Due to their remarkable structural properties, they can form host–guest inclusion complexes, which is why they are frequently used in the pharmaceutical, cosmetic, and food industries, as well as in environmental and analytical chemistry. This review presents the reports from 2011 to 2023 on the quantitative structure–activity/property relationship (QSAR/QSPR) approach, which is primarily employed to predict the thermodynamic stability of inclusion complexes. This article extensively discusses the significant developments related to the size of available experimental data, the available sets of descriptors, and the machine learning (ML) algorithms used, such as support vector machines, random forests, artificial neural networks, and gradient boosting. As QSAR/QPR analysis only requires molecular structures of guests and experimental values of stability constants, this approach may be particularly useful for predicting these values for complexes with randomly substituted cyclodextrins, as well as for estimating their dependence on pH. This work proposes solutions on how to effectively use this knowledge, which is especially important for researchers who will deal with this topic in the future. This review also presents other applications of ML in relation to CD complexes, including the prediction of physicochemical properties of CD complexes, the development of analytical methods based on complexation with CDs, and the optimisation of experimental conditions for the preparation of the complexes. Full article
Show Figures

Graphical abstract

11 pages, 1814 KiB  
Article
Quantitative Structure–Property Relationship Analysis in Molecular Graphs of Some Anticancer Drugs with Temperature Indices Approach
by Xiaolong Shi, Ruiqi Cai, Jaber Ramezani Tousi and Ali Asghar Talebi
Mathematics 2024, 12(13), 1953; https://doi.org/10.3390/math12131953 - 24 Jun 2024
Viewed by 472
Abstract
The most important application of anticancer drugs in various forms (alkylating agents, hormones agents, and antimetabolites) is the treatment of malignant diseases. Topological indices are widely used in the field of chemical and medical sciences, especially in studying the chemical, biological, clinical, and [...] Read more.
The most important application of anticancer drugs in various forms (alkylating agents, hormones agents, and antimetabolites) is the treatment of malignant diseases. Topological indices are widely used in the field of chemical and medical sciences, especially in studying the chemical, biological, clinical, and therapeutic aspects of drugs. In this article, the temperature indices in anticancer drugs molecular graphs such as Carmustine, Convolutamine F, Raloxifene, Tambjamine K, and Pterocellin B were calculated and then analyzed based on physical and chemical properties. The analysis was performed by identifying the best regression models based on temperature indices for six physical and chemical features of anticancer drugs. The results indicated that temperature indices were essential topological indices that predict the properties of anticancer drugs, such as boiling point, flash point, enthalpy, molar refractivity, molar volume, and polarizability. It was also observed that the r value of the regression model was more than 0.6, and the p value was less than 0.05. Full article
Show Figures

Figure 1

19 pages, 2312 KiB  
Article
Structure–Activity Relationship (SAR) Modeling of Mosquito Repellents: Deciphering the Importance of the 1-Octanol/Water Partition Coefficient on the Prediction Results
by James Devillers and Hugo Devillers
Appl. Sci. 2024, 14(13), 5366; https://doi.org/10.3390/app14135366 - 21 Jun 2024
Viewed by 465
Abstract
Repellents play a fundamental role in vector control and prevention to keep mosquitoes away from humans. Available in limited numbers, it is absolutely necessary to find new repellents for preventing problems of resistance. QSAR (Quantitative Structure–Activity Relationship) methods are particularly suited for designing [...] Read more.
Repellents play a fundamental role in vector control and prevention to keep mosquitoes away from humans. Available in limited numbers, it is absolutely necessary to find new repellents for preventing problems of resistance. QSAR (Quantitative Structure–Activity Relationship) methods are particularly suited for designing molecules with potential repellent activity. These models require that the molecules be described by physicochemical properties, topological indices, and/or structural indicators. In the former situation, QSPR (Quantitative Structure–Property Relationship) models are used for calculating physicochemical descriptors. Use of different QSPR models for the same property can lead to different values for the same molecule. In this context, the influence of the 1-octanol/water partition coefficient (log P) calculated according to two different methodologies was statistically evaluated in the modeling of 2171 molecules for which their skin repellent activity against Aedes aegypti was available. The two series of supervised artificial neural networks differed only by their input neuron coding for log P. Although both categories of classification models led to overall good statistics, we clearly showed that differences in log P values calculated for a molecule could result in very different prediction results. This was especially true for repellents. The practical implication of these differences was discussed. Full article
(This article belongs to the Section Chemical and Molecular Sciences)
Show Figures

Figure 1

17 pages, 3484 KiB  
Article
Prediction of Lubrication Performances of Vegetable Oils by Genetic Functional Approximation Algorithm
by Jianfang Liu, Yaoyun Zhang, Sicheng Yang, Chenglingzi Yi, Ting Liu, Rongrong Zhang, Dan Jia, Shuai Peng and Qing Yang
Lubricants 2024, 12(6), 226; https://doi.org/10.3390/lubricants12060226 - 18 Jun 2024
Viewed by 599
Abstract
Vegetable oils, which are considered potential lubricants, are composed of different types and proportions of fatty acids. Because of their diverse types and varying compositions, they exhibit different lubrication performances. The genetic function approximation algorithm was used to model the quantitative structure–property relationship [...] Read more.
Vegetable oils, which are considered potential lubricants, are composed of different types and proportions of fatty acids. Because of their diverse types and varying compositions, they exhibit different lubrication performances. The genetic function approximation algorithm was used to model the quantitative structure–property relationship between fatty acid structure and the wear scar diameter and friction coefficients measured by four-ball friction and wear tests. Based on the models with adjusted R2 greater than 0.9 and fatty acid compositions of vegetable oils, the wear scar diameter and friction coefficients of Xanthoceras sorbifolia bunge oil and Soybean oil as validation oil samples were predicted. The difference between the predicted and experimental values was small, indicating that the models could accurately predict the lubrication performances of vegetable oils. The lubrication performances of 14 kinds of vegetable oils were predicted by GFA-QSPR models, and the primary factors influencing their lubrication properties were studied by cluster analysis. The results show that the content of C18:1 has a positive effect on the lubrication performances of vegetable oils, while the content of C18:3 has a negative effect, and the length of the carbon chain of fatty acids significantly affects their lubrication properties. Full article
Show Figures

Figure 1

16 pages, 4079 KiB  
Article
Machine Learning Approach for the Estimation of Henry’s Law Constant Based on Molecular Descriptors
by Atta Ullah, Muhammad Shaheryar and Ho-Jin Lim
Atmosphere 2024, 15(6), 706; https://doi.org/10.3390/atmos15060706 - 13 Jun 2024
Viewed by 576
Abstract
In atmospheric chemistry, the Henry’s law constant (HLC) is crucial for understanding the distribution of organic compounds across gas, particle, and aqueous phases. Quantitative structure–property relationship (QSPR) models described in scientific research are generally tailored to specific groups or categories of substances and [...] Read more.
In atmospheric chemistry, the Henry’s law constant (HLC) is crucial for understanding the distribution of organic compounds across gas, particle, and aqueous phases. Quantitative structure–property relationship (QSPR) models described in scientific research are generally tailored to specific groups or categories of substances and are often developed using a limited set of experimental data. This study developed a machine learning model using an extensive dataset of experimental HLCs for approximately 1100 organic compounds. Molecular descriptors calculated using alvaDesc software (v 2.0) were used to train the models. A hybrid approach was adopted for feature selection, ensuring alignment with the domain knowledge. Based on the root mean squared error (RMSE) of the training and test data after cross-validation, Gradient Boosting (GB) was selected as a model for predicting HLC. The hyperparameters of the selected model were optimized using the automated hyperparameter optimization framework Optuna. The impact of features on the target variable was assessed using the SHapley Additive exPlanations (SHAP). The optimized model demonstrated strong performance across the training, evaluation, and test datasets, achieving coefficients of determination (R2) of 0.96, 0.78, and 0.74, respectively. The developed model was used to estimate the HLC of compounds associated with carbon capture and storage (CCS) emissions and secondary organic aerosols. Full article
(This article belongs to the Section Atmospheric Techniques, Instruments, and Modeling)
Show Figures

Figure 1

19 pages, 4472 KiB  
Article
A Machine Learning Approach for Predicting Caco-2 Cell Permeability in Natural Products from the Biodiversity in Peru
by Victor Acuña-Guzman, María E. Montoya-Alfaro, Luisa P. Negrón-Ballarte and Christian Solis-Calero
Pharmaceuticals 2024, 17(6), 750; https://doi.org/10.3390/ph17060750 - 7 Jun 2024
Viewed by 836
Abstract
Background: Peru is one of the most biodiverse countries in the world, which is reflected in its wealth of knowledge about medicinal plants. However, there is a lack of information regarding intestinal absorption and the permeability of natural products. The human colon adenocarcinoma [...] Read more.
Background: Peru is one of the most biodiverse countries in the world, which is reflected in its wealth of knowledge about medicinal plants. However, there is a lack of information regarding intestinal absorption and the permeability of natural products. The human colon adenocarcinoma cell line (Caco-2) is an in vitro assay used to measure apparent permeability. This study aims to develop a quantitative structure–property relationship (QSPR) model using machine learning algorithms to predict the apparent permeability of the Caco-2 cell in natural products from Peru. Methods: A dataset of 1817 compounds, including experimental log Papp values and molecular descriptors, was utilized. Six QSPR models were constructed: a multiple linear regression (MLR) model, a partial least squares regression (PLS) model, a support vector machine regression (SVM) model, a random forest (RF) model, a gradient boosting machine (GBM) model, and an SVM–RF–GBM model. Results: An evaluation of the testing set revealed that the MLR and PLS models exhibited an RMSE = 0.47 and R2 = 0.63. In contrast, the SVM, RF, and GBM models showcased an RMSE = 0.39–0.40 and R2 = 0.73–0.74. Notably, the SVM–RF–GBM model demonstrated superior performance, with an RMSE = 0.38 and R2 = 0.76. The model predicted log Papp values for 502 natural products falling within the applicability domain, with 68.9% (n = 346) showing high permeability, suggesting the potential for intestinal absorption. Additionally, we categorized the natural products into six metabolic pathways and assessed their drug-likeness. Conclusions: Our results provide insights into the potential intestinal absorption of natural products in Peru, thus facilitating drug development and pharmaceutical discovery efforts. Full article
(This article belongs to the Section Natural Products)
Show Figures

Figure 1

15 pages, 2573 KiB  
Article
Machine-Learning-Based Prediction of Plant Cuticle–Air Partition Coefficients for Organic Pollutants: Revealing Mechanisms from a Molecular Structure Perspective
by Tianyun Tao, Cuicui Tao and Tengyi Zhu
Molecules 2024, 29(6), 1381; https://doi.org/10.3390/molecules29061381 - 20 Mar 2024
Viewed by 921
Abstract
Accurately predicting plant cuticle–air partition coefficients (Kca) is essential for assessing the ecological risk of organic pollutants and elucidating their partitioning mechanisms. The current work collected 255 measured Kca values from 25 plant species and 106 compounds (dataset (I)) [...] Read more.
Accurately predicting plant cuticle–air partition coefficients (Kca) is essential for assessing the ecological risk of organic pollutants and elucidating their partitioning mechanisms. The current work collected 255 measured Kca values from 25 plant species and 106 compounds (dataset (I)) and averaged them to establish a dataset (dataset (II)) containing Kca values for 106 compounds. Machine-learning algorithms (multiple linear regression (MLR), multi-layer perceptron (MLP), k-nearest neighbors (KNN), and gradient-boosting decision tree (GBDT)) were applied to develop eight QSPR models for predicting Kca. The results showed that the developed models had a high goodness of fit, as well as good robustness and predictive performance. The GBDT-2 model (Radj2 = 0.925, QLOO2 = 0.756, QBOOT2 = 0.864, Rext2 = 0.837, Qext2 = 0.811, and CCC = 0.891) is recommended as the best model for predicting Kca due to its superior performance. Moreover, interpreting the GBDT-1 and GBDT-2 models based on the Shapley additive explanations (SHAP) method elucidated how molecular properties, such as molecular size, polarizability, and molecular complexity, affected the capacity of plant cuticles to adsorb organic pollutants in the air. The satisfactory performance of the developed models suggests that they have the potential for extensive applications in guiding the environmental fate of organic pollutants and promoting the progress of eco-friendly and sustainable chemical engineering. Full article
Show Figures

Graphical abstract

16 pages, 671 KiB  
Article
On the Entire Harmonic Index and Entire Harmonic Polynomial of Graphs
by Anwar Saleh and Samirah H. Alsulami
Symmetry 2024, 16(2), 208; https://doi.org/10.3390/sym16020208 - 9 Feb 2024
Viewed by 940
Abstract
A topological descriptor is a numerical parameter that describes a chemical structure using the related molecular graph. Topological descriptors have significance in mathematical chemistry, particularly for studying QSPR and QSAR. In addition, if a topological descriptor has a reciprocal link with a molecular [...] Read more.
A topological descriptor is a numerical parameter that describes a chemical structure using the related molecular graph. Topological descriptors have significance in mathematical chemistry, particularly for studying QSPR and QSAR. In addition, if a topological descriptor has a reciprocal link with a molecular attribute, it is referred to as a topological index. The use of topological indices can help to examine the physicochemical features of chemical compounds because they encode certain attributes of a molecule. The Randić index is a molecular structure descriptor that has several applications in chemistry and medicine. In this paper, we introduce a new version of the Randić index to the inclusion of the intermolecular forces between bonds with atoms, referred to as an entire Harmonic index (EHI), and we present the entire Harmonic polynomial (EHP) of a graph. Specific formulas have been obtained for certain graph classes, and graph operations have been obtained. Bounds and some important results have been found. Furthermore, we demonstrate that the correlation coefficients for the new index lie between 0.909 and 1. In the context of enthalpy of formation and π-electronic energy, the acquired values are significantly higher than those observed for the Harmonic index and the Randić index. Full article
Show Figures

Figure 1

23 pages, 11646 KiB  
Article
Multivariate Approaches in Quantitative Structure–Property Relationships Study for the Photostability Assessment of 1,4-Dihydropyridine Derivatives
by Martina Chieffallo, Michele De Luca, Fedora Grande, Maria Antonietta Occhiuzzi, Miyase Gözde Gündüz, Antonio Garofalo and Giuseppina Ioele
Pharmaceutics 2024, 16(2), 206; https://doi.org/10.3390/pharmaceutics16020206 - 31 Jan 2024
Cited by 1 | Viewed by 950
Abstract
1,4-dihydropyridines (1,4-DHPs) are widely recognized as highly effective L-type calcium channel blockers with significant therapeutic benefits in the treatment of cardiovascular disorders. 1,4-DHPs can also target T-type calcium channels, making them promising drug candidates for neurological conditions. When exposed to light, all 1,4-DHPs [...] Read more.
1,4-dihydropyridines (1,4-DHPs) are widely recognized as highly effective L-type calcium channel blockers with significant therapeutic benefits in the treatment of cardiovascular disorders. 1,4-DHPs can also target T-type calcium channels, making them promising drug candidates for neurological conditions. When exposed to light, all 1,4-DHPs tend to easily degrade, leading to an oxidation product derived from the aromatization of the dihydropyridine ring. Herein, the elaboration of a quantitative structure–property relationships (QSPR) model was carried out by correlating the light sensitivity of structurally different 1,4-DHPs with theoretical molecular descriptors. Photodegradation experiments were performed by exposing the drugs to a Xenon lamp following the ICH rules. The degradation was monitored by spectrophotometry, and experimental data were elaborated by Multivariate Curve Resolution (MCR) methodologies to assess the kinetic rates. The results were confirmed by the HPLC-DAD method. PaDEL-Descriptor software was used to calculate molecular descriptors and fingerprints related to the chemical structures. Seventeen of the 1875 molecular descriptors were selected and correlated to the photodegradation rate by means of the Ordinary Least Squares (OLS) algorithm. The chemometric model is useful to predict the photosensitivity of other 1,4-DHP derivatives with a very low relative error percentage of 5.03% and represents an effective tool to design new analogs characterized by higher photostability. Full article
(This article belongs to the Section Drug Targeting and Design)
Show Figures

Figure 1

46 pages, 18097 KiB  
Article
Favipiravir Analogues as Inhibitors of SARS-CoV-2 RNA-Dependent RNA Polymerase, Combined Quantum Chemical Modeling, Quantitative Structure–Property Relationship, and Molecular Docking Study
by Magdalena Latosińska and Jolanta Natalia Latosińska
Molecules 2024, 29(2), 441; https://doi.org/10.3390/molecules29020441 - 16 Jan 2024
Cited by 3 | Viewed by 1625
Abstract
Our study was motivated by the urgent need to develop or improve antivirals for effective therapy targeting RNA viruses. We hypothesized that analogues of favipiravir (FVP), an inhibitor of RNA-dependent RNA polymerase (RdRp), could provide more effective nucleic acid recognition and binding processes [...] Read more.
Our study was motivated by the urgent need to develop or improve antivirals for effective therapy targeting RNA viruses. We hypothesized that analogues of favipiravir (FVP), an inhibitor of RNA-dependent RNA polymerase (RdRp), could provide more effective nucleic acid recognition and binding processes while reducing side effects such as cardiotoxicity, hepatotoxicity, teratogenicity, and embryotoxicity. We proposed a set of FVP analogues together with their forms of triphosphate as new SARS-CoV-2 RdRp inhibitors. The main aim of our study was to investigate changes in the mechanism and binding capacity resulting from these modifications. Using three different approaches, QTAIM, QSPR, and MD, the differences in the reactivity, toxicity, binding efficiency, and ability to be incorporated by RdRp were assessed. Two new quantum chemical reactivity descriptors, the relative electro-donating and electro-accepting power, were defined and successfully applied. Moreover, a new quantitative method for comparing binding modes was developed based on mathematical metrics and an atypical radar plot. These methods provide deep insight into the set of desirable properties responsible for inhibiting RdRp, allowing ligands to be conveniently screened. The proposed modification of the FVP structure seems to improve its binding ability and enhance the productive mode of binding. In particular, two of the FVP analogues (the trifluoro- and cyano-) bind very strongly to the RNA template, RNA primer, cofactors, and RdRp, and thus may constitute a very good alternative to FVP. Full article
Show Figures

Graphical abstract

16 pages, 6291 KiB  
Article
Evaluation of Antioxidant Properties and Molecular Design of Lubricant Antioxidants Based on QSPR Model
by Jianfang Liu, Yaoyun Zhang, Chenglingzi Yi, Rongrong Zhang, Sicheng Yang, Ting Liu, Dan Jia, Qing Yang and Shuai Peng
Lubricants 2024, 12(1), 3; https://doi.org/10.3390/lubricants12010003 - 22 Dec 2023
Cited by 2 | Viewed by 1640
Abstract
Two quantitative structure–property relationship (QSPR) models of hindered phenolic antioxidants in lubricating oils were established to help guide the molecular structure design of antioxidants. Firstly, stepwise regression (SWR) was used to filter out essential molecular descriptors without autocorrelation, including electronic, topological, spatial, and [...] Read more.
Two quantitative structure–property relationship (QSPR) models of hindered phenolic antioxidants in lubricating oils were established to help guide the molecular structure design of antioxidants. Firstly, stepwise regression (SWR) was used to filter out essential molecular descriptors without autocorrelation, including electronic, topological, spatial, and structural descriptors, and multiple linear regression (MLR) was used to construct QSPR models based on the screened variables. The two models are statistically sound, with R2 values of 0.942 and 0.941, respectively. The models’ reliability was verified by the frontier molecular orbital energy gaps of the antioxidants. A hindered phenolic additive was designed based on the models. Its antioxidant property is calculated to be 20.9% and 11.0% higher than that of typical commercial antioxidants methyl 3-(3,5-di-tert-butyl-4-hydroxyphenyl) propionate and 2,2′-methylenebis(6-tert-butyl-4-methylphenol), respectively. The structure–property relationship of hindered phenolic antioxidants in lubricating oil obtained by computer-assisted analysis can not only predict the antioxidant properties of existing hindered phenolic additives but also provide theoretical basis and data support for the design or modification of lubricating oil additives with higher antioxidant properties. Full article
Show Figures

Figure 1

46 pages, 21402 KiB  
Article
On the Development of Descriptor-Based Machine Learning Models for Thermodynamic Properties: Part 2—Applicability Domain and Outliers
by Cindy Trinh, Silvia Lasala, Olivier Herbinet and Dimitrios Meimaroglou
Algorithms 2023, 16(12), 573; https://doi.org/10.3390/a16120573 - 18 Dec 2023
Viewed by 2088
Abstract
This article investigates the applicability domain (AD) of machine learning (ML) models trained on high-dimensional data, for the prediction of the ideal gas enthalpy of formation and entropy of molecules via descriptors. The AD is crucial as it describes the space of chemical [...] Read more.
This article investigates the applicability domain (AD) of machine learning (ML) models trained on high-dimensional data, for the prediction of the ideal gas enthalpy of formation and entropy of molecules via descriptors. The AD is crucial as it describes the space of chemical characteristics in which the model can make predictions with a given reliability. This work studies the AD definition of a ML model throughout its development procedure: during data preprocessing, model construction and model deployment. Three AD definition methods, commonly used for outlier detection in high-dimensional problems, are compared: isolation forest (iForest), random forest prediction confidence (RF confidence) and k-nearest neighbors in the 2D projection of descriptor space obtained via t-distributed stochastic neighbor embedding (tSNE2D/kNN). These methods compute an anomaly score that can be used instead of the distance metrics of classical low-dimension AD definition methods, the latter being generally unsuitable for high-dimensional problems. Typically, in low- (high-) dimensional problems, a molecule is considered to lie within the AD if its distance from the training domain (anomaly score) is below a given threshold. During data preprocessing, the three AD definition methods are used to identify outlier molecules and the effect of their removal is investigated. A more significant improvement of model performance is observed when outliers identified with RF confidence are removed (e.g., for a removal of 30% of outliers, the MAE (Mean Absolute Error) of the test dataset is divided by 2.5, 1.6 and 1.1 for RF confidence, iForest and tSNE2D/kNN, respectively). While these three methods identify X-outliers, the effect of other types of outliers, namely Model-outliers and y-outliers, is also investigated. In particular, the elimination of X-outliers followed by that of Model-outliers enables us to divide MAE and RMSE (Root Mean Square Error) by 2 and 3, respectively, while reducing overfitting. The elimination of y-outliers does not display a significant effect on the model performance. During model construction and deployment, the AD serves to verify the position of the test data and of different categories of molecules with respect to the training data and associate this position with their prediction accuracy. For the data that are found to be close to the training data, according to RF confidence, and display high prediction errors, tSNE 2D representations are deployed to identify the possible sources of these errors (e.g., representation of the chemical information in the training data). Full article
(This article belongs to the Special Issue Nature-Inspired Algorithms in Machine Learning (2nd Edition))
Show Figures

Graphical abstract

Back to TopTop