F8 - 2021 - Fingerprint Machine Learning QSAR Prediction of Ionic Liquid Properties
F8 - 2021 - Fingerprint Machine Learning QSAR Prediction of Ionic Liquid Properties
a r t i c l e i n f o a b s t r a c t
Article history: Ionic liquids (ILs) have many applications in, for example, organic synthesis, batteries and drug delivery. In this
Received 21 August 2020 study, molecular fingerprint (MF) was used to represent ionic liquids (ILs) and was combined with machine
Received in revised form 6 November 2020 learning (ML) to develop quantitative structure-activity relationship (QSAR) models for predicting the refractive
Accepted 24 December 2020
index and viscosity of ILs. To demonstrate the effectiveness of this approach, four datasets with different sizes
Available online 28 December 2020
containing different numbers of ILs' refractive indexes and viscosity, which were previously used to develop
Keywords:
QSAR models by molecular descriptor (MD)-based method and group contribution method (GCM), were
Ionic liquid employed to develop QSAR models by MF-ML method. The results showed that the models developed by MF-
QSARs ML showed comparative predictive performance with the MD-based method and GCM for these four datasets,
Machine learning but MF-ML can more quickly obtain the representations of IL within milliseconds. Moreover, the MF-ML models
Refractive index were interpreted by the recently developed shapely additive explanation (SHAP) method. The results showed
Viscosity that the models made the predictions based on the reasonable understanding of how different features affect
the related properties of IL, thus building the trustworthiness of MF-ML models. This study offered a new ap-
proach with theoretical support to rapidly developing trustful QSAR models to predict the properties of ILs.
© 2021 Elsevier B.V. All rights reserved.
https://doi.org/10.1016/j.molliq.2020.115212
0167-7322/© 2021 Elsevier B.V. All rights reserved.
Y. Ding, M. Chen, C. Guo et al. Journal of Molecular Liquids 326 (2021) 115212
dataset. MD is often used as input in methods such as conductor-like viscosity data is a relatively small size [33], which contains 304 experi-
screening model for real solvents (COSMO-RS) [18,19], or artificial neu- mental data points that covered a range of temperatures
ral network (ANN) [20,21]. (258.15–433.15 K) and viscosities (3–2300 cP), and was kept at con-
Molecular fingerprint (MF) encodes the chemical structural features stant pressure (1.01 bar). Likewise, Zhao et al. used the MDs-based
of compounds into binary vectors containing only 0 s and 1 s [22], which method while Chen et al. used a GCM method to develop QSARs to pre-
are commonly used in tasks such as virtual screening [23], similarities dict the viscosity for different number of ILs [16,33].
searching [24], and clustering [25]. For the binary vectors, 0 means no The reason why we chose these four datasets was because (1) two
certain chemical structure (e.g., -OH) is present in the compound different popular approaches, i.e., MD-based method and GCM, were
while 1 means its presence. Different chemical structural features used to develop models, which can be used to compare with our MF-
(e.g., -OH and –Br) occupy different positions in the binary vectors. As based method; (2) large and small data volume are both involved in
compared with MD, MF is much easier to obtain and understand. For in- these four datasets, which can be used to test the applicability of our
stance, MFs of over thousands of compounds can be obtained within 1 s, MFs-based methods. A summary of these four data is presented in
which is impossible for obtaining their MDs in such a short time. Re- Table 1. More details about the dataset such as the cations and anions
cently, MFs have been combined with machine learning (ML) to suc- species and the number of data points in each type of IL can be found
cessfully develop the QSAR model to predict the ligand biological in their papers [11,16,17,33].
activity [23], toxicity [26,27], and the rate constants of OH radical to-
ward organic contaminants [28,29]. Given this, in this study, we trans- 2.2. Machine learning model development
ferred this method to develop QSAR models to predict the properties
of ILs. The SMILES of all the cations and anions that constitute ILs were
To demonstrate the efficiency of this method, QSAR models for first obtained by the ChemDraw and then converted to MFs by the
predicting two properties of ILs, i.e., refraction index and viscosity, RDKit package in Python. An MF of IL is thus composed of the MFs of
were developed by using MFs of ILs as inputs into a traditional ML algo- its corresponding cation and anion. An example is illustrated in Fig. 1,
rithm—extreme gradient boosting (XGBoost) [30]. XGBoost is a tree- in which the MFs of 1-Butyl-3-methylimidazolium cation and
based machine learning algorithm with a gradient boosting design. Gra- hexafluorophosphate anion are combined as MF of IL. Conditions such
dient boosting develops trees in a step-wise way, in which the latter as the temperature or/and pressure were combined with the MFs of IL
tree tries to minimize the error of the former tree. Details of XGBoost as the final inputs to XGBoost, as shown in Fig. 1.
can refer to Qi et al.'s paper [31]. The refractive index and viscosity are Two types of molecular fingerprints were used in this study: Morgan
two important properties of ILs, in which several related properties fingerprint and atom-pair fingerprint, both of which encode the chem-
can be estimated once the refractive index of the material is known ical structural information of ILs into binary vectors that are filled with
[17] while viscosity has a great influence on the transfer performance only 0 s and 1 s. Only 1 s represents certain structural features existing
of the IL containing system [16]. This study used four datasets of differ- in ILs and its position in the vector represents what the specific struc-
ent sizes and each of them were used to develop QSAR models by the tural feature is in ILs. The difference between Morgan fingerprint and
traditional MD-based method or GCM. We then built QSAR models on atom-pair fingerprint is the way the chemical structural features are
these four datasets with the MF-XGBoost method and compared their represented. Morgan fingerprint first localizes a center atom and then
predictive performance. Then, we used the recently developed shapely includes its neighbor atoms with a certain radius while atom-pair uses
additive explanation (SHAP) method [32] to interpret the developed substructures composed of two non‑hydrogen atoms and an inter-
models, i.e., showing how the features such as temperature or pressure atomic separation. The detailed explanations on how to produce Mor-
or atom groups affect the predictions, which is important for trusting gan fingerprint and atom-pair fingerprint are illustrated in Rogers
our model when “black box” ML methods are used. et al.'s and Carhart et al.'s paper [22,34].
Every dataset was split into the training dataset and test dataset, in
2. Methods and materials which the training dataset was used to train ML while the test dataset
was used to test the generalizability of the obtained model. The test
2.1. Datasets dataset was not used in the development process of the model, which
guaranteed that it had never been exposed to the model. We directly
Four individual datasets containing refractive indices and viscosities used the same training and test datasets as the studies of Venkatraman
of various ILs were employed in this study. Two datasets of refractive et al., Sattari et al., Zhao et al. and Chen et al. to make a fair comparison
index are compiled from studies of Venkatraman et al. (labeled as Re- with their results [11,16,17,33]. However, to control the overfitting
fractive index-1) and Sattari et al. (labeled as Refractive index-2) problem, we further split the training dataset into a sub-training dataset
[11,17], respectively, in which Venkatraman et al. used the MD-based and a validation dataset, i.e., cross-validation. Overfitting means the
method while Sattari et al. used the GCM method to develop QSAR model shows high predictive performance on the training dataset but
models to predict the refractive indexes for various ILs. The dataset of a poor one on the test dataset. In other words, ML only memorizes the
Venkatraman et al. contained a total of 3147 experimental data points data in the training dataset rather than correlating the underlying rela-
of 467 ILs' refractive indices at different temperatures [17]. The ILs are tionships. This sub-training dataset was used to train the model while
composed of 240 cations with major classes such as imidazolium, am- the validation dataset was used to choose the hyperparameters of
monium, pyrrolidinium, and pyridinium, and 86 anions that are domi- XGBoost to control the complexity of the model and control the
nated by carboxylates, halides, and sulfates. The dataset of Sattari et al. overfitting. The validation dataset was not involved in training the
was of a relatively small size, which contained 931 experimental data model. To fully use the training dataset, we did a 5 cross-validation on
points of 97 unique ILs constituted from 50 different cations and 33 an- the training dataset, in which the training dataset was split 5 times to
ions [11]. form 5 sub-training datasets and 5 validation datasets. The optimum
Two datasets of viscosity are compiled from Zhao et al. (labeled as hyperparameters were the ones that minimize the average prediction
Viscosity-1) and Chen et al. (labeled as Viscosity-2) [16,33], of which performance on these 5 validation datasets. After obtaining the opti-
all the experimental data points are collected from IL Thermo and mum hyperparameters, the XGBoost was retrained on the whole train-
IUPAC Database. In Zhao et al.'s dataset, a total of 1502 experimental vis- ing dataset to obtain the final model. This final model was then tested
cosity data points of 89 ILs are investigated [16]. The collected viscosity on the test dataset as the final evaluation of its predictive performance.
data points (8.28–142,000 cP) cover a wide range of pressures Hyperparameters are the parameters that are pre-set before the
(1–3000 bar) and temperatures (253.15–395.32 K). Chen et al.'s training process. The hyperparameters in this study included the
2
Y. Ding, M. Chen, C. Guo et al. Journal of Molecular Liquids 326 (2021) 115212
hyperparameters of both XGBoost and MF (i.e., radius and length), in working mechanism of SHAP is that the effect of a feature is calculated
which any values can be taken. It is thus impossible to enumerate by checking what the prediction would be if that feature is absent. How-
every value to obtain the optimum one. Given this, we used the power- ever, this may lose the interaction information between features be-
ful Bayesian optimization algorithm to optimize the hyperparameters, cause different features may have dependent relationships. To avoid
which can choose the next hyperparameter candidate based on the re- this, we should observe how the predictions change for each possible
sults obtained from the previous ones [35,36]. Hence, the possibility to subset of features with and without a certain feature and then combine
achieve the optimum values of the hyperparameters was maximized. these changes to form a unique contribution for each feature, which in-
Table 2 listed the parameter of MF obtained by the Bayesian optimiza- cludes the interaction effect between features. The SHAP method can
tion algorithm, i.e., the length and radius, the positions of cation, also show the trend of effect on the final prediction with the change of
anion, temperature, and pressure in the vectors for these four datasets. feature values. The above explanations enable us to evaluate if our
model makes predictions based on the knowledge of how different fea-
tures affect the predictions, even though this model is obtained by a
2.3. Model interpretation “black box” ML algorithm.
Table 1
Summary of the experimental data of refractive index and viscosity.
Table 2
The optimum parameters of MF obtained by Bayesian optimization algorithm and the positions of cation, anion, and conditions (i.e., temperature and pressure)
in the binary vector for all the four dataset.
3
Y. Ding, M. Chen, C. Guo et al. Journal of Molecular Liquids 326 (2021) 115212
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi where Vpred, Vexp, V exp and n is the predicted, experimental, and aver-
u 2
u n exp pred age of values and the number of data points.
t∑i¼1 V −V
RMSE ¼ ð1Þ
n
3. Results and discussion
2
∑i V pred −V exp 3.1. The comparison of the predictive performance
2
R ¼ 2 ð2Þ
∑i V exp −V exp
Table 3 lists the comparison of predictive performance on the test
dataset for these four datasets when different types of molecular finger-
print were used to develop QSAR models. Compared with the atom-pair
Table 3
The comparison of predictive performance on the test dataset between Morgan finger- fingerprint, using the Morgan fingerprint for the representation of ILs is
print-based and atom-pair fingerprint-based XGBoost QSAR models. better in terms of RMSE and R [2]. Except for the dataset of Refractive
index-1, QSAR models developed by Morgan fingerprint showed
Fingerprint Dataset RMSE R2
lower RMSE and higher R2 for the other three datasets (Table 3) than
Morgan Fingerprint Refractive index-1 0.017 0.782 the atom-pair fingerprint. Hence, Morgan fingerprint was chosen as
Refractive index-2 0.013 0.853
Viscosity-1 0.0091 0.97
the representation of ILs in the following study.
Viscosity-2 0.053 0.989 We then compared the MD-based method with the MF-XGBoost
Atom-pair Fingerprint Refractive index-1 0.016 0.836 method on the datasets of refractive index and viscosity. For the refrac-
Refractive index-2 0.022 0.568 tive index dataset (Fig. 2A), Venkatraman et al. used the MD-based
Viscosity-1 0.162 0.918
method that correlated the MDs of ILs with their experimental refrac-
Viscosity-2 0.065 0.984
tive indexes by several regression methods [17], in which the top-2
Fig. 2. The plots of experimental refractive index (A) and viscosity (B) versus their corresponding predictive values in the test dataset by different models. (CUBIST: the name of package,
Rule- And Instance-Based Regression Modeling; RF: random forest; SVM: supporting vector machine; MLR: multiple linear regression).
4
Y. Ding, M. Chen, C. Guo et al. Journal of Molecular Liquids 326 (2021) 115212
Fig. 3. The SHAP plot of four models obtained on these for the dataset, in which (A) and (C) were for refractive index-1 and refractive index-2 while (B) and (D) were for the viscosity-1 and
viscosity-2.
best ones were the CUBIST and RF, as shown in Fig. 2A. We thus chose dataset, Chen et al. only offered the predictive performance on the train-
these two models to compare with the MF-XGBoost model. The predic- ing dataset (R2 =0.988, RMSE = 0.055) [33]. We have listed predictive
tive performance, R2, and RMSE were calculated for the test dataset that performance on the training (R2 = 0.995, RMSE = 0.020) and test
has never been “seen” by the model. The MF-XGBoost model showed dataset (R2 = 0.989, RMSE = 0.053). It should be noted that the test
similar R2 (0.782) and RMSE (0.017) to CUBIST (R2=0.83, RMSE= dataset has never been exposed to the model before. All of these results
0.16) and RF (R2=0.797, RMSE=0.018), indicating that the showed that the MF-XGBoost showed comparative predictive perfor-
MF-XGBoost model has a comparative predictive performance to the mance with both the MD-based method and GCM, indicating the effec-
MD-based models on predicting the refractive indexes of ILs. For the vis- tiveness of the MF-based method as a reliable approach to developing
cosity dataset (Fig. 2B), however, the MF-XGBoost model showed a QSAR models with satisfactory predictive performance for predicting
higher predictive performance (R2 = 0.97, MSE= 0.0091) than that of the properties of ILs.
MLR (R2=0.8, MSE=0.187) and SVM (R2=0.93, MSE=0.025) which
were combined with the Sσ-profile of ILs in Zhao et al.'s study [16]. 3.2. Model interpretation by SHAP method
For the GCM method, Sattari et al. achieved R2 = 0.906, RMSE =
0.011 on the test dataset for predicting the refractive indexes [11]. The We then interpreted these models by the SHAP method because the
MF-XGBoost method showed a slightly lower but still satisfactory pre- “black box” XGBoost method was used to develop models. Fig. 3 showed
dictive performance (R2 = 0.853, RMSE = 0.013). For the viscosity the SHAP plot for these four models interpreted by the SHAP method.
5
Y. Ding, M. Chen, C. Guo et al. Journal of Molecular Liquids 326 (2021) 115212
Table 4
The top-5 features in the SHAP plots of Fig. 3 and their represented atom groups and effects on the predictions.
Y. Ding, M. Chen, C. Guo et al. Journal of Molecular Liquids 326 (2021) 115212
Taking Fig. 3A as an example to illustrate how to read the SHAP plot, the consistent with the experimental fact, we can conclude that our model
X-axis is the Shapely values and Y-axis is the feature names. For exam- is trustful.
ple, Feature 8132 in Fig. 3A represents the feature in the position of
8132, which is the temperature (Table 2). For every feature, the patterns 3.2.2. The effect of atom groups on the refractive index and viscosity
in the figure are composed of all the data points while the feature values Next, we interpreted how atom groups of IL affect the refractive
are represented by the color in which color gradually changes from blue index and viscosity. When the atom groups are absent, i.e., blue points
to red when the feature values gradually increase. For MF, red color rep- in patterns, the SHAP values are close to 0 for most of the features, indi-
resents 1 while blue color represents 0 because only 0 and 1 are cating that they contributed negligibly toward the predictions. This is
possible values for MF. For comparison, the values of temperature and reasonable because atom groups should contribute 0 if they are absent
pressure are the continuous values, corresponding to the continuous in ILs. Table 4 lists the top-5 atom groups that had the largest effect on
change of colors. The feature with positive or negative Shapley values the predictions in these four models. Here, the largest effect means
means it can increase or decrease the predictions. For example, feature the sum of SHAP values for certain atom groups in all the ILs is largest.
8312 represents the temperature, and with increasing temperature (the But, its SHAP value did not have to be the largest for a specific IL. For ex-
color is changed from blue to red), its Shapely value is gradually moved ample, Cl− had the largest decreasing effect on the viscosity for the
from the positive to the negative direction, indicating increasing tem- dataset of Viscosity-1, because the sum of SHAP values for Cl- in all
perature can decrease the prediction, i.e., refraction index. This is consis- the ILs was the largest. But for an individual specific IL, Cl- did not
tent with the fact that high temperature leads to a lower refraction have to be the largest SHAP value when compared with other atom
index [9]. In summary, the SHAP plot unveiled how the model made groups.
predictions on the target, which we can use to conclude if the model The F atom in anions of IL (Feature 7990) was found to decrease the
is trustful or not. refractive index if it is present in the ILs. This is consistent with the ex-
perimental fact that anions containing F atom often show low refractive
index [9]. Likewise, B- that is often combined with F atom to form BF4
3.2.1. The effect of temperature and pressure on the refractive index and also decreases the refractive index if it is present in anions of IL, which
viscosity is also consistent with the experimental fact [9]. For comparison, the
Based on the model interpretation, we can unveil what effects tem- model “thought” that the presence of aromatic carbons in anions and
perature and pressure have on the refractive index and viscosity the cations of IL (Features 3643 and 5700) can increase the refractive
model found. It should be noted that we did not train the model to iden- index, as indicated by their positive Shapely values. This is also consis-
tify these effects but only correlate MFs with the corresponding refrac- tent with the experimental facts that ILs containing aromatic groups
tive index and viscosity. The model “learned” such effects often show higher refractive indexes [9]. For the viscosity, the viscosities
automatically, based on which it makes predictions on the properties. of ILs generally increase with the length of the alkyl side chain, which is
For the refractive index dataset, i.e., Fig. 3A and C, the effect of tem- ascribed to the fact that increasing the length of the alkyl side chain will
perature (Feature 8132 for Fig. 3A and Feature 4110 for Fig. 3C) on the increase van der Waals interaction and thus the viscosity [38,39]. The
refractive index is the same, that is, the increasing temperature de- model also “thought” the presence of alkyl chain can (Feature 212) in-
creased the refractive index. This is consistent with the experimental crease the viscosity, which is consistent with the experimental observa-
fact [9] that the SHAP values gradually changed from negative values tions [37]. The F atom in the anions can decrease the viscosity indicated
to positive values when the temperature was gradually decreased by the negative Shapley value. This is also consistent with the experi-
(i.e., the color of feature 8132 in Fig. 3A and 4110 in Fig. 3C changed mental observation that the anions containing F atom often show low
from red to blue). For the viscosity dataset, i.e., Fig. 3B and Fig. 3D, the viscosity [37]. One interesting observation is that the model thought
effect of temperature on the viscosity (Feature 9265 for Fig. 3B and Fea- F-P atom group (Feature 4032) can increase the viscosity, although it
ture 7504 for Fig. 3D) is also the same, that is, increasing the tempera- contains the F atom, which is different from other anions containing
ture decreased the viscosity, which is also consistent with the the F atom. This is also consistent with the experimental fact [37]. The
experimental fact [37]. The relationship between temperature and re- above interpretation indicated that our models made the predictions
fractive index or viscosity was easily learned by the model after devel- based on the reasonable understanding of how the features affect the
oping QSAR models. We can also check the transition temperature properties of IL, which made our models trustful.
that starts to decrease the refractive index or viscosity by checking at
what temperature the SHAP value is close to 0. For example, 316 °F 4. Conclusion
was found as the transition temperature for Refractive index-1 because
the temperature below this value had a negative SHAP value while that This study demonstrated that MF-XGBoost is an effective way to de-
above this value had a positive one. Likewise, 313 °F was the transition velop QSAR models to predict the refractive index and viscosity of ILs,
temperature for Viscocity-1. which should achieve wide applications for predicting other properties
The pressure was involved only in the dataset of Viscocity-1 (Feature of ILs, such as CO2 adsorption, conductivity, and density. Compared with
9264). As shown in Fig. 3B, the model found that decreasing pressure the MD-based method and GCM, the MF-based method can more
(Color was changed from red to blue) is beneficial to decreasing the vis- quickly obtain the representations of ILs and easily combine with
cosity (SHAP values were changed from positive values to negative conditions, i.e., temperature and pressure. Although the “black-box”
ones), which was also consistent with the experimental findings [37]. ML algorithm (Here, it was XGBoost) was used, we interpreted our
We can also determine a threshold for pressure: if pressure is below models by the SHAP method and demonstrated that the model made
or above this threshold it will decrease or increase the viscosity, by the predictions based on the reasonable understanding of how the fea-
checking at what pressure the SHAP value is close to 0. This threshold tures affect the related properties of ILs. For example, the temperature
value is determined as 10,000 kPa, which means that any pressures effect on the refractive index and viscosity was correctly “learned” by
below 10,000 kPa decrease the viscosity. our model. This study offered a new way to more quickly develop trust-
These results indicated that the models correctly “learned” the rela- ful QSAR models to predict the properties of ILs.
tionship between temperature or/and pressure on the related proper-
ties of ILs. Such a relationship is the knowledge automatically learned Authorship statement
by the model after developing QSARs models, which is unveiled by the
SHAP method. When the model made the predictions, it used this All persons who meet authorship criteria are listed as authors, and
knowledge to make the predictions. Because this knowledge is all authors certify that they have participated sufficiently in the work
7
Y. Ding, M. Chen, C. Guo et al. Journal of Molecular Liquids 326 (2021) 115212
to take public responsibility for the content, including participation in [12] X. Wang, X. Lu, Q. Zhou, Y. Zhao, X. Li, S. Zhang, Database and new models based on a
group contribution method to predict the refractive index of ionic liquids, Phys.
the concept, design, analysis, writing, or revision of the manuscript. Fur- Chem. Chem. Phys. 19 (30) (2017) 19967–19974.
thermore, each author certifies that this material or similar material has [13] A. Varnek, N. Kireeva, I.V. Tetko, I.I. Baskin, V.P. Solov’ev, Exhaustive QSPR studies of
not been and will not be submitted to or published in any other publica- a large diverse set of ionic liquids: how accurately can we predict melting points? J.
Chem. Inf. Model. 47 (3) (2007) 1111–1122.
tion before its appearance in the Hong Kong Journal of Occupational [14] V. Venkatraman, B.K. Alsberg, Quantitative structure-property relationship model-
Therapy. ling of thermal decomposition temperatures of ionic liquids, J. Mol. Liq. 223
(2016) 60–67.
[15] V. Venkatraman, B.K. Alsberg, Predicting CO2 capture of ionic liquids using machine
Authorship contributions learning, Journal of CO2 Utilization 21 (2017) 162–168.
[16] Y. Zhao, Y. Huang, X. Zhang, S. Zhang, A quantitative prediction of the viscosity of
Conception and design of study: Yi Ding, Jingwen Wang. ionic liquids using S σ-profile molecular descriptors, Phys. Chem. Chem. Phys. 17
(5) (2015) 3761–3767.
acquisition of data: Yi Ding, Chao Guo, Peng Zhang. [17] V. Venkatraman, J.J. Raj, S. Evjen, K.C. Lethesh, A. Fiksdahl, In silico prediction and ex-
analysis and/or interpretation of data: Yi Ding, Chao Guo, Peng perimental verification of ionic liquid refractive indices, J. Mol. Liq. 264 (2018)
Zhang, Jingwen Wang. 563–570.
[18] A. Klamt, F. Eckert, COSMO-RS: a novel and efficient method for the a priori predic-
Drafting the manuscript: Yi Ding, Chao Guo, Peng Zhang, tion of thermophysical data of liquids, Fluid Phase Equilib. 172 (1) (2000) 43–72.
Jingwen Wang; [19] F. Eckert, A. Klamt, Fast solvent screening via quantum chemistry: COSMO-RS ap-
revising the manuscript critically for important intellectual content: proach, AICHE J. 48 (2) (2002) 369–385.
[20] P. Díaz-Rodríguez, J.C. Cancilla, N.V. Plechkova, G. Matute, K.R. Seddon, J.S. Torrecilla,
Yi Ding, Chao Guo, Peng Zhang, Jingwen Wang.
Estimation of the refractive indices of imidazolium-based ionic liquids using their
Approval of the version of the manuscript to be published (the polarisability values, Phys. Chem. Chem. Phys. 16 (1) (2014) 128–134.
names of all authors must be listed): Yi Ding, Minchun Chen, Chao [21] M. Lashkarbolooki, A.Z. Hezave, S. Ayatollahi, Artificial neural network as an applica-
ble tool to predict the binary heat capacity of mixtures containing ionic liquids, Fluid
Guo, Peng Zhang, Jingwen Wang.
Phase Equilib. 324 (2012) 102–107.
[22] D. Rogers, M. Hahn, Extended-connectivity fingerprints, J. Chem. Inf. Model. 50 (5)
(2010) 742–754.
[23] K.-Z. Myint, L. Wang, Q. Tong, X.-Q. Xie, Molecular fingerprint-based artificial neural
Declaration of Competing Interest networks QSAR for ligand biological activity predictions, Mol. Pharm. 9 (10) (2012)
2912–2923.
The authors declare that they have no known competing financial [24] G. Klopmand, Concepts and applications of molecular similarity, by Mark A. Johnson
and Gerald M. Maggiora, eds., John Wiley & Sons, New York, 1990, 393 pp. Price:
interests or personal relationships that could have appeared to influ- $65.00, J. Comput. Chem. 13 (4) (1992) 539–540.
ence the work reported in this paper. [25] M.J. McGregor, P.V. Pallai, Clustering of large databases of compounds:using the
MDL “keys” as structural descriptors, J. Chem. Inf. Comput. Sci. 37 (3) (1997)
443–448.
Acknowledgements [26] K. Mansouri, A. Abdelaziz, A. Rybacka, A. Roncaglioni, A. Tropsha, A. Varnek, A.
Zakharov, A. Worth, A.M. Richard, C.M. Grulke, D. Trisciuzzi, D. Fourches, D.
All persons who have made substantial contributions to the work re- Horvath, E. Benfenati, E. Muratov, E. Wedebye, F. Grisoni, G.F. Mangiatordi, G.M.
Incisivo, H. Hong, H.W. Ng, I.V. Tetko, I. Balabin, J. Kancherla, J. Shen, J. Burton, M.
ported in the manuscript (e.g., technical help, writing and editing assis-
Nicklaus, M. Cassotti, N.G. Nikolov, O. Nicolotti, P.L. Andersson, Q. Zang, R. Politi,
tance, general support), but who do not meet the criteria for authorship, R.D. Beger, R. Todeschini, R. Huang, S. Farag, S.A. Rosenberg, S. Slavov, X. Hu, R.S.
are named in the Acknowledgements and have given us their written Judson, CERAPP: Collaborative Estrogen Receptor Activity Prediction Project, Envi-
permission to be named. If we have not included an Acknowledge- ron. Health Perspect. 124 (7) (2016) 1023–1033.
[27] Y. Wu, G. Wang, Machine Learning Based Toxicity Prediction: From Chemical Struc-
ments, then that indicates that we have not received substantial contri- tural Description to Transcriptome Analysis, Int. J. Mol. Sci. 19 (8) (2018).
butions from non-authors. [28] S. Zhong, J. Hu, X. Fan, X. Yu, H. Zhang, A deep neural network combined with mo-
lecular fingerprints (DNN-MF) to develop predictive models for hydroxyl radical
rate constants of water contaminants, J. Hazard. Mater. 383 (2020) 121141.
References [29] S. Zhong, K. Zhang, D. Wang, H. Zhang, Shedding light on “black box” machine learn-
ing models for predicting the reactivity of HO• radicals toward organic compounds,
[1] T. Welton, Room-temperature ionic liquids. Solvents for synthesis and catalysis, Chem. Eng. J. 126627 (2020).
Chem. Rev. 99 (8) (1999) 2071–2084. [30] T. Chen, C. Guestrin, XGBoost: A Scalable Tree Boosting System. XGBoost: A Scalable
[2] M. Freemantle, An Introduction to Ionic Liquids, Royal Society of chemistry 2010. Tree Boosting System, 2016 785–794.
[3] L.C. Branco, J.G. Crespo, C.A. Afonso, Studies on the selective transport of organic [31] T. Chen, C. Guestrin, XGBoost: A Scalable Tree Boosting System, arXiv (2016)
compounds by using ionic liquids as novel supported liquid membranes, Chem. 785–794.
Eur. J. 8 (17) (2002) 3865–3871. [32] S.M. Lundberg, S.-I. Lee, In A unified approach to interpreting model predictions, Adv.
[4] M. Galiński, A. Lewandowski, I. Stępniak, Ionic liquids as electrolytes. Electrochim, Neural Inf. Proces. Syst. 2017 (2017) 4765–4774.
Acta 51 (26) (2006) 5567–5580. [33] B.-K. Chen, M.-J. Liang, T.-Y. Wu, H.P. Wang, A high correlate and simplified QSPR for
[5] D. Zhao, M. Wu, Y. Kou, E. Min, Ionic liquids: applications in catalysis. Catal, Today 74 viscosity of imidazolium-based ionic liquids, Fluid Phase Equilib. 350 (2013) 37–42.
(1–2) (2002) 157–189. [34] R.E. Carhart, D.H. Smith, R. Venkataraghavan, Atom pairs as molecular features in
[6] I. Marrucho, L. Branco, L. Rebelo, Ionic liquids in pharmaceutical applications, Annual structure-activity studies: definition and applications, J. Chem. Inf. Comput. Sci. 25
review of chemical and biomolecular engineering 5 (2014) 527–546. (2) (1985) 64–73.
[35] J. Snoek, H. Larochelle, in neural information, A.-R. P., Practical bayesian optimiza-
[7] M. Hasib-ur-Rahman, M. Siaj, F. Larachi, Ionic liquids for CO2 capture—development
tion of machine learning algorithms. Advances in neural information Processing Sys-
and progress, Chem. Eng. Process. Process Intensif. 49 (4) (2010) 313–322.
tems 25 (NIPS 2012), 2012.
[8] D.S. Firaha, O. Hollóczki, B. Kirchner, Computer-aided design of ionic liquids as CO2 [36] I. Dewancker, M. McCourt, S. Clark, Bayesian Optimization for Machine Learning : A
absorbents. Angew. Chem. Int, Ed. 54 (27) (2015) 7805–7809. Practical Guidebook, arXiv:1612.04858 2016.
[9] S. Seki, S. Tsuzuki, K. Hayamizu, Y. Umebayashi, N. Serizawa, K. Takei, H. Miyashiro, [37] G. Yu, D. Zhao, L. Wen, S. Yang, X. Chen, Viscosity of ionic liquids: database, observa-
Comprehensive refractive index property for room-temperature ionic liquids, J. tion, and quantitative structure-property relationship analysis, AICHE J. 58 (9)
Chem. Eng. Data 57 (8) (2012) 2211–2216. (2012) 2885–2899.
[10] R.L. Gardas, J.A. Coutinho, Group contribution methods for the prediction of [38] R. Hagiwara, Y. Ito, Room temperature ionic liquids of alkylimidazolium cations and
thermophysical and transport properties of ionic liquids, AICHE J. 55 (5) (2009) fluoroanions. J, Fluorine Chem. 105 (2) (2000) 221–227.
1274–1290. [39] Z.B. Zhou, H. Matsumoto, K. Tatsumi, Low-melting, low-viscous, hydrophobic ionic
[11] M. Sattari, A. Kamari, A.H. Mohammadi, D. Ramjugernath, A group contribution liquids: 1-alkyl (alkyl ether)-3-methylimidazolium perfluoroalkyltrifluoroborate,
method for estimating the refractive indices of ionic liquids, J. Mol. Liq. 200 Chem. Eur. J. 10 (24) (2004) 6581–6591.
(2014) 410–415.