SVOH: Rigorous Selection Approach For Optimal Hyperparameter Values
SVOH: Rigorous Selection Approach For Optimal Hyperparameter Values
Abstract:- The problem we address in this paper is a model generally referred to in the machine learning literature as the
selection problem. We consider the k-fold cross-validation model selection phase [9] and is strictly linked to the evalua-
(KCV) technique, applied to the Gaussian support vector tion of the SVM's generalisation capacity or, in other words,
machine (SVM) classification algorithm. In the cross-vali- the error rate that the SVM can achieve on new data (unknown
dation process, the value of k for the number of subsets is data). In fact, it is common practice to select the optimal SVM
generally chosen and set aprioristically (without any ex- (i.e. the optimal hyperparameters) by choosing the one with
periment). However, the value of k affects the choice of the the lowest generalisation error. The methods for carrying out
best compromise between the estimation error and the ap- the model selection phase can be divided into two categories
proximation error of the model. In this way, the k value of according to [9] : theoretical methods [10] and methods based
the number of subsets can severely influence the optimal on resampling techniques [11].
values of the SVM classifier's hyperparameters and conse-
quently affect the performance of the selected model and Theoretical methods provide in-depth information about
its ability to generalize. classification algorithms but are often inapplicable and incal-
culable to be of any practical use. On the other hand, as men-
In this work, we propose a rigorous approach for tioned by [5], Practitioners have found procedures based on
finding the values of the hyperparameters of the Gaussian resampling techniques, which work well in practice but offer
SVM known as SVOH (Selection of Optimal Hyperparam- no theoretical guarantee of generalisation error. One of the
eter Values) in a context of protein-protein interaction (PPI) most popular resampling techniques is the k-fold cross-valida-
prediction, where it is necessary to classify the pairs of pro- tion (KCV) procedure [8], which is simple, effective and reli-
teins that interact together and those that do not interact able. The KCV technique consists of dividing a data set into k
together. The proposed approach considers the k value of independent subsets. All but one of these subsets is used to
the number of subsets as an influential parameter of the form a classifier, while the remaining subset is used to evaluate
model and therefore performs learning to find an optimal the generalisation error. After training, it is possible to calcu-
value of k. late an upper limit on the generalisation error for each of the
trained classifiers.
Keywords:- Machine Learning, Model Selection, Cross-Vali-
dation, Prediction of Protein-Protein Interactions In the literature, the choice of the value of k is fixed at 5
or 10. Choosing a fixed value of subsets for cross-validation
I. INTRODUCTION can produce a model with a high bias and variance [9]. Cross-
validation takes the average of several estimates of the reten-
The support vector machine (SVM) is one of the most tion risk corresponding to different splits of data. In [5] we can
algorithms for classification tasks, parti-cularly in the classifi- check that the value of k influences the stability of the mean
cation of protein-protein interactions [1]–[3]. SVM belongs to error. Still according to [9], Model selection performance with
the field of artificial neural networks (ANN) [4] but is charac- cross-validation is gene-rally optimal when the variance is as
terised by the solid foundations of statistical learning theory. low as possible. This variance generally decreases as the num-
SVMs are learned by searching a set of para-meters obtained ber k of subsets increases, with a fixed training sample size n.
by solving a constrained quadratic convex programming prob- When k is fixed, the variance of the cross-validation also de-
lem (CCQP), for which a number of efficient techniques have pends on n. In fact, in [8], we can see that the value of γ de-
been deve-loped. The search for optimal parameters does not, pends strongly on the training set used. The choice of k there-
however, complete the learning process, because there is a set fore influences the variance of the cross-validation estimator
of additional variables, hyperparameters, which must be set to and, according to [6], [12], can have a significant impact on
achieve optimal classification performance, e.g. for ANNs, the the search for the optimal values of the hyperparameters.
hyperparameter is the number of hidden nodes. In the Gauss-
ian SVM framework, these are the regulator parameters C and
γ. This setting is not trivial and is an open research problem
[5]–[8]. The process of finding the best hyperparameters is
In the following section, we present a new approach 3: DZE, DZS = subdivision (DZ, k)
called SVOH for selecting the optimal va-lues of the hyperpa- 4: fE = SVM (DZE, C, γ)
rameters of the Gaussian SVM. We first present the problem 5: Er=evaluating the accuracy rate (fE, DZS)
to be solved and then show how the SVOH algorithm works. 6: f = f ∪{ Er }
7: end for
II. PROPOSED APPROACH 8 : {k*, C*, γ* }= the best accuracy rate for f
9 : Retourn {k*,C*,γ* }
A. Problem to be Solved
In order not to fix the value of k when searching for the III. MATERIALS AND METHODS
optimal values of the Gaussian SVM hyperparameters using k-
fold cross-validation, we propose to use the procedure of con- A. Learning Data
sidering several possible combinations of subsets into which In this work, the data used for experimentation come
the original training set can be divided. The aim is to choose from the work of kopoin et al. [3], [16]. Kopoin et al. used the
a better cross-validation estimation procedure, one that pre- BP (Bigram physicochemical) feature extraction technique to
dicts the lowest bias and variance, allowing a good combina- produce numerical data from three protein-protein interaction
tion of hyperparameters (C,γ) to be identified so that the SVM (PPI) reference datasets [17]. PPI refers to whether two pro-
classifier can have a low generalisation error and predict un- teins interact. In the case of interaction, we speak of positive
known data with a higher accuracy rate. PPI and in the opposite case, ne-gative PPI. Firstly, HPRD PPI
data [1] consisting of a set of 10,000 samples divided into
For the proposed approach, we will consider the number 5,000 positive PPI pairs and 5,000 negative PPI pairs as in [2],
k as a hyperparameter as in [6], which can take any value in [18], [19]. S. Cerevisiae PPI datasets [20], [1] consis-ting of
the set k ∈{i,...,10}, i ≥ 3. The smallest value of k is set to 3 a set of 11188 samples (with 5,594 positive pairs and 5,594
because for each subset, the training data must be greater than negative pairs) and H. Pylori PPI datasets [21] consisting of
60% of the training set, as shown by [13]. Here, we set the 2,496 samples (1,458 po-sitive pairs and 1,458 negative pairs)
highest value of k at 10 to remain within the margin set by were also used. Four other PPI datasets also used for interac-
empirical methods. This limited choice of test values from k to tion prediction were used to test the SVOH approach. The first
10 also means that, in cases where the training set is large, the is the Homo Sapiens (H. Sapiens) dataset. This dataset is col-
technique is not very computationally intensive. Assuming lected from the HPRD database as described in [22]. It con-
that there are q parameters, and that each of them has m dis- tains 8,161 protein pairs, including 3,899 positive PPI pairs
tinct values, its computational com-plexity increases exponen- and 4,262 negative PPI pairs. The second is the Escherichia
tially at a rate of O(mq) as shown by [14], [15]. In addition, in coli (E. coli) dataset, consisting solely of 6,594 positive pairs
[6], we can see that more than 10 different databases have pro- [23]. The third is the data set named C. elegans [24] which
duced an optimal value k less than 10. The number of param- contains 4,013 positive pairs. Finally, the fourth set is named
eters to optimise for our case therefore becomes the triple (C, M. musculus and contains 313 positive pairs [25].
γ ,k), given that our f decision function uses a Gaussian kernel
which itself operates with the parameter pair (C, γ). B. SVM Algorithm
The PPI prediction phase, which starts with an optimal
B. Functioning of the SVOH Algorithm SVM, is obtained by selecting the optimal hyperparameters,
Let {C} and {γ} correspond respectively to the set of val- i.e. those that allow the SVM to have the lowest generalisation
ues for parameter C and the set of values for parameter γ. Let error.
DZ be our trai-ning set of Z observations and f our SVM model
obtained with the hyperparameters (C, γ), DZE the Z(k-1)/Z Consider a learning set𝒵 = {(𝓍𝑖 , 𝓎𝑖 ), 𝑖 ∈ [1, 𝑛]} where
subsets of the training set after subdivision into k-subsets and for each vector 𝓍 ∈ ℝ𝑝 is assigned a value ∈ {−1, +1} .
DZS , the Z/k subsets remaining reserved for the test after sub- The relationship between 𝓍 and 𝓎 is encapsulated in an un-
division. The algorithm takes as input DZ, {C} and {γ}. For known distribution 𝑝(𝓍, 𝓎), which is the source of the data.
each k subdivisions, k∈{3,10}, of the training set Dz, the algo- The aim of learning is to find a function 𝒻: ℝ𝑑 → 𝑦𝒻 ⊂ ℝ
rithm trains a f classifier using the values of {C} and {γ} on which approximates this relationship. The SVM algorithm [26]
DZE, then evaluates on DZS the correctness rate of f. Finally, the can be used for this purpose, where the classifier is identified
algorithm selects the best triplet (C, γ, k) that gave a higher during the hyperparameter search phase by solving the follow-
correctness rate. A pseudo-code of the SVOH algorithm is ing convex quadratic problem:
shown below.:
𝑛 𝑛 𝑛
1
SVOH Algorithm 𝑚𝑎𝑥 ∑ 𝛼𝑖 − ∑ ∑ 𝛼𝑖 𝛼𝑗 ∙ 𝓎𝑖 𝓎𝑗 ∙ ℎ(𝓍𝑖 , 𝓍𝑗 )
Input : DZ : learning set 2
𝑖=1 𝑖=1 𝑗=1
{C}: set of values for C, 𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑡𝑜 0 ≤ 𝛼𝑖 ≤ 𝒞, 𝑖 = 1, . . 𝑛
{γ}: set of values for γ 𝑛
∑ 𝛼𝑖 𝓎𝑖 = 0
Output : {k*, C*, γ*} 𝑖=1
1: for C ∈ {C}, γ ∈ {γ}, k ∈{3,10} do :
2: f=∅
where the 𝛼𝒾 are the Lagrange multipliers and 𝒞, one 𝐻10 (𝑅𝑖 ) − 𝜑1
of the hyperparameters, which controls the trade-off between 𝐻1 (𝑅𝑖 ) =
0
[𝐻1 (𝑅𝑖 ) − 𝜑1 ] 2
margin and misclassification error, and ℎ(𝓍𝒾 , 𝓍𝒿 ), the kernel √∑20 ⁄
𝑖=1 20
function. The kernel considered here is the Gaussian kernel. 0
𝐻2 (𝑅𝑖 ) − 𝜑2
The Gaus-sian kernel is derived from the RBF (Radial Basic 𝐻2 (𝑅𝑖 ) =
0
Function) and depends on the Euclidean distance between the √∑20
2
[𝐻2 (𝑅𝑖 ) − 𝜑2 ] ⁄
vectors in the starting space. It is defined as follows: { 𝑖=1 20
∥ 𝓍𝒾 − 𝓍𝒿 ∥2
ℎ(𝓍𝒾 , 𝓍𝒿 ) = exp ( ) with 𝐻10 (𝑅𝑖 ) the hydrohobicity value of amino acid i,
2𝛾 2
similarly 𝐻20 (𝑅𝑖 ), the hydrophilic value of the amino acid i,
𝜑1 and 𝜑2 are respectively the average of the hydrophobi-
with 𝛾 an additional hyperparameter which determines
city and hydrophilicity values of the 20 amino acids.
the extent of the influence of a single training example [18].
Solving the convex quadratic problem yields a classifier de-
The BP method applied to a protein sequence generates
fined as follows :
a 400-D vector as follows:
𝓃 𝑉𝐵𝑃 = [Φ1 , Φ2 , Φ3 , … , Φμ , … , ΦΨ ]𝑇
𝒻(𝓍) = ∑ 𝛼𝒾 𝓎𝑖 𝒽( 𝓍𝒾 , 𝓍𝒿 ) + 𝑏
𝒾=1
where Ψ=r×s=400 is the dimensionality of the cha-rac-
teristic vector VBP.
where b represent the bias. To represent the pair of proteins, the vector of each pro-
tein are concatenated, resulting in a final 800-D vector.
The two hyperparameters C and γ are therefore the influ-
ential parameters of the SVM classifier, allowing it to estimate IV. RESULTS
the generalization error.
The IPP HPRD datasets were used as training data, while
C. Bigram Physicochemical Method the other two datasets, S. Cerevisiae and H. Pylori were used
The Bigram Physicochemical (BP) method is a feature as test data.
extraction method based on protein sequences. The BP method
calculates the bigram of two amino acids (frequency of two A. Evaluation Metrics Used
amino acids) u-sing the values of a distance function obtained To evaluate the robustness of the proposed approach, we
from the values of the hydrophobic and hydrophilic pro-perties used the metrics generally used to measure the performance of
of the amino acids [27] in a matrix called the physicochemical a classifier. We used the following measures: Accuracy (Acc),
matrix (MSP). Precision (Pre), Sensitivity (Sen) and AUC. Some of these
measures are defined as follows:
Consider a protein P composed of L amino acid residues : 𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐 = ,
𝑅1 𝑅2 𝑅3 … 𝑅𝐿−1 𝑅𝐿 𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
𝑇𝑃
𝑃𝑟𝑒 =
The value of the bigram between amino acids i and j, rep- 𝑇𝑃 + 𝐹𝑃
𝑇𝑁
resented by the frequency of occurrence of the transition from 𝑆𝑒𝑛 =
the amino acid at position i to the amino acid at position j, is 𝑇𝑁 + 𝐹𝑁
calculated as follows:
𝐿−1 TP (true positive) is the number of predicted positive
𝐵𝑃𝑖,𝑗 = ∑ 𝐶𝑘,𝑖 × 𝐶𝑘+1,𝑗 , PPIs, i.e., interact really, FP (false positive) is the number of
predicted positive PPIs, but are negative really, TN (true neg-
𝑘=1
1 ≤ 𝑖 ≤ 20; 1 ≤ 𝑗 ative) is the number of PPIs predicted negative, and which are
≤ 20 negative really, and FN (false negative) is the number of PPIs
where , 𝐶𝑘,𝑖 is the value of the MSP in row k and col- predicted negative, but are positive really. The ROC curve and
umn i and 𝐶𝑘+1,𝑗 the value of the MSP in row k +1 and col- AUC value graphically illustrate the performance of a binary
classification system.
umn j, calculated as follows:
1
𝐶𝑘,𝑖 = 𝑓(𝑅𝑖 , 𝑅𝑗 ) B. Train Results
𝑗 The training was conducted on HPRD data and consisted
with in searching using the SVOH algorithm for an optimal value
𝑓(𝑅𝑖 , 𝑅𝑗 ) = 𝐻1∗ (𝑅𝑖 ) × 𝐻1∗ (𝑅𝑗 ) + 𝐻2∗ (𝑅𝑖 ) × 𝐻2∗ (𝑅𝑗 ) ; of k, C and γ among a grid of potential values informed in Ta-
1 ≤ 𝑖 ≤ 𝐿, 1 ≤ 𝑗 ≤ 20 ble 1. We used the accuracy rate as a performance evaluation
metric to find the optimal hyperparameter values. The gene-
where 𝐻1 (𝑅𝑖 ) and 𝐻2 (𝑅𝑖 ) are respectively the corre- ralisability of the trained model is assessed on the S. Cere-
lated hydrophobicity and hydrophilicity functions of amino visiae and H. Pylori datasets.
acid i and are obtained as follows :
Table 1: Range of Hyperparameter Values 97.88% respectively. In addition, although the difference be-
Hyperparameter Grid values tween the different rates is not very large, we note that subdi-
C {1;3;10;32;50;100} viding the training set into 7 subsets improves the accuracy
γ {10-4; 10-3; 10-2; 10-1; 1} rate by around 1% compared with the rates obtained with the
k {3;4;5;6;7;8;9;10} a priori subdivision values (see Table 2). We also observe a
better score in the accuracy and sensitivity metrics, with an
Application of the SVOH algorithm yielded the follow- average performance greater than 0.7%, than those obtained
ing optimal hyperparameter values : (C*, γ*, k*) = (32; 0.01; by the a priori values (table 3). The results show that the best
7). Table 2 shows the best performance values of the average rates are obtained with a k = 7 subdivision, i.e. the one de-
correctness rate for different combinations of the (C, γ, k) tri- terminated by the SVOH approach.
plet.
C. Other Results
Table 2: Results of the Accuracy Rate after Application of
the SVOH Approach Results with other PPI Datasets
k (C, γ) Acc (%) Table 4 shows the results obtained on data sets other than
3 (10 ; 0.1) 91.92 the training data.
4 (50 ; 0.01) 92.36
5 (100 ; 0.001) 92.70 Table 4 : Results on Different PPI Datasets
6 (10 ; 0.001) 91.13 PPI Data (k*, C*, γ*) Acc (%)
7 (32 ; 0.01) 93.69 H. sapiens (4;32;0.01) 90.92
8 (32 ; 0.001) 92.36 E. coli (5;10;0.001) 90.36
9 (100 ; 0.1) 92.21 C. elegans (7;50;0.001) 88.49
10 (100 ; 0.01) 92.49 M. musculus (6;10;0.1) 74.43
The results in Table 2 show that for values of k ∈ The results in Table 4 indicate that the hyperparameter
{3;4;5;6}, the accuracy rates are between 92% and 93%. From triplet values (k*, C*,γ*) that achieve the best performance on
k = 7, the accuracy rates are much higher and lie between 92% the H. sapiens, E. coli, C. elegans and M. musculus datasets
and 94%. On the whole, the accuracy rates are sensibly equal, are (4; 32; 0.01), (5; 10; 0.001), (7; 50; 0.001) and (6;10; 0.1),
however, for a number k = {5 ; 7 ; 10} where the values 5 and respectively. We can see that, apart from the E. coli data where
10 represent the a priori values, the model formed with a num- performance is obtained with an a priori value for the subdivi-
ber k = 7 of subsets obtains the best accuracy score with 93.69% sion of the training set (k = 5), the other data sets show perfor-
against 92.70% for k = 5 and 92.49% for k = 10. These first mance for subdivision values that differ from the usual values.
results show that the best performance of the SVM model is These results show that the number of subdivisions of the
obtained on the triplet (k, C, γ) = (7; 32; 0.01). training set is important for finding the optimal values of the
SVM classifier's hyperparameters.
In Table 3, we compare the scores obtained for the sub-
division values k = 7 determinate by the SVOH approach with Results Obtained with the ANN Algorithm.
those obtained for the values k = {5;10}, which are the values The architecture of an artificial neural network (ANN)
generally applied, for the other metrics precision, sensitivity, [28] is a multi-layer stack of simple modules. The input layer
and AUC. receives the data, then the in-formations of the data are trans-
formed in a non-linear way through several hidden layers. The
Table 3: Results for Other Metrics average gradient [29] is calculated and the weights are ad-
justed accordingly, before the final outputs are calculated in
k Pre (%) Sen (%) Auc (%)
the output layer. For example, consider learning an artificial
5 92.90 92.15 96.36
neural network with λ-hidden layers, where each layer com-
putes 𝐻 𝛼 , 𝛼 ∈[1,…, 𝜆]. The first layer considers the network
7 94.09 93.16 97.88 inputs, while the last layer returns the outputs 𝐻 𝜆 as an a
10 92.87 92.67 95.58 posteriori probability. Let {𝑁1 , … , 𝑁 𝛼 , … , 𝑁 𝜆 }, the number of
neurons for each layer. The intermediate layers return 𝐻 𝛼 =
The scores obtained for a priori values of the number of {ℎ𝑖𝛼 } where ℎ𝑖𝛼 represents the output of the i-th neuron in the
subsets are approximately the same in all metrics. For a subdi- 𝐻 𝛼 . This output is determined according to the following ex-
vision k = 5 of the training set, we obtain as hyperparameter pression:
values (C, γ) = (100, 0.001). The scores obtained in the accu- ℎ𝑖𝛼 = 𝑓 𝜆 (∑𝑁
𝛼−1 𝛼 𝛼−1
× 𝑏 𝛼−1 );
𝑖=1 𝜔𝑖,𝑗 × ℎ𝑗
racy, sensitivity and AUC metrics are 92.90%; 92.15% and 𝛼
96.36% respectively. For a subdivision k = 10, we obtain hy- ∀𝑖 ∈ {1, … , 𝑁 }, ∀𝑗 ∈ {1, … , 𝑁 𝛼−1 }
perparameter values (C,γ)=(10;0.01). The scores obtained for 𝛼
the various metrics in Table 3 are 92.87%, 92.67% and 95.38% where 𝜔𝑖,𝑗 represent weights and 𝑏 𝛼−1 the bias (one
𝜆
respectively. However, the rates obtained for a subdivision k per layer) and 𝑓 , a non-linear function applied to the sum of
=7 with hyperparameter values (C,γ) = (32; 0.01) in the accu- the weights.
racy, sensitivity and AUC metrics are 94.09%, 93.15% and
[5]. D. Anguita, A. Ghio, S. Ridella, et D. Sterpi, « K-Fold [20]. Z.-H. You, S. Li, X. Gao, X. Luo, et Z. Ji, « Large-Scale
Cross Validation for Error Rate Estimate in Support Vec- Protein-Protein Interactions Detection by Integrating Big
tor Machines. », in DMIN, 2009, p. 291‑297. Biosensing Data with Computational Model », BioMed
[6]. D. Anguita, L. Ghelardoni, A. Ghio, L. Oneto, et S. Research International. Consulté le: 5 janvier 2019. [En
Ridella, « The’K’in K-fold Cross Validation. », in ligne]. Disponible sur: https://www.hindawi.com/jour-
ESANN, 2012, p. 441‑446. nals/bmri/2014/598129/abs/
[7]. J. D. Rodriguez, A. Perez, et J. A. Lozano, « Sensitivity [21]. J. Martin, « Prédiction de la structure locale des protéines
Analysis of k-Fold Cross Validation in Prediction Error par des modeles de chaˆınes de Markov cachées », PhD
Estimation », IEEE Trans. Pattern Anal. Mach. Intell., Thesis, Citeseer, 2005.
vol. 32, no 3, p. 569‑575, mars 2010, doi: [22]. C.-H. Huang, H.-S. Peng, et K.-L. Ng, « Prediction of
10.1109/TPAMI.2009.187. Cancer Proteins by Integrating Protein Interaction, Do-
[8]. Y. Bengio et Y. Grandvalet, « No unbiased estimator of main Frequency, and Domain Interaction Data Using
the variance of k-fold cross-validation », J. Mach. Learn. Machine Learning Algorithms », BioMed Research In-
Res., vol. 5, no Sep, p. 1089‑1105, 2004. ternational. Consulté le: 28 mai 2018. [En ligne]. Dispo-
[9]. S. Arlot et A. Celisse, « A survey of cross-validation pro- nible sur: https://www.hindawi.com/jour-
cedures for model selection », Stat. Surv., vol. 4, no none, nals/bmri/2015/312047/abs/
p. 40‑79, janv. 2010, doi: 10.1214/09-SS054. [23]. M. Riley, « Functions of the Gene Products of Esche-
[10]. V. Vapnik, The Nature of Statistical Learning Theory. richia coli », MICROBIOL REV, vol. 57, p. 91, 1993.
Springer Science & Business Media, 2013. [24]. X.-T. Huang, Y. Zhu, L. L. H. Chan, Z. Zhao, et H. Yan,
[11]. J. L. Rodgers, « The bootstrap, the jackknife, and the ran- « An integrative C. elegans protein–protein interaction
domization test: A sampling taxonomy », Multivar. Be- network with reliability assessment based on a probabil-
hav. Res., vol. 34, no 4, p. 441‑456, 1999. istic graphical model », Mol. Biosyst., vol. 12, n o 1, p.
[12]. C.-W. Hsu, C.-C. Chang, et C.-J. Lin, A practical guide 85‑92, 2016.
to support vector classification. Taipei, 2003. [25]. Y. Z. Zhou, Y. Gao, et Y. Y. Zheng, « Prediction of Pro-
[13]. G. A. Y. Laura, « Algorithme de descente du gradient sto- tein-Protein Interactions Using Local Description of
chastique », 2015. Amino Acid Sequence », in Advances in Computer Sci-
[14]. J. A. A. Brito, F. E. McNeill, C. E. Webber, et D. R. Chet- ence and Education Applications, vol. 202, M. Zhou et
tle, « Grid search: an innovative method for the estima- H. Tan, Éd., Berlin, Heidelberg: Springer Berlin Heidel-
tion of the rates of lead exchange between body compart- berg, 2011, p. 254‑262. doi: 10.1007/978-3-642-22456-
ments », J. Environ. Monit., vol. 7, no 3, p. 241‑247, févr. 0_37.
2005, doi: 10.1039/B416054A. [26]. V. N. Vapnik, « An overview of statistical learning the-
[15]. L. Yang et A. Shami, « On hyperparameter optimization ory », IEEE Trans. Neural Netw., vol. 10, no 5, p. 988‑999,
of machine learning algorithms: Theory and practice », 1999.
Neurocomputing, vol. 415, p. 295‑316, nov. 2020, doi: [27]. C. J. van Oss, « Hydrophobicity and hydrophilicity of bi-
10.1016/j.neucom.2020.07.061. osurfaces », Curr. Opin. Colloid Interface Sci., vol. 2, no
[16]. C. N. Kopoin, A. K. Atiampo, B. G. N’Guessan, et M. 5, p. 503‑512, 1997.
Babri, « Prediction of Protein-Protein Interactions from [28]. P. Wira, « Réseaux de neurones artificiels: architectures
Sequences using a Correlation Matrix of the Physico- et applications », Cours En Ligne Univ. Haute-Alsace,
chemical Properties of Amino Acids », Int. J. Comput. 2009.
Sci. Netw. Secur., vol. 21, no 3, p. 41‑47, mars 2021, doi: [29]. L. Bottou, « Stochastic gradient descent tricks », in Neu-
10.22937/IJCSNS.2021.21.3.6. ral networks: Tricks of the trade, Springer, 2012, p.
[17]. J. R. Bock et D. A. Gough, « Predicting protein–protein 421‑436.
interactions from primary structure », Bioinformatics, [30]. X. Cao, W. Zhang, et Y. Yu, « A Bootstrapping Frame-
vol. 17, no 5, p. 455‑460, mai 2001, doi: 10.1093/bioin- work With Interactive Information Modeling for Net-
formatics/17.5.455. work Alignment », IEEE Access, vol. 6, p. 13685‑13696,
[18]. Y. E. Göktepe et H. Kodaz, « Prediction of Protein-Pro- 2018, doi: 10.1109/ACCESS.2018.2811721.
tein Interactions Using An Effective Sequence Based
Combined Method », Neurocomputing, vol. 303, p.
68‑74, août 2018, doi: 10.1016/j.neucom.2018.03.062.
[19]. Z.-H. You, Y.-K. Lei, L. Zhu, J. Xia, et B. Wang, « Pre-
diction of protein-protein interactions from amino acid
sequences with ensemble extreme learning machines and
principal component analysis », BMC Bioinformatics,
vol. 14, no S8, p. S10, mai 2013, doi: 10.1186/1471-2105-
14-S8-S10.