How Can Chemometrics Support The Development of Point of Need
How Can Chemometrics Support The Development of Point of Need
org/ac Feature
devices and also decrease the amount of experiments (thus, costs) at the same time. Readers will be
provided a concise overview regarding the most employed chemometric tools used for target
identification, design of experiments, data analysis, and digitalization of results applied to the
Downloaded via 31.14.72.20 on March 8, 2022 at 19:43:35 (UTC).
development of diverse portable analytical platforms. This Feature provides a tutorial perspective
regarding all the major methods and applications that have been currently developed. In particular,
the presence of a concise and informative table assists analytical chemists in utilizing the right
chemometrics-based tool depending on the architectures and transduction.
Sara Tortorella and Stefano Cinti*
■
devices for the use of nonspecialists, as already established by
DECENTRALIZED ANALYTICAL CHEMISTRY the WHO with the ASSURED criteria.6,7 In addition to this,
the obvious limitations existing in developing/remote
The role of analytical chemistry to provide facile and affordable countries, in the frame of the 2030 Agenda for Sustainable
solutions for the major fields of action, namely, clinical, Development (UN), represent a clear objective to be extended
pharmaceutical, environmental, and agri-food, is continuously within a multidisciplinary vision.8 The research around the
facing challenges.1,2 The research and the economic efforts, various architectures and principles of analytical methods, e.g.,
along with the development of innovative and breakthrough electrochemistry, colorimetry, fluorimetry, spectrophotometry,
point-of-need tools (including point-of-care and lab-on-chip spectrometry, chromatography, has been merged to other
devices), represents a hot topic in the analytical sciences. disciplines such as biology, biochemistry, organic chemistry,
Although few of these efforts have been capable of generating material science, engineering, and microfabrication, with the
marketable solutions, the glucose biosensor for diabetic
patients and the lateral flow immunoassay strip for pregnancy
tests still roughly encompasses the total market. Public Published: January 26, 2021
authorities, nonprofit foundations, and private companies,
e.g., EU Commission, NIH, Bill and Melinda Gates, Wellcome
Trust, AIRC, Roche, Samsung, Google, etc., are committed to
funding researchers who aim to enable nonspecialized
customers and citizens, worldwide, to be actively involved in
■
pubs.acs.org/ac Feature
aim to reduce tasks for the end-user, to improve the reliability CHEMOMETRICS AT A GLANCE
and (possibly) to reduce costs (e.g., the use of eco-friendly Chemometrics was defined for the first time by a young
materials like paper-based substrates,9−11 the synthesis of assistant professor writing a grant application as “the art of
biomimetic nanomaterials,12,13 the rational design of recog- extracting chemically relevant information from data produced
nition probes (aptamers), the multiplexing approach, micro- in chemical experiments”.22 It was 1972, and the professor was
fluidics for lowering sample treatment/chemicals use/waste Svante Wold (professor of organic chemistry at Umeå
management.)14−16 Among all the aforementioned objectives, University, Sweden), today frequently remembered as the
a common feature is recognized: making analytical processes “father of chemometrics”. A few years later, together with his
more convenient in terms of (i) fabrication (use of synthetic colleague Bruce Kowalski (professor of analytical chemistry,
materials instead of animal sources, e.g., oligonucleotide University of Washington, Seattle, WA), he founded the
aptamers vs antibodies), (ii) application (in situ use, e.g., International Chemometrics Society. As clearly explained by
point-of-care vs laboratory-bounded), (iii) environment Wold himself in different occasions, the main goal of this new
(reduction of waste production, e.g., paper vs plastic), (iv) discipline was to get chemically relevant information out of
social impact (improving citizen participation, e.g., non- measured chemical data (e.g., design of experiment, DoE,
specialists vs skilled personnel), and (v) economics (limiting multivariate analysis),23−25 and to represent and display this
the use of prime matters/maintenance, e.g., microfluidics vs information. These complementary tasks clearly demanded
bulky/expensive approaches). However, the path from knowledge of statistics and applied mathematics, but the
conceptualization to market, through design and data analysis, approach has always been the fit-for-use: “We must remain
still appears dependent on a univariate paradigm.17 The choice chemists and adapt statistics to chemistry instead of vice versa. And
of a target, optimization of a sensor, analysis of a signal, noise chemometrics must continue to be motivated by chemical problem
discrimination, and evaluation of experimental parameters, e.g., solving, not by method development.”22 From the 70s up to 90s,
stability, reproducibility, shelf life, are mainly derived from the development of computerized instruments (especially in
linear correlation. For instance, the amount of an enzyme to analytical and physical-organic chemistry) made data acquis-
develop a biosensor is optimized by assuming other ition easier and cheaper. In those years, chemometrics became
experimental parameters are irrelevant and/or the presence established and a big effort was put on designing novel
of interfering species in complex matrixes might limit real algorithms for data information extraction and optimization,
application. Although this approach works in many cases, the analogous to what biologists, psychologists, and economists
adoption of statistical methods to understand the additive have done with biometrics, psychometrics, and econometrics,
effect of multiple species, to evaluate the correlation of respectively.26−28 In the late 90s, the application of computer
experimental variables on signal output, and to discriminate/ and informatics technology in chemistry led to the coining of
classify multitargets simultaneously, represent a useful the counterpart of bioinformatics in chemistry: chemo-
opportunity for moving toward a multivariate perspective.18 informatics.29,30 Chemoinformatics is the use of informatics
Plenty of information can be extracted from data if more than methods to solve chemical problems,30,31 thus including
one variable is considered at the same time: the understanding chemical database systems and structures, computer assisted
of how more inputs correlate with each other and affect the structural elucidation, computer-assisted drug and chemical
output can potentially improve both the analytical perform- synthesis design, and molecular modeling.29,32,33 Mainly born
ances and the cost of portable devices.19 The aim of this as a tool to support analytical and physical-chemical data
Feature is to highlight the use of chemometrics for the analysis, today chemometrics is applied in both academia and
development of portable analytical devices, with a holistic industry to broader areas of analytical chemistry, process
description. Although other content on chemometrics applied optimization, drug design, biomarker discovery, material
to portable devices has been reported in the literature,19−21 design, food science, digital and signal processing, image
some novel aspects are included in this Feature: (1) the analysis, and omics sciences with the potential to revolutionize
perspective is extended to aspects that are often not addressed, the very intellectual roots of problem solving.33−35 The first big
such as target identification and digitalization, and (2) the challenge for chemometrics is the reduction of the amount of
reported examples are focused on diverse disciplines around experiments. Reasons for reducing the number of experiments
the world of portable devices, microfluidics, selective are trivial: experiments are expensive, time-consuming, and
biosensors, nonspecific array, optical readouts, etc. All the sometimes pose ethical issues (e.g., animal experimentation).
steps from bench to market, namely, target identification, Minimizing the number of experiments, without compromising
device optimization, signal treatment and data digitalization, the information content therein, is arguably the main aim of a
can benefit from the adoption of chemometrics. Readers scientist. To this aim, we can use experimental design, also
working in the field of point-of-needs would be able to known as Design of Experiment (DoE): a mathematical
consider the use of novel routes for enhancing their research. framework for planning experiments by changing all involved
As reported in the preface of the book “Chemometrics in variables simultaneously, thus extracting the maximum amount
Electroanalysis”,21 Prof. Scholtz used the following words: of information in the fewest number of experiments. Different
“Still, only a few electrochemists and electroanalysts make use of it, mathematical strategies exist and have been extensively
probably because their attention is completely absorbed by the described elsewhere.36−40 However, rather than the difficulty
purely electrochemical problems, leaving not much time to study stemming from mathematical aspects, the main stumbling
chemometrics.” This Feature is intended for those operating block is the mental attitude required to switch from changing
within the field of point-of-need devices that still do not from the one-variable-at-a-time strategy (OVAT) to DoE,
consider the use of chemometrics to improve their outputs. which is still underrepresented in the scientific community.24
Light theoretical descriptions are combined with practical When optimizing a biosensor, many variables must be taken
evidence to support the realization of novel analytical tools. into account: the pH, the concentration of the target analyte,
2714 https://dx.doi.org/10.1021/acs.analchem.0c04151
Anal. Chem. 2021, 93, 2713−2722
Analytical Chemistry pubs.acs.org/ac Feature
temperature, time of reaction, and other parameters depending one category is of interest (e.g., traceability problems, food
on the working principle. DoE, by means of, e.g., factorial authentication, and in the context of personalized medicine)
designs40 or D-optimal design methodologies,41 allows one to class modeling tools are used instead.50 Instead of looking for
identify the minimum essential experiments needed to span all differences among samples belonging to different categories,
sources of the variation and suggests that resulting class modeling techniques focus on the dis/similarities
experimental data will identify the optimal conditions, the between the samples belonging to a particular category.50,51
variables that most influence the results, and those that do not. In the last decades, novel data mining techniques have been
In addition to DoE, chemometrics tools and, more generally, developed to identify relationships and trends in large,
statistical tools of data-mining, are commonly used to extract multidimensional big data sets as might be obtained from
hidden information and enlightening relationships among data. modern high-throughput screening (HTS) or untargeted
In the simplest case, data come or can be summarized in a omics analyses, namely, substructural analysis,52 discriminant
block of data X, a m × n matrix in which for each of the m analysis,53 neural networks,54−56 decision trees,57,58 support
samples, n experimental variables, or molecular descriptors are vector machine,59 and kernel algorithms.60 However, it is
reported. Starting from the assumption that the n variables almost impossible to define the best modeling technique a
frequently are more than the m samples, they are likely to be priori: an initial benchmark study is necessary to determine the
correlated, and they can be missing in some samples, most appropriate one. Despite the mathematic complexity
multivariate analysis25 (MVA) is the elected way to extract added to new algorithms, the final goal is still the same: to
information through data analysis.42 We can distinguish two provide interpretable, thus useful, models with adequate
approaches for the application of MVA: unsupervised and predictive ability.
supervised. In the unsupervised setting, the goal is to explore
the variance in a single block of data X. For that, a matrix
factorization can be performed without any a priori knowledge
■ TARGET IDENTIFICATION
Point-of-need devices are built to sensitively, accurately, and
(e.g., no information about the class label of data, the number selectively detect an analyte (or group of analytes) of interest.
of classes, etc.), so that natural patterns can be elucidated. This A question arises: how was that particular analyte (or group of
approach is ideal to explore data in an unbiased fashion, analytes) identified as most related to a specific pathophysio-
especially in an early phase of the investigation, when no or logical condition we aim to monitor? The process of target
little information on variable or analytes involved in the analyte identification or, more popularly nowadays, of
process is available. Among unsupervised multivariate analysis biomarker discovery, deserves a dedicated research effort
tools, principal component analysis (PCA) is the workhorse in which can be speeded up by chemometrics. We can distinguish
chemometrics. PCA is used to reduce the complexity and the two possible approaches for target identification: hypothesis-
dimensionality of a set of data contained in a matrix by based and discovery-based, Figure 1. The hypothesis-based
rationalizing the variance and providing an overview of all
observations or samples in the data table.43 The idea around
PCA is to reduce the dimensionality of a data set consisting of
a large number of variables, by obtaining novel variables
(principal components) that are obtained by a combination of
former variables: it allows one to retain as much as possible the
variability present in the data set and to reduce noise and
redundancy. By inspecting a PCA model, groupings, trends,
and outliers can also be found. However, in some cases we do
have an a priori additional knowledge of the samples, for
example, concentration, dose, age, gender. In this scenario, we
can use supervised models to explore the variance in a block of
data X that allows the prediction of a response block Y, that is,
our additional knowledge. The latter may contain quantitative
data, which puts one in the regression domain, or categorical
data (i.e., healthy versus disease samples), which puts one in
the classification domain. This method helps shift the question
from “What is in there (X)?” to either “What is its relation to Y Figure 1. (A) Hypothesis-based approach focused on cystic fibrosis
(quantitative)?” or “What is the difference between the classes (CF). People affected by CF have CTFR (membrane protein)
in Y (categorical)?”.43 As a result, providing that no malfunctioning (ii), leading to chloride ions accumulation within cells.
overfitting44 is occurring (i.e., The model we are building to Sweat chloride detection is used to diagnose CF. (B) Discovery-based
rationalize relationships among observations is too closely or approach with the use of PCA: (i) score plot that displays variability
purely fitting the training data with poor predictive ability on among samples, (ii) loading plot that displays contribution of original
novel data), supervised methods can point to the variables that variables (X, Y, Z, three possible biomarkers), and (iii) biplot that
visualizes the correlation among samples and original variables.
lead to the desired quantization or classification. Popular
supervised multivariate analysis tools are partial least squares
regression45 (PLS, e.g., age, dose concentration) and its approach is grounded on the mechanistic understanding of
extension to classification problems, known as PLS discrim- biochemical processes behind the pathophysiological condition
inant analysis46 (PLS-DA, e.g., control vs treated, healthy vs of interest: understanding that diabetes mellitus increases
disease). Sparse and other variants of those techniques also blood glucose levels led to the identification of glycosylated
exist.47−49 When there is an high imbalance between the hemoglobin as an ideal biomarker for diagnosis of diabetes.61,62
number of training samples in each class and/or where only The same happens for pregnancy tests.63 Discovery-based
2715 https://dx.doi.org/10.1021/acs.analchem.0c04151
Anal. Chem. 2021, 93, 2713−2722
Analytical Chemistry pubs.acs.org/ac Feature
approaches, on the other hand, aim to identify statistically features identified are not noise and to test the predictive
significant changes in molecular species associated with the ability of the model beyond the training data, respectively.44,75
pathophysiological state of interest. For instance, the breast All in all, even if supervised methods are preferred in the target
and ovarian cancer-associated gene BRCA1 was identified by identification step, it is good practice to start the analysis with
positional cloning of a region on chromosome 17 that is unsupervised methods: if the desired classification is already
frequently deleted in breast cancer.64 visible in a PCA scores plot, for instance, the supervised
Today, in the high-throughput omics era, where generating algorithms can be applied with less probability of overfitting. A
data is arguably easier and faster than interpreting results, the discovery-based approach can then evolve into a hypothesis-
discovery-based approach is predominant. Especially in the based approach, since it is not sufficient to prove that an
early stages of an investigation, appropriately sized data sets analyte can discriminate two or more groups of samples: a
undergo statistical analysis to data reduction and classification biochemical mechanistic explanation is needed to support the
in a purely data-driven fashion. In such a multivariable domain, discovery. Indeed, one of the bottlenecks of a discovery-based,
multivariate analysis is the natural choice to identify a panel of data-driven target identification is to validate the robustness
complementary target analytes that can effectively discriminate and to prove clinical applicability of the proposed markers,
the samples under investigation better than a single one (i.e., thus to prove interpretability of the model. In this context, it is
univariate approach).65 Taking into account the correlation crucial to combine the data-driven approach with expert
structure of the data and the synergies and antagonisms knowledge throughout the entire process of target identi-
plausibly existing among the potential analytes, the multivariate fication: from sample sizing to collection, from data modeling
approach outperforms the univariate one in sensitivity, to result interpretation and validation.
specificity, and reliability and were successfully used for
diagnostics and prognostic biomarkers discovery.66 As an
example, one of the few Food and Drug Administration (FDA)
■ DESIGN AND OPTIMIZATION OF THE DEVICE
As written above, a common way to develop point of need
approved biomarkers62 is the one for ovarian cancer (Ova1), devices is represented by a single variable optimization. This
discovered by artificial neural network (ANN) modeling of the approach appears inconvenient by two perspectives: number of
plasma proteome of women with ovarian cancer compared to experiments and reliability of optimization.76 To optimize a
women with benign gynecological diseases.67 As a result, a device composed of N variables, L levels, and with a number of
panel of five biomarkers was found to outperform the R repetitions, N × L × R experiments are required: for
previously known ovarian cancer biomarker, CA125,68 in the instance, 75 experiments are needed to optimize (variable-by-
ability to discriminate between invasive ovarian cancer and variable) 5 variables with 5 levels, repeated 3 times. Even if the
benign lesions.62,69 When the dimension of the cohorts is OVAT approach might work in some cases, the number of
limited or the interest is focused on the phenomenological experiments increases quickly, along with time and cost.77 In
characterization of the disease,51 multivariate biomarker addition, the presence of interaction among variables is not
discovery is achieved through the building of exploratory taken into account at all. The lack of information related to
models. In this case, PCA and hierarchical cluster analysis variable correlation might lead to a “falsely” optimized final
(HCA) are predominantly used to reduce the dimensionality device, thus negatively weighing on the performance.
of the data and elucidating a natural pattern. Recently, a Sparse Generally, if the interactions among variables are high, then
Mean approach was proposed as most sensitive and best able a great difference is observed between the optimizations
to identify the specifically perturbed variables in PCA-based obtained by univariate and multivariate approaches. To
methods.70 When multiple sources of variability are present, overcome this limitation, DoE allows to observe all the
PCA may suffer from an interpretational problem and other variables simultaneously, by adopting statistical multivariate
strategies can be used, e.g., ANOVA simultaneous component methods that have the goal of lowering resources and
analysis, (ASCA),71 and ANOVA principal component analysis improving outcomes.78,79 However, even if the multivariate
(ANOVA-PCA).72 However, such unsupervised, exploratory vision contains the above-discussed advantages, a clarification
methods may not be the most straightforward choice for target should be given: if N variables are considered, and each of
identification, as they are not designed to specifically find them is investigated at L levels, all the possible combinations
differences among groups, while target analytes and biomarkers are LN, e.g., 2 variables with 10 values lead to 102 = 100
are supposed to be unique molecular signature of a certain experiments, namely, full factorial. In this case, the multivariate
group.73 Among supervised classification methods, PLS-DA, approach to define the correlation of just two variables
support-vector machine, random forest, and artificial neural produces a high number of experiments. The adoption of DoE
networks are used to force the method to provide the desired might help analytical chemists understanding the effect of
classification as well as to predict the classification of new variables on response. Designs are obtained by combining the
samples. When a statistically significant discrimination among variables through well-defined rules. If the aim is to evaluate
classes (e.g., disease and healthy) is found, it means that a the effect of variables on the response, especially when a
mathematical relationship between the data and the categorical process is unknown, a Plackett−Burman (P−B) design can be
variable y (e.g., the class) was established and therefore it can used for screening experiments:80 it is known as a screening
be used to predict the class of novel samples. As anticipated, design, and it is intended as linear combinations of two levels
supervised methods suffer from the risk of overfitting, which of each variable, i.e., the upper level is signed as “+” and the
arises when the fitting of training data is so well that both the lower level is signed as “−”. This is a very economic approach,
predictive features of the data and noise are incorporated into for screening the contribution of a high number of variables
the model, which will imply poor model performance in the (N) using a number of experiments equal to N + 1 that is a
prediction stage. In order to verify that the model holds a true multiple of 4, e.g., 11 variables can be screened using 12
biochemical meaning and avoid overfitting, we can focus on experiments (Figure 2A). However, it is very important to
variable selection74 and validation to verify that the predictive highlight that this approach (1) is useful to individuate the
2716 https://dx.doi.org/10.1021/acs.analchem.0c04151
Anal. Chem. 2021, 93, 2713−2722
Analytical Chemistry pubs.acs.org/ac Feature
■
metals, e.g., Zn, Cd, Pb, and Cu, and simultaneously detect
them in raw propolis samples through the use of a pencil-based
electrochemical sensor.82 However, users should be aware of DIGITALIZATION
the previously discussed overfitting issue that affects those When thinking about the “next 20 years” scenario, we can
techniques: that is the reason why validation with a relevant foresee with confidence that advances in technologies,
number of objects is highly required. In Table 1, simple cases computerization, and miniaturization are likely to increase.
for users interested in approaching chemometrics tools are The advent of the internet of things (IoT) and of innovative
reported. strategies based on information and communication tech-
2718 https://dx.doi.org/10.1021/acs.analchem.0c04151
Anal. Chem. 2021, 93, 2713−2722
Analytical Chemistry pubs.acs.org/ac Feature
nologies (ICTs), the development of technologies like everyday life, not only providing a mean to personalized, user-
wearables, digital biosensors, smart houses, and smart cities centered data analysis but also preserving privacy through edge
will make it possible to monitor everyone in real time.120 analytics and understanding why and how their data are
Specifically referring to science, all areas of research will analyzed.
become data-intensive, emphasis will shift from data
generation to data analysis, and knowledge of data-mining
techniques will be essential to carry out research, thus bringing
■ CONCLUSIONS
The use of chemometrics tools represent a great resource for
new challenges to researchers.121,122 The combination of developing novel devices for decentralized analysis: depending
biosensors with IoT and ICTs strategies is also essential to on the analytical necessities and experimental settings, a variety
generate population health data that can be used, for instance, of statistical-based approaches might extract plenty of useful
to predict the outbreak of infectious diseases.123 Gartner information from data (sometimes not taken into account).
summarizes these concepts in its definition of big data as “high- The adoption of the mathematical tools behind chemometrics
volume, high-velocity, and/or high-variety information assets is reported to be essential for all the steps around portable
that demand cost-effective, innovative forms of information devices, starting from conceptualization to final application: in
processing that enable enhanced insight, decision making, and fact, the research around analytical devices implementation
process automation”.124 In this scenario, to quote Gasteiger, could benefit from chemometrics through identifying relevant
“the application of chemoinformatics is only limited by your own targets, optimizing architectures, analyzing data, and collecting
imagination!”30 Big data techniques, machine learning, signal information. Chemometrics can make important advances in
theory, hierarchical architecture for the detection of security developing analytical portable devices. It might facilitate (i) the
incidents in a security information system are the basis for the discovery of novel targets, e.g., hypothesis-based and discovery-
entire workflow. In the past 2 decades, the combination of based approaches, (ii) the optimization of experiments, e.g.,
multivariate techniques with the LASSO (least absolute design of experiments, and (iii) the classification of data and
shrinkage and selection operator) operator gave rise to the their processing. In addition, in the digital era, point of needs
so-called sparse models, now popular in the chemometric will be increasingly connected and personalized: chemometrics
community.47,125 These methods adapt multivariate techniques is and will be essential to mine the collected big data, derive
to the huge data dimensionality, generating simple and easier interpretable models for early risk detection and intervention,
to interpret models. Indeed, while the common tendency of and grant privacy and data protection. The aim of the paper is
data mining tools is to make the analysis as simple as possible to offer nonchemometricians starting approaches for develop-
for the end-user, leading in many cases to “black boxes” in ing portable analytical devices. Although it is mainly focused
which advanced data interpretation is very limited, chemo- on basic chemometric tools, data fusion approaches, and
metrics approaches like the ones mentioned above offer the approaches able to deal with highly multivariate data coming
unique opportunity of clearly interpreting and visualizing from designed experiments such as ASCA, APCA, ANOVA-
statistical analysis outcomes, and evaluating its robustness. TP, and rMANOVA should be taken into account for
This is what makes chemometrics “sexy” for the years to come: outcomes from different devices. Chemometrics represents a
the features of being an already up-to-date tool for solving real multidisciplinary pursuit, incorporating chemical, mathemat-
complex problems in an effective and still interpretable thus ical, and computational sciences. The adoption of models to
user-friendly way. In addition, chemometrics offer the describe, process, and differentiate data should never be
possibility of integrating and interpreting the complex independent from the chemical perspective, and focus must be
multidimensional information provided by different sensors/ put on chemical interpretability and predictive ability. The
devices through data fusion techniques.126−128 Data can be years to come bring a new challenge for chemists: to bridge the
fused by simple concatenation of different sensor data (namely, gaps among different disciplines and to find solutions by
low-level data fusion); by fusing the features extracted from the embracing different cultures to the same scientific question.
original data, e.g., via MVA or features selection strategies This Feature is a first step in this path.
■
(midlevel); by merging the different model responses only,
after each data set has been modeled independently (high- AUTHOR INFORMATION
level).129,130 When using data fusion approaches, the trade-off
to be found is in enhancing the quantity and the quality of the Corresponding Author
information content which can be extracted without including Stefano Cinti − Department of Pharmacy, University of
higher amounts of noise, not predictive information. For Naples “Federico II”, 80131 Naples, Italy; BAT
instance, from information provided by wearable sensors, level Center−Interuniversity Center for Studies on Bioinspired
data fusion techniques and inference methods are used for Agro-Environmental Technology, University of Napoli
activity recognition for Parkinson’s disease monitoring,131 fall “Federico II”, 80055 Naples, Italy; orcid.org/0000-0002-
detection and prediction, to physiological monitoring for early 8274-7452; Email: [email protected]
risk detection and intervention.132,133 Clearly, also new issues
Author
arise: ethical and regulatory issues concerning the require-
ments and specifications of data analysis components, the user, Sara Tortorella − Molecular Horizon srl, 06084 Bettona,
and in e-Health applications, patient consent, data, and privacy Perugia, Italy; orcid.org/0000-0001-9691-8323
protection.113 Digital sensors and biometric monitoring should Complete contact information is available at:
clearly empower citizens and hold the promise of huge https://pubs.acs.org/10.1021/acs.analchem.0c04151
potential benefits, but in order to fully exploit these devices, a
cultural effort has to be done to inform and gain users’ Author Contributions
compliance and co-operation. In the digital and IoT era, The manuscript was written through contributions of all
chemometrics will be even closer to the end users in their authors.
2719 https://dx.doi.org/10.1021/acs.analchem.0c04151
Anal. Chem. 2021, 93, 2713−2722
Analytical Chemistry pubs.acs.org/ac Feature
■ ACKNOWLEDGMENTS
S.C. acknowledges the MIUR Grant “Dipartimento di
(33) Applied Chemoinformatics; Engel, T., Gasteiger, J., Eds.; Wiley-
VCH Verlag GmbH & Co. KGaA: Weinheim, Germany, 2018.
(34) Lavine, B. K.; Workman, J. Chemometrics: Past, Present, and
Future. In Chemometrics and Chemoinformatics; ACS Symposium
Eccellenza 2018-2022” to the Department of Pharmacy of Series, Vol. 894; American Chemical Society, 2005; pp 1−13.
University of Naples “Federico II”. Authors acknowledge Julian (35) Lavine, B. K.; Workman, J. Anal. Chem. 2013, 85, 705−714.
Ramirez for proofreading the manuscript.
■
(36) Box, J. F. Am. Stat. 1980, 34, 1−7.
(37) Trygg, J.; Wold, S. Introduction to Statistical Experimental Design
REFERENCES - What is it? Why and Where is it Useful?, 1996; https://www.win.tue.
(1) Wang, S.; Lifson, M. A.; Inci, F.; Liang, L.-G.; Sheng, Y.-F.; nl/~adibucch/6BV04/tutorialTryggWold.pdf.
Demirci, U. Expert Rev. Mol. Diagn. 2016, 16, 449−459. (38) L. Eriksson, E.; Johansson; Wold, N. K.; Wikstrom, C.; Wold, S.
(2) Chin, C. D.; Linder, V.; Sia, S. K. Lab Chip 2012, 12, 2118− Design of Experiments,Principles and Applications; Carlson, R., Ed.;
2134. Umetrics AB, Umea Learnways AB: Stockholm, Sweden, 2001; Vol.
(3) Nayak, S.; Blumenfeld, N. R.; Laksanasopin, T.; Sia, S. K. Anal. 15.
Chem. 2017, 89, 102−123. (39) Lanati, A.; Poli, C.; Imberti, M.; Menegon, A.; Grohovaz, F. A
(4) Gubala, V.; Harris, L. F.; Ricco, A. J.; Tan, M. X.; Williams, D. E. design of experiment approach to optimize an image analysis protocol
Anal. Chem. 2012, 84, 487−515. for drug screening. In Mathematical Models in Biology; Springer:
(5) Dai, Y.; Liu, C. C. Angew. Chem. 2019, 131, 12483−12496. Cham, Switzerland, 2015; pp 65−84.
(6) Urdea, M.; Penny, L. A.; Olmsted, S. S.; Giovanni, M. Y.; Kaspar, (40) Box, G. E. P.; Hunter, J. S.; Hunter, W. G. Statistics for
P.; Shepherd, A.; Wilson, P.; Dahl, C. A.; Buchsbaum, S.; Moeller, G.; Experimenters: Design, Innovation, and Discovery, 1st ed.; Wiley, 1978.
Hay Burgess, D. C. Nature 2006, 444, 73−79. (41) de Aguiar, P. F.; Bourguignon, B.; Khots, M. S.; Massart, D. L.;
(7) Weiss, C.; Carriere, M.; Fusco, L.; Capua, I.; Regla-Nava, J. A.; Phan-Than-Luu, R. Chemom. Intell. Lab. Syst. 1995, 30, 199−210.
Pasquali, M.; Scott, J. A.; Vitale, F.; Unal, M. A.; Mattevi, C.; et al. (42) Smilde, A.; Bro, R.; Geladi, P. Multi-Way Analysis with
ACS Nano 2020, 14, 6383−6406. Applications in the Chemical Sciences; Wiley: Chichester, U.K., 2004.
(8) https://www.un.org/sustainabledevelopment/development- (43) Trygg, J.; Holmes, E.; Lundstedt, T. J. Proteome Res. 2007, 6,
agenda/ 469−479.
(9) Cinti, S.; Moscone, D.; Arduini, F. Nat. Protoc. 2019, 14, 2437− (44) Faber, N. M.; Rajkó, R. Anal. Chim. Acta 2007, 595, 98−106.
2451. (45) Wold, S.; Sjöström, M.; Eriksson, L. Chemom. Intell. Lab. Syst.
(10) Parolo, C.; Sena-Torralba, A.; Bergua, J. F.; Calucho, E.; 2001, 58, 109−130.
Fuentes-Chust, C.; Hu, L.; Rivas, L.; Á lvarez-Diduk, R.; Nguyen, E. (46) Barker, M.; Rayens, W. J. Chemom. 2003, 17, 166−173.
P.; Cinti, S.; et al. Nat. Protoc. 2020, 15, 3788−3816. (47) Camacho, J.; Saccenti, E. J. Chemom. 2018, 32, e2964.
2720 https://dx.doi.org/10.1021/acs.analchem.0c04151
Anal. Chem. 2021, 93, 2713−2722
Analytical Chemistry pubs.acs.org/ac Feature
(48) Lê Cao, K.-A.; Rossouw, D.; Robert-Granié, C.; Besse, P. Stat. (79) Vera Candioti, L.; De Zan, M. M.; Camara, M. S.; Goicoechea,
Appl. Genet. Mol. Biol. 2008, 7, 35. H. C. Talanta 2014, 124, 123−138.
(49) Trygg, J.; Wold, S. J. Chemom. 2002, 16, 119−128. (80) Vander Heyden, Y.; Nijhuis, A.; Smeyers-Verbeke, J.;
(50) De Luca, S.; Bucci, R.; Magrì, A. D.; Marini, F. In Encyclopedia Vandeginste, B. G. M.; Massart, D. L. J. Pharm. Biomed. Anal. 2001,
of Analytical Chemistry: Applications, Theory and Instrumentation; 24, 723−753.
Wiley, 2006; pp 1−24. (81) Avoundjian, A.; Jalali-Heravi, M.; Gomez, F. A. Anal. Bioanal.
(51) Calvani, R.; Marini, F.; Cesari, M.; Tosato, M.; Anker, S. D.; Chem. 2017, 409, 2697−2703.
von Haehling, S.; Miller, R. R.; Bernabei, R.; Landi, F.; Marzetti, E. J. (82) Pierini, G. D.; Pistonesi, M. F.; Di Nezio, M. S.; Centurión, M.
Cachexia Sarcopenia Muscle 2015, 6, 278−286. E. Microchem. J. 2016, 125, 266−272.
(52) Cramer, R. D.; Redl, G.; Berkoff, C. E. J. Med. Chem. 1974, 17, (83) Hamedpour, V.; Leardi, R.; Suzuki, K.; Citterio, D. Analyst
533−535. 2018, 143, 2102−2108.
(53) McFarland, J. W.; Gains, D. J. In Comprehensive Medicinal (84) NIST/SEMATECH e-Handbook of Statistical Methods,
Chemistry; Ramsden, C. A., Ed.; Pergamon Press: New York, 1990; pp http://www.itl.nist.gov/div898/handbook/ (Accessed Dec 30, 2020).
667−689. (85) Ferreira, S.L.C.; Bruns, R.E.; Ferreira, H.S.; Matos, G.D.; David,
(54) Rumelhart, D. E.; Hinton, G. E.; Williams, R. J. Nature 1986, J.M.; Brandao, G.C.; da Silva, E.G.P.; Portugal, L.A.; dos Reis, P.S.;
323, 533−536. Souza, A.S.; dos Santos, W.N.L. Anal. Chim. Acta 2007, 597, 179−
(55) Schneider, G.; Wrede, P. Prog. Biophys. Mol. Biol. 1998, 70, 186.
175−222. (86) Hamedpour, V.; Postma, G. J.; van Den Heuvel, E.; Jansen, J. J.;
(56) Zupan, J.; Gasteiger, J. Neural Networks in Chemistry and Drug Suzuki, K.; Citterio, D. Anal. Bioanal. Chem. 2018, 410, 2305−2313.
Design, 2nd ed.; John Wiley & Sons, Inc.: New York, 1999. (87) Lundstedt, T.; Seifert, E.; Abramo, L.; Thelin, B.; Nyström, Å.;
(57) Salzberg, S. L. Mach. Learn. 1994, 16, 235−240. Pettersen, J.; Bergman, R. Chemom. Intell. Lab. Syst. 1998, 42, 3−40.
(58) Berk, R. A. In Statistical Learning from a Regression Perspective; (88) Ni, Y.; Kokot, S. Anal. Chim. Acta 2008, 626, 130−146.
Springer New York: New York, 2008; pp 1−65. (89) Jayawardane, B. M.; Wei, S.; McKelvie, I. D.; Kolev, S. D. Anal.
(59) Saeh, J. C.; Lyne, P. D.; Takasaki, B. K.; Cosgrove, D. A. J. Chem. 2014, 86, 7274−7279.
Chem. Inf. Model. 2005, 45, 1122−1133. (90) Forina, M.; Oliveri, P.; Casale, M.; Lanteri, S. Anal. Chim. Acta
(60) Hann, M. M.; Leach, A. R.; Harper, G. J. Chem. Inf. Comput. Sci. 2008, 622, 85−93.
2001, 41, 856−864. (91) Biancolillo, A.; Marini, F. Front. Chem. 2018, 6, 576.
(61) Diabetes Care 2010, 33 Suppl 1, S4−S10,. (92) Di Natale, C.; Paolesse, R.; Macagnano, A.; Mantini, A.;
(62) McDermott, J. E.; Wang, J.; Mitchell, H.; Webb-Robertson, B.- D’Amico, A.; Legin, A.; Lvova, L.; Rudnitskaya, A.; Vlasov, Y. Sens.
J.; Hafen, R.; Ramey, J.; Rodland, K. D. Expert Opin. Med. Diagn. Actuators, B 2000, 64, 15−21.
2013, 7, 37−51. (93) Wasilewski, T.; Migoń, D.; Gebicki, J.; Kamysz, W. Anal. Chim.
(63) Canfield, R. E.; O’Connor, J. F.; Birken, S.; Krichevsky, A.; Acta 2019, 1077, 14−29.
(94) Christodoulides, N.; Floriano, P. N.; Miller, C. S.; Ebersole, J.
Wilcox, A. J. Environ. Health Perspect. 1987, 74, 57−66.
(64) Friedman, L. S.; Ostermeyer, E. A.; Lynch, E. D.; Szabo, C. I.; L.; Mohanty, S.; Dharshan, P.; Griffin, M.; Lennart, A.; Ballard, K. L.
M.; King, C. P., Jr; et al. Ann. N. Y. Acad. Sci. 2007, 1098, 411−428.
Anderson, L. A.; Dowd, P.; Lee, M. K.; Rowell, S. E.; Boyd, J.; King,
(95) Helfer, G. A.; Tischer, B.; Filoda, P. F.; Parckert, A. B.; dos
M. C. Cancer Res. 1994, 54, 6374−6382.
Santos, R. B.; Vinciguerra, L. L.; Ferrão, M. F.; Barin, J. S.; da Costa,
(65) Zhang, Z.; Yu, Y.; Xu, F.; Berchuck, A.; van Haaften-Day, C.;
A. B. Food Anal. Methods 2018, 11, 2022−2028.
Havrilesky, L. J.; de Bruijn, H. W. A.; van der Zee, A. G. J.; Woolas, R.
(96) Mishra, R. K.; Alonso, G. A.; Istamboulie, G.; Bhand, S.; Marty,
P.; Jacobs, I. J.; et al. Gynecol. Oncol. 2007, 107, 526−531. J. L. Sens. Actuators, B 2015, 208, 228−237.
(66) Robotti, E.; Manfredi, M.; Marengo, E. J. Proteomics Bioinform. (97) Scampicchio, M.; Mannino, S.; Zima, J.; Wang, J. Electroanalysis
2014, S3, 003. 2005, 17, 1215−1221.
(67) Zhang, Z.; Barnhill, S. D.; Zhang, H.; Xu, F.; Yu, Y.; Jacobs, I.; (98) Bevilacqua, M.; Bro, R.; Marini, F.; Rinnan, Å.; Rasmussen, M.
Woolas, R. P.; Berchuck, A.; Madyastha, K. R.; Bast, R. C., Jr Gynecol. A.; Skov, T. TrAC, Trends Anal. Chem. 2017, 96, 42−51.
Oncol. 1999, 73, 56−61. (99) Camacho, J.; Picó, J.; Ferrer, A. J. Chemom. 2008, 22, 299−308.
(68) Buas, M. F.; Gu, H.; Djukovic, D.; Zhu, J.; Drescher, C. W.; (100) Shintu, L.; Baudoin, R.; Navratil, V.; Prot, J. M.; Pontoizeau,
Urban, N.; Raftery, D.; Li, C. I. Gynecol. Oncol. 2016, 140, 138−144. C.; Defernez, M.; Blaise, B. J.; Domange, C.; Péry, A. R.; Toulhoat, P.;
(69) Zhang, Z.; Chan, D. W. Cancer Epidemiol., Biomarkers Prev. et al. Anal. Chem. 2012, 84, 1840−1848.
2010, 19, 2995−2999. (101) Rojas, J.; Fontana Tachon, A.; Chevalier, D.; Noguer, T.;
(70) Koeman, M.; Engel, J.; Jansen, J.; Buydens, L. Sci. Rep. 2019, 9, Marty, J.L.; Ghommidh, Ch. Sens. Actuators, B 2004, 102, 284−290.
1123. (102) Jiménez-Carvelo, A. M.; Salloum-Llergo, K. D.; Cuadros-
(71) Smilde, A. K.; Jansen, J. J.; Hoefsloot, H. C.; Lamers, R. J. A.; Rodríguez, L.; Capitán-Vallvey, L. F.; Fernández-Ramos, M. D.
Van Der Greef, J.; Timmerman, M. E. Bioinformatics 2005, 21, 3043− Microchem. J. 2020, 157, 104930.
3048. (103) Abbasitabar, F.; Zare-Shahabadi, V.; Shamsipur, M.; Akhond,
(72) Harrington, P. D. B.; Vieira, N. E.; Espinoza, J.; Nien, J. K.; M. Sens. Actuators, B 2011, 156, 181−186.
Romero, R.; Yergey, A. L. Anal. Chim. Acta 2005, 544, 118−127. (104) Marini, F.; de Beer, D.; Joubert, E.; Walczak, B. J. Chromatogr.
(73) Hendriks, M. M. W. B.; Eeuwijk, F. A. va.; Jellema, R. H.; A 2015, 1405, 94−102.
Westerhuis, J. A.; Reijmers, T. H.; Hoefsloot, H. C. J.; Smilde, A. K. (105) van Der Leeden, R. Qual. Quant. 1998, 32, 15−29.
TrAC, Trends Anal. Chem. 2011, 30, 1685−1698. (106) Engel, J.; Blanchet, L.; Engelke, U. F.; Wevers, R. A.; Buydens,
(74) Kvalheim, O. M.; Arneberg, R.; Bleie, O.; Rajalahti, T.; Smilde, L. M. PLoS One 2014, 9, e92452.
A. K.; Westerhuis, J. A. J. Chemom. 2014, 28, 615−622. (107) Alonso-Lomillo, M. A.; Dominguez-Renedo, O.; Ferreira-
(75) Szymańska, E.; Saccenti, E.; Smilde, A. K.; Westerhuis, J. A. Goncalves, L.; Arcos-Martinez, M. J. Biosens. Bioelectron. 2010, 25,
Metabolomics 2012, 8, 3−16. 1333−1337.
(76) Leardi, R. Anal. Chim. Acta 2009, 652, 161−172. (108) Hamedpour, V.; Leardi, R.; Suzuki, K.; Citterio, D. Analyst
(77) Ferreira, S. L.; Lemos, V. A.; de Carvalho, V. S.; da Silva, E. G.; 2018, 143, 2102−2108.
Queiroz, A. F.; Felix, C. S.; da Silva, D. L. F.; Dourado, G. B.; Oliveira, (109) Asadollahi-Baboli, M.; Mani-Varnosfaderani, A. Measurement
R. V. Microchem. J. 2018, 140, 176−182. 2014, 47, 145−149.
(78) Bezerra, M. A.; Santelli, R. E.; Oliveira, E. P.; Villar, L. S.; (110) Risoluti, R.; Gregori, A.; Schiavone, S.; Materazzi, S. Anal.
Escaleira, L. A. Talanta 2008, 76, 965−977. Chem. 2018, 90, 4288−4292.
2721 https://dx.doi.org/10.1021/acs.analchem.0c04151
Anal. Chem. 2021, 93, 2713−2722
Analytical Chemistry pubs.acs.org/ac Feature
2722 https://dx.doi.org/10.1021/acs.analchem.0c04151
Anal. Chem. 2021, 93, 2713−2722