Agronomy 12 00748 v2
Agronomy 12 00748 v2
Review
Challenges to Use Machine Learning in Agricultural Big Data:
A Systematic Literature Review
Ania Cravero 1, * , Sebastian Pardo 1 , Samuel Sepúlveda 1 and Lilia Muñoz 2
1 Department of Computer Science and Informatics, Center for Software Engineering Studies,
Universidad de La Frontera, Temuco 4780000, Chile; [email protected] (S.P.);
[email protected] (S.S.)
2 Faculty of Computer Systems Engineering, Universidad Tecnológica de Panamá, Panama City 32401, Panama;
[email protected]
* Correspondence: [email protected]
Abstract: Agricultural Big Data is a set of technologies that allows responding to the challenges of the
new data era. In conjunction with machine learning, farmers can use data to address problems such
as farmers’ decision making, water management, soil management, crop management, and livestock
management. Crop management includes yield prediction, disease detection, weed detection, crop
quality, and species recognition. On the other hand, livestock management considers animal welfare
and livestock production. The purpose of this paper is to synthesize the evidence regarding the
challenges involved in implementing machine learning in agricultural Big Data. We conducted a
systematic literature review applying the PRISMA protocol. This review includes 30 papers published
from 2015 to 2020. We develop a framework that summarizes the main challenges encountered,
machine learning techniques, and the leading technologies used. A significant challenge is the design
of agricultural Big Data architectures due to the need to modify the set of technologies adapting the
machine learning techniques as the volume of data increases.
Citation: Cravero, A.; Pardo, S.;
Sepúlveda, S.; Muñoz, L. Challenges Keywords: Big Data; machine learning; agriculture; challenges; systematic literature review
to Use Machine Learning in
Agricultural Big Data: A Systematic
Literature Review. Agronomy 2022, 12,
748. https://doi.org/10.3390/ 1. Introduction
agronomy12030748
In order to meet the global food demand by 2050, increased food production from
Academic Editors: Gniewko Niedbała, 25% to 70% is required [1]. Considering this increase, food production per hectare needs
Sebastian Kujawa and Paul Kwan to double when the world population stabilizes around 2100 (United Nations 2019). Food
Received: 2 February 2022
security is a fundamental global need, which is threatened by several factors such as
Accepted: 17 March 2022
population growth, shrinking arable land, climate change, food waste, and consumer
Published: 21 March 2022
preference for animal protein [2]. Increasing agriculture or food production rapidly to meet
the growing demand for food supply is not an easy task. Several factors contribute to
Publisher’s Note: MDPI stays neutral
this problem, such as current agricultural practices, poor storage, markets, and changing
with regard to jurisdictional claims in
scenarios [3].
published maps and institutional affil-
For offering sustainable agricultural production, it is necessary to use cutting-edge tech-
iations.
nologies such as blockchain, IoT, Big Data, and machine learning (ML), among others [3,4].
Data-driven agriculture among these technologies is the most promising approach to solve
current and future issues. If it were possible to generate a large amount of data from
Copyright: © 2022 by the authors.
farms and use it to drive some agricultural decisions, most global food problems could be
Licensee MDPI, Basel, Switzerland. solved [3]. For example, if farmers could build data sets or maps for diverse environmental
This article is an open access article factors around the farm, they could implement techniques such as smart farming, precision
distributed under the terms and farming, vertical farming, and others. It has been proven that data-driven agriculture
conditions of the Creative Commons improves crop yields, reduces costs, and guarantees sustainability [5].
Attribution (CC BY) license (https:// Li et al. explained that agricultural Big Data is a part of cutting-edge technology. It
creativecommons.org/licenses/by/ comprises concepts, technology, specific measures, and an entire gamut of agricultural
4.0/). activities such as farming and planting. The same authors stated that by incorporating
papers for the study. Section 4 contains the results derived from the analysis of the selected
papers. Section 5 discusses the main challenges of applying ML in agricultural Big Data.
Finally, Section 6 presents the conclusions.
Because of the large number of abbreviations used in the relative scientific works,
Tables 1 and 2 list the abbreviations that appear in this work, categorized into ML techniques
and learning algorithms, and general abbreviations, respectively.
Abbreviation Meaning
DL Deep learning
ANN Artificial neural networks
SVM Support vector machines
DT Decision trees
NN Neural networks
RF Random forest
CNN Convolutional neural networks
RNN Recurrent neural networks
RBN Restricted Boltzmann machine
DBN Deep belief network
SNIC Simple non-iterative clustering
SLIC Simple linear iterative clustering
KC K-means clustering
BC Bagged clustering
RPT Recursive partition trees
BDT Booster decision trees
BCT Bootstrap classification trees
SB Stochastic boosting
LR Logistic regression
AR Autoregression
ARIMA Autoregressive integrated moving average
VAR Vector autoregression
KNN K-nearest neighbors
GLM Generalized linear model
GBM Gradient-boosting machine
Abbreviation Meaning
PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses
IoT Internet of things
ML Machine learning
SLR Systematic literature review
DL Deep learning
AI Artificial intelligence
UAV Unmanned aerial vehicle
ICT Information and communications technology
NDVI Normalized difference vegetation index
ACM Association for Computing Machinery
IEEE Institute of Electrical and Electronics Engineers
MDPI Multidisciplinary Digital Publishing Institute
WoS Web of Science
BDM Big Data application machine learning-based smart farm system
AAFC Agriculture and Agri-Food Canada
ET0 Reference evapotranspiration
MLC Multi-label classification
RBF Radial basis function
AUC Area under curve
GEE Google Earth Engine
GPS Global Positioning System
Agronomy 2022, 12, 748 4 of 34
Table 2. Cont.
Abbreviation Meaning
HDFS Hadoop Distributed File System
TCP Transmission Control Protocol
NASA National Aeronautics and Space Administration
ESA European Space Agency
CSV Comma-separated values
CPU Central processing unit
GIS Geographic information systems
2. Background
In this section, we explain the basic concepts of ML and Big Data. On the one hand,
we explain the use of ML in agriculture, and on the other hand, the use of Big Data and its
development in agriculture. Finally, we mention the main challenges in agricultural Big
Data, described in the literature.
Briefly, supervised learning and unsupervised learning mainly focus on data analysis,
while reinforced learning is preferred for decision-making problems.
In general, the goal of ML algorithms is to optimize the performance of a task by
exploiting examples or past experiences. By exploiting examples or past experiences, ML
can generate efficient relationships for data inputs and reconstruct a knowledge schema to
analyze large data volumes [16].
sis, while reinforced learning is preferred for decision-making problems.
In general, the goal of ML algorithms is to optimize the performanc
exploiting examples or past experiences. By exploiting examples or past ex
can generate efficient relationships for data inputs and reconstruct a know
Agronomy 2022, 12, 748 5 of 34
to analyze large data volumes [16].
On the contrary, deep learning (DL) is a branch of ML that tries to mod
withOn atheseries ofdeep
contrary, algorithms
learning (DL)byisusing
a branchaofdeep
ML thatlayer with
tries to multiple
model abstractionsprocessi
with a series
which of great
is of algorithms by using
interest in thea deep layer with
artificial multiple processing
intelligence layers.
field, has comeDL, to the
which is of great interest in the artificial intelligence field, has come to the fore in natural
language processing and image classification [17]. Figure 1 shows the re
language processing and image classification [17]. Figure 1 shows the relationship between
tween
AI, AI,DL.
ML, and ML, and DL.
Figure
Figure 1. Relationship
1. Relationship betweenbetween AI,
AI, ML, and DL.ML, and DL.
DL has algorithms such as convolutional neural networks (CNN), recurrent neural
DL(RNN),
networks has algorithms such as
restricted Boltzmann convolutional
machine (RBM), and deepneural
beliefnetworks
network (DBN).(CNN), re
networks (RNN), restricted Boltzmann machine (RBM), and deep belief n
Furthermore, DL has the advantages of processing unstructured data at maximum capacity,
producing high-quality results, and avoiding unnecessary costs.
Furthermore, DL has the advantages of processing unstructured data at ma
ML has been used to solve different agricultural problems in crop management,
ity, producing
including high-quality
yield prediction, results, weed
disease detection, and avoiding unnecessary
detection, crop costs.
quality, and species
ML in
recognition; has been management,
livestock used to solve different
including animalagricultural problems
welfare and livestock in crop ma
production;
in water management; and in soil management [9,16,17]. Figure 2 shows a summary of
cluding yield prediction, disease detection, weed detection, crop qualit
agricultural problems solved using ML. The figure was modified from the reference [16].
recognition;
An example in livestock
of this management,
is that many including
producers say that weeds areanimal
the mostwelfare and
severe threat to livesto
in water management; and in soil management [9,16,17]. Figure 2 shows
crop production. Accurate weed detection is essential for sustainable agriculture because
weeds are difficult to detect and distinguish from crops. ML algorithms, along with sensors,
agricultural problems solved using ML. The figure was modified from the
now allow accurate detection and identification of weeds without causing environmental
Anorexample
problems secondary of this ML
effects. is that many
for weed producers
detection has ledsay that weeds
to developing are
tools andthe mos
to crop
robots production.
to destroy Accuratethe
weeds, minimizing weedneed detection
for herbicidesis [9].
essential foraccurate
Therefore, sustainable
detection and classification of the characteristics of crop quality have increased product
cause weeds are difficult to detect and distinguish from crops. ML algorith
values and reduced waste.
sensors,
Figure now allow
3 presents accurate
a graph showingdetection andML
the different identification
techniques used oftoweeds
improvewithou
ronmental problems or secondary effects. ML for weed detection
agriculture. Liakos et al. explained that eight ML models had been implemented; fivehas led
of these models were implemented for crop management, and the most popular models
were artificial neural networks (ANN). Four ML models were implemented for livestock
management, with the most popular models being support vector machines (SVM). For
water management, particularly evapotranspiration estimation, two ML models were
implemented, and the most frequently implemented were ANN. Finally, four ML models
were implemented in the soil management category, with the most popular one being the
ANN model. In summary, ML models have been applied in multiple applications for crop
management (61%), mostly yield prediction (20%), and disease detection (22%) [9].
tools and robots to destroy weeds, minimizing the need for herbicides [9]. There
Agronomy 2022, 12, 748 curate detection and classification of the characteristics of crop quality
6 of 34 have i
Figure
Figure 2. Use
2. Use of in
of ML ML in agriculture,
agriculture, modifiedmodified from the
from the reference reference
[16]. [16].
On the other hand, Liakos et al. point out that when data recordings are involved,
Figure 3 presents a graph showing the different ML techniques used to imp
occasionally at the level of Big Data, the implementations of ML are less in number, mainly
riculture.
because of theLiakos et efforts
increased al. explained that
required for eightanalysis
the data ML models
task, not had been
for the implemented
ML models
these
per models
se. This were implemented
fact partially for crop
explains the almost equal management,
distribution of ML and the mostinpopula
applications
Agronomy 2022, 12, x FOR PEER REVIEW
were artificial neural networks (ANN). Four ML models were implemented
livestock management (19%), water management (10%), and soil management (10%) [9]. for
7 of
Benos et al. updated the information presented in Figure 3 through
management, with the most popular models being support vector machines (SV a systematic
study of ML use in agriculture during the years 2018 to 2020 [16]. Figure 4 presents the
water management, particularly evapotranspiration estimation, two ML models
results obtained.
plemented, and the most frequently implemented were ANN. Finally, four ML
were implemented in the soil management category, with the most popular one b
ANN model. In summary, ML models have been applied in multiple applications
management (61%), mostly yield prediction (20%), and disease detection (22%) [
On the other hand, Liakos et al. point out that when data recordings are i
occasionally at the level of Big Data, the implementations of ML are less in numbe
because of the increased efforts required for the data analysis task, not for the ML
per se. This fact partially explains the almost equal distribution of ML application
stock management (19%), water management (10%), and soil management (10%
Benos et al. updated the information presented in Figure 3 through a sy
study of ML use in agriculture during the years 2018 to 2020 [16]. Figure 4 pre
results obtained.
Figure4.4.ML
Figure MLmodels
models giving
giving the the
bestbest output
output [16]. [16].
It is observed that the ML ANN algorithm is still the preferred algorithm for data
It is observed that the ML ANN algorithm is still the preferred algorithm for data
analysis. On the other hand, Ensemble learning has been gaining ground and outperforms
analysis. On the other hand, Ensemble learning has been gaining ground and outperform
other algorithms such as SVM and decision trees (DS). According to Benos et al., the most
other algorithms
commonly used data such
comeasfrom
SVMmeteorology,
and decision trees
soil, (DS).
water andAccording
crop quality,toremote
Benossensing,
et al., the mos
satellite imagery, UAVs and UAVs, and in situ and laboratory measurements [16]. remote
commonly used data come from meteorology, soil, water and crop quality, The sens
ing, frequent
most satellite MLimagery,
model UAVs
providingandtheUAVs, and in
best output situ
was, byand laboratory
far, ANN, which measurements
appeared in [16]
The most
almost frequent
half of ML model
the reviewed studiesproviding
(51.8%). RNN the followed,
best output was, by approximately
representing far, ANN, which ap
10% of ANNs,
peared with long
in almost halfshort-term memory studies
of the reviewed standing(51.8%).
out, as it RNN
can optimize
followed,it. The second
representing ap
most accurate ML model was ensemble learning (EL), contributing
proximately 10% of ANNs, with long short-term memory standing out, as it can optimizto the ML models
used in agricultural
it. The second most systems withML
accurate approximately
model was22.2%, and regression
ensemble learning models came next to th
(EL), contributing
with an equal percentage, namely 4.7%. Both of these ML models were presented in all
ML models used in agricultural systems with approximately 22.2%, and regression mod
generic categories.
els came
Benosnext with
et al. an equal
conclude percentage,
that namely
the increasing 4.7%.
interest in Both of thesein
ML analyses ML models were pre
agricultural
sented in all generic categories.
applications is captured. When comparing the number of relevant studies, between 2018
Benos
and 2019, et al.
there wasconclude
an increasethat
of the
26%.increasing
For 2020, theinterest in ML analyses
corresponding increase in agricultural
jumped to ap
plications
109% againstis2019
captured.
findings;When
thus,comparing theoverall
resulting in an number 164% of rise
relevant studies,
compared withbetween
2018. The2018 and
accelerating
2019, there rate
wasofanthe researchofinterest
increase 26%. Forin ML in agriculture
2020, is a consequence
the corresponding increaseofjumped
various to 109%
factors, following the considerable advancements of ICT systems in agriculture [16].
The increased interest in ML research in agriculture is a consequence of several factors:
the considerable advances in ICT systems in agriculture; the vital need to increase the
efficiency of agricultural practices while reducing the environmental burden; and the need
for reliable measurements with the handling of large volumes of data [16,17].
The fourth dimension is veracity, which refers to the ability to validate the quality of the
data used in the analyses.
Apart from the “4 Vs”, another dimension of Big Data must also be considered: its
value. The value is obtained by analyzing data to extract hidden patterns, trends, and
knowledge models through algorithms and smart data analysis techniques. Data science
methods increase the value of data by better understanding their phenomena and be-
haviors, optimizing processes, and improving the discoveries of machines, businesses,
and scientists [20]. Therefore, we cannot consider the science of Big Data without includ-
ing data analysis and ML as critical steps for numbering value among Big Data science
strategies [21].
In practice, Big Data analysis tools enable data scientists to discover correlations and
patterns by analyzing massive quantities of data from different sources. In recent years,
the science of Big Data has become an essential modern discipline for data analysis [21].
It is considered an amalgam of classic disciplines such as statistics, artificial intelligence,
mathematics, and informatics with its sub-disciplines, including database systems, ML,
and distributed systems [22].
The Big Data ecosystem handles the evolution of data, models, and support infras-
tructure throughout its life cycle; it is a whole set of components, or architecture, storing,
processing, and visualizing data and delivering results to guide applications [23,24]. The
framework architecture of Big Data in Figure 5 includes data storage, information manage-
ment, data processing, data analysis, and interface and visualization components.
As shown in Figure 5, the Big Data process starts with the identification of the sources
from which useful data are extracted [18]. Next, the data are stored in one of the designed
data models depending on whether the data are structured or not. In the following step,
the data are classified and filtered according to the type of analysis required. Then,
Agronomy 2022, 12, x FOR PEER REVIEW it is
9 of 34
defined whether the processing will be by batch, stream, or memory storage [25]. The
classified data are analyzed using appropriate tools such as DL [26], ad hoc analysis [25],
and data science in general [27]. The data obtained must be presented through some kind
retailers to monitor consumer purchases by improving product traceability throughout
of visualization tool. Finally, the data are analyzed by the decision makers [24].
the supply chain [30].
Big Data in agriculture refers to all the modern technology available combined with
data analysis as a foundation for making decisions only based on data [28]. Figure 6 shows
the agricultural Big Data evolution. The figure was modified from the reference [28].
Big Data has been used to improve various aspects of agriculture, such as knowledge
about weather and climate change, land, animal research, crops, soil, weeds, food availabil-
ity and security, biodiversity, farmers’ decision making, farmers’ insurance and finance,
and remote sensing [29]. It is also used to create platforms that allow the supply chain
actors to have access to high-quality products and processes, tools to improve yields and
predict demand, and advice and guidance to farmers based on the response capacity of
their crops to fertilizers leading to better fertilizer use. Furthermore, Big Data has led to
the introduction of plant-scanning equipment used to follow up on deliveries and allow
retailers to monitor consumer purchases by improving product traceability throughout the
supply
Figure chain [30]. architecture of Big Data.
5. General
Figure 6. Agricultural Big Data evolution, modified from the reference [28].
Big Data does not function in isolation. It has been used with other technologies
such as ML, cloud-based platforms, image processing, modeling and simulation, statistical
analysis, NDVI vegetation indices, and geographic information systems (GIS) [29]. ML
tools have been used in prediction, grouping, and classification problems, while image
processing has been used when the data are extracted from images (i.e., cameras and remote
sensing) [29].
Agronomy 2022, 12, 748 10 of 34
techniques, and more effective and efficient data analytics to understand and support the
functions of several agricultural verticals. On the other hand, Weersink et al. explained
that the data must be collected consistently and fulfill the protocols that can group them
into centralized servers. These servers must be protected from cyberattacks while masking
the identity of the operation managers [31].
Coble et al. analyze the challenges and opportunities of Big Data in agriculture
and conclude that these technologies will lead to relevant analytics at every stage of
the agricultural value chain. The authors believe that there are relevant policy, farm
management, supply chain, consumer demand, and sustainability issues. A significant
challenge mentioned by the authors is the management of data repositories due to the
volume and variety [32]. According to Coble et al., data service providers struggle to
attract a critical mass of farmers to submit farm data to repositories. This concern is partly
because the value of an agricultural data community ultimately depends on the number
of farms and acres in the system, i.e., the size of the network. Concerning data variability,
different levels of data quality are available, e.g., some farmers are known for not correctly
labeling on-farm production data or for not considering all sowing data. These aspects are
of utmost importance for the system to deliver a proper analysis and the farmer to make a
correct decision. The authors point out that progress must be made in creating public data
repositories, engaging both large and small farmers in real collaboration.
On the other hand, Misra et al. present an overview of Big Data, AI, and IoT and their
disruptive role in shaping the future of agri-food systems [33]. The authors discuss these
technologies in greenhouse monitoring, smart farm machines, drone-based crop imaging,
supply chain modernization, social media (for open innovation and sentiment analysis) in
the food industry, food quality assessment (using spectral methods and sensor fusion), and
food safety. They indicate an economic impact from the point of view of productivity, lower
cost of production, and improved quality. Therefore, adopting technological innovations
and taking advantage of them is essential for modern agriculture and the food industry.
Our paper analyzes the main challenges in agricultural Big Data when incorporating
ML for data analysis. The challenges are classified based on the intrinsic characteristics
of Big Data, the four Vs, and the ML data analysis itself. From the data found, we pro-
pose a framework that summarizes the challenges, the ML techniques, and the leading
technologies used to provide information for future research.
3. Methodology
The research method used for this paper was SLR. For the selection of articles, the
PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) method
was applied, which contains four stages: identification, screening, eligibility, and inclu-
sion [12]. To define the objectives and research questions (RQ), we used the methodology
proposed by Kitchenham and Charters [34]. The analysis stems from a series of RQ such as
(1) what kind of problems are solved using agricultural Big Data and ML, (2) what is the
agricultural line of business in which the problems are attempted to be solved, (3) what
are the main ML techniques used to analyze the data, (4) what technological tools are used
to implement agricultural Big Data, and (5) what are the challenges to implement ML in
agricultural Big Data.
The search string was constructed as follows: (a) from the RQs we obtained keywords;
(b) we applied the population–intervention–comparison–results–context criterion to frame
the RQs (PICOC [35]) criterion. According to Kitchenham and Charters [34], the population
corresponds to an application area, namely, agricultural Big Data. On the other hand, the
intervention deals with the challenges in applying ML. In our case, it is not appropriate to
apply the comparison. The results obtained are the identified problems, the corresponding
solutions, and challenges when applying various ML techniques.
Moreover, we accessed Scopus, Springer, ACM, IEEE, MDPI, and Web of Science. The
search strings used in all data sources were “Big Data”, “machine learning” and “agriculture
or farm”. We looked for these strings in the articles’ titles, keywords, and abstracts. In
propriate to apply the comparison. The results obtained are the identified problems, the
corresponding solutions, and challenges when applying various ML techniques.
Moreover, we accessed Scopus, Springer, ACM, IEEE, MDPI, and Web of Science
The search strings used in all data sources were “Big Data”, “machine learning” and “ag
Agronomy 2022, 12, 748 12 of 34
riculture or farm”. We looked for these strings in the articles’ titles, keywords, and ab
stracts. In addition, we considered only scientific articles and conferences in English pub
lished from 2015 to 2020.
addition, we considered only scientific articles and conferences in English published from
In descending order, we identified 580 potential articles in the Scopus database, 567
2015 to 2020.
in theInWeb of Science
descending database,
order, 486 in
we identified 580Springer,
potential 356 in IEEE,
articles in the309 in ACM,
Scopus and567
database, 270inin MDPI
the Web of Science database, 486 in Springer, 356 in IEEE, 309 in ACM, and 270 in MDPI.selection
We excluded books, book chapters, working papers, and press articles. This final
resulted
We in 30
excluded relevant
books, book articles.
chapters, working papers, and press articles. This final selection
In the
resulted in 30identification phase, 2568 articles were examined. This examination was fol
relevant articles.
lowed In the
by aidentification phase,the
screening where 2568 articles were
duplication examined.
criterion was This examination
applied. was fol-resulted
This screening
lowed by a screening where the duplication criterion was applied. This screening
in 1489 identified articles. The abstracts of these articles were then reviewed, and we resulted
in 1489 identified
checked whetherarticles. The contained
or not they abstracts ofthethese
nexusarticles
betweenwere agriculture,
then reviewed, Bigand weand ML
Data,
checked whether or not they contained the nexus between agriculture, Big Data, and ML.
After eliminating the non-relevant articles, 54 articles remained for detailed analysis. This
After eliminating the non-relevant articles, 54 articles remained for detailed analysis. This
analysisconsisted
analysis consistedininreviewing
reviewing thethe entire
entire article
article to ensure
to ensure that that it included
it included a description o
a description
thetheBig
of BigData
Data process andthe
process and theML ML techniques.
techniques. Most Most
of the ofexcluded
the excluded
articlesarticles had purely
had purely
theoretical, technological, or experimental issues. Some excluded articles
theoretical, technological, or experimental issues. Some excluded articles poorly discussed poorly discussed
thenexus
the nexusbetween
between agriculture,
agriculture, Big Data,
Big Data, and ML.and ML. Finally,
Finally, the based
the SLR was SLR wason 30based on 30 rele
relevant
articles. FigureFigure
vant articles. 7 summarizes the stepsthe
7 summarizes for relevant
steps forarticle
relevantselection.
article selection.
Figure7.7.Flowchart
Figure Flowchartof of
thethe
literature selection
literature process.
selection process.
The composition of the selected articles is very varied. Of the 30 articles, 12 were
The composition of the selected articles is very varied. Of the 30 articles, 12 were
published in conferences and 18 in scientific journals. The journals that include more than
published
three in conferences
publications and 18 in scientific
are the International Journal of journals.
Emerging The journals
Trends that include
in Engineering more than
Research
and Computers and Electronics in Agriculture. Figure 8 shows the yearly distribution of the Research
three publications are the International Journal of Emerging Trends in Engineering
and Computers
included papers.and Electronics in Agriculture. Figure 8 shows the yearly distribution of the
included papers. with their reading for each of the 30 selected papers to extract rel-
We proceeded
evant data to answer the RQs. The extracted data for each paper and the assessment
strategy were as follows: (i) title, authors, year; (ii) reason why the paper was initially
included; (iii) ML techniques used; (iv) challenges and solutions when applying ML; and
(v) architectures, technologies, and tools used.
Then, we classified the data according to the intrinsic characteristics of Big Data, such
as volume, variety, velocity, and veracity. We included a different data processing and
analysis category to identify problems applying ML.
were as follows: (i) title, authors, year; (ii) reason why the paper was initially included;
(iii) ML techniques used; (iv) challenges and solutions when applying ML; and (v) archi-
tectures, technologies, and tools used.
Then, we classified the data according to the intrinsic characteristics of Big Data, such
Agronomy 2022, 12, 748
as volume, variety, velocity, and veracity. We included a different data processing 13 of 34
and
analysis category to identify problems applying ML.
understanding of the soil, harvest optimization based on climate conditions, and data
from the farmers’ undocumented experiences. They analyzed the data to develop low-cost
planting [36]. They used a Big Data architecture to process the available metadata separately
from the data analysis. The processing was performed using data mining and text mining
techniques, and ML techniques applied to sensor data. The uniqueness of the architecture
in this context lay in the selection of the ML technique based on the real-time application
and the farmers’ expert domain knowledge. In this way, it was possible to target the ML
processes to extract the desired feature space, increasing the efficiency and accuracy of
the system.
Doshi et al. developed a Big Data system they call AgroConsultant, which is designed
to help Indian farmers make an informed decision about which crop to grow considering
the planting season, the geographic location of their operation, the characteristics of the soil,
and environmental factors such as temperature and precipitation [37]. For this purpose, the
authors use a Big Data architecture with two subsystems. The first one predicts crop suit-
ability through recommendations. They use a training dataset to ensure the ML techniques’
correct analysis. This dataset includes historical records of soil and weather parameters
for 20 crop types accumulated over thirty years. The second subsystem is concerned with
processing the heterogeneous data and class labels for subsequent ML analysis.
The second problem has to do with climate prediction. Rehman et al. used real-
time applications on sensors to capture climate changes in the soil and the atmosphere to
establish planting dates [38]. They used a Big Data architecture with IoT to collect data in
a real time and seamless way. The system makes recommendations by analyzing a set of
agricultural and climate change rules.
Finally, the third problem refers to increasing production. For this, Tarik and Mo-
hammed use methodological data to predict the cereal production rate in an area character-
ized by an unstable climate [39]. They use a system that captures water data from a weather
station. The information is processed to obtain temperature, dew point, humidity, pressure,
visibility, wind direction, wind speed, gust speed, events, rainfall, weather conditions, etc.
The system pre-processed the data because it is often incomplete and inconsistent. The data
from 60 years of industry history data were processed to carry out the comparative study.
The authors were able to analyze the data to improve the understanding of the agricultural
ecosystem through a scorecard.
4.1.2. Crops
In the case of crops, the most significant concern is to increase production after re-
ducing production costs, increasing quality, and finally managing diseases. Balducci et al.
analyze environmental data such as climate, humidity, and wind along with production
and structural data such as soil type and land extension and, from these, propose easy-to-
develop tasks based on the use of ML techniques [40]. In the same concern, Kedarmal et al.
use an ontology of smart farming that contains agriculture-related concepts and properties
to link the stored data with a knowledge graph. This graph allows farmers to take timely
action to improve production and predict crop yields [41]. Ramaraj et al. address the issue
of crop yield prediction by analyzing the most commonly used rice cultivation methods,
yield parameters, and morphological characteristics of the crop, thus improving the process
of predicting yield expectations [42]. Priya et al. present a precision farming model to
suggest what crops must be planted in terms of field conditions. This suggestion allows
farmers to maximize the production and quality of their crops [43,44]. Yoki et al. imple-
ment the BMS system (Big Data Application Machine Learning-based Smart Farm System),
emphasizing crop productivity and the importance of increasing the farmers’ income. The
author concludes that the information and the processable knowledge must be improved
at the farm level to increase production and improve the quality of the harvest by getting a
reasonable price [45]. For crop identification and classification, Shelestov et al. present a
workflow that uses satellite imagery and produces a crop classification map of the analyzed
area [46]. Yahata et al. address the same issue, using image detection methods on images
Agronomy 2022, 12, 748 15 of 34
obtained from a cyber–physical system to collect data on the stage of crop growth and
environmental information [47]. They developed valuable rules for a good crop, adapted
through detecting flowers and seedpods. Crop disease detection is addressed by Fenu
and Malloci, who conclude that the ML models tested are able to predict a risk index for
late blight in potatoes using weather data from meteorological stations in the region [48].
Tombe presents a development that integrates computer vision for smart agriculture. The
proposed technique analyzes crop image features and can determine the health status of
the crop, identifying diseases or weeds [49].
4.1.4. Land
Under the topic of land, we found only two papers that approached Big Data and
ML to solve their problems. Agriculture and Agri-Food Canada (AAFC) is the federal
department responsible for agriculture that releases the Annual Crop Inventory in Space
maps to improve agricultural production. These maps are valuable operational space-based
remote sensing products covering agricultural land use and non-agricultural land found
within Canada’s agricultural acreage [52]. Similar work was done by Amani et al. to
develop high spatial resolution (30 m) reference maps of the cropland extent of South
Asia. The authors explained a need to improve productivity due to the food insecurity
experienced in the area [52].
Figure10.
Figure Problems totobebesolved
10.Problems through
solved agricultural
through Big Data
agricultural Bigand ML.and ML.
Data
From Figure 11, it is possible to observe that the primary concern is the improvement
From Figure11, it is possible to observe that the primary concern is the improveme
of crops to increase production. This concern is due to the problem of food security in
ofseveral
crops regions
to increase
of theproduction.
world. On the This concern
other is due
hand, there to the
is little problem
concern of food
for crop moni-security
several regions of the world. On the other hand, there is little concern
toring. An example of this concern is observed in Sitokonstantinou et al., which shows for crop monitorin
Antheexample
monitoring of allconcern
of this crops in Korea to avoidinoverproduction
is observed Sitokonstantinou of rice
etand
al.,increase that of the mo
which shows
other cereals, achieving a nutritional balance [57]. So far, all agricultural
itoring of all crops in Korea to avoid overproduction of rice and increase that of othBig Data systems
work independently, thinking about solving the specific problems of each agricultural area.
cereals, achieving a nutritional balance [57]. So far, all agricultural Big Data systems wo
However, solving the problems shown in Figure 11 is a global phenomenon. In the future,
independently,
Agronomy 2022, 12, x FOR PEER REVIEW the systems will thinking about solving
be interconnected to monitor thethespecific problems
entire world of each
production, agricultural
verifying 17 of 34 ar
the
However, solving
state of sowing the problems
to make shown in Figure 11 is a global phenomenon. In the futu
timely decisions.
the systems will be interconnected to monitor the entire world production, verifying t
state of sowing to make timely decisions.
Figure
Figure 11.11. Main
Main problemsto
problems tobe
be solved
solved vs.
vs. agriculture.
agriculture.
In terms of the architecture of NN, they are structures with a single hidden layer [39,58],
two hidden layers [40,48,52] and three hidden layers [55]. The activation functions used
include sigmoid [48,58], tangent sigmoid [52], and rectified linear unit [55].
In Balducci et al., they compared the performance of different ML techniques for the
reconstruction of ambiguous or corrupt data captured by IoT sensors with DT resulting in
being the technique with the best outcomes in most of the experiments [40]. The authors
concluded that the performance of this algorithm drops if few data are used to train
the model, and that this is affected differently by the different attributes coming from
the sensors.
Doshi et al. compared the DT with other techniques, due to its support incorporated
for the MLC, for automatic crop recommendation [37]. According to Priya et al., this
technique is not advisable for work with large scattered data sets, since its performance
decreases as the data volume increases, and that it does not ideally detect anomalies and
manage missing data [44].
Figure 13 presents a map summarizing the number of papers selected by ML tech-
niques vs. problems to be solved. To simplify the figure, we have not considered techniques
that were used only once.
Figure 13 shows NN’s most commonly used technique to solve various agricultural
problems such as increasing production, reducing costs, increasing quality, and predicting
the weather. NNs are an excellent choice when working with Big Data sets because they
are easy to adapt. This adaptation reduces errors produced by adjusting the weights and
biases of each neuron based on the data used for training [44].
On the other hand, in addition to the problems solved using NN, RF was also used to
monitor the crop through satellite images. RF is ideal for working with massive data sets
Agronomy 2022, 12, x FOR PEER REVIEW 20 of 34
since it needs less time to preprocess the data, is competent in global time complexity, and
works well with scattered data sets [44].
13. ML
Figure 13.
Figure ML Main
Mainproblems
problemstotobebesolved vs.vs.
solved ML techniques.
ML techniques.
SVM was used to solve plant disease control problems to increase yield quality and
quantity. SVM is suitable for managing small data sets that do not contain too many out-
liers, and its performance is increased when the dimensional space of the data is ample,
and the attributes are lower [44].
Agronomy 2022, 12, 748 20 of 34
SVM was used to solve plant disease control problems to increase yield quality and
quantity. SVM is suitable for managing small data sets that do not contain too many outliers,
and its performance is increased when the dimensional space of the data is ample, and the
attributes are lower [44].
DT has been used to solve the same problems as NN. Several authors emphasize that
DT requires little computational time and generates a high level of accuracy [37,40,50]. The
DT is efficient in terms of computation and scalability. Moreover, its yield is increased
when the same data do not correlate with each other [44].
Although these ML techniques were the most widely used, the trend is for agricultural
Big Data to combine several ML techniques. Shelestov et al. compared RF, SVM, and
NN techniques [46]. Nobrega et al. used RF, DS, XGBoost, KNN, SVM, and Naïve Bayes
techniques [50]. Ferreira et al. used K-means, BC, RPT, bagging of DT, SB, SVM, NN, and
LR [58]. Saggi et al. used NN, GLM, RF, and GBM techniques [55]. Priya et al. used a
combination of techniques such as DT, RF, BN, ANN, SVM, and Markov [44]. Besides
combining these techniques, some of these must be adapted using Big Data systems due to
volume, variety, veracity, and velocity characteristics.
Figure
Figure 14.14. Maintechnologies
Main technologies used
used in
inagricultural
agriculturalBig Data
Big andand
Data ML.ML.
The most common technologies were sensors and satellite imagery for collecting
The most common technologies were sensors and satellite imagery for collecting ag-
agricultural data. The former was used to take data at the location of interest for crops,
ricultural
animals,data. The or
weather, former was used [44,45,50].
soil properties to take data at the images
Satellite locationwere
of interest
mainly for
usedcrops,
for ani-
mals,
monitoring land and crops in large areas [52,57,59]. These were obtained through satellitesmoni-
weather, or soil properties [44,45,50]. Satellite images were mainly used for
toring land and
or services fromcrops in providers
external large areas [52,57,59].
such as Google These were obtained
Earth Engine through
(GEE), a global satellites or
positioning
services from external providers such as Google Earth Engine
system (GPS), Sentinel satellites, Landsat satellites, or Google Maps.(GEE), a global positioning
system (GPS), Sentinel satellites, Landsat satellites, or Google Maps.
In implementing Big Data systems, the most used file system was Hadoop Distrib-
uted File System (HDFS) because it allows separating data sets, storing them in a distrib-
uted way in several nodes of a cluster, and parallelizing operations on them [57]. Most of
Agronomy 2022, 12, 748 21 of 34
In implementing Big Data systems, the most used file system was Hadoop Distributed
File System (HDFS) because it allows separating data sets, storing them in a distributed
way in several nodes of a cluster, and parallelizing operations on them [57]. Most of
the implemented clusters were configured with the various programs provided through
the Apache Hadoop framework. Among these, the following stood out Apache Hive
and Apache Kafka. Apache Hive was used to configure data warehouses that streamline
working with large data sets stored in distributed units [62,63]. Apache Kafka was used
for the transmission of information or messages to different nodes of the designed Big
Data architecture [45,63]. The most widely used technology, Apache Spark framework, was
mainly used for processing the collected data [38,41,50,57,62,63].
The technology most often identified in the implementation of ML models is the
Python programming language [40,48,59]. Among the articles that use Python to implement
the models, most used libraries that facilitate working with the datasets or that implement
the models of the ML techniques used in the research. The libraries that were repeated the
most are Scikit-Learn [48,53], Pandas [53], and NumPy [40].
Regarding the tools for data visualization, Web technologies such as PHP [64] or Java
Scripts stood out. The latter stands out above all together with libraries such as D3 for data
visualization [53], Leaflet.js for displaying maps, [37] and React for building interactive
user interfaces [53].
In Wang et al., a Big Data system for agriculture is proposed. It is a system with a
design based on the collection, storage, analysis, and application of pear tree data [62].
For the collection of tree growth data (air temperature, soil moisture, light intensity, etc.),
a high-precision wireless sensor network is used whose collected data are sent via TCP
protocol to traditional databases (MySQL, MongoDB, etc.). These databases temporarily
store the data and serve as data sources for the overall Big Data system. For this purpose,
data synchronization software such as NiFi, Sqoop, or Flume is used, and the data sources
are synchronized with the HDFS cluster that is responsible for storing all the data together.
SparkSQL is used to read, filter, and store the data from the HDFS cluster to Apache Hive
and Apache Hbase. The former is used for data analysis, and the latter is used for data
monitoring and visualization of data statistics. Apache Dubbo is used for running farmer
management services in a distributed manner. The article did not detail the technologies
used to implement ML models.
4.4.1. Volume
In Yang et al., a data detection device with sensors and videos is used that commu-
nicates with the platform in the cloud with the TCP Socket protocol, loading the data
into the cloud in real time [60]. The device transmits data to the platform in the cloud
at 3 s intervals. The same authors created a physical division of the table into discrete
Agronomy 2022, 12, 748 22 of 34
tables for each day that data are stored to solve the problem of storing a large amount of
data in the system. On the other hand, the authors create an analysis service based on
Hadoop and a file service implemented using a platform in the cloud. The data kept in
the MySQL database are transferred to a specific format to be filed in the HDFS system
every day. This should be suitable to execute the data analysis service [60]. In Amani et al.
and in Shelestov et al., they use a large amount of satellite data for downloading [46,52].
By implementing the workflow on the platform in the cloud, Shelestov et al. overcome the
challenges of downloading and processing Big Data. On the other hand, Gumma et al. also
use satellite images in the Big Data system [59]. He notes that due to the classification of
large areas, the size of the Landsat data is considerable (reaching petabyte when dealing
with a time series), which is why it is challenging to process the data in regular systems [64].
To solve this challenge, they use the computation platform in the GEE cloud for image
processing because it has the entire Landsat file and many raster data sets openly available
from NASA, the European Space Agency (ESA), and other images that can convey the code
to the data. Thus, the complex multi-temporal data on a continental scale can be analyzed
using JavaScript or Python and shared and replicated by other researchers, reducing the
barriers to using supercomputers to perform geospatial analyses [59].
On the other hand, Sitokonstantinou et al. used satellite images with a resolution of
the 10 m time series from which 167 features were extracted [57]. The authors reported
that the automated and prompt acquisition of Sentinel images from the available centers
becomes a challenge. Sentinel data offers hubs different specifications such as their constant
archiving policy, data availability, geographic cover, and acquisition latency. The authors
have developed an application to connect to multiple Sentinel hubs and automatically
seek pertinent data. This intermediary of Sentinel data recovers the required products
from the most efficient center that is chosen in terms of download speed and product
availability [57].
In Ferreira et al., a large amount of image data was used to analyze the coincidence
of the farms. The authors explained that coincidence is a very challenging task because,
generally, the documentation of the attributes of the linked entity (i.e., farm) is highly
inconsistent in all the databases because of spelling errors, errors, or missing information,
and the infeasibility of manual data mining due to the size of the data set [58]. On the
other hand, Ochoa and Guo used input images processed through hierarchical convolution
modules to reduce the size and gain many more channels [65]. Three to five convolutional
layers were used with each image, followed by a normalization layer in batches. At the
end of each module, a grouping layer and an activation layer were obtained. Then, the
compressed images were fed into a set of ascending hierarchical sampling modules.
In Abbona et al., a large volume of data was analyzed to predict whether the calves
survive 60 days after birth [51]. The authors looked for a possible solution to highlight the
young calves’ strengths and find alternatives through variables. The authors indicated that
identifying such variables is a complex task, but it has a solution since the amount of data
recorded among the cattle is enormous nowadays and manageable through ML techniques.
Amaechi et al. improved climate prediction through a model that uses rules based on
the knowledge of experts [54]. The rules have been included in enormous data sets in data
pre-processing.
4.4.2. Variety
Dutta et al. created a database of terrain characteristics based on expert knowledge
in the domain, which is generally undocumented information [36]. The authors used the
knowledge of the domain to offer direction to ML. Domain-guided extraction using ML
is called semantic extraction, which is why it produces a base of semantic features. The
base of meta-features and the base of semantic features were integrated to form a space of
enriched features which was a more significant representation of the heterogeneous data.
Tombe created a crop image characterization scheme that is applied to determine the
health status of the crop [49]. Conversely, Rehman et al. used heterogeneous data such as
Agronomy 2022, 12, 748 23 of 34
text, web data, and CSV, among others, and from these extracted the information needed to
construct a set of agriculture rules to provide recommendations [38]. Sitokonstantinou et al.
also used a set of rules based on the farmers’ expert knowledge [57].
Fenu and Malloci implemented a module to load data automatically for each crop, a
module to load data manually for each crop (used by the farmers during the monitoring
survey), and a module to integrate prognosis models [48]. Ochoa and Guo manually classi-
fied data into six classes of land cover: impermeable surfaces, buildings, low vegetation,
trees, automobiles, and background [65]. Then, the authors divided the images into square
patches of 256 × 256 × 3 with no overlapping, collecting data from 20,102 images. In
Vasumathi et al., they collected data to fit the quality of the satellite images to analyze plant
growth [61]. They achieved contrast stretching or normalization to perform a noise filtering
process then. In this process, the pixels in the image showed different intensity values
instead of real values obtained from the image.
Sathiaraj et al. used data from a data set for climatic extreme indices and data from a
National Evaluation of the Climate document published in 2014. The collected data are
compared daily with a table of climatic thresholds defined to consider the annual frequency
of days that exceeded or fell below a certain threshold. This was carried out for each climate
measurement site. This threshold-based data set represents the climatic extremes at each
site and is a resource of practical application to group sites that experience similar trends
and climatic extremes [53].
Ryan et al. used data from the herbicide resistance testing service at Charles Sturt Uni-
versity, New South Wales, Australia, from 2001 to 2015, and data from agricultural surveys
applied in several counties registered in the Australian Bureau of Statistics (Australian
Bureau of Statistics, 2015) [56]. The first data set consisted of annual samples of ryegrass
received for herbicide resistance tests on farms all over southern Australia. The postal
codes of the regions determined the locations of the samples.
In Priya et al., the authors indicated data from such factors as climate conditions, soil
type and quality, the variety of the crop, and its quality, as well as some of the dangerous
conditions of plagues, weeds, and crop diseases [44]. The authors indicate that it is
possible to think of agriculture as a significant biological ecosystem, where crops interact
with several bodies in the ecosystem. In addition, none of these entities are fixed and
vary dynamically; then, for understanding the agricultural ecosystem, it is necessary
to understand the various entities [44]. On the other hand, the author noted that the
information compiled is accurate and verified by experts; therefore, when the data are used
to develop ML algorithms, the algorithm is more reliable and provides transparency for the
development. Once again, the hardware for data collection differs from one organization to
another, so the collected data can be in different formats, and the results can vary [44]. This
is a constant challenge.
4.4.3. Velocity
Of the papers selected, few described challenges due to the speed with which data
must be collected or processed and then visualized. MapReduce is a type of technology
used in Big Data to gain processing speed with large volumes of data [43]. Vasumathi et al.
use the MapReduce algorithm to subdivide the data so that the request is only inspected
in the explicit partition, which increases the query’s efficiency and recovery time [61]. In
Saggi et al., they used MapReduce in the H2 O system to obtain more data fragments than
the CPU cores [55].
4.4.4. Veracity
In Tarik and Mohammed, they had to perform a pre-processing of the data because the
real data were often incomplete (lost values, simplified data), noisy (errors and exceptions),
and inconsistent (names, coding) [39]. These issues caused problems in the implementation
of the Big Data system. Conversely, Ferreira et al. reduced the land space as an alternative
to reduce the number of comparisons to be assessed among the attributes of pairs of farms.
Agronomy 2022, 12, 748 24 of 34
To verify the data, they had to compare several approaches based on the edition, the
Levenshtein and Jaro–Winkler metrics, and determinist, stochastic, and ML approaches
to classify the pairs of farms as coincident or noncoincident [58]. The authors point out
that they store pairs of similar farms, verifying the values of each approach. In Donzia and
Kim, the performance of the ML module was improved by using an automatic controller
for data stored continuously in HDFS [45].
Ochoa and Guo prepared the training data for the first CNN model, dividing the
original aerial images into square patches with a predefined resolution [65]. This division
led them to increase the sample of training data. They used techniques to increase data
that had to be verified to improve data quality. On the contrary, Vasumathi et al. adjusted
the quality of the satellite images to analyze plant growth. The adjustment was due to
insufficient quality [61]. They carried out a contrast stretching or normalization process and
another process of noise filtering of the pixels with different intensity values to obtain real
pixels obtained from the image. In Yahata (2017), they use data from photos of flowering
plants to count them and examine their growth [47]. The problem is verifying the data when
the flower is very close to another one since the incorrect coincidences must be reduced.
The authors use a robot to improve the accuracy of the flower pairing. Sitokonstantinou
et al. developed a framework for updating images of the plots to validate the land cover
data through ML algorithms. With the validated data, detailed maps of land cover at plot
level were created to distinguish between rice crops and other types of crops [57].
Shelestov et al. needed to process data and configure computer resources to use
state-of-the-art classification approaches. In order to solve these problems, they developed
an automated crop classification workflow based on ML techniques [46].
Wang et al. used the decision tree algorithm to analyze pear tree demand. The authors
had to predetermine the standard demand for the growth of the pear tree. Each layer of the
algorithm is a data comparison process in real time with this standard [62]. The goal is to
combine ML with a Big Data platform to quickly extract data features and data values. The
results of this layer are presented to the farm administrator via the data application layer.
In Ferreira et al., “farm pairing” was done in a scenario with a large amount of
livestock data. They compared the performance of twelve automated pairing methods
differently [58]. They used two unsupervised ML approaches, including k-means clus-
tering (KC) and bagged clustering (BC). On the other hand, they used seven supervised
ML approaches, including recursive partition trees (RPT), boosted decision trees (BDT),
bootstrap classification trees based on BCT, stochastic boosting (SB), support vector ma-
chines (SVM), neural networks (NN), and logistic regression (LR). The probabilistic and
ML approaches considered the Levenshtein metric and the Jaro–Winkler similarity criteria.
The authors concluded that SVM combined with the Levenshtein metric produced the best
results of all the approaches, with almost perfect precision, sensitivity, specificity, and very
high accuracy.
Saggi et al. implemented a multilayer deep learning model considering multiple
hidden layers and a rectified linear active function [55]. The model was trained with
stochastic gradient descent using back propagation. On the other hand, Pandya et al. used
series prediction methods such as autoregression (AR), autoregressive integrated moving
average (ARIMA), and vector autoregression (VAR). These techniques allowed to face the
time interval concept in the transmission data, i.e., the transmission properties can change
over time [63]. Another challenge was the efficiency of the system to update the ML models
based on these algorithms to address the time interval of the concept. The authors proposed
a novel framework to address both challenges.
Abbona et al. used a more frequent set of variables that were encapsulated in the
models constructed by genetic programming to investigate their zoological meaning in
calf production, evaluating the performance of the prediction models [51]. The authors
note that the method worked well, implying that the ML horizon must be investigated
further and that comparisons with other techniques must be made, even in larger data sets
containing more features. Evolutionary algorithms can be applied to zootechnical data,
obtaining performance models to learn the available data.
Amaechi and Pham employed a model designed by the authors to improve the
convolutional network approach they propose. Compared with the alternative methods
tested, the authors achieve a high weather forecast accuracy [54]. Table 4 presents the
selected papers and the characteristics of Big Data where they describe challenges. We
added a column where they present challenges to carry out the analysis with ML.
Table 4. Cont.
Cui and Gao combined agricultural Big Data and DL to accelerate the intelligent
transformation of the agricultural production process and promote the transformation of
the digital agricultural architecture. The authors developed standard architecture and the
sharing of data formats, achieving standardized processing through designing a standard
architecture of agricultural Big Data [66].
Despite the adaptations made to solve the challenges due to the data’s volume, variety,
velocity, and veracity, there are still no automated technologies for this procedure. In the
future, Big Data architectures will be much more flexible, and agricultural data will be
available in general repositories containing sufficient metadata for the agricultural Big Data
itself to decide which ML technique to use and which adaptation to include. An example
of this is the processing of satellite images stored directly in cloud computing repositories,
improving the speed and veracity of the data.
5. Discussion
As we have mentioned in this paper, there are several challenges for the proper use
of ML in agricultural Big Data. These challenges are due to the intrinsic characteristics of
Big Data, which are volume, variety, velocity, and veracity. To provide a visual map of the
ML techniques and challenges faced, we propose a framework composed of three main
sections considering these aspects in the main components of agricultural Big Data. The
first section of the framework presents the main challenges. The second section shows the
main ML techniques used and the context or problem to be solved. The last section shows
the most used technologies. See details in Figure 15.
Big Data, which are volume, variety, velocity, and veracity. To provide a visual map of
the ML techniques and challenges faced, we propose a framework composed of three
main sections considering these aspects in the main components of agricultural Big Data.
The first section of the framework presents the main challenges. The second section shows
Agronomy 2022, 12, 748 27 of 34
the main ML techniques used and the context or problem to be solved. The last section
shows the most used technologies. See details in Figure 15.
Figure 17. Relationship of the challenges between ML, the agriculture industry, and Big Data.
MapReduce has been used to increase data processing speed and adapt ML algorithms
differently from the traditional one [72]. Besides, it is possible to split the original data set
into subsets and then combine the partial solutions. However, data distribution operations
can adversely affect unbalanced data sets. Among the effects is the performance when
classifying unbalanced data sets, which may even face the problem of a small sample
size that can be amplified as the original data are distributed on different machines [68].
According to the same author, another effect is the change in the data set that occurs when
the partitions of the training and test set are very different between them. Then, it is
necessary to design new techniques that can generate synthetic data that best represent
minority class instances when using a MapReduce framework [68].
A common assumption of ML is that algorithms can learn better with more data and,
consequently, provide more accurate results. However, massive data sets impose several
challenges because traditional algorithms were not designed to meet such requirements.
For example, several ML algorithms were designed for smaller data sets assuming that
the entire data set can fit in memory [67]. Another assumption is that the whole data set is
available for processing at training time. Big Data breaks these assumptions by rendering
Agronomy 2022, 12, 748 30 of 34
unusable traditional algorithms or algorithms that primarily affect their performance [67].
The same author mentions the existence of ML adaptations, such as deep and online
learning, to overcome the challenges of ML with Big Data [10]. However, we have identified
a budding use of such algorithms in the selected papers.
Regarding the technologies used in agricultural Big Data and ML, the most used were
Hadoop, HDFS, Apache Spark, and cloud computing. Hadoop is mainly used to batch
process the data and, therefore, the data must be stored in a large repository [73]. Cloud
computing is a good solution for storage and processing because satellite data are already
available [52]. Apache Spark is used when streaming data processing is required, and
therefore data reading is constant [63].
In agricultural Big Data, a combination of technologies is required since data from
experts, videos, and satellite images will be batch processed. On the other hand, data from
social networks and sensors will be processed by streaming. For the case of cloud-based
technologies, there are several tools for the use of ML: Microsoft Azure Machine Learning,
which is now part of Cortana Intelligence Suite; Google Cloud Machine Learning Platform;
Amazon Machine Learning; and IBM Watson Analytics [74]. These services are offered by
established providers, offering scalability and integration with other services and platforms.
From our point of view, the most relevant challenge to consider is the design of
Big Data architecture since they must be flexible and highly scalable, considering that
architecture design is a complex task [25]. Other challenges are understanding the statistical
characteristics of the data before applying algorithms and the ability to work with larger
data sets [75]. In addition, specific knowledge is required for certain problems in agriculture,
such as increased production, quality improvement, and climate change, among others.
However, it is essential to note that none of the selected papers includes this last aspect as a
problem to be faced. According to L’heureux et al., as data size increases, the performance
of algorithms becomes even more dependent on the architecture used [67]. Then, data size
increasing makes it necessary to rethink the concept of architecture used to implement and
develop algorithms that manage data.
6. Conclusions
This paper presented an SLR using the PRISMA protocol, selecting thirty articles
that explain the use of ML in agricultural Big Data. We analyzed these articles from
three different points of view. First, we recorded the solutions and challenges to different
agricultural problems. Second, we reviewed the main ML techniques used in agricultural
Big Data and its main difficulties and challenges. Finally, we recorded the used technologies.
We found 36 ML techniques, considering a total of 80 implementations. Each paper
implemented more than one ML technique. NN, RF, SVM, and DT were the most frequently
implemented techniques. Despite the positive results described in the papers, the imple-
mentation of these algorithms remains a challenge, as there are constant problems due to
the increase in data size. These challenges imply adjustments in data classification, the
difficult task of calculating the number of training samples, difficulties in the classification
of unbalanced datasets, and the efficient application of MapReduce due to the increase in
data volume, among other aspects to consider.
The most widely used technologies in agricultural Big Data and ML were Hadoop,
HDFS, Apache Spark, and cloud computing. Although these technologies can process a
large volume of heterogeneous data with reasonable speed, it is still a challenge to use
ML algorithms. According to the nature of the data, some adaptations will be necessary
to improve the quality of the analyses. A big challenge is the design of agricultural Big
Data architectures since the set of technologies will have to be modified as the volume of
data increases.
The future of agricultural Big Data development and the use of ML is promising.
This future will increase the development of flexible architectures that consider various
alternative ML techniques according to the conditions of the generated data. This increase
is possible thanks to the technologies that have been developed and are constantly evolving.
Agronomy 2022, 12, 748 31 of 34
On the other hand, cloud computing will increase due to the training of new professionals
and the improvement of network speed. Cloud computing and other tools will include
more alternative ML techniques, facilitating flexibility.
In terms of ML techniques, the use of DL and other techniques mentioned in Figure 12
that were adapted to specific contexts due to problems with data volume, processing speed,
variability, and veracity, will increase. However, these problems can be solved by sorting
the data storage through the data lake.
Author Contributions: A.C. contributed to the planning, organization, and direction of the SRL;
S.P. contributed to paper writing, formatting and creating figures and tables; S.S. contributed to the
methodological support and expert judgement; L.M. contributed to the data analysis and discussion.
All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by Universidad de La Frontera, Vicerrectoría de Investigación y
Postgrado and Dr. Samuel Sepúlveda thanks to research project DIUFRO DI20-0060.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Hunter, M.C.; Smith, R.G.; Schipanski, M.E.; Atwood, L.W.; Mortensen, D.A. Agriculture in 2050: Recalibrating Targets for
Sustainable Intensification. Bioscience 2017, 67, 386–391. [CrossRef]
2. White, E.L.; Thomasson, J.A.; Auvermann, B.; Kitchen, N.R.; Pierson, L.S.; Porter, D.; Baillie, C.; Hamann, H.; Hoogenboom, G.;
Janzen, T.; et al. Report from the conference, ‘identifying obstacles to applying Big Data in agriculture’. Precis. Agric. 2021, 22,
306–315. [CrossRef]
3. Bhat, S.A.; Huang, N.F. Big Data and AI Revolution in Precision Agriculture: Survey and Challenges. IEEE Access 2021, 9,
110209–110222. [CrossRef]
4. Maya-Gopal, P.S.; Chintala, B.R. Others Big Data challenges and opportunities in agriculture. Int. J. Agric. Environ. Inf. Syst. 2020,
11, 48–66. [CrossRef]
5. Torky, M.; Hassanein, A.E. Integrating blockchain and the internet of things in precision agriculture: Analysis, opportunities, and
challenges. Comput. Electron. Agric. 2020, 178, 105476. [CrossRef]
6. Hongyan, L.; Ziyi, C.; Haitong, W. Research of Agricultural Big Data. E3S Web. Conf. 2020, 214, 1011.
7. Lassoued, R.; Macall, D.M.; Smyth, S.J.; Phillips, P.W.B.; Hesseln, H. Expert Insights on the Impacts of, and Potential for,
Agricultural Big Data. Sustainability 2021, 13, 2521. [CrossRef]
8. Tibbetts, J.H. The Frontiers of Artificial Intelligence. Bioscience 2018, 68, 5–10. [CrossRef]
9. Liakos, K.; Busato, P.; Moshou, D.; Pearson, S.; Bochtis, D. Machine Learning in Agriculture: A Review. Sensors 2018, 18, 2674.
[CrossRef]
10. Cravero, A.; Sepúlveda, S. Use and Adaptations of Machine Learning in Big Data—Applications in Real Cases in Agriculture.
Electronics 2021, 10, 552. [CrossRef]
11. Bilali, H.E.; Allahyari, M.S. Transition towards sustainability in agriculture and food systems: Role of information and communi-
cation technologies. Inf. Process. Agric. 2018, 5, 456–464. [CrossRef]
12. Moher, D.; Liberati, A.; Tetzlaff, J.; Altman, D.G.; Group, P. Preferred reporting items for systematic reviews and meta-analyses:
The PRISMA statement. PLoS Med. 2009, 6, e1000097. [CrossRef] [PubMed]
13. Cherkassky, V.; Mulier, F. Learning from Data: Concepts, Theory, and Methods; John Wiley & Sons: Hoboken, NJ, USA, 2007.
14. Rudin, C.; Wagstaff, K. Machine learning for science and society. Mach. Learn. 2014, 95, 1–9. [CrossRef]
15. Qiu, J.; Wu, Q.; Ding, G.; Xu, Y.; Feng, S. A survey of machine learning for Big Data processing. EURASIP J. Adv. Signal Process.
2016, 1, 1–16.
16. Benos, L.; Tagarakis, A.C.; Dolias, G.; Berruto, R.; Kateris, D.; Bochtis, D. Machine Learning in Agriculture: A Comprehensive
Updated Review. Sensors 2021, 21, 3758. [CrossRef]
17. Fatih, B.A.L.; Kayaalp, F. Review of machine learning and deep learning models in agriculture. Int. Adv. Res. Eng. J. 2021, 5,
309–323.
18. Santos, M.; e Sá, J.; Costa, C.; Galváo, J.; Andrade, C.; Martinho, B.; Lima, F.; Costa, E. A Big Data analytics architecture for
industry 4.0. In World Conference on Information Systems and Technologies; Springer: Berlin/Heidelberg, Germany, 2017; pp. 175–184.
19. Sassi, I.; Ouaftouh, S.; Anter, S. Adaptation of Classical Machine Learning Algorithms to Big Data Context: Problems and
Challenges. In Proceedings of the 2019 1st International Conference on Smart Systems and Data Science (ICSSD), Rabat, Morocco,
3–4 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–7.
20. Gupta, D.; Rani, R. A study of Big Data evolution and research challenges. J. Inf. Sci. 2019, 45, 322–340. [CrossRef]
Agronomy 2022, 12, 748 32 of 34
21. Elshawi, R.; Sakr, S.; Talia, D.; Trunfio, P. Big Data Systems Meet Machine Learning Challenges: Towards Big Data Science as a
Service. Big Data Res. 2018, 14, 1–11. [CrossRef]
22. Haig, B.D. Big Data Science: A Philosophy of Science Perspective; American Psychological Association: Washington, DC, USA, 2020.
23. De Mauro, A.; Greco, M.; Grimaldi, M. A formal definition of Big Data based on its essential features. Libr. Rev. 2016, 65, 122–135.
[CrossRef]
24. Demchenko, Y.; De-Laat, C.; Membrey, P. Defining architecture components of the Big Data Ecosystem. Collab. Technol. Syst. Int. Conf.
2014, 104–112.
25. Salma, C.A.; Tekinerdogan, B.; Athanasiadis, I.N. Chapter 4—Domain-Driven Design of Big Data Systems Based on a Reference
Architecture. In Software Architecture for Big Data and the Cloud; Morgan Kaufmann: Burlington, MA, USA, 2017; pp. 49–68.
26. Sowmya, R.; Suneetha, K. Data mining with Big Data. IEEE Trans. Knowl. Data Eng. 2017, 26, 246–250.
27. Song, I.-Y.; Zhu, Y. Big Data and data science: What should we teach? Expert Syst. Wiley Online Libr. 2016, 33, 364–373. [CrossRef]
28. Sarker, M.N.I.; Islam, M.S.; Ali, M.A.; Islam, M.S.; Salam, M.A.; Mahmud, S.H. Promoting digital agriculture through Big Data for
sustainable farm management. Int. J. Innov. Appl. Stud. 2019, 25, 1235–1240.
29. Kamilaris, A.; Kartakoullis, A.; Prenafeta-Boldú, F. A review on the practice of Big Data analysis in agriculture. Comput. Electron. Agric.
2017, 143, 23–37. [CrossRef]
30. Wolfert, S.; Ge, L.; Verdouw, C.; Bogaardt, M.-J. Big Data in smart farming—A review. Agric. Syst. 2017, 153, 69–80. [CrossRef]
31. Weersink, A.; Fraser, E.; Pannell, D.; Duncan, E.; Rotz, S. Opportunities and Challenges for Big Data in Agricultural and
Environmental Analysis. Annu. Rev. Resour. Econ. 2018, 10, 19–37. [CrossRef]
32. Coble, K.H.; Mishra, A.K.; Ferrell, S.; Griffin, T. Big Data in agriculture: A challenge for the future. Appl. Econ. Perspect. Policy
2018, 40, 79–96. [CrossRef]
33. Misra, N.N.; Dixit, Y.; Al-Mallahi, A.; Bhullar, M.S.; Upadhyay, R.; Martynenko, A. IoT, Big Data and artificial intelligence in
agriculture and food industry. IEEE Internet Things J. 2020, 1, 99. [CrossRef]
34. Kitchenham, B.; Charters, S. Guidelines for Performing Systematic Literature Reviews in Software Engineering. Rep. EBSE´07. 2007.
Available online: https://www.elsevier.com/__data/promis_misc/525444systematicreviewsguide.pdf (accessed on 1 February 2022).
35. Petticrew, M.; Roberts, H. Systematic Reviews in the Social Sciences: A Practical Guide; John Wiley & Sons: Hoboken, NJ, USA, 2008.
36. Dutta, R.; Li, C.; Smith, D.; Das, A.; Aryal, J. Big Data Architecture for Environmental Analytics. In International Symposium on
Environmental Software Systems; Springer: Berlin/Heidelberg, Germany, 2015; pp. 578–588.
37. Doshi, Z.; Nadkarni, S.; Agrawal, R.; Shah, N. AgroConsultant: Intelligent Crop Recommendation System Using Machine
Learning Algorithms. In Proceedings of the 2018 Fourth International Conference on Computing Communication Control and
Automation (ICCUBEA), Pune, India, 16–18 August 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–6.
38. Rehman, A.; Liu, J.; Keqiu, L.; Mateen, A.; Yasin, M.Q. Machine learning prediction analysis using IoT for smart farming.
Int. J. Emerg. Trends Eng. Res. 2020, 8, 6482–6487.
39. Hajji, O.J.T. Mohammed Big Data Analytics and Artificial Intelligence Serving Agriculture. In Advanced Intelligent Systems for
Sustainable Development (AI2SD’2019); Ezziyyani, M., Ed.; Springer International Publishing: Berlin/Heidelberg, Germany, 2020;
pp. 57–65.
40. Balducci, F.; Impedovo, D.; Pirlo, G. Machine learning applications on agricultural datasets for smart farm enhancement. Machines
2018, 6, 38. [CrossRef]
41. Choudhary, N.K.; Chukkapalli, S.S.L.; Mittal, S.; Gupta, M.; Abdelsalam, M.; Joshi, A. YieldPredict: A Crop Yield Prediction
Framework for Smart Farms. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA,
10–13 December 2020; pp. 2340–2349.
42. Gnanasankaran, N.; Ramaraj, E. The effective yield of paddy crop in Sivaganga district—An initiative for smart farming.
Int. J. Sci. Technol. Res. 2020, 9, 6452–6455.
43. Priya, R.; Ramesh, D.; Khosla, E. Crop Prediction on the Region Belts of India: A Naïve Bayes MapReduce Precision Agricultural
Model. In Proceedings of the 2018 International Conference on Advances in Computing, Communications and Informatics
(ICACCI), Bangalore, India, 19–22 September 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 99–104.
44. Priya, R.; Ramesh, D. ML based sustainable precision agriculture: A future generation perspective. Sustain. Comput. Inform. Syst.
2020, 28, 100439. [CrossRef]
45. Donzia, S.K.Y.; Kim, H. Architecture Design of a Smart Farm System Based on Big Data Appliance Machine Learning. In
Proceedings of the 2020 20th International Conference on Computational Science and Its Applications (ICCSA), Cagliari, Italy,
1–4 July 2020; pp. 45–52.
46. Shelestov, A.; Lavreniuk, M.; Vasiliev, V.; Shumilo, L.; Kolotii, A.; Yailymov, B.; Kussul, N.; Yailymova, H. Cloud Approach to
Automated Crop Classification Using Sentinel-1 Imagery. IEEE Trans. Big Data 2020, 6, 572–582. [CrossRef]
47. Yahata, S.; Onishi, T.; Yamaguchi, K.; Ozawa, S.; Kitazono, J.; Ohkawa, T.; Yoshida, T.; Murakami, N.; Tsuji, H. A hybrid machine
learning approach to automatic plant phenotyping for smart agriculture. In Proceedings of the 2017 International Joint Conference
on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 1787–1793.
48. Fenu, G.; Malloci, F.M. An Application of Machine Learning Technique in Forecasting Crop Disease. In Proceedings of the 2019
the 3rd International Conference on Big Data Research, Paris, France, 20–22 November 2019.
49. Tombe, R. Computer Vision for Smart Farming and Sustainable Agriculture. In Proceedings of the 2020 IST-Africa Conference
(IST-Africa), Kampala, Uganda, 18–22 May 2020; IEEE: Piscataway, NJ, USA, 2020.
Agronomy 2022, 12, 748 33 of 34
50. Nóbrega, L.; Tavares, A.; Cardoso, A.; Gonzalves, P. Animal monitoring based on IoT technologies. In Proceedings of the 2018 IoT
Vertical and Topical Summit on Agriculture—Tuscany (IOT Tuscany), Tuscany, Italy, 8–9 May 2018; IEEE: Piscataway, NJ, USA,
2018; pp. 1–5.
51. Abbona, F.; Vanneschi, L.; Bona, M.; Giacobini, M. Towards modelling beef cattle management with Genetic Programming. Livest. Sci.
2020, 241, 104205. [CrossRef]
52. Amani, M.; Kakooei, M.; Moghimi, A.; Ghorbanian, A.; Ranjgar, B.; Mahdavi, S.; Davidson, A.; Fisette, T.; Rollin, P.; Brisco, B.; et al.
Application of Google Earth Engine Cloud Computing Platform, Sentinel Imagery, and Neural Networks for Crop Mapping in
Canada. Remote Sens. 2020, 12, 3561. [CrossRef]
53. Sathiaraj, D.; Huang, X.; Chen, J. Predicting climate types for the Continental United States using unsupervised clustering
techniques. Environmetrics 2019, 30, e2524. [CrossRef]
54. Amaechi, E.S.; Pham, H. Van Enhancement of Convolutional Neural Networks Classifier Performance in the Classification of IoT
Big Data. In Proceedings of the 4th International Conference on Machine Learning and Soft Computing, Haiphong, Vietnam,
17–19 January 2020; Association for Computing Machinery: Haiphong, Vietnam, 2020; pp. 25–29.
55. Saggi, M.K.; Jain, S. Reference evapotranspiration estimation and modeling of the Punjab Northern India using deep learning.
Comput. Electron. Agric. 2019, 156, 387–398. [CrossRef]
56. Ip, R.H.; Ang, L.M.; Seng, K.P.; Broster, J.C.; Pratley, J.E. Big Data and machine learning for crop protection. Comput. Electron. Agric.
2018, 151, 376–383. [CrossRef]
57. Sitokonstantinou, V.; Drivas, T.; Koukos, A.; Papoutsis, I.; Kontoes, C. Scalable Distributed Random Forest Classification for
Paddy Rice Mapping. 2020. Available online: https://zenodo.org/record/3662151 (accessed on 1 February 2022).
58. Aiken, V.C.F.; Dórea, J.R.R.; Acedo, J.S.; de Sousa, F.G.; Dias, F.G.; de Magalhães Rosa, G.J. Record linkage for farm-level data
analytics: Comparison of deterministic, stochastic and machine learning methods. Comput. Electron. Agric. 2019, 163, 104857.
[CrossRef]
59. Gumma, M.K.; Thenkabail, P.; Teluguntla, P.; Oliphant, A.; Xiong, J.; Giri, C.; Pyla, V.; Dixit, S.; Whitbread, A. Agricultural
cropland extent and areas of South Asia derived using Landsat satellite 30-m time-series big-data using random forest machine
learning algorithms on the Google Earth Engine cloud. GISci. Remote Sens. 2020, 57, 302–322. [CrossRef]
60. Yang, J.; Liu, M.; Lu, J.; Miao, Y.; Hossain, M.A.; Alhamid, M.F. Botanical internet of things: Toward smart indoor farming by
connecting people, plant, data and clouds. Mob. Netw. Appl. 2018, 23, 188–202. [CrossRef]
61. Vasumathi, M.T.; Kamarasan, M. Fruit disease prediction using machine learning over Big Data. Int. J. Recent Technol. Eng. 2019,
7, 556–559.
62. Wang, X.; Yang, K.; Liu, T. The Implementation of a Practical Agricultural Big Data System. In Proceedings of the 2019 IEEE 5th
International Conference on Computer and Communications (ICCC), Chengdu, China, 6–9 December 2019; pp. 1955–1959.
63. Pandya, A.; Odunsi, O.; Liu, C.; Cuzzocrea, A.; Wang, J. Adaptive and Efficient Streaming Time Series Forecasting with Lambda
Architecture and Spark. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA,
10–13 December 2020; pp. 5182–5190.
64. Zhou, L.; Pan, S.; Wang, J.; Vasilakos, A. Machine learning on Big Data: Opportunities and challenges. Neurocomputing 2017, 237,
350–361. [CrossRef]
65. Ochoa, K.S.; Guo, Z. A framework for the management of agricultural resources with automated aerial imagery detection.
Comput. Electron. Agric. 2019, 162, 53–69. [CrossRef]
66. Cui, X.; Gao, Z. A Standard Architecture of Agricultural Big Data for Deep Learning. In Proceedings of the 2020 IEEE International
Conference on Advances in Electrical Engineering and Computer Applications (AEECA), Dalian, China, 25–27 August 2020;
pp. 908–911.
67. L’heureux, A.; Grolinger, K.; Elyamany, H.; Capretz, M. Machine learning with Big Data: Challenges and approaches. IEEE Access
2017, 5, 7776–7797. [CrossRef]
68. del Río, S.; López, V.; Benítez, J.M.; Herrera, F. On the use of MapReduce for imbalanced Big Data using Random Forest. Inf. Sci.
2014, 285, 112–137. [CrossRef]
69. Wibowo, M.; Sulaiman, S.; Shamsuddin, S.M. Machine Learning in Data Lake for Combining Data Silos. In Proceedings of the
International Conference on Data Mining and Big Data, Fukuoka, Japan, 27 July–1 August 2017; Springer: Berlin/Heidelberg,
Germany, 2017; pp. 294–306.
70. LaPlante, A.; Sharma, B. Architecting Data Lakes Data Management Architectures for Advanced Business Use Cases; O’Reilly Media Inc.:
Sevastopol, CA, USA, 2016.
71. Khine, P.P.; Wang, Z.S. Data lake: A new ideology in Big Data era. In Proceedings of the InITM Web of Conferences 2018, Wuhan,
China, 15–17 December 2017; EDP Science: Ulys, France, 2018; Volume 17, p. 03025. [CrossRef]
72. Grolinger, K.; Hayes, M.; Higashino, W.A.; L’Heureux, A.; Allison, D.; Capretz, M. Challenges for MapReduce in Big Data. In
Proceedings of the 2014 IEEE World Congress on Services, Anchorage, AK, USA, 27 June–2 July 2014; pp. 182–189.
73. Loaiza, J.; Carmona, M.; Giuliani, G.; Fiameni, G. Big-Data in Climate Change Models—A Novel Approach with Hadoop
MapReduce. In Proceedings of the 2017 International Conference on High Performance Computing & Simulation (HPCS), Genoa,
Italy, 17–21 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 45–50.
Agronomy 2022, 12, 748 34 of 34
74. Yang, C.; Huang, Q.; Li, Z.; Liu, K.; Hu, F. Big Data and cloud computing: Innovation opportunities and challenges. Int. J. Digit. Earth
2017, 10, 13–53. [CrossRef]
75. Sukumar, S.R. Machine Learning in the Big Data Era: Are We There Yet? In Proceedings of the ACM Knowledge Discovery and
Data Mining: Workshop on Data Science for Social Good, New York, NY, USA, 24–17 August 2014.