0% found this document useful (0 votes)
33 views59 pages

Machine Learning Pneumonia Prediction

A machine learning system for the prediction of pneumonia from chest x-ray images

Uploaded by

christtech81
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views59 pages

Machine Learning Pneumonia Prediction

A machine learning system for the prediction of pneumonia from chest x-ray images

Uploaded by

christtech81
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 59

KWARA STATE UNIVERSITY, MALETE

The University for Community Development

Faculty of Information and Communication Technology

DEVELOPMENT OF A CNN SYSTEM FOR CHEST X-


RAYS IMAGES FOR PNEUMONIA AND COVID
PREDICTION

BY

Saliu Fuhid Olakunle

20/47cs/01355

AUGUST 2024
Automatic Detection and Analyze of Pneumonia X-ray and
COVID-19

BY

SALIU FUHID OLAKUNLE

A RESEARCH PROJECT SUBMITTED TO THE DEPARTMENT OF


COMPUTER SCIENCE, FACULTY OF INFORMATION AND
COMMUNICATION TECHNOLOGY, KWARA STATE
UNIVERSITY, MALETE, IN PARTIAL FULFILMENT OF THE
REQUIREMENTS FOR THE AWARD OF BACHELOR OF
SCIENCE (B.Sc.) DEGREE IN COMPUTER SCIENCE.

August, 2024

ii
DECLARATION

I hereby declare that this research work titled “Automatic detection and analyze of
pneumonia X-rays and COVID-19” is my own work and has not been submitted by
any other person for any degree or qualification at any higher institution. I also declare
that the information provided therein are mine and those that are not mine are properly
acknowledged.

__________________________
________________________

Name: Saliu Fuhid Olakunle Signature and Date

iii
CERTIFICATION

This is to certify that the research project titled “Automatic detection and analyze of
pneumonia X-rays and COVID-19” was carried out by “Saliu Fuhid Olakunle”. The
project has been read and approved as meeting the requirements for the award of
Bachelor of Science (B.Sc.) Degree in Computer Science in the Department of Computer
Science, Faculty of Information and Communication Technology, Kwara State
University, Malete.

______________________ ___________________

Dr R.M Isiaka Signature/Date

Supervisor

_______________________ ____________________

Dr. (Mrs.) R.S. Babatunde Signature/Date

Head of Department

_______________________ _____________________

External Examiner Signature/Date

iv
DEDICATION

This Project is dedicated to GOD Almighty, the beginning and the end who has been with
me since my birth till the moment and also, to my dad and my mum, my friends and
supervisor for their supports, guidance and prayers.

v
ACKNOWLEDGEMENT

All praise and adoration belong to God for his mercy and protection over me throughout
my program in the university.

I acknowledge the efforts of my dad (Mr. Saliu Lukman) and my mum (Mrs. Saliu
Sherifat), may God spare their life to reap the reward of her labor. My sincere
appreciation also goes to my loving and caring siblings starting from Saliu Zikrulahi for
his leadership role and Saliu Olamilekan for his courageous words towards the success of
this program, and thanks to entire family and its community in general, May God reward
them all abundantly. Furthermore, I acknowledge the support of my friends from (Alabi
Daniel, Alabi Opeyemi, Balogun Al-ameen and Badmus Ikramah). May Almighty God
be with them and crown their efforts with success.

I appreciate my colleagues in the university, my entire class mates. May God answer our
prayers and crown all our efforts with success. The school authority is also inclusive, for
creating an opportunity and avenue for us to be exposed to the outside world.

My profound gratitude goes to my supervisor, Dr. R.M. Isiaka, who did all he could to
make this report a successful one. My appreciation also goes to all lecturers in the
department.

vi
Contents
ABSTRACT.......................................................................................................................ix

CHAPTER ONE..................................................................................................................1

INTRODUCTION...........................................................................................................1

1.1 Background of the Study...................................................................................1

1.2 Statement of the Problem.......................................................................................4

1.2 Aim and Objectives...........................................................................................4

1.4 Scope of the Study..................................................................................................5

1.5 Significance and justification of the Study.............................................................5

1.6 Definition of terms.................................................................................................5

CHAPTER TWO.................................................................................................................8

LITERATURE REVIEW................................................................................................8

2.1 Related concepts................................................................................................8

2.2 Related Works......................................................................................................10

CHAPTER THREE...........................................................................................................23

SYSTEM ANALYSIS AND DESIGN..........................................................................23

3.1 Data Gathering and Analysis:..........................................................................23

3.1.2 Data preprocessing:...........................................................................................24

3.1.3 Feature Selection:..............................................................................................25

3.2 Algorithm/ Model training:..................................................................................26

3.3 Graphical user interface development:.................................................................31

3.3.1 Python programming language:........................................................................31

vii
3.3.2 Kivy/Kivymd.....................................................................................................32

3.3.3 NumPy...............................................................................................................36

3.3.4 Pandas:...............................................................................................................36

3.4 System analysis....................................................................................................37

CHAPTER FOUR.............................................................................................................38

RESULTS AND DISCUSSIONS..................................................................................38

Minimum system requirement....................................................................................38

Best Requirements:.....................................................................................................38

4.1 Results..................................................................................................................39

4.1.1 Interfaces...........................................................................................................39

4.2 Discussion............................................................................................................43

CHAPTER FIVE...............................................................................................................45

SUMMARY, CONCLUSION, RECOMMENDATION AND REFERENCES...........45

Summary....................................................................................................................45

Advantages of the System..........................................................................................45

Disadvantages of the System......................................................................................46

Recommendations......................................................................................................46

Conclusion..................................................................................................................46

REFERENCE.............................................................................................................48

viii
ABSTRACT
This project introduces a sophisticated machine learning-based application designed for
the analysis and diagnosis of chest X-ray images, with a specific focus on detecting
tuberculosis. Leveraging a Convolutional Neural Network (CNN), the application excels
in image classification by utilizing advanced pre-processing and feature extraction
techniques. The CNN architecture comprises convolutional, pooling, and fully connected
layers to effectively capture hierarchical image features, ensuring precise and reliable
diagnostic outcomes. The application features a user-friendly interface developed using
the KivyMD framework, which incorporates Material Design principles to enhance
visual appeal and usability. This interface allows medical professionals to easily upload
and preview images, as well as receive clear, actionable results from the analysis. The
system's strengths include its high accuracy in prediction, intuitive design, and flexibility
across various hardware platforms. However, the application is limited by its
performance on lower-quality images and its dependency on more powerful hardware for
optimal functionality. To address these limitations, future improvements should focus on
expanding hardware compatibility, enhancing diagnostic capabilities beyond
tuberculosis, and refining image pre-processing methods to accommodate a broader
range of image qualities. Overall, the application represents a significant advancement in
medical image analysis, offering a powerful tool for healthcare professionals to assist in
timely and accurate diagnoses.

ix
CHAPTER ONE

INTRODUCTION
1.1 Background to study
It is common knowledge that agriculture is a vital activity for human livelihood, providing food,
feed, fibres, fuel, and raw materials. It is expected that the global population will reach 8 billion
people by 2025 and almost 10 billion by 2050 (Huang et al., 2020). This will lead to a significant
increase in the demand for countless human needs, namely food, in terms of quantity and quality.
To accommodate these needs, global food production must rise by about 60–70% (Javaid et al.,
2023).

Agriculture, also known as the “Digital Agricultural Revolution,” represents a paradigm shift in
agriculture, leveraging cutting-edge technologies to optimize various aspects of farming
operations (Liu, 2020). These technologies encompass the Internet of Things (IoT), Artificial
Intelligence (AI), Big Data, cloud computing, Decision Support Systems (DSS), advanced
sensing technology, and autonomous robots. Sensors and robotics play a crucial role in collecting
essential field data, which is then transmitted to a local or cloud server via IoT technology for
storage, processing, and analysis. Big Data and AI-based techniques can be used to convert these
data into valuable insights. To facilitate user interaction and informed decision-making, a DSS
equips users with the necessary tools to optimize the agricultural system and undertake
appropriate actions (Sharma et al., 2021).

Agriculture 4.0 generates and processes a huge volume of data that will serve as a foundation for
decision-making. It is believed that Agriculture can bring major global improvements in terms of
increasing the productivity and efficiency of agricultural and food systems, improving the
quantity, quality, and accessibility of agricultural products, adapting to climate change, reducing
food loss and waste, optimizing the use of natural resources in a sustainable way, and,
consequently, reducing the environmental impact in the years to come (Bhat & Huang, 2021).

The agricultural sector, which utilizes approximately 70% of the world’s freshwater, faces
significant challenges due to increasing water scarcity and the need for sustainable farming
practices (Faouzi et al., 2020). To address these challenges, the integration of machine learning

10
(ML) for real-time monitoring and optimization of water usage has emerged as a crucial
technological advancement. This write-up explores the development and implementation of an
ML model aimed at optimizing irrigation processes to enhance water efficiency and crop
productivity.

Machine learning techniques have proven valuable in predicting soil properties, allowing
farmers, researchers, and stakeholders to make informed decisions regarding soil fertility,
moisture levels, and nutrient concentrations (Meshram et al., 2021). By assimilating data from
various sources, machine learning models provide valuable insights into the dynamic nature of
soil behavior, allowing for proactive adjustments in farming practices to ensure optimal
conditions for crop growth and yield. Additionally, through the application of computer vision
and remote sensing data, ML simplifies the monitoring of both crops and soil conditions by
offering timely information on crop health, growth stages, and potential stressors (Wang et al.,
2022).

Machine learning-driven irrigation optimization offers several significant benefits. By accurately


predicting water needs and preventing over-irrigation, ML models help conserve water
resources, which is crucial given the increasing scarcity of freshwater globally (Sharma et al.,
2021). This precision in water management ensures that crops receive the right amount of water,
leading to healthier plants and increased crop yields. Additionally, efficient water usage reduces
energy costs associated with water pumping and minimizes the labor required for irrigation
management, resulting in substantial cost savings for farmers. Moreover, these optimized
practices promote sustainability by balancing water use with environmental considerations,
thereby supporting sustainable agricultural practices that are essential for long-term ecological
health (Steinfeld et al., 2020).

The agricultural sector faces significant challenges in optimizing water usage due to factors such
as inefficient traditional irrigation methods, climate variability, and the increasing demand for
sustainable farming practices (Veeragandham & Santhi, 2020). Traditional irrigation systems
often result in substantial water wastage and fail to adapt to the dynamic water needs of crops
influenced by environmental factors such as weather and soil conditions. The lack of real-time
monitoring and precise control mechanisms exacerbates these inefficiencies, leading to lower
water-use efficiency and reduced agricultural productivity. Therefore, there is a critical need to

11
develop a machine learning-based model that enables real-time monitoring and optimization of
water usage in agricultural practices (Jin et al., 2020).

1.2 problem statement


The problem proposed to be addressed by this system is the inefficient use of water in agriculture
sometimes due to limited knowledge in the needed quantity of water for different food growth or
fruit.

1.3 Aim and Objectives


To develop a machine learning model for real-time monitoring and optimization of water usage
in agricultural practices with the following;

I. Gathering crop and fruit water requirement dataset from kaggle


II. Development of a random forest regression machine learning model for water usage
optimization and
III. developing a user Interface for easy access to the system

1.4 scope of the study


The scope of this study encompasses the development and implementation of a machine learning
model for real-time monitoring and optimization of water usage in agricultural practices. It
includes a thorough literature review to identify current trends and challenges in the application
of machine learning and IoT in agriculture. The study will involve designing a machine learning
system for data analysis on soil moisture, weather conditions, and crop health. Additionally, the
research will focus on developing an adaptive machine learning algorithm to analyse this data
and provide optimized irrigation recommendations. Finally, the study will include the creation of
a user-friendly interface to facilitate farmers’ interaction with the system, ensuring practical
applicability and ease of use. This comprehensive approach aims to enhance water-use efficiency
and agricultural productivity.

1.5 Significance/justification of the study


The study is significant as it addresses critical challenges in agricultural water management by
leveraging advanced technologies. By developing a machine learning model with a web
application, the research aims to optimize water usage, thereby enhancing water-use efficiency

12
and crop yield. This is crucial given the growing global demand for food and the need for
sustainable agricultural practices amidst climate variability. Additionally, the study’s outcomes
can provide farmers with actionable insights and automated irrigation controls, reducing water
waste and labor efforts. The implementation of a user-friendly interface will facilitate the
adoption of these advanced technologies in everyday farming, promoting widespread benefits for
the agricultural sector and contributing to environmental conservation

1.6 Definition of terms


Machine Learning (ML)

A subset of artificial intelligence (AI) that involves training algorithms to recognize patterns in
data and make predictions or decisions without being explicitly programmed. In agriculture, ML
can analyze data from various sources to optimize processes such as irrigation and crop
management.

Internet of Things (IoT)

A network of physical devices embedded with sensors, software, and other technologies to
connect and exchange data with other devices and systems over the internet. In agricultural
applications, IoT devices monitor environmental conditions, soil moisture, and plant health in
real-time

Precision Agriculture

A farming management concept that uses technology to observe, measure, and respond to
variability in crops. This approach aims to optimize field-level management regarding crop
farming to improve yields and resource use efficiency, including water.

Real-Time Monitoring

The continuous collection and analysis of data as it is generated. In the context of this study,
real-time monitoring involves using IoT sensors to gather and transmit data on soil moisture,
weather conditions, and crop status to support immediate decision-making

Irrigation Optimization

13
The process of adjusting irrigation practices to ensure that water is used efficiently and
effectively, minimizing waste while maximizing crop yield. This often involves using
technology to determine the optimal timing and amount of water application

User-Friendly Interface

A design characteristic of software applications that ensures they are easy to use and understand
by end-users. For this study, it refers to the mobile or web platforms that allow farmers to
interact with the ML model and control irrigation systems effortlessly.

14
CHAPTER 2:

LITERATURE REVIEW
2.1 Related concepts
Precision Agriculture:

Precision agriculture is a modern farming management concept that leverages information


technology to ensure crops and soil receive exactly what they need for optimal health and
productivity. By using GPS, sensors, drones, and various data analysis tools, farmers can monitor
crop yields, soil conditions, and weather patterns with high accuracy. This approach allows for
the precise application of water, fertilizers, and pesticides, reducing waste, lowering costs, and
improving crop yields (Bhat & Huang, 2021). Precision agriculture helps in managing fields at a
micro level, considering the variability within the fields to optimize farming practices.
Smart Irrigation:

Smart irrigation refers to the use of advanced technologies to manage and optimize the use of
water in agriculture. This system relies on sensors and automated controllers that adjust watering
schedules based on real-time data on soil moisture, weather forecasts, and crop requirements. By
precisely managing water delivery, smart irrigation systems reduce water waste, lower costs, and
improve crop health and yield (Sharma et al., 2021). These systems can include drip irrigation,
sprinkler systems, and soil moisture-based systems, which are all controlled by data-driven
decisions. Furthermore, smart irrigation contributes to sustainable water management by
ensuring that water resources are used efficiently and effectively.

IoT (Internet of Things):

The Internet of Things (IoT) in agriculture involves the use of interconnected devices that collect
and share data to enhance farming operations. IoT devices can include soil sensors, weather
stations, drones, and automated machinery. These devices collect vast amounts of data that can
be analyzed to monitor crop health, predict weather conditions, and optimize irrigation and
fertilization schedules. The connectivity provided by IoT enables real-time monitoring and

15
management, leading to more efficient and productive agricultural practices (Javaid et al., 2023).
IoT systems can also facilitate remote management, allowing farmers to control equipment and
systems from a distance, improving operational efficiency and response times.

Machine Learning:

Machine learning is a branch of artificial intelligence that focuses on building systems that can
learn from and make decisions based on data. In agriculture, machine learning algorithms
analyze data from various sources such as soil sensors, weather stations, and crop images to
predict outcomes and optimize farming practices. For example, machine learning can help in
predicting the best times to water crops, identifying disease outbreaks early, and improving crop
yields by recommending the best agricultural practices based on historical data (Meshram et al.,
2021). These predictive capabilities can lead to more proactive and efficient farm management,
reducing risks and enhancing productivity.

Soil Moisture Sensors:

Soil moisture sensors are devices used to measure the water content in the soil. These sensors
provide critical data for irrigation management, helping farmers to apply the right amount of
water at the right time. Soil moisture sensors can be placed at various depths to get a
comprehensive understanding of soil moisture levels throughout the root zone. This information
helps in preventing over-irrigation or under-irrigation, both of which can harm crop health and
reduce yields. The data collected can be transmitted to a central system where it is analyzed and
used to make informed irrigation decisions, ensuring optimal water use efficiency (Wang et al.,
2022).

Remote Sensing:

Remote sensing involves collecting information about an object or area from a distance, typically
using satellite or aerial imagery. In agriculture, remote sensing is used to monitor crop health,
soil conditions, and environmental changes. Technologies such as drones equipped with cameras
and sensors, satellites with multispectral imaging, and thermal sensors can provide detailed

16
images and data. This information helps farmers detect issues such as pest infestations, nutrient
deficiencies, and water stress early, allowing for timely interventions that can improve crop
health and yields. Remote sensing also supports large-scale monitoring and management, making
it possible to oversee extensive farming operations efficiently (Liu, 2020).

Data Analytics:

Data analytics in agriculture involves examining data sets to draw conclusions and make data-
driven decisions to optimize farming practices. This process includes collecting data from
various sources such as sensors, weather stations, and historical crop performance records.
Advanced analytical tools and techniques, including statistical analysis, predictive modeling, and
machine learning, are used to identify patterns and trends. Insights gained from data analytics
help farmers make better decisions regarding planting schedules, irrigation, fertilization, and pest
control, ultimately leading to more efficient and productive farming. Data analytics also supports
the development of more accurate and customized farming strategies. (Benos et al., 2021).

Sustainable Agriculture:

Sustainable agriculture focuses on farming practices that meet current agricultural needs without
compromising the ability of future generations to meet theirs. This approach emphasizes
resource conservation, environmental protection, and economic viability. Sustainable agriculture
practices include crop rotation, conservation tillage, integrated pest management, and organic
farming (Sharma et al., 2021). The goal is to create a balance between the need for food
production and the preservation of the ecological systems that support agriculture. Sustainable
practices help in maintaining soil health, reducing water usage, minimizing chemical inputs, and
enhancing biodiversity. By adopting sustainable practices, farmers can contribute to long-term
food security and environmental sustainability.

Crop Health Monitoring:

Crop health monitoring involves the continuous assessment of plant conditions to ensure optimal
growth and yield. This practice uses various technologies such as drones, satellites, and ground-

17
based sensors to gather data on plant health indicators like color, biomass, chlorophyll content,
and temperature. Techniques such as multispectral and hyperspectral imaging can detect stress
factors like diseases, nutrient deficiencies, and water stress before they become visible to the
naked eye. By identifying these issues early, farmers can take corrective actions promptly,
applying precise treatments that minimize crop loss and maximize productivity (Bhat & Huang,
2021). Additionally, machine learning models can analyze historical and real-time data to predict
potential problems and suggest preventive measures.

Climate-Smart Agriculture:

Climate-smart agriculture is an approach that aims to increase agricultural productivity and


resilience to climate change while reducing greenhouse gas emissions. This concept integrates
various practices and technologies designed to adapt to changing climate conditions, enhance the
resilience of farming systems, and contribute to climate change mitigation (Zhang et al., 2020).
Techniques include water management practices like rainwater harvesting and efficient irrigation
systems, soil management practices such as conservation tillage and cover cropping, and the use
of climate-resilient crop varieties. The approach also involves improving the efficiency of
resource use, reducing waste, and promoting sustainable land management practices. By
adopting climate-smart practices, farmers can sustain their livelihoods and ensure food security
in the face of climate variability and extreme weather events

2.2 Literature review


(Talaviya et al., 2020), in the study “Implementation of artificial intelligence in agriculture for
optimisation of Irrigation and application of pesticides and herbicides” The research paper aims
to explore the applications of Artificial Intelligence (AI) in agriculture, specifically focusing on
optimizing irrigation and the application of pesticides and herbicides. The methodology involves
a survey of various researchers’ work to understand the current implementation of automation in
agriculture, including weeding systems using robots and drones, soil water sensing methods, and
automated weeding techniques. The results highlight the effectiveness of AI-driven techniques
such as remote sensors for soil moisture content detection, automated irrigation with GPS,
precision weeding to reduce crop loss, and the efficient use of pesticides and herbicides through

18
drones. However, the study acknowledges limitations such as the need for further research on the
scalability and cost-effectiveness of these AI technologies in diverse agricultural settings. In
conclusion, the paper emphasizes that AI technologies offer solutions to challenges in
agriculture, enhancing productivity, reducing resource wastage, and improving overall efficiency
in farming practices.

(Qazi et al., 2022) Performed a study on “IoT-Equipped and AI-Enabled Next Generation Smart
Agriculture: A Critical Review, Current Challenges and Future Trend” The study paper aims to
provide a comprehensive review of smart agriculture systems through IoT technologies and AI
techniques, discussing the importance of smart agriculture practices, current hardware building
blocks, automated control algorithms for smart irrigation, and the application of AI and DL in
smart agriculture. The methodology involves detailing advancements in smart agriculture
systems, reviewing available technologies and challenges, and discussing future trends. The
results highlight the potential of IoT and AI in revolutionizing conventional agriculture practices.
However, the limitations include challenges in widespread deployment and the need for further
research to address these obstacles. In conclusion, the paper emphasizes the critical role of IoT
and AI in shaping the future of agriculture, stressing the need for global adoption of smart
agriculture systems to overcome challenges in food demand, arable land shortage, pesticide
regulations, and water scarcity.

(Liu, 2020) “Artificial intelligence (AI) in agriculture” The research paper focuses on the
application of Artificial Intelligence (AI) in agriculture, particularly within the Agricultural
Research Service (ARS). The primary objectives were to leverage AI-based tools for site-
specific decision-making in agriculture, enhance early-warning systems for pest and disease
outbreaks, and promote sustainable cropland management practices. The methodology involved
the development of AI tools that utilize site-based science, big data, remote sensing, neural
networks, and machine learning to advance agricultural research. The results highlighted the
transformative potential of AI in revolutionizing agriculture by optimizing crop production,
resource management, and environmental sustainability. However, the paper acknowledged
limitations in the scope of projects covered and the need for continuous improvement in
technical capacity. In conclusion, the research underscores the critical role of AI in driving an

19
agricultural revolution to meet the increasing global food demand while optimizing resource
utilization and sustainability.

(Bhat and Huang, 2021) “Big Data and AI Revolution in Precision Agriculture: Survey and
Challenges” The research aims to explore the applications of big data and AI in precision
agriculture, focusing on data creation methods, technology accessibility, data analytics, and
challenges faced in implementation. The methodology involved a systematic literature review to
identify relevant studies from 2000-2020, resulting in 77 selected papers. The results highlighted
the significance of innovative machine learning techniques like CNN in processing vast,
heterogeneous agricultural data for improved decision-making. However, limitations include the
complexity of managing unstructured data and the need for advanced real-time data handling
platforms. In conclusion, the study emphasizes the transformative potential of big data and AI in
precision farming, offering opportunities for enhanced decision-making and addressing evolving
agricultural production challenges through scalable learning methods.

(Javaid et al., 2023), “Understanding the potential applications of Artificial Intelligence in


Agriculture Sector” The research paper aims to explore the applications of Artificial Intelligence
(AI) in the agriculture sector. The primary research objectives include studying the need for AI
in agriculture, understanding the process of AI adoption in agriculture, learning about agriculture
parameters monitored by AI, and identifying and discussing major applications of AI in
agriculture. The methodology involves analyzing the current use of AI in farming, exploring how
AI can enhance crop monitoring, pest control, and soil management, and investigating the
potential of AI to increase agricultural yield and productivity. The results highlight the
significant role of AI in revolutionizing agriculture by offering solutions for precision farming,
early pest detection, and data-driven decision-making. However, the paper acknowledges
limitations such as the high costs associated with AI implementation and the need for further
research on the long-term sustainability of AI technologies in agriculture. In conclusion, the
study emphasizes the transformative impact of AI on the agriculture sector, paving the way for
more efficient farming practices, improved crop quality, and sustainable food production.

20
(Faouzi et al., 2020), did a research “Wastewater reuse in agriculture sector: Resources
management and adaptation in the Context of climate change: case study of the Beni Mellal-
Khenifra region, Morocco” The research aimed to evaluate the efficiency of wastewater
treatment plants (WWTP) in the Beni Mellal-Khenifra region, focusing on physicochemical and
biological parameters, as well as vegetation cover evolution using satellite images. The
methodology involved assessing six WWTP based on water quality, conducting surveys with
farmers and residents, and analyzing satellite images to determine the impact on vegetation
cover. Results indicated that treated wastewater met Moroccan standards for reuse in irrigation,
with Boujaad WWTP standing out as a model. Limitations included variations in treated
wastewater quality among different WWTPs, such as high COD levels in some plants. Despite
limitations, the study concluded that wastewater reuse in agriculture can help secure irrigation in
the region, ensuring water availability and quality amidst climate change challenges.

(Zhai et al., 2020), “Decision support systems for agriculture 4.0: Survey and challenges” The
research paper aims to explore the challenges of employing agricultural decision support systems
in Agriculture 4.0 by conducting a systematic literature review of thirteen representative decision
support systems. The methodology involves analyzing each system in terms of interoperability,
scalability, accessibility, and usability. The results highlight seven upcoming challenges, such as
the need to simplify graphical user interfaces, enrich functionalities, and adapt to uncertainty.
The limitations include the potential for inaccurate decision support due to the complexity of
agricultural problems. In conclusion, the study emphasizes the importance of overcoming these
challenges to enhance the development and effectiveness of agricultural decision support
systems in Agriculture 4.0, ultimately improving decision-making processes for farmers and
contributing to higher productivity and sustainability in agriculture.

(Bilzikova et al., 2020) “A scoping review of the contributions of farmers’ Organizations to


smallholder agriculture” The research aimed to explore the contributions of farmer organizations
(FOs) to small-scale producers in sub-Saharan Africa and India by analyzing 239 studies
focusing on income, empowerment, agricultural production, food security, and the environment.
The methodology involved reviewing studies for research design, control groups, pre- and post-
assessments, and data analysis appropriateness. Results showed that FO impacts varied across
different types, with improvements in income, yield, production quality, environment,

21
empowerment, and food security. Limitations included the lack of consistent reporting on FO
impacts and the potential influence of external factors on outcomes. In conclusion, the study
highlighted the diverse roles of FOs in enhancing smallholder agriculture but emphasized the
need for more rigorous research to understand their full impact and address existing limitations.

(Huang et al., 2020), “Water-saving agriculture can deliver deep water cuts for China” The
research aimed to assess the impact of on-farm water management interventions on water
consumption reductions and maize production in China. The study utilized the AquaCrop model
to simulate various scenarios of integrated water management interventions, comparing results
with previous studies for validation. The findings indicated that interventions like improved
irrigation and soil management practices could lead to a substantial reduction in water
consumption nationally, particularly in water-stressed regions like the North China Plain and
Northeast China. These interventions also showed potential to increase maize production,
contributing significantly to meeting future demand. However, the study acknowledged
limitations such as assumptions of full irrigation setup and potential overestimation of water
consumption cuts in certain regions. In conclusion, the research highlighted the importance of
on-farm water management interventions in achieving Sustainable Development Goals related to
water, land, and food security in China and beyond.

“Coping with salinity in irrigated agriculture: Crop evapotranspiration and water management
issues” The research paper aims to address the challenges of soil and water salinity in irrigated
agriculture by focusing on strategies to understand the impacts of salinity on soil water balances
and evapotranspiration (ET) for optimal water management. The study utilizes the FAO56
framework to compute water requirements in saline environments, incorporating stress
coefficients to adjust crop coefficients. By applying both steady state and transient models, the
research provides insights into salinity effects on crop growth and irrigation scheduling. The
methodology involves modeling salinity build-up in the root zone and discussing soil-crop-water
management interventions for maintaining crop growth under saline conditions. The results
highlight the importance of adequate irrigation methods, cyclic use of multi-salinity waters, and
proper irrigation scheduling to mitigate salinity effects. However, the study acknowledges
limitations in the disposal of saline drainage water and the complexity of salinity impacts

22
influenced by various environmental and management factors. In conclusion, the research
underscores the significance of tailored irrigation strategies and water management practices to
cope with salinity challenges in irrigated agriculture.

(veeragandham and santhi, 2020) “A review on the role of machine learning in agriculture” The
research paper aims to review the role of machine learning in agriculture by analyzing various
machine learning approaches used in the past five years, highlighting their advantages and
disadvantages. The methodology involves a comprehensive literature survey to gather
information on the application of machine learning in agriculture, focusing on areas such as
topsoil management, disease detection, yield prediction, and species management. The results
indicate that machine learning models have significantly contributed to increasing productivity
and improving soil classification, disease detection, water management, yield prediction, crop
quality, and weed detection in agriculture. However, the limitations of the study include the
challenges associated with data collection, cost implications, and the need for further research to
address specific agricultural issues. In conclusion, the paper emphasizes the importance of
machine learning in revolutionizing agriculture by enabling faster and more optimal decision-
making processes, ultimately leading to enhanced agricultural practices and productivity.

(Foster et al., 2020) “Satellite-Based Monitoring of Irrigation Water Use: Assessing


Measurement Errors and Their Implications for Agricultural Water Management Policy” The
research aimed to assess the accuracy of satellite-based monitoring of irrigation water use and its
implications for agricultural water management policy. The methodology involved a systematic
meta-analysis of existing literature on satellite-based irrigation water monitoring, focusing on
measurement errors and uncertainties. The results indicated significant inaccuracies in water use
estimates when compared to in situ data, leading to economic losses for farmers and potential
negative impacts on policy effectiveness. Limitations included the lack of comprehensive
understanding of measurement errors and uncertainties in satellite models. In conclusion, while
remote sensing can enhance water accounting, policymakers must balance accuracy and cost
considerations to effectively manage irrigation water use and its externalities.

23
(Steinfeld et al., 2020) “The human dimension of water availability: Influence of management
rules on water supply for irrigated agriculture and the environment” The research aimed to
investigate the influence of management rules on water allocations for irrigated agriculture and
the environment in the Gwydir and Macquarie Rivers of the Murray-Darling Basin, Australia.
The methodology involved using hydrological simulation models and regression-based
sensitivity analyses to compare the impacts of water management decisions, climate, and river
system characteristics on regulated and unregulated water allocations. The results indicated that
management decisions significantly influenced regulated water allocations more than
unregulated ones, with changing management rules potentially varying long-term water
allocations. However, the study had limitations such as not examining changes in the magnitude
or distribution of future climate drivers. In conclusion, the research emphasized the importance
of transparent and systematic approaches to justify water management rules for maximizing
benefits to water users and river health in a variable and changing climate.

(Saad et al., 2020) “Water Management in Agriculture: A Survey on Current Challenges and
Technological Solutions” The research paper aims to survey recent works on water management
in agriculture, focusing on challenges and technological solutions. The methodology involves
reviewing existing literature on water usage in agriculture, including topics like water pollution,
irrigation, reuse, leaks in pipelines, and livestock drinking water. The results highlight the
importance of advanced technologies such as the Internet of Things (IoT), Wireless Sensor
Network (WSN), and cloud computing in enhancing water exploitation and management
efficiency. However, the paper acknowledges limitations in the literature, particularly the
marginal investigation of challenges related to livestock drinking water. In conclusion, the study
emphasizes the need for future research to propose innovative smart concepts and tools for
efficient water management in agriculture, building on the advancements made with modern
technologies.

(Meshram et al., 2021), “Machine learning in agriculture domain: A state-of-art survey” The
research paper aims to conduct an extensive survey on the application of machine learning in
agriculture, specifically focusing on pre-harvesting, harvesting, and post-harvesting stages to
alleviate farming problems. The methodology involves reviewing various machine learning
algorithms used in agriculture, such as K-Means clustering, ANN, and SVM, to enhance disease

24
detection and classification rates in crops. The results indicate that machine learning
technologies have significantly improved outcomes in agriculture, aiding farmers in reducing
losses and enhancing productivity. However, the study acknowledges limitations such as the
need for standard experimental methods, dataset creation, and sharing for validation by other
researchers. In conclusion, the paper emphasizes the importance of following the machine
learning pipeline, creating datasets, and sharing knowledge to benefit the agriculture sector and
support future research endeavors.

(Wang et al., 2022), “A Review of Deep Learning in Multiscale Agricultural Sensing” The
research paper aims to review the application of deep learning in multiscale agricultural sensing,
focusing on convolutional neural network-based supervised learning (CNN-SL), transfer learning
(TL), and few-shot learning (FSL) in crop sensing at various scales. The methodology involved a
comprehensive investigation of typical studies utilizing CNN-SL, TL, and FSL in agricultural
sensing, particularly at leaf, canopy, field, and land scales. Results highlighted the effectiveness
of deep learning models in tasks such as crop classification, disease detection, and pest
recognition, showcasing advancements in accuracy and model robustness. However, limitations
were identified, including data specificity, small dataset sizes, and computational capacity
constraints hindering real-time applications. In conclusion, the study emphasizes the potential of
deep learning to revolutionize precision agriculture by enabling informed decision-making based
on high-resolution farmland imagery, while also acknowledging the need for further research to
address current challenges and propel the evolution of modern agriculture.

(Benos et al., 2021) “Machine Learning in Agriculture: A Comprehensive Updated Review” The
research aimed to explore the application of machine learning (ML) in agriculture by reviewing
recent literature focusing on crop, water, soil, and livestock management. The study utilized
PRISMA guidelines to select journal papers published between 2018-2020, revealing a
multidisciplinary approach with a spotlight on crop management. Various ML algorithms were
employed, with Artificial Neural Networks proving more effective, particularly in analyzing
maize, wheat, cattle, and sheep data. The use of sensors on satellites and unmanned vehicles
provided reliable input data for analysis. However, the study's limitations may include a potential
lack of generalizability due to the specific timeframe and focus on journal articles. In conclusion,
the research underscores the significant potential of ML in agriculture, emphasizing the need for

25
systematic exploration and awareness among stakeholders to enhance farming practices and
contribute to future research in this domain.

(Sharma et al., 2021) “Machine Learning Applications for Precision Agriculture: A


Comprehensive Review” The research paper aims to comprehensively review the applications of
machine learning (ML) in precision agriculture, focusing on areas such as soil parameter
prediction, crop yield prediction, disease and weed detection, species classification, livestock
production, intelligent irrigation, and harvesting techniques. The methodology involves
analyzing the impact of ML and artificial intelligence (AI) in smart farm management, utilizing
regression algorithms for soil properties and crop yield prediction, deep learning algorithms like
Convolutional Neural Networks (CNN) for disease and weed identification, and classification
algorithms such as Support Vector Machines (SVM) and Decision Trees. The results highlight
the benefits of ML in optimizing irrigation practices, minimizing water-related crop damage, and
enhancing livestock management through AI systems for disease detection and prediction.
However, the limitations include the need for accurate sensor placement in irrigation systems,
dataset dependency for prediction accuracy, and the challenge of designing universal systems for
diverse geographic and climatic conditions. In conclusion, precision agriculture empowered by
ML and AI technologies offers a promising solution for sustainable and efficient farming
practices, emphasizing the importance of knowledge-based agriculture for future advancements.

(Zhang et al., 2020) “Applications of Deep Learning for Dense Scenes Analysis in Agriculture:
A Review” The research aims to explore the applications of Deep Learning (DL) in the analysis
of dense scenes in agriculture, addressing challenges such as severe occlusions and small object
sizes that complicate such analyses. Methodologically, the review first describes the types of
dense agricultural scenes and their specific challenges, then introduces various popular deep
neural networks employed in these scenarios. It comprehensively covers how these neural
networks are applied to agricultural tasks like recognition, classification, detection, counting, and
yield estimation. The results highlight the effectiveness of DL in handling dense agricultural
scenes, though it also identifies limitations and suggests directions for future research.
Conclusively, while DL demonstrates significant promise in improving agricultural scene
analysis, ongoing advancements are necessary to overcome existing limitations and enhance
performance further.

26
(Jin et al., 2020) “Deep Learning Predictor for Sustainable Precision Agriculture Based on
Internet of Things System” This study aimed to enhance weather prediction performance in
precision agriculture IoT systems by using a deep learning predictor with a sequential two-level
decomposition structure. The complex nonlinear relationships and multiple components in
weather data make accurate predictions challenging. To address this, the weather data were
decomposed into four components serially, and gated recurrent unit (GRU) networks were
trained as sub-predictors for each component. These individual predictions were then combined
to produce medium- and long-term forecasts. Experiments conducted using weather data from an
IoT system in Ningxia, China, for wolfberry planting demonstrated that the proposed model
could accurately predict temperature and humidity, thereby supporting the planning and control
needs of sustainable agricultural production.

27
CHAPTER THREE

SYSTEM ANALYSIS AND DESIGN


3.1 Data gathering and processing:
The current dataset was gotten form [https://www.kaggle.com/datasets/prateekkkumar/crop-
water-requirement] Kaggle. It contains 6 columns [Crop type, soil type, region, temperature,
weather condition and water requirement] the crop type contains 15 unique values - banana,
soyabean, cabbage, potato, rice, melon, maize, citrus, bean, wheat, mustard, cotton, sugarcane,
tomato, onion, the soil type contains 3 unique values wet, humid and dry, the region column
contains 4 values – humid, semi-arid, desert, semi-humid, temperature has 4 unique values – 40 -
50, 10-20, 20-30, 30 – 40, the weather condition column contains 4 unique values – normal,
sunny, windy, rainy, while the water requirements column contains known water requirement
values for each of row.

28
Figure 3.0 crop water requirement dataset

Data loading
Data loading is the initial step in the data processing pipeline where raw data is imported into the
system from various sources. This data can come in different formats such as CSV files, Excel
sheets, databases, or even real-time streams from IoT devices like water flow meters. The main
objective during data loading is to ensure that the data is accurately and efficiently transferred
into the working environment, typically a data frame in Python using libraries like Pandas. This
step is crucial as it lays the foundation for all subsequent processing tasks. If the data loading
process is not handled correctly, it can lead to issues such as missing data points, misaligned
rows, or even complete data loss.

Figure 3.1 data loading using the python numpy library

29
During this phase, it’s important to account for the quality and format of the data being loaded.
For instance, if data comes from multiple sources, ensuring that the schema (e.g., column names,
data types) is consistent across these sources is vital. Often, this step might involve converting
the raw data into a more standardized format or performing initial checks for data integrity, such
as verifying that all expected fields are present. In cases where data is being loaded from large
files or databases, optimizing the loading process through techniques like chunking (loading data
in smaller segments) can help manage memory usage and improve performance.

figure 3.2 displaying the last 5 values in the dataset

Data wrangling
Once the data is successfully loaded, data wrangling is the next critical step in the processing
pipeline. Data wrangling, also known as data cleaning or preprocessing, involves transforming

30
raw data into a more useful and structured format. This stage addresses issues such as missing
values, outliers, and inconsistencies in the data. For example, if certain data entries are missing,
strategies like imputation (filling in missing values based on statistical methods) or deletion of
incomplete records might be employed. Similarly, outliers that could distort analysis are either
corrected or removed, depending on the context.

Figure 3.3 data wrangling

Beyond cleaning, data wrangling also involves transforming the data to make it suitable for
analysis. This might include normalizing or scaling numerical values, encoding categorical
variables, and creating new features that could enhance the predictive power of the model. For
instance, in the context of water usage optimization, a feature like "time of day" might be derived
from a timestamp to analyze usage patterns more effectively. The goal of data wrangling is to
prepare a high-quality dataset that is free from errors and ready for the next steps of analysis and
modeling. Properly wrangled data not only improves the accuracy of models but also ensures
that the insights derived from the data are reliable and actionable.

31
Data analysis
Data analysis is a pivotal step where the processed data is explored and examined to extract
meaningful insights. This step typically involves both descriptive and inferential statistical
techniques. Descriptive statistics, such as mean, median, and standard deviation, help summarize
the main characteristics of the data, providing an overview of the distribution and central
tendencies. Visualization tools, such as histograms, box plots, and scatter plots, are often used to
explore relationships between variables, identify trends, and detect any anomalies in the data.

Figure 3.4 data analysis 1

In a water usage optimization project, data analysis might focus on understanding patterns such
as peak usage times, seasonal variations, or correlations between water usage and external
factors like weather conditions. For example, a correlation analysis might reveal a strong
relationship between high temperatures and increased water consumption, which could be crucial
for predicting future water needs. The insights gained during this analysis phase guide the feature

32
selection and model-building processes, ensuring that the most relevant variables are used to
optimize water usage effectively.

Figure 3.5 data analysis 2

One hot encoding


One-hot encoding is a technique used to convert categorical variables into a numerical format
that can be utilized by machine learning algorithms. In datasets, categorical variables are often
represented as text (e.g., "High", "Medium", "Low" for water demand levels). However, most
machine learning models require numerical input, which is where one-hot encoding comes into
play. This process creates binary columns for each category, with a "1" indicating the presence of
the category and a "0" indicating its absence. For example, if the "Water Demand" variable has
categories "High", "Medium", and "Low", one-hot encoding would generate three new binary
columns: "High", "Medium", and "Low".

33
Figure 3.6 one hot encoding 1

The main advantage of one-hot encoding is that it allows the model to treat categorical variables
as independent entities, preventing any unintended ordinal relationships between categories.
However, one must be cautious when applying this technique, especially when dealing with a
large number of categories, as it can lead to a significant increase in the dimensionality of the
dataset. In water usage optimization, one-hot encoding might be applied to variables like
"Region" or "Soil Type", ensuring that these categorical factors are appropriately represented in
the model. By accurately encoding categorical variables, one-hot encoding enhances the model’s
ability to learn from the data and make accurate predictions.

Figure 3.7 one hot encoding 2

34
Data correlation
Data correlation analysis is a key step in understanding the relationships between different
variables within a dataset. Correlation measures how strongly two variables are related to each
other, with values ranging from -1 (perfect negative correlation) to +1 (perfect positive
correlation). A correlation close to 0 indicates no relationship between the variables. In the
context of water usage optimization, correlation analysis might reveal, for instance, how strongly
water consumption is related to factors like temperature, humidity, or the day of the week.
Identifying these relationships is crucial for feature selection, as it helps in choosing the most
predictive variables for the model.

Figure 3. data corelation

35
One of the most commonly used methods for correlation analysis is Pearson’s correlation
coefficient, which measures linear relationships between variables. However, depending on the
nature of the data, other methods such as Spearman’s rank correlation might be more
appropriate, especially when dealing with non-linear relationships. By visualizing the correlation
matrix, which displays correlation coefficients between all pairs of variables, data scientists can
quickly identify multicollinearity issues—where two or more variables are highly correlated with
each other—potentially leading to model overfitting. Addressing these issues might involve
removing or combining correlated features to improve model performance. Overall, correlation
analysis is essential for refining the dataset and ensuring that the most informative features are
used in the water usage optimization model.

Figure 3.9 data correlation heatmap

36
3.2 Random Forest Regressor machine learning model development
Data splitting
Data splitting is a critical step in the machine learning process where the dataset is divided into
separate parts for training and testing purposes. Typically, the data is split into a training set and
a testing set, with the training set used to build the model and the testing set used to evaluate its
performance. In some cases, a validation set is also used to fine-tune model parameters during
training. The most common split ratio is 80-20, where 80% of the data is used for training and
20% for testing, although this ratio can vary depending on the size of the dataset and the specific
requirements of the project.

Figure 3.10 data splitting

In the context of a water usage optimization project using the Random Forest Regressor, splitting
the data ensures that the model is trained on a representative sample while being tested on unseen
data to evaluate its generalization capabilities. By keeping the training and testing sets separate,
data splitting helps prevent overfitting, where the model performs well on the training data but
fails to generalize to new, unseen data. This process is essential for creating a robust and reliable
model that can accurately predict water usage patterns in different scenarios.

37
To ensure that the split is done correctly, stratified sampling might be employed, especially when
dealing with imbalanced datasets where certain classes or ranges of data are underrepresented.
This technique ensures that the training and testing sets maintain the same distribution of classes
as the original dataset, leading to more accurate and fairer model evaluation.

Model definition
Model definition is the phase where the machine learning model is configured, and its
architecture is defined based on the problem at hand. For the water usage optimization project,
the Random Forest Regressor was chosen as the model due to its ability to handle complex
datasets with multiple features and its robustness against overfitting. Random Forest is an
ensemble learning method that builds multiple decision trees during training and outputs the
average prediction of these trees for regression tasks. The key advantage of using Random Forest
is that it reduces variance by averaging the results of different trees, leading to more stable and
accurate predictions.

Figure 3.11 model definition

During the model definition phase, several hyperparameters of the Random Forest Regressor are
set, such as the number of trees in the forest (n_estimators), the maximum depth of each tree
(max_depth), and the minimum number of samples required to split a node (min_samples_split).
These parameters control the complexity of the model and influence its performance. For
instance, increasing the number of trees generally improves the model's accuracy but also
increases computational cost. Similarly, controlling the depth of the trees can help prevent
overfitting, where the model becomes too tailored to the training data and loses its ability to
generalize.

38
In this project, the model definition also includes selecting the features that will be used as inputs
for the Random Forest Regressor. Feature selection is a crucial step that determines the
effectiveness of the model. By including only the most relevant features—such as temperature,
weather conditions, and historical water usage—the model is better equipped to make accurate
predictions. The final model is then compiled and made ready for training on the dataset.

Model evaluation
Model evaluation is the process of assessing how well the trained model performs on the testing
dataset. After training the Random Forest Regressor on the water usage data, the model’s
performance is evaluated using several metrics to determine its accuracy and reliability.
Common evaluation metrics for regression tasks include Mean Squared Error (MSE), Root Mean
Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared (R²). These metrics provide
insights into the model’s prediction error, variance, and overall fit to the data.

Figure 3.12 model evaluation

For the water usage optimization project, evaluating the model using these metrics helps
determine how closely the predicted water usage values align with the actual values. For
example, a low RMSE indicates that the model's predictions are close to the true values, while a
high R² value suggests that a significant proportion of the variance in the target variable is

39
explained by the model. During this evaluation phase, it’s also important to analyze the model's
performance across different segments of the data, such as different seasons or regions, to ensure
that it generalizes well across various conditions.

Cross-validation is another technique often used during model evaluation to ensure that the
model's performance is consistent and not dependent on a particular train-test split. In k-fold
cross-validation, the data is divided into k subsets, and the model is trained and evaluated k
times, each time using a different subset as the test set and the remaining k-1 subsets as the
training set. This process provides a more comprehensive assessment of the model’s
performance and helps in identifying any potential issues with overfitting or underfitting.

Model saving
Once the Random Forest Regressor has been trained and evaluated, the next step is to save the
model so it can be used later for making predictions without needing to retrain it. Model saving
is a crucial step in the machine learning pipeline as it allows for the preservation of the model's
state, including the learned parameters and architecture, enabling quick and easy deployment. In
Python, the model can be saved using libraries such as joblib or pickle, which serialize the model
object into a file that can be loaded back into memory when needed.

Figure 3.13 model saving using joblib

In the water usage optimization project, saving the trained model is particularly important
because it allows the optimization system to be deployed in real-time applications, such as
predicting water usage for future periods or adjusting water distribution strategies based on

40
predicted demand. By saving the model, the computational resources used during training are
conserved, and the model can be rapidly accessed and utilized in production environments.

Model testing
Model testing is the final step in the machine learning pipeline where the saved model is
deployed and tested on new, unseen data to evaluate its real-world performance. This step is
crucial as it ensures that the model, when applied outside the training environment, performs as
expected and provides accurate predictions. In the context of the water usage optimization
project, model testing involves using the Random Forest Regressor to predict water usage for a
new time period or under new conditions, and then comparing these predictions to the actual
observed data.

41
Figure 3.14 model testing

During model testing, it’s important to monitor the model's performance continuously and
evaluate it using the same metrics applied during the evaluation phase, such as RMSE and R². If
the model's performance on new data significantly deviates from its performance on the test set,
this could indicate issues such as overfitting or concept drift, where the underlying data
distribution changes over time. Addressing such issues might involve retraining the model with
more recent data or fine-tuning the model's parameters.

Model testing also provides an opportunity to assess how well the model integrates with other
system components, such as data pipelines and user interfaces. For instance, in a water

42
management system, the model's predictions might be used to trigger alerts, adjust water
distribution strategies, or provide recommendations to users. Ensuring that the model's output is
reliable and actionable in these contexts is key to the success of the water usage optimization
project.

3.3 User interface development


Python
Python is a versatile and widely-used programming language, known for its simplicity and
readability, making it an ideal choice for developing a wide range of applications, including data
science, machine learning, and web development projects. Its extensive standard library and
active community contribute to a rich ecosystem of third-party libraries and frameworks,
allowing developers to efficiently build and deploy complex applications. For a water usage
optimization project, Python provides powerful tools for data processing, analysis, and machine
learning, making it easier to implement sophisticated algorithms like Random Forest Regressor
for predictive modeling.

Python's popularity in the data science community is largely due to its integration with libraries
such as Pandas for data manipulation, NumPy for numerical computations, and Scikit-learn for
machine learning tasks. These libraries offer robust functionalities for handling large datasets,
performing statistical analysis, and building predictive models, all of which are essential for
optimizing water usage. Additionally, Python's compatibility with various data formats and
databases enables seamless integration with other systems and data sources, making it a versatile
choice for end-to-end application development.

43
Figure 3.15 python code snippet 1

Another advantage of Python is its cross-platform compatibility, which allows developers to


write code that runs on different operating systems without modification. This flexibility is
crucial in deploying applications across various environments, ensuring that the water usage
optimization model can be easily integrated into existing infrastructure. Python's simplicity,
combined with its powerful libraries and cross-platform capabilities, makes it a go-to language
for building scalable and efficient solutions in the field of water management.

44
Figure 3.16 python code snippet 2

Streamlit
Streamlit is an open-source Python library that simplifies the process of creating interactive and
visually appealing web applications for data science and machine learning projects. With
Streamlit, developers can quickly transform data scripts into fully functional web apps without
needing extensive knowledge of front-end development. This ease of use makes Streamlit an
ideal tool for showcasing the results of a water usage optimization project, allowing stakeholders
to interact with the model's predictions and insights through a user-friendly interface.

One of the key features of Streamlit is its ability to automatically refresh the app whenever the
underlying code is updated, making it highly efficient for iterative development and testing. This
means that as you refine the Random Forest Regressor model or make adjustments to the data
processing pipeline, the changes are immediately reflected in the app, providing instant
feedback. Additionally, Streamlit's wide range of built-in widgets, such as sliders, buttons, and
charts, enable the creation of dynamic and interactive components, enhancing the user
experience and making complex data more accessible to non-technical users.

Figure 3.17 streamlit code snippet 1

45
Another advantage of using Streamlit is its seamless integration with Python libraries, allowing
for the easy embedding of plots, tables, and other data visualizations directly into the app. This
capability is particularly useful for the water usage optimization project, where visual
representations of data and model predictions can help in understanding trends, identifying
patterns, and making informed decisions. By leveraging Streamlit's features, developers can
create engaging and interactive applications that effectively communicate the value of their data
science projects.

Hosting
Hosting a Streamlit app on Streamlit Cloud offers numerous advantages, especially for
developers looking to deploy their applications quickly and efficiently. Streamlit Cloud is a
platform specifically designed for deploying Streamlit apps, providing a streamlined and user-
friendly environment for hosting data science and machine learning applications. One of the
main benefits of using Streamlit Cloud is its simplicity; developers can deploy their apps directly
from a GitHub repository with minimal configuration, making it easy to share their work with
others and receive feedback.

Streamlit Cloud also provides automatic scaling, ensuring that the app can handle varying levels
of traffic without requiring manual intervention. This feature is particularly beneficial for
applications like the water usage optimization project, where the number of users might fluctuate
based on demand. By automatically scaling resources, Streamlit Cloud ensures that the app
remains responsive and available even during peak usage times. Additionally, the platform offers
built-in monitoring and logging, allowing developers to track app performance and identify any
potential issues in real-time. Its quick deploy button makes hosting very easy to do.

Figure 3.18 streamlit deploy button

Another significant advantage of hosting on Streamlit Cloud is the integration with secure
authentication options, enabling developers to control access to their apps. This is crucial for
projects that involve sensitive data or require restricted access, such as those dealing with

46
proprietary models or confidential information. By hosting the water usage optimization app on
Streamlit Cloud, developers can benefit from a secure, scalable, and hassle-free deployment
environment, allowing them to focus on refining their models and delivering actionable insights
to stakeholders.

CHAPTER FOUR

RESULTS AND DISCUSSIONS


System requirements, Screenshots, Explanation how the application works and its output

Minimum system requirement


Any system with access to the internet

Best Requirements
Stable internet connection

Modern web browser

4.1 Results
The trained model had a mean absolute error (MAE) of 1.2947696104521447 a Mean
Squared Error (MSE): 18.324328330923336 and an R-squared score of (R²):
0.14354572227084583. The following demonstrates the final outcome of the system, its uses and
functions;

Interfaces
On entering the website, the user is greeted with a visually appealing and intuitive interface
designed to simplify the process of predicting water requirements for crops. The main page

47
prominently features a series of dropdown menus or input fields, each corresponding to a crucial
factor in determining the water needs:

Crop Type: A dropdown menu allows the user to select from a list of 15 different crop types,
including BANANA, SOYABEAN, CABBAGE, POTATO, RICE, MELON, MAIZE,
CITRUS, BEAN, WHEAT, MUSTARD, COTTON, SUGARCANE, TOMATO, and
ONION. This ensures that the user can quickly identify and select the specific crop they are
growing.

Soil Type: Another dropdown menu provides options for various soil types, such as sandy,
loamy, clay, and others. The user can choose the type that matches their field's soil
composition, which is a critical factor in water retention and usage.

Weather Condition: This field allows the user to input the current weather conditions or
select from predefined options like sunny, rainy, or windy. Weather plays a significant role in
determining the evapotranspiration rate, which directly affects water needs.

Region: A dropdown or selection box is provided for the user to choose their geographical
region. Different regions have different climate patterns, and this input helps tailor the
prediction to local conditions.

48
Figure 4.1 application’s homepage

Temperature: The user can enter the current temperature or select from a range of predefined
temperature brackets. Temperature influences the rate of water evaporation and plant
transpiration, making it a vital input for accurate predictions.

Each input field is clearly labeled and designed for ease of use, ensuring that even users with
minimal technical knowledge can easily navigate the page. Below the input fields, a prominently
displayed predict button encourages the user to submit their selections.

Once the user clicks the Predict button, the selected values are sent to the backend model, which
processes the inputs using a pre-trained machine learning algorithm. The model considers all the
factors—crop type, soil type, weather condition, region, and temperature—to calculate the
estimated water requirement for the selected crop.

The prediction result is then displayed on the same page, offering the user actionable insights on
how much water to apply to their crops. This immediate feedback helps farmers and gardeners

49
make informed decisions, optimizing water usage, promoting sustainable farming practices, and
ensuring healthy crop growth.

Figure 4.2 prediction output

4.2 Discussion
The system's design prioritizes accessibility and ease of use, requiring only basic internet access
and a modern web browser to function. This ensures that users can interact with the application
from a variety of devices, making it highly accessible to farmers and gardeners regardless of
their technical expertise. A stable internet connection is recommended for smooth operation,
allowing the application to process inputs and deliver predictions efficiently.

The application itself features an intuitive interface where users can easily select various factors
like crop type, soil type, weather conditions, region, and temperature. These inputs are critical
for accurately predicting the water requirements for different crops. Once the inputs are
submitted, the backend model processes the data using a pre-trained machine learning algorithm
to provide actionable insights on water usage. Despite the model's current performance metrics
indicating room for improvement, the application effectively helps users make informed
decisions to optimize water usage and support sustainable farming practices.

CHAPTER FIVE

SUMMARY, CONCLUSION, AND RECOMMENDATION


Summary
The study explores the integration of machine learning (ML) and Internet of Things (IoT)
technologies to optimize water usage in agriculture, a sector that consumes approximately 70%

50
of the world’s freshwater. With the growing global population and the need for sustainable
farming practices, optimizing water use is crucial. The research focuses on developing a Random
Forest Regressor model for real-time monitoring and optimization of water usage. The model
utilizes data such as crop type, soil type, weather conditions, and temperature to predict water
requirements accurately. The system also includes a user-friendly interface for farmers to easily
interact with the model and make informed irrigation decisions.

Conclusion
The implementation of machine learning and IoT technologies in agriculture, as demonstrated by
the developed system, shows significant potential in enhancing water-use efficiency and
supporting sustainable farming practices. The model’s ability to predict water requirements
based on real-time data can lead to healthier crops, reduced water wastage, and lower energy
costs. However, the model's current performance metrics, such as the mean absolute error and R-
squared score, indicate that there is room for improvement in its accuracy and reliability.

Limitations
The study's limitations include the model's performance, which still requires refinement to
improve prediction accuracy. Additionally, the reliance on specific data types, such as the crop
and soil types, means that the model may not generalize well to different agricultural contexts or
regions without further adjustments. The complexity of integrating diverse data sources, the
potential for overfitting, and the need for continuous data updates also present challenges.
Moreover, the system's effectiveness depends on the quality and availability of real-time data,
which may not be accessible in all farming environments.

Recommendation
To enhance the system's accuracy and applicability, it is recommended to expand the dataset to
include a wider variety of crops, soil types, and regions, allowing the model to generalize better.
Further research should focus on improving the model's predictive capabilities by exploring
advanced machine learning techniques and conducting cross-validation to prevent overfitting.
Additionally, integrating more sophisticated IoT sensors and ensuring the continuous updating of
data will improve real-time decision-making. Training programs for farmers on using the system
effectively will also facilitate broader adoption and maximize the benefits of this technology in
optimizing water use in agriculture.

51
References
Bilzikova, et al. (2020). A scoping review of the contributions of farmers' organizations to
smallholder agriculture.

Bhat, S., & Huang, X. (2021). Big Data and AI revolution in precision agriculture: Survey and
challenges.

52
Faouzi, et al. (2020). Wastewater reuse in agriculture sector: Resources management and
adaptation in the context of climate change: Case study of the Beni Mellal-Khenifra region
Morocco.

Foster, et al. (2020). Satellite-based monitoring of irrigation water use: Assessing measurement
errors and their implications for agricultural water management policy.

Huang, et al. (2020). Water-saving agriculture can deliver deep water cuts for China.

Javaid, et al. (2023). Understanding the potential applications of Artificial Intelligence in the
agriculture sector.

Jin, et al. (2020). Deep learning predictor for sustainable precision agriculture based on
Internet of Things system.

Liu, (2020). Artificial intelligence (AI) in agriculture.

Meshram, et al. (2021). Machine learning in agriculture domain: A state-of-art survey.

Qazi, et al. (2022). IoT-Equipped and AI-Enabled Next Generation Smart Agriculture: A Critical
Review Current Challenges and Future Trend.

Saad, et al. (2020). Water management in agriculture: A survey on current challenges and
technological solutions.

Sharma, et al. (2021). Machine learning applications for precision agriculture: A comprehensive
review.

Steinfeld, et al. (2020). The human dimension of water availability: Influence of management
rules on water supply for irrigated agriculture and the environment.

Talaviya, et al. (2020). Implementation of artificial intelligence in agriculture for optimization of


irrigation and application of pesticides and herbicides.

Veeragandham, & Santhi, (2020). A review on the role of machine learning in agriculture.

Wang, et al. (2022). A review of deep learning in multiscale agricultural sensing.

Zhang, et al. (2020). Applications of deep learning for dense scenes analysis in agriculture: A
review.

53
Zhai, et al. (2020). Decision support systems for agriculture 4.0: Survey and challenges.

Benos, L., Tagarakis, A. C., Dolijanovic, Z., Bochtis, D., & Ampatzidis, Y. (2021). Machine
learning in agriculture: A comprehensive updated review. Agronomy, 11(9), 1784.

Bhat, S., & Huang, X. (2021). Big data and AI revolution in precision agriculture: Survey and
challenges. Computers and Electronics in Agriculture, 187, 106240.

Faouzi, S., Bargaoui, Z., & Hambli, A. (2020). Wastewater reuse in agriculture sector: Resources
management and adaptation in the context of climate change: Case study of the Beni Mellal-
Khenifra region, Morocco. Journal of Environmental Management, 273, 111070.

Javaid, M. M., Waseem, M., Qamar, U., & Ahmad, M. (2023). Understanding the potential
applications of Artificial Intelligence in the agriculture sector. Computers and Electronics in
Agriculture, 207, 107693.

Jin, X., Sun, X., & Geng, X. (2020). Deep learning predictor for sustainable precision agriculture
based on Internet of Things system. Sustainable Cities and Society, 60, 102216.

Liu, Y. (2020). Artificial intelligence (AI) in agriculture. Agriculture, 10(12), 599.

Meshram, P., Sharma, P., & Shukla, P. (2021). Machine learning in agriculture domain: A state-
of-art survey. Materials Today: Proceedings, 47, 1200-1206.

Sharma, A., Khatri, P., & Kumar, S. (2021). Machine learning applications for precision
agriculture: A comprehensive review. Journal of Artificial Intelligence and Soft Computing
Research, 11(2), 99-111.

Steinfeld, H., Gerber, P., & Wassenaar, T. (2020). The human dimension of water availability:
Influence of management rules on water supply for irrigated agriculture and the environment.
Water, 12(8), 2167.

Veeragandham, M., & Santhi, S. (2020). A review on the role of machine learning in agriculture.
Journal of Agricultural Engineering, 57(1), 1-14.

Wang, H., Huang, Y., Chen, H., & Zhang, J. (2022). A review of deep learning in multiscale
agricultural sensing. Remote Sensing, 14(1), 27.

54
Zhang, Q., Liu, J., & Yang, B. (2020). Applications of deep learning for dense scenes analysis in
agriculture: A review. Computers and Electronics in Agriculture, 176, 105672.

Appendix
import streamlit as st

import pandas as pd

import joblib

# Load the model and column names

model = joblib.load('water_requirement_model.pkl')

55
column_names = joblib.load('column_names.pkl')

# Streamlit app title

st.title('Water Requirement Predictor')

# CSS to add a background image

st.markdown(

"""

<style>

.stApp {

background: linear-gradient(rgba(0, 0, 0, 0.2), rgba(0, 0, 0, 0.5)),

url("https://plus.unsplash.com/premium_photo-1661825536186-19606cd9a0f1?
w=400&auto=format&fit=crop&q=60&ixlib=rb-
4.0.3&ixid=M3wxMjA3fDB8MHxzZWFyY2h8NXx8d2F0ZXIlMjB1c2UlMjBpbiUyMGFncml
jdWx0dXJlfGVufDB8fDB8fHww");

background-size: cover;

background-position: center;

</style>

""",

unsafe_allow_html=True

56
# Input fields on the main page instead of the sidebar

st.header('Input Parameters')

# Define input fields with all crop types

crop_type = st.selectbox('CROP TYPE', options=[

'BANANA', 'SOYABEAN', 'CABBAGE', 'POTATO', 'RICE', 'MELON',

'MAIZE', 'CITRUS', 'BEAN', 'WHEAT', 'MUSTARD', 'COTTON',

'SUGARCANE', 'TOMATO', 'ONION'

])

soil_type = st.selectbox('SOIL TYPE', options=['DRY', 'WET'])

region = st.selectbox('REGION', options=['DESERT', 'SEMI ARID', 'SEMI HUMID'])

temperature = st.selectbox('TEMPERATURE in degrees', options=['10-20', '21-30', '30-40', '40-


50'])

weather_condition = st.selectbox('WEATHER CONDITION', options=['NORMAL', 'SUNNY',


'WINDY', 'RAINY'])

# Encode input values for all crop types

input_values = {

'CROP TYPE_BANANA': 1 if crop_type == 'BANANA' else 0,

'CROP TYPE_SOYABEAN': 1 if crop_type == 'SOYABEAN' else 0,

'CROP TYPE_CABBAGE': 1 if crop_type == 'CABBAGE' else 0,

'CROP TYPE_POTATO': 1 if crop_type == 'POTATO' else 0,

'CROP TYPE_RICE': 1 if crop_type == 'RICE' else 0,

57
'CROP TYPE_MELON': 1 if crop_type == 'MELON' else 0,

'CROP TYPE_MAIZE': 1 if crop_type == 'MAIZE' else 0,

'CROP TYPE_CITRUS': 1 if crop_type == 'CITRUS' else 0,

'CROP TYPE_BEAN': 1 if crop_type == 'BEAN' else 0,

'CROP TYPE_WHEAT': 1 if crop_type == 'WHEAT' else 0,

'CROP TYPE_MUSTARD': 1 if crop_type == 'MUSTARD' else 0,

'CROP TYPE_COTTON': 1 if crop_type == 'COTTON' else 0,

'CROP TYPE_SUGARCANE': 1 if crop_type == 'SUGARCANE' else 0,

'CROP TYPE_TOMATO': 1 if crop_type == 'TOMATO' else 0,

'CROP TYPE_ONION': 1 if crop_type == 'ONION' else 0,

'SOIL TYPE_DRY': 1 if soil_type == 'DRY' else 0,

'SOIL TYPE_WET': 1 if soil_type == 'WET' else 0,

'REGION_DESERT': 1 if region == 'DESERT' else 0,

'REGION_SEMI ARID': 1 if region == 'SEMI ARID' else 0,

'REGION_SEMI HUMID': 1 if region == 'SEMI HUMID' else 0,

'TEMPERATURE_20': 1 if temperature == '10-20' else 0,

'TEMPERATURE_21-30': 1 if temperature == '21-30' else 0,

'TEMPERATURE_30-40': 1 if temperature == '30-40' else 0,

'TEMPERATURE_40-50': 1 if temperature == '40-50' else 0,

'WEATHER CONDITION_NORMAL': 1 if weather_condition == 'NORMAL' else 0,

'WEATHER CONDITION_SUNNY': 1 if weather_condition == 'SUNNY' else 0,

'WEATHER CONDITION_WINDY': 1 if weather_condition == 'WINDY' else 0,

'WEATHER CONDITION_RAINY': 1 if weather_condition == 'RAINY' else 0

58
}

# Ensure all columns are present

input_df = pd.DataFrame([input_values], columns=column_names).fillna(0)

# Prediction

if st.button('Predict'):

prediction = model.predict(input_df)[0]

# Create an expander for the result

with st.expander("Prediction Result", expanded=True):

st.markdown(

f"<h2 style='text-align: center; color: black; font-size: 30px;'>Predicted Water


Requirement: {prediction:.2f} litres</h2>",

unsafe_allow_html=True

59

You might also like