Milk Quality Prediction Using Machine Learning
Milk Quality Prediction Using Machine Learning
1,2, 4
Department of Computer Engineering, School of Technology, Pandit Deendayal Energy University, Gandhinagar, India
3
Information Technology Department, Vardhaman College of Engineering, Hyderabad, India
Abstract
Milk is the main dietary supply for every individual. High-quality milk shouldn't contain any adulterants. Dairy products are
sold everywhere in society. Yet, the local milk vendors use a wide range of adulterants in their products, permanently altering
the evaporated. Using milk that has gone bad can have serious health consequences. On October 18 of this year, the Food
Safety and Standards Authority of India (FSSAI), the nation's top food safety authority, released the final result of the
National Milk Safety and Quality Survey (NMSQS) and declared the milk readily available in India to be "mostly safe."
According to an FSSAI survey, 68.4% of the milk in India is tainted. The quality of milk cannot be checked by any equipment
or special system. Milk that has not been pasteurized has not been treated to get rid of harmful bacteria. Infected raw milk
may contain Salmonella, Campylobacter, Cryptosporidium, E. coli, Listeria, Brucella, and other dangerous pathogens. These
microorganisms pose a major risk to your family's health. Manually analyzing the various milk constituents can be very
challenging when determining the quality of the milk. Analyzing and discovering with the aid of machine learning can help
with this endeavor. Here a machine learning-based milk quality prediction system is developed. The proposed technology
has shown 99.99% classification accuracy.
Keywords: Machine Learning, Milk Quality Prediction, Random Forest, Support Vector Machine, Label Encoding
Copyright © 2023 D. Bhavsar et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA
4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the
original work is properly cited.
doi: 10.4108/eetiot.4501
*
Corresponding author. Email: [email protected]
broadband infrared light source to simultaneously capture and SVM network. Ghosh et al. (2023) embarked on a
multi-wave length feature data [4]. Therefore, assure the comprehensive study to assess water quality through
milk's quality, it must be examined for the presence of predictive machine learning. Their research underscored
required ingredients and any potential adulterants [5]. In the potential of machine learning models in effectively
this instance, sensors are utilized to calculate several assessing and classifying water quality. The dataset used
parameters, including as pH, turbidity, and color. In order for this purpose included parameters like pH, dissolved
to combat illegal products like low-quality milk, the milk oxygen, BOD, and TDS. Among the various models they
industry should be able to submit ongoing data on milk employed, the Random Forest model emerged as the most
quality to the administration during the production of milk accurate, achieving a commendable accuracy rate of
packages. Hence it becomes an ultimate need to assess the 78.96%. In contrast, the SVM model lagged behind,
quality of milk in a short duration of time with higher registering the lowest accuracy of 68.29%[18].
accuracy. To perform this task Machine learning (ML) can Alenezi et al. (2021) developed a novel Convolutional
be applied as a useful tool. Machine learning is a subset of Neural Network (CNN) integrated with a block-greedy
Artificial intelligence that mainly deals with enabling a algorithm to enhance underwater image dehazing. The
computer to recognize complex patterns using historical method addresses color channel attenuation and optimizes
data. Here 2 machine learning is used to for examining the local and global pixel values. By employing a unique
grade of milk. For the training and validation of the model, Markov random field, the approach refines image edges.
the milk quality dataset available in the Kaggle repository Performance evaluations, using metrics like UCIQE and
is used. UIQM, demonstrated the superiority of this method over
existing techniques, resulting in sharper, clearer, and more
colorful underwater images[19].
2. Literature Review Sharma et al. (2020) presented a comprehensive study on
the impact of COVID-19 on global financial indicators,
Sheng et al [6] proposed Multiwavelength Gradient- emphasizing its swift and significant disruption. The
Boosted Regression Tree to Analysis of Protein and Fat in research highlighted the massive economic downturn, with
Milk. Multiwavelength Spectral Sensor System developed global markets losing over US $6 trillion in a week in
a method for calculating the milk's wavelength intensity February 2020. Their multivariate analysis provided
using a multichannel spectral sensor. The coefficient of insights into the influence of containment policies on
determination, mean square error, mean absolute error, and various financial metrics. The study underscores the
explained variance regression score was used to evaluate profound effects of the pandemic on economic activities
the effectiveness of the GBRT regression model. and the potential of using advanced algorithms for
Brudzewski et al [7] has proposed the SVM model to detection and analysis[20].
Obtain data for the classification. The system for
recognizing and categorizing objects has been
implemented using SVM neural networks with linear and
radial kernels. They used a system based on oxide-based 3. Proposed Work & Methodology
gas sensors for the purpose of classifying milk. Wasudeo
Moharkar et al [8] Proposed Laser-Induced The different phases of the proposed system are shown in
Instrumentation to Detection and Quantification of Milk. the following block diagram.
Using laser-induced spectrometry, a few common milk
adulterants can be found. Used for embedded data
collecting is the Raspberry Pi. Kumar et al [9] proposed
Support vector machines (SVM) and residual neural
networks are employed in the research's ensemble machine
learning technique. SVM, a supervised learning technique,
is employed for classification and regression tasks,
whereas ResNets, a class of deep neural network design,
are frequently employed for image
recognition applications. Shobana et al [10] have proposed
in the paper aim to develop a fruit intake recommendation
system for blind people. The researchers employed deep Fig1. System Architecture
learning methods for feature extraction, such as Visual
Geometry Group 16 and Convolutional Neural Network.
Additionally, they employed machine learning methods
3.1 Data Collection
like Logistic Regression, Light Gradient Boosting, and
Random Forest (RF) for prediction. Ruifang et al [11] used The data set for the proposed system Collected from Kaggle
in the paper Extreme Learning Machines and kernel-based repository[12]. This dataset consists of 7 independent features as
Extreme Learning Machines. They compared these models' shown in the table1. These parameters are used to predict analysis
performance to that of the widely used BP neural network of the milk. Grade (Target) of the milk which is categorical data
Where Low (Bad) or Medium (Moderate) High are three different 3.3 Models
classes.The total number of records present in the dataset is 1059
rows, and 8 columns Out of all features, 7 are categorical and 1 is
In model training, we are using training data. In the dataset,
numeric.
the training data is 80% and the testing data is 20%. We are
Table 1. Categorical and Numerical data using RF and SVM to evaluate the model.
Grade pH
3.3.1 Random Forest
Odor Random Forest is an ensemble learning method that creates
various decision trees during training and outputs a class
Temperature (for classification tasks) or mean prediction (for regression
tasks) for each tree on unseen data. The "forest" it builds is
Taste a collection of decision trees, typically trained using a
"bagging" approach. Random forests can manage data sets
Fat with many features and determine the importance of each
feature in predicting milk quality. In a real-world scenario,
Color some milk samples may be missing certain measurements.
Random forests can handle missing values and still
Turbidity produce accurate predictions. One of the challenges with
decision trees is that they tend to overfit. However, random
forests can achieve better generalization by using multiple
trees and averaging their results. [11].
3.2 Data Pre-Processing
In regression and classification issues, random forests, a
In the first stage calculated the data’s missing value in pre- supervised machine learning method, are frequently
processing. It is discovered that none of the features have a employed and, most of the time, offer excellent results even
missing value. In the next step the Label encoding is without hyperparameter modification. Because of It
performed. The value of the attribute in the problem cannot constructs a decision tree and Entropy criteria consider.
be understood by a computer at all, hence the values in this "Gini" stands for the Gini impurity, while "entropy" stands
situation are transformed to category integer values using for information gain. Gini index is calculated as shown in
label encoding. Label Encoding is used for the ‘Grade’ equation2.
feature in this dataset. Finally scalling of the feature values
𝒄𝒄
is performed. The Min-Max scaling is applied here. In Min- Gini = 𝟏𝟏 − �𝒊𝒊=𝟏𝟏(𝑝𝑝𝑖𝑖 )𝟐𝟐 (2)
max scaling max value and min values are used for scale as Where
shown in the equation 1. Before model fitting, firstly 𝑝𝑝𝑖𝑖 =proportion of data belongs to class c
feature-wise normalization, such as Min-Max[13]. Scaling
is typically employed to address this potential issue[14].
4. Performance Analysis made by a specific dairy, it can identify the milk's fat
content. The proposed method, which is based on the RF
In this proposed work RFand SVM are used for the milk application, has excellent generalization properties for
quality prediction. The prediction effectiveness of the relatively small training data sets. Using the Machine
classifiers is determined using different performance learning-based model the milk quality prediction can be
matrices such as Performance Score shown in Table 2. done with more accuracy which is better than the
machinery-based system. For a better assessment of milk
Table 2. Performance Score quality in a real-time environment, the proposed system
can be integrated with any device that can acquire the value
of differ rent milk quality parameters. Better accuracy at
Accuracy Precision Recall F1 Score the relatively small size of a calibration data collection,
RF 0.92 0.85 1.00 0.92 cheaper calibration costs. Ease of adaptation to various
working environments. It might be used in the food
business for checking the milk's production parameters.
SVM 0.57 0.51 0.92 0.66
References
1. Anderson, Melisa, et al. "The microbial content of
Accuracy of a classifier predicts the Number of correct unexpired pasteurized milk from selected supermarkets
predictions from the Total number of predictions [15]. Precision in a developing country." Asian Pacific journal of
indicates how well a Classifier predicts True Positive from Total tropical biomedicine 1.3 (2011): Volume 1, Issue 3,
Positive Predicted [16]. Recall defines how well the classifier 2011, Pages 205-211, ISSN 2221-1691,
predicts true positives from Actual predicted [17]. The F1 score doi:10.1016/S2221-1691(11)60028-2.
measures the efficiency of the model using the Precision and 2. Dhanashekar R, Akkinepalli S, Nellutla A. “Milk-borne
Recall. The formula for the above performance parameters shown infections. An analysis of their potential effect on the
in the equation 3, 4, 5 and 6. milk industry”. Germs. 2012 Sep 1;2(3):101-9. doi:
10.11599/germs.2012.1020. PMID: 24432270;
J+K
Accuracy = (3) PMCID: PMC3882853.
J+K+L+M
J 3. Wenchuan Guo, Xinhua Zhu, Hui Liu, Rong Yue,
Recall = (4) Shaojin Wang,"Effects of milk concentration and
J+M
J freshness on microwave dielectric properties", Journal
Precision = (5)
J+L of Food Engineering, Volume 99, Issue 3,2010, Pages
2∗Precision∗Recall
F1 Score= 344-350,ISSN 0260-8774, doi:10.1016/j.jfoodeng.
Precision+Recall
(6) 2010.03.015.
4. J. N. V. R. Swarup Kumar, D. N. V. S. L. S. Indira, K.
Where Srinivas and M. N. Satish Kumar, "Quality Assessment
J = True positive and Grading of Milk using Sensors and Neural
K= True negative Networks," 2022 International Conference on
L = False positive Electronics and Renewable Systems (ICEARS),
M = False negative Tuticorin, India, 2022, pp. 1772-1776, doi:
10.1109/ICEARS53579.2022.9752269
5. L. W. Moharkar and S. Patnaik, "Detection and
RF has more accuracy than SVM. RF algorithm is
Quantification of Milk Adulteration by Laser Induced
assemble-based. RF is a better learning algorithm because
Instumentation," 2019 IEEE 5th International
of more classifiers than SVM. Also better Decision
Conference for Convergence in Technology (I2CT),
making. RF mapping is in Information Gain. Information
Bombay, India, 2019, pp. 1-5, doi:
Gain shows how much pure classification. This helps in RF 10.1109/I2CT45611.2019.9033883.
to burst accuracy, precision, recall, and F score. 6. T. Sheng, S. Shi, Y. Zhu, D. Chen and S. Liu, "Analysis
of Protein and Fat in Milk Using Multiwavelength
Gradient-Boosted Regression Tree," in IEEE
5. Conclusion Transactions on Instrumentation and Measurement,
vol. 71, pp. 1-10, 2022, Art no. 2507810, doi:
The technique for identifying milk in the paper is based on
10.1109/TIM.2022.3165298
the application of SVM and RF. The Grade has been 7. K. Brudzewski a, S. Osowski b, T. Markiewicz b ,
measured using a semiconductor gas sensor array set inside “Classification of milk by means of an electronic nose
a measuring test chamber. The outcomes of numerical and SVM neural network”. Received 30 June 2003,
studies identifying different milk production methods and Revised 13 October 2003, Accepted 21 October 2003,
fat contents have demonstrated the excellent efficacy of the Available online 30 December 2003.
suggested approach. Even within the family of milky goods
8. L. W. Moharkar and S. Patnaik, "Detection and 19. Alenezi, F.; Armghan, A.; Mohanty, S.N.; Jhaveri,
Quantification of Milk Adulteration by Laser Induced R.H.; Tiwari, P. Block-Greedy and CNN Based
Instumentation," 2019 IEEE 5th International Underwater Image Dehazing for Novel Depth
Conference for Convergence in Technology (I2CT), Estimation and Optimal Ambient Light. Water 2021,
Bombay, India, 2019, pp. 1-5, doi: 13, 3470. https://doi.org/10.3390/w13233470
10.1109/I2CT45611.2019.9033883.
9. A. K. S, H. M. L, S. V. G. V., U. M.S, L. Kannagi and
P. S. Bharathi, "A Novel and Effective Ensemble 20. G. P. Rout and S. N. Mohanty, "A Hybrid Approach for
Machine Learning Model for Identifying Healthy and Network Intrusion Detection," 2015 Fifth International
Conference on Communication Systems and Network
Rotten Fruits," 2023 International Conference on
Technologies, Gwalior, India, 2015, pp. 614-617, doi:
Artificial Intelligence and Knowledge Discovery in 10.1109/CSNT.2015.76.
Concurrent Engineering (ICECONF), Chennai, India,
2023, pp. 1-7, doi:
10.1109/ICECONF57129.2023.10083721.