Internship Report
Internship Report
BACHELOR OF TECHNOLOGY
In
by
GULIPILLI SRAVANI
21ME1A5419
Assistant Professor
Approved by AICTE, Permanently Affiliated to JNTUK, Recognized by UGC 2(f) & 12(B)
Approved by AICTE, Permanently Affiliated to JNTUK, Recognized by UGC 2(f) & 12(B)
CERTIFICATE
External Examiner
DECLARATION
G. Sravani 21ME1A5419
ACKNOWLEDGEMENT
I would like to take the opportunity to express our deep gratitude to all the people who have extended
their cooperation in various ways during my internship. It is my pleasure and responsibility to
acknowledge the help of all those individuals.
I am very grateful to Mr. K. Naresh Babu, Assistant Professor of the Department Artificial
Intelligence & Data Science for his guidance and encouragement in all respects in carrying
throughout my internship.
I am very grateful to Dr. K. Venkatesh, Professor and Head of the Department Artificial
Intelligence and Data Science for his guidance and encouragement in all respects in carrying
throughout my internship.
I would like to express my sincere gratitude to Dr. M. Muralidhara Rao, Principal, Ramachandra
College of Engineering, Eluru for his valuable suggestions during preparation of draft in our
document.
I sincerely thank all the faculty members and staff of the Department of AI & DS for their valuable
advices, suggestions and constant encouragement which played a vital role in carrying out my
internship.
Finally, I thank one and all who directly or indirectly helped me to complete my internship
successfully.
G. Sravani
21ME1A5419
ABSTRACT
This study employs a predictive modelling technique to assess the quality of red wine based on
physicochemical properties. A comprehensive dataset is gathered, containing various attributes such
as acidity levels, residual sugar, pH, and alcohol content, which serves as the foundation for training
our predictive model. The data is then pre-processed by handling missing values, normalizing
features, and performing feature engineering to extract more meaningful information. Feature
selection techniques are also applied to identify the most relevant attributes for predicting wine
quality. Once the data is prepared, we use a machine learning algorithm, such as Logistic Regression,
to predict wine quality by training the dataset with it.
The evaluation of the model involves metrics such as accuracy, precision, recall, and F1-score,
assessing its performance across different quality categories. We also use a confusion matrix to
visualize the model's performance by displaying true positives, true negatives, false positives, and
false negatives. This matrix helps evaluate the model’s effectiveness in terms of accuracy, precision,
recall, and F1-score. Ultimately, the model can be deployed in the wine industry to enable producers
to quickly and accurately assess the quality of their red wines based on objective chemical
characteristics. This predictive tool can inform decisions on production processes, blending strategies,
and quality control measures, leading to improved consistency and customer satisfaction.
INDEX
1 Internship Certificate 1
1
Learning Objectives/ Internship Objectives
Internships are generally thought of to be reserved for college students looking to gain
experience in a particular field. However, a wide array of people can benefit from Training
Internships in order to receive real world experience and develop their skills.
An objective for this position should emphasize the skills you already possess in the area and
your interest in learning more.
Some internship is used to allow individuals to perform scientific research while others are
specifically designed to allow people to gain first-hand experience working.
Utilizing internships is a great way to build your resume and develop skills that can be
emphasized in your resume for future jobs. When you are applying for a Training Internship,
make sure to highlight any special skills or talents that can make you stand apart from the rest
of the applicants so that you have an improved chance of landing the position.
2
WEEKLY OVERVIEW OF INTERNSHIP ACTIVITIES
Python Basis
II
Python Packages
Data Visualization
3
Week Name of topic/ Module Completed
Introduction to ANN
ANN Architecture
VI
Working of ANN
Types of ANN
4
Week Name of Topic/ Module Completed
5
Introduction to Artificial Intelligence:
Artificial intelligence is the science of making machines that can think like humans, The goal
for AI is to be able to do things such as recognize patterns, make decisions, and judge like
humans.
Uses of AI:
1. Digital Assistants.
2. Music and media streaming services.
3. Security.
4. Social Media.
Types of AI:
There are a lot of ongoing AI discoveries and developments, most of which are divided into different
types. AI is broadly in divided in to two categories.
1. Based on capabilities:
a. Reactive Machines: AI capable of responding to external stimuli in real time; unable to build
memory or store information for future.
b. Limited memory machines: AI that can store knowledge and use it to learn and train for
future tasks.
c. Theory of Mind: AI that can sense and respond to human emotions, plus perform the tasks
of limited memory machines.
d. Self-awareness: AI that can recognize others’ emotions, plus has sense of self and human-
level intelligence; the final stage of AI.
Introduction to Machine Learning:
Machine learning (ML) is a branch of artificial intelligence (AI) that enables computers to “self-learn”
from training data and improve over time, without being explicitly programmed. Machine learning
algorithms are able to detect patterns in data and learn from them, in order to make their own
predictions.
Uses of ML:
1. E-commerce Product recommendations: It detect items that match the customer's needs
based on the customer's previous actions and make them appear in an interesting way.
2. Self-Driving Cars: Machine learning allow a car to collect data on its surroundings from
cameras and other sensors, interpret it, and decide what actions to take.
3. Email automation and spam filtering: It identify and classify incoming spam emails based
on their resemblance to stored training examples of spam emails.
4. Chatbots: Chatbots are used a lot in customer interaction, marketing on social network sites,
and instant messaging the client.
7
Types of ML:
Based on the methods and way of learning, machine learning is divided into
mainly four types, which are:
1. Supervised machine learning: Supervised learning is a category of machine learning
that uses labelled datasets to train algorithms to predict outcomes and recognize
patterns.
2. Unsupervised machine learning: It is a type of learning where models are given un-
labelled data and allowed to discover patterns and insights without any explicit
guidance or instruction.
3. Semi-Supervised learning: Semi-supervised learning is a branch of machine learning
that combines supervised and unsupervised learning by using both labelled and
unlabelled data and it is a broad category of machine learning that uses labelled data
to ground predictions, and unlabelled data to learn the shape of the larger data
distribution.
4. Reinforcement machine learning: Reinforcement learning (RL) is a machine
learning (ML) technique that trains software to make decisions to achieve the most
optimal results. It mimics the trial-and-error learning process that humans use to
achieve their goals.
Introduction to deep Learning:
Deep learning is a branch of machine learning which is based on artificial neural networks. It is
capable of learning complex patterns and relationships within data. In deep learning, we don’t need
to explicitly program everything. It has become increasingly popular in recent years due to the
advances in processing power and the availability of large datasets. Because it is based on artificial
neural networks (ANNs) also known as deep neural networks (DNNs). These neural networks are
inspired by the structure and function of the human brain’s biological neurons, and they are
designed to learn from large amounts of data.
8
Python Basics, Python Packages, Data Visualization, Data, Matplotlib & Seaborn,
Data Wrangling techniques.
Python Basics:
Data science is an interconnected field that involves the use of statistical and computational methods
to extract insightful information and knowledge from data. Python is a popular and versatile
programming language, now has become a popular choice among data scientists for its ease of use,
extensive libraries, and flexibility. Python provide and efficient and streamlined approach to handing
complex data structure and extracts insights.
Python Packages:
Python is a computer programming language often used to build websites and software, automate
tasks, and analyze data. Python library is a collection of related modules. It contains bundles of code
that can be used repeatedly in different programs. It makes Python Programming simpler and
convenient for the programmer.
1. Pandas: Pandas is a Python library used for working with data sets. It has functions for
analysing, cleaning, exploring, and manipulating data.
2. Matplotlib: Matplotlib is a cross-platform, data visualization and graphical plotting library
(histograms, scatter plots, bar charts, etc) for Python and its numerical extension NumPy.
As such, it offers a viable open-source alternative to MATLAB.
3. NumPy: NumPy is a very popular python library for large multi-dimensional array and
matrix processing, with the help of a large collection of high-level mathematical functions.
4. TensorFlow: TensorFlow is a very popular open-source library for high performance
numerical computation developed by the Google Brain team in Google
9
Data Visualization:
Data visualization is the practice of translating information into a visual context, such as a map or
graph, to make data easier for the human brain to understand and pull insights from. The main goal of
data visualization is to make it easier to identify patterns, trends
and outliers in large data sets.
Techniques:
It involves using various visual elements such as charts, graphs, maps, and infographics to
communicate insights effectively.
1. Bar Charts: A bar chart or bar graph is a chart that represents categorical data with
rectangular bars with heights proportional to the values that they represent.
2. Line Charts: A line chart is generally constructed by marking points that correspond to each
data value ordered according to a criterion and then joining these by lines
3. Pie Charts: It displays the pictorial representation of data that makes it possible to visualize
the relationships between the parts and the whole of a variable.
4. Scatter Plots: Scatter plots are the graphs that present the relationship between two variables
in a data-set.
5. Heatmaps: Represent data values in a matrix format using colors to indicate the magnitude
of the values.
Matplotlib:
Seaborn:
10
Data Wrangling Steps:
Each data project needs a different strategy to guarantee that the final dataset is trustworthy and
available. These are frequently referred to as necessary data wrangling stages or activities.
Step 1: Discovery of Data
Steps in Pre-processing: There are six steps of data pre-processing in machine learning
Handling Missing Data: Missing values are data points that are absent for a specific variable in a
dataset. They can be represented in various ways, such as blank cells, null values, or special symbols
like “NA” or “unknown.” These missing data points pose a significant challenge in data analysis and
can lead to inaccurate or biased results.
Handling Outliers Data: Outliers can be caused by measurement uncertainty or due to experimental
error. Outliers in data can be observed using a number of techniques. To find outliers, we can simply
plot the box plot. Outliers are points that are outside of the minimum and maximum values. Outlier
11
detection and handling are crucial aspects of building reliable and robust machine learning models.
Understanding the impact of outliers, choosing the appropriate technique for your specific data and
task, and leveraging domain knowledge and data visualization can ensure that your models perform
well on unseen data and provide accurate and trustworthy predictions.
Train & Test Set: Train and test datasets are the two key concepts of machine learning, where the
training dataset is used to fit the model, and the test dataset is used to evaluate the model. we can easily
evaluate the performance of our model. Such as, if it performs well with the training data, but does not
perform well with the test dataset, then it is estimated that the model may be overfitted. For splitting
the dataset, we can use the train_test_split function of scikit-learn.
Feature selection: Feature selection is one of the important concepts of machine learning, which
highly impacts the performance of the model. As machine learning works on the concept of
"Garbage In Garbage Out", so we always need to input the most appropriate and relevant dataset to the
model in order to get a better result.
It helps in avoiding the curse of dimensionality.
It helps in the simplification of the model so that it can be easily interpreted by the researchers.
It reduces the training time.
It reduces overfitting hence enhance the generalization.
12
Introduction to Regression, Types of Regression, Logistic Regression
and its types.
1. Simple Linear Regression: It is a type of Regression algorithms that models the relationship
between a dependent variable and a single independent variable. The relationship shown by a
Simple Linear Regression model is linear or a sloped straight line, hence it is called Simple
Linear Regression
13
3. Logistic Regression: Logistic regression is used for binary classification where we use
sigmoid function, that takes input as independent variables and produces a probability value
between 0 and 1. Logistic regression predicts the output of categorical dependent variable.
Therefore, the outcome must be a categorical or discrete value.
On the basis of the categories, Logistic Regression can be classified into three types:
1. Binomial: In binomial Logistic regression, there can be only two possible types of the
dependent variables, such as 0 or 1, Pass or Fail, etc.
2. Multinomial: In multinomial Logistic regression, there can be 3 or more possible unordered
types of the dependent variable, such as “cat”, “dogs”, or “sheep”
3. Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of
dependent variables, such as “low”, “Medium”, or “High”.
14
Decision Trees, Random Forest & KNN algorithm.
Decision Trees: In machine learning, a decision tree is a predictive model that uses a Fsplits the
data into 10 subsets based on features, with each split maximizing the homogeneity of the target
variable within the subsets. Decision trees are used for classification and regression tasks, offering
interpretability and versatility in handling various types of data
Root Node
Decision Nodes
Leaf Nodes
Sub-Tree
Pruning
Sub-Tree
Parent and Child Node
15
4. Repeating the Process
1. Easy Navigation
2. Step-by-step approach
3. Time-saving
Random forest: It is also called a lazy learner algorithm because it does not learn from the training
set immediately instead it stores the dataset and at the time of classification, it performs an action on
the dataset
Step-2: Build the decision trees associated with the selected data points (Subsets).
Step-3: Choose the number N for decision trees that you want to build.
16
Step-5: For new data points, find the predictions of each decision tree, and assign the new data points
to the category that wins the majority votes
K-Nearest Neighbor (KNN) Algorithm: KNN algorithm stores all the available data and
classifies a new data point based on the similarity. This means when new data appears then it can be
easily classified into a well suite category by using KNN algorithm. It is also called a lazy learner
algorithm because it does not learn from the training set immediately instead it stores the dataset and
at the time of classification, it performs an action on the dataset.
The KNN working can be explained on the basis of the below algorithm:
Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
Step-4: Among these k neighbors, count the number of the data points in each category.
Step-5: Assign the new data points to that category for which the number of the neighbor is
maximum.
Step-6: Our model is ready
Bayes Theorem: Bayes theorem is one of the most popular machine learning concepts that
helps to calculate the probability of occurring one event with uncertain knowledge while other one
has already occurred.
Bayes' theorem can be derived using product rule and conditional probability of event X with known
event Y.
The theorem can be mathematically expressed as:
Where,
17
Classification: Classification is a supervised machine learning method where the model tries to
predict the correct label of a given input data.
The Classification algorithm is a Supervised Learning technique that is used to identify the category
of new observations on the basis of training data. In Classification, a program learns from the given
dataset or observations and then classifies new
observation into a number of classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat
or dog, etc. Classes can be called as targets/labels or categories.
Unlike regression, the output variable of Classification is a category, not a value, such as "Green or
Blue", "fruit or animal", etc. Since the Classification algorithm is a Supervised learning technique,
hence it takes labeled input data, which means it contains input with the corresponding output.
Types of Classification: Classification Algorithms can be further divided into the Mainly
two category:
Linear Models
a) Logistic Regression
b) Support Vector Machines
a) K-Nearest Neighbours
b) Kernel SVM
c) Decision Tree Classification
18
Ensemble Samplings & Techniques, Support Vector Machine, Hyper
Parameter Tuning.
Ensembling Sampling: Ensemble learning is an approach in which two or more models are
fitted to the same data, and the predictions of each model are combined. Ensemble learning aims to
achieve better performance with the ensemble of models than with any individual model.
There are several methods of ensembling techniques in machine learning, including:
1. Voting: Different models make predictions, and the final prediction is determined by a
majority vote or averaging their predictions.
2. Bagging (Bootstrap Aggregating): Multiple models (often of the same type) are trained on
different subsets of the training data, typically with replacement. The final prediction is an
average or voting of these models.
3. Boosting: Models are trained sequentially, where each subsequent model corrects the errors
of the previous one. Examples include AdaBoost, Gradient Boosting Machines (GBM), and
XGBoost.
4. Stacking: Predictions from multiple models are used as input features to a meta- model,
which then produces the final prediction. It combines the strengths of different models.
5. Random Forest: An ensemble method based on decision trees, where multiple decision trees
are trained on different subsets of the data, and the final prediction is determined by averaging
or voting. These methods help improve the overall performance and robustness of machine
learning models by leveraging the diversity of multiple models
Support Vector Machine: Support Vector Machine, abbreviated as SVM can be used for both
regression and classification tasks, but generally, they work best in classification problems. The main
objective of the SVM algorithm is to find the optimal hyperplane in an N-dimensional space that can
separate the data points in different classes in the feature space. The hyperplane tries that the margin
between the closest points of different classes should be as maximum as possible.
19
Types of Support Vector machine:
Support Vector Machines (SVM) can be divided into two main parts:
1. Linear SVM: Typically used for linear regression and classification problems. Linear SVM
is used for linearly separable data, which means if a dataset can be classified into two
classes by using a single straight line, then such data is termed as linearly separable data,
and classifier is used called as Linear SVM classifier
2. Non-Linear SVM: Non-Linear SVM is used for non-linearly separated data, which means
if a dataset cannot be classified by using a straight line, then such data is termed as non-
linear data and classifier used is called as Non- linear SVM classifier.
20
Hyperparameter Tuning: Hyperparameters directly control model structure, function, and
performance. Hyperparameter tuning allows data scientists to tweak model performance for optimal
results. This process is an essential part of machine learning, and choosing appropriate hyperparameter
values is crucial for success. The goal of hyperparameter tuning is to find the values that lead to the
best performance on a given task.
Types of Hyperparameter tuning:
1. Number of hidden layers: It’s a trade-off between keeping our neural network as simple as
possible (fast and generalized) and classifying our input data correctly.
2. Number of nodes/neurons per layer: More isn't always better when determining how many
neurons to use per layer. Increasing neuron count can help, up to a point.
3. Learning rate: Model parameters are adjusted iteratively and the learning rate controls the
size of the adjustment at each step.
4. Momentum: Momentum helps us avoid falling into local minima by resisting rapid changes
to parameter values
Steps:
21
Deploying ML Models on Flask:
22
Introduction to ANN, ANN Architecture, Activation function,
Working of ANN, Types of ANN
Introduction to ANN: An Artificial Neural Network (ANN) was developed by Jeffrey Hilton.
ANN is similar to human brain’s neural structure. It consists of interconnected nodes (neurons)
organized into layers. Information flows through these nodes, and the network adjusts the connection
strengths (weights) during training to learn from data, enabling it to recognize patterns, make
predictions, and solve various tasks in machine learning and artificial intelligence.
Artificial Neural Networks Architecture: The term "Artificial Neural Network" is derived from
Biological neural networks that develop the structure of a human brain. Similar to the human brain
that has neurons interconnected to one another, artificial neural networks also have neurons that
are interconnected to one another in various layers of the networks. These neurons are known as
nodes.
ANN contains mainly three layers:
1. Input Layer: - Accepts all inputs that are given by the user.
2. Hidden Layer: - Performs all the calculations to find hidden features and patters.
3. Output Layer: - Predicts the Output based on input and weighted calculation at hidden neurons
ANN Activation function: This summed function is applied over an Activation function. The output
from this neuron is multiplied with the weight W3 and supplied as input to the output layer.
(Y=W1X1+W2X2+b)
Types of ANN:
1) Feed Forward Neural Network: Feedforward neural network is a linear network, which consists
of a single layer of output nodes, the inputs are fed directly to the outputs via a series of weights. The
23
sum of the products of the weights and the inputs is calculated in each node
2) Feed Backward propagation: Backward propagation is an algorithm that is designed to test for
errors working back from output nodes to input nodes. If there are any errors then by using this
algorithm, we can prevent the rate of errors.
24
Introduction to CNN, CNN Architecture, Introduction to Transfer
Learning, Types of Transfer Learning, Use Cases of Machine
Learning.
Introduction to CNN:
Convolutional neural network (CNN or ConvNet) is a network architecture for deep learning that
learns directly from data. CNNs are particularly useful for finding patterns in images to recognize
objects, classes, and categories. A CNN is a powerful tool but requires millions of labelled data
points for training.
Applications of CNN:
1. Image classification: CNNs are the state-of-the-art models for image classification. They can
be used to classify images into different categories, such as cats and dogs.
2. Object detection: CNN can detect and locate objects in images or videos like people, cars, and
buildings. They can also be used to localize objects in images, which means that they can
identify the location of an object in an image.
3. Image segmentation: CNNs can be used to segment images, which means that they can identify
and label different objects in an image. This is useful for applications such as medical imaging.
CNN Architecture:
The CNN’s job is to compress the images into a format that is easier to process while preserving
elements that are important for obtaining a decent prediction. A convolutional neural network,
ConvNets in short has three layers which are its building blocks, let’s have a look
1. Convolutional Layer
2. Pooling Layer
3. Fully Connected Laye
25
Types of layers:
Input Layers: It’s the layer in which we give input to our model. In CNN, Generally, the input will be
an image or a sequence of images. This layer holds the raw input of the image with width 32,
height 32, and depth 3.
Convolutional Layers: This is the layer, which is used to extract the feature from the input dataset.
It applies a set of learnable filters known as the kernels to the input images. The filters/kernels are
smaller matrices usually 2×2, 3×3, or 5×5 shape. it slides over the input image data and computes the
dot product between kernel weight and the corresponding input image patch. The output of this layer
is referred as feature maps.
Pooling layer: This layer is periodically inserted in the covnets and its main function is to reduce the
size of volume which makes the computation fast reduces memory and also prevents overfitting. Two
common types of pooling layers are max pooling and average pooling.
Fully Connected Layer: Fully Connected Layer is also known as Dense Layer. The fully connected
layer plays a critical role in the final stages of a CNN, where it is responsible for classifying images
based on the features extracted in the previous layers. The term fully connected means that each
neuron in one layer is connected to each neuron in the subsequent layer.
Activation Function: The activation function is typically applied to the output of each neuron in the
network. It takes in the weighted sum of the inputs and produces an output that is then passed on to the
next layer.
26
Introduction to Transfer Learning:
Transfer learning (TL) is a technique in machine learning (ML) in which knowledge learned from a
task is re-used in order to boost performance on a related task. For example, for image classification,
knowledge gained while learning to recognize cars could be applied when trying to recognize trucks.
There are two ways to make use of knowledge from the pre-trained model. The first is to freeze some
layers from the pre- trained model and then train layers using our new dataset. The second way is to
create a new model but also remove some features from the layer in the pre-trained model.
Types of transfer learning:
Deep transfer learning is a technique that utilizes pre-trained deep neural networks as the starting point
for training on a new task. There are several different types of algorithms used in deep transfer
learning, including:
1. Fine-tuning: This involves taking a pre-trained network and training it further on a new task
by adjusting the weights of the final layers.
2. Multi-task Learning: Training a single model to perform multiple tasks simultaneously,
where the knowledge learned from one task can benefit the performance on other related tasks.
3. Feature extraction: This method uses the features learned by a pre-trained network as input
to a new classifier, which is trained from scratch
Use Cases of Machine Learning:
Image Recognition: Identifying objects, people, places, and activities in images and videos, used in
applications like facial recognition, medical imaging, and autonomous vehicles.
Recommendation Systems: Suggesting products, services, or content based on user preferences and
behavior, seen in e-commerce platforms, streaming services, and social media platforms.
Speech Recognition: Converting spoken language into textor commands, utilized in virtual assistants,
voice-controlled devices, and speech-to-text applications.
Healthcare: Diagnosing diseases, predicting patient outcomes, and personalizing treatment plans
based on medical data and patient history.
27
Executive Summary
Objective
This model aims to provide a reliable and objective assessment of wine quality, thereby aiding winemakers,
distributors, and consumers in making informed decisions. The specific goals of the project are as follows:
Gather a comprehensive dataset of red wine samples, including various chemical properties
and quality ratings.
Clean and pre-process the data to handle missing values, normalize features, and prepare it for
analysis.
Feature Selection:
Identify the most significant chemical properties that influence wine quality.
Use statistical and machine learning techniques to select the most relevant features for the
predictive model.
Develop and train various machine learning models (e.g., Linear Regression, Decision Trees,
Random Forest) to predict wine quality based on the selected features.
Evaluate the performance of these models using appropriate metrics such as accuracy,
precision, recall, and F1-score.
Validate the model on a separate test dataset to ensure its generalizability and robustness.
Deploy the model for use by winemakers, distributors, and consumers to assess the quality of
red wine samples efficiently and objectively.
28
INTERNSHIP PART
During the Red Wine Quality Detection internship at Blackbucks, we engage in a variety of activities and
responsibilities aimed at providing practical experience in data pre-processing, feature engineering, and
machine learning model development using Python.
Performed exploratory data analysis (EDA) to understand the distribution of features and their
relationship with wine quality.
Feature Engineering:
Experimented with various machine learning algorithms, including Decision Trees, Random Forest,
Support Vector Machines, and Neural Networks.
Model Evaluation:
Evaluated model performance using metrics such as accuracy, precision, recall, F1-score, and
confusion matrix.
Hyperparameter Tuning:
Optimized model performance by tuning hyperparameters using techniques like grid search and
random search.
Model Deployment:
Developed a user-friendly web application or API to deploy the model for real-world use cases
29
Equipment Used
Software:
Python: The primary programming language for data analysis, machine learning, and model
development.
Jupyter Notebook: An interactive environment for data exploration, visualization, and code
execution.
Google Colab: A cloud-based platform for running Python code, including machine
learning experiments.
Hardware:
Personal Computer or Laptop: A reliable system with sufficient processing power and
memory to run the required software.
Cloud Computing Resources: Google Colab or other cloud-based platforms for accessing
powerful computing resources.
Tasks Performed
Technical Skills:
o Proficient in using Python libraries like Pandas, NumPy, Matplotlib, and Seaborn for
data exploration, cleaning, and visualization.
Machine Learning:
Model Deployment:
Problem-Solving:
o Ability to identify and address challenges in data analysis, model development, and
deployment.
Critical Thinking:
Soft Skills:
Time Management: Efficiently managed time and prioritized tasks to meet deadlines.
31
Red Wine Quality Detection using Machine Learning
Libraries:
To implement the project, we need to import necessary packages such as data visualization package
known as matplotlib.pyplot, data exploration packages called NumPy, pandas, machine learning
model package called Logistic Regression, to split the dataset into training and testing we use
train_test_split function.
Dataset:
To load dataset, we need to use load_wine() class which has attributes like data, target. The data
attribute has 13 features and target attribute is a label class has three classes low, medium, and high.
To represent data attribute, we store data in X variable
To represent target attribute, we store target in Y variable.
After loading data and target into X and Y, we need to create dataset from it by using pandas
DataFrame.
32
After performing above activities, the dataset is like this -
I use Logistic Regression model to accurately predict the quality of wine, because Logistic Regression
used to classify the distinct values such as low, medium, high. Logistic Regression uses Sigmoid
function which is a probabilistic function that plot the samples into a range of 0<=X<=+Z meaning
this function predict two or more distinct values.
33
Training:
Training the Logistic Regression which is supervised machine learning model that takes two
parameters i.e., features and its labels.
Testing:
Testing means predicting the labels for testing dataset samples which are known as predicted output.
This predicted output will be evaluated with actual output which are in y_test variable., we use f1-
score to get accuracy.
Visualizing:
To evaluate we use graphical representation which is simply understandable by visualizing. The
predicted output is represented with blue colour and actual output represented with yellow, if yellow
overlap with blue means predicted accurately else not.
34
Model Evaluation:
To evaluate model, we consider following properties i.e., precision, recall, f1-score and support.
These properties evaluate model performance which will be helpful to improve model.
Accuracy:
Accuracy says how the work of the model? Is model can make better prediction? The accuracy gives
answers to these all questions.
35