0% found this document useful (0 votes)
37 views41 pages

Internship Report

This internship report by Gulipilli Sravani details the completion of an internship in Artificial Intelligence and Data Science at IIDT Purple Technologies. The report includes an abstract on predictive modeling for assessing red wine quality, outlines the internship objectives, and provides a weekly overview of activities conducted during the internship. It also acknowledges the guidance and support received from faculty and highlights the skills and knowledge gained in AI, machine learning, and data science.

Uploaded by

rajithagolla1453
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views41 pages

Internship Report

This internship report by Gulipilli Sravani details the completion of an internship in Artificial Intelligence and Data Science at IIDT Purple Technologies. The report includes an abstract on predictive modeling for assessing red wine quality, outlines the internship objectives, and provides a weekly overview of activities conducted during the internship. It also acknowledges the guidance and support received from faculty and highlights the skills and knowledge gained in AI, machine learning, and data science.

Uploaded by

rajithagolla1453
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

INTERNSHIP REPORT

An internship report submitted in partial fulfilment of the requirements of IV - II Semester of

BACHELOR OF TECHNOLOGY

In

ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

by

GULIPILLI SRAVANI

21ME1A5419

Under the supervision of

Mr. K. Naresh Babu

Assistant Professor

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

RAMACHANDRA COLLEGE OF ENGINEERING(AUTONOMOUS)

Approved by AICTE, Permanently Affiliated to JNTUK, Recognized by UGC 2(f) & 12(B)

Accredited by NACC A+, Accredited by NBA, ISO9001:2015 Certified

NH-16, BYPASS ROAD, VATLURU(V), ELURU- 534007, E. Dt, A.P


Ramachandra College of Engineering (Autonomous)

Approved by AICTE, Permanently Affiliated to JNTUK, Recognized by UGC 2(f) & 12(B)

Accredited by NACC A+, Accredited by NBA, ISO9001:2015 Certified

NH-16, BYPASS ROAD, VATLURU(V), ELURU- 534007, E. Dt, A.P

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

CERTIFICATE

This is to certify that the “Internship Report” submitted by GULIPILLI SRAVANI


(Regd. No.:21ME1A5419) is work done by him and submitted during 2023- 2024 academic year in
partial fulfilment of the requirements for IV – II Semester of Bachelor of Technology in Artificial
Intelligence and Data Science, internship on AI-ML-DS at IIDT Purple Technologies in association
with the Andhra Pradesh State Council of Higher Education (APSCHE).

Mr. K. Naresh Babu Dr. K. Venkatesh

Assistant Professor Professor

Dept. of AI&DS Head of Dept. AI&DS

External Examiner
DECLARATION

I hereby declare that the internship on “Artificial Intelligence” submitted by me to Jawaharlal


Nehru Technological University Kakinada in partial fulfilment of the requirements of IV-II semester
of Bachelor of Technology in Artificial Intelligence and Data Science. This internship work carried
out by me under the supervision of Dr. K. Venkatesh, Professor and Head of the Department AI&DS.

G. Sravani 21ME1A5419
ACKNOWLEDGEMENT

I would like to take the opportunity to express our deep gratitude to all the people who have extended
their cooperation in various ways during my internship. It is my pleasure and responsibility to
acknowledge the help of all those individuals.

I am very grateful to Mr. K. Naresh Babu, Assistant Professor of the Department Artificial
Intelligence & Data Science for his guidance and encouragement in all respects in carrying
throughout my internship.

I am very grateful to Dr. K. Venkatesh, Professor and Head of the Department Artificial
Intelligence and Data Science for his guidance and encouragement in all respects in carrying
throughout my internship.

I would like to express my sincere gratitude to Dr. M. Muralidhara Rao, Principal, Ramachandra
College of Engineering, Eluru for his valuable suggestions during preparation of draft in our
document.

I express my heartful gratitude to the Management of Ramachandra College of Engineering,


Eluru for their support and encouragement in completing my internship and providing me necessary
facilities.

I sincerely thank all the faculty members and staff of the Department of AI & DS for their valuable
advices, suggestions and constant encouragement which played a vital role in carrying out my
internship.

Finally, I thank one and all who directly or indirectly helped me to complete my internship
successfully.

G. Sravani

21ME1A5419
ABSTRACT

This study employs a predictive modelling technique to assess the quality of red wine based on
physicochemical properties. A comprehensive dataset is gathered, containing various attributes such
as acidity levels, residual sugar, pH, and alcohol content, which serves as the foundation for training
our predictive model. The data is then pre-processed by handling missing values, normalizing
features, and performing feature engineering to extract more meaningful information. Feature
selection techniques are also applied to identify the most relevant attributes for predicting wine
quality. Once the data is prepared, we use a machine learning algorithm, such as Logistic Regression,
to predict wine quality by training the dataset with it.

The evaluation of the model involves metrics such as accuracy, precision, recall, and F1-score,
assessing its performance across different quality categories. We also use a confusion matrix to
visualize the model's performance by displaying true positives, true negatives, false positives, and
false negatives. This matrix helps evaluate the model’s effectiveness in terms of accuracy, precision,
recall, and F1-score. Ultimately, the model can be deployed in the wine industry to enable producers
to quickly and accurately assess the quality of their red wines based on objective chemical
characteristics. This predictive tool can inform decisions on production processes, blending strategies,
and quality control measures, leading to improved consistency and customer satisfaction.
INDEX

S. No Contents Page No.

1 Internship Certificate 1

2 Introduction to AIML & DS 6

3 Fundamentals of AIML & DS 9

Introduction to Data Visualization & Libraries, Pre-


4 10
Processing the Data

5 Introduction to Machine Learning Algorithms 13

6 Ensemble Learning and Its Techniques 19

7 ANN Architecture and ANN Types 23

8 CNN Architecture and CNN Types 25

Project Explanation about Abstract, Algorithm &


9 33
Dataset, Source Code & Project Documentation
INTERNSHIP CERTIFICATE

1
Learning Objectives/ Internship Objectives

 Internships are generally thought of to be reserved for college students looking to gain
experience in a particular field. However, a wide array of people can benefit from Training
Internships in order to receive real world experience and develop their skills.

 An objective for this position should emphasize the skills you already possess in the area and
your interest in learning more.

 Internships are utilized in a number of different career fields, including architecture,


engineering, healthcare, economics, advertising and many more.

 Some internship is used to allow individuals to perform scientific research while others are
specifically designed to allow people to gain first-hand experience working.

 Utilizing internships is a great way to build your resume and develop skills that can be
emphasized in your resume for future jobs. When you are applying for a Training Internship,
make sure to highlight any special skills or talents that can make you stand apart from the rest
of the applicants so that you have an improved chance of landing the position.

2
WEEKLY OVERVIEW OF INTERNSHIP ACTIVITIES

Week Name of Topic/ Module Completed

Introduction to Artificial Intelligence & Types of AI

I Introduction to Machine Learning & Types of ML

Introduction to Deep Learning

Week Name of topic/ Module Completed

Fundamentals of AIML & DS

Python Basis
II

Python Packages

Data Wrangling Techniques

Week Name of topic/ Module Completed

Data Visualization

Matplotlib and seaborn


III

Data preparation for Real-time application

Pre- processing the Data

3
Week Name of topic/ Module Completed

Introduction to Regression, Types of Regression

K-Nearest Neighbour, Random Forest, Decision Tree Algorithm


IV

Introduction to Classification, Types of Classification

Naïve Bayes Algorithm

Week Name of topic/ Module Completed

Ensemble Sampling & Techniques

Support Vector Machine


V

Types of Support Vector Machine

Hyper Parameter Tuning

Week Name of topic/ Module Completed

Introduction to ANN

ANN Architecture
VI

Working of ANN

Types of ANN

4
Week Name of Topic/ Module Completed

Introduction to CNN, CNN Architecture

VII Introduction to Transfer Learning, Types of Transfer Learning

Use Cases of Machine Learning

Week Name of Topic/ Module Completed

Project Explanation about Abstract, Algorithm & Dataset

VIII Project Explanation about Source Code and Project Documentation

Real Time Project

5
Introduction to Artificial Intelligence:

Artificial intelligence is the science of making machines that can think like humans, The goal
for AI is to be able to do things such as recognize patterns, make decisions, and judge like
humans.
Uses of AI:

AI has been used in various applications in our daily life:

1. Digital Assistants.
2. Music and media streaming services.
3. Security.
4. Social Media.

Types of AI:

There are a lot of ongoing AI discoveries and developments, most of which are divided into different
types. AI is broadly in divided in to two categories.
1. Based on capabilities:

a. Weak AI or Narrow AI: AI designed to complete very specific actions; unable to


independently learn.
b. General AI: AI designed to learn, think and perform at similar levels to humans.
c. Super AI: AI able to surpass the knowledge and capabilities of humans.
6
2. Based on functionality:

a. Reactive Machines: AI capable of responding to external stimuli in real time; unable to build
memory or store information for future.
b. Limited memory machines: AI that can store knowledge and use it to learn and train for
future tasks.
c. Theory of Mind: AI that can sense and respond to human emotions, plus perform the tasks
of limited memory machines.
d. Self-awareness: AI that can recognize others’ emotions, plus has sense of self and human-
level intelligence; the final stage of AI.
Introduction to Machine Learning:

Machine learning (ML) is a branch of artificial intelligence (AI) that enables computers to “self-learn”
from training data and improve over time, without being explicitly programmed. Machine learning
algorithms are able to detect patterns in data and learn from them, in order to make their own
predictions.
Uses of ML:

ML is widely used in following applications:

1. E-commerce Product recommendations: It detect items that match the customer's needs
based on the customer's previous actions and make them appear in an interesting way.
2. Self-Driving Cars: Machine learning allow a car to collect data on its surroundings from
cameras and other sensors, interpret it, and decide what actions to take.
3. Email automation and spam filtering: It identify and classify incoming spam emails based
on their resemblance to stored training examples of spam emails.
4. Chatbots: Chatbots are used a lot in customer interaction, marketing on social network sites,
and instant messaging the client.

7
Types of ML:

Based on the methods and way of learning, machine learning is divided into
mainly four types, which are:
1. Supervised machine learning: Supervised learning is a category of machine learning
that uses labelled datasets to train algorithms to predict outcomes and recognize
patterns.
2. Unsupervised machine learning: It is a type of learning where models are given un-
labelled data and allowed to discover patterns and insights without any explicit
guidance or instruction.
3. Semi-Supervised learning: Semi-supervised learning is a branch of machine learning
that combines supervised and unsupervised learning by using both labelled and
unlabelled data and it is a broad category of machine learning that uses labelled data
to ground predictions, and unlabelled data to learn the shape of the larger data
distribution.
4. Reinforcement machine learning: Reinforcement learning (RL) is a machine
learning (ML) technique that trains software to make decisions to achieve the most
optimal results. It mimics the trial-and-error learning process that humans use to
achieve their goals.
Introduction to deep Learning:

Deep learning is a branch of machine learning which is based on artificial neural networks. It is
capable of learning complex patterns and relationships within data. In deep learning, we don’t need
to explicitly program everything. It has become increasingly popular in recent years due to the
advances in processing power and the availability of large datasets. Because it is based on artificial
neural networks (ANNs) also known as deep neural networks (DNNs). These neural networks are
inspired by the structure and function of the human brain’s biological neurons, and they are
designed to learn from large amounts of data.

8
Python Basics, Python Packages, Data Visualization, Data, Matplotlib & Seaborn,
Data Wrangling techniques.

Python Basics:

Data science is an interconnected field that involves the use of statistical and computational methods
to extract insightful information and knowledge from data. Python is a popular and versatile
programming language, now has become a popular choice among data scientists for its ease of use,
extensive libraries, and flexibility. Python provide and efficient and streamlined approach to handing
complex data structure and extracts insights.
Python Packages:

Python is a computer programming language often used to build websites and software, automate
tasks, and analyze data. Python library is a collection of related modules. It contains bundles of code
that can be used repeatedly in different programs. It makes Python Programming simpler and
convenient for the programmer.
1. Pandas: Pandas is a Python library used for working with data sets. It has functions for
analysing, cleaning, exploring, and manipulating data.
2. Matplotlib: Matplotlib is a cross-platform, data visualization and graphical plotting library
(histograms, scatter plots, bar charts, etc) for Python and its numerical extension NumPy.
As such, it offers a viable open-source alternative to MATLAB.
3. NumPy: NumPy is a very popular python library for large multi-dimensional array and
matrix processing, with the help of a large collection of high-level mathematical functions.
4. TensorFlow: TensorFlow is a very popular open-source library for high performance
numerical computation developed by the Google Brain team in Google

9
Data Visualization:
Data visualization is the practice of translating information into a visual context, such as a map or
graph, to make data easier for the human brain to understand and pull insights from. The main goal of
data visualization is to make it easier to identify patterns, trends
and outliers in large data sets.

Techniques:

It involves using various visual elements such as charts, graphs, maps, and infographics to
communicate insights effectively.

1. Bar Charts: A bar chart or bar graph is a chart that represents categorical data with
rectangular bars with heights proportional to the values that they represent.

2. Line Charts: A line chart is generally constructed by marking points that correspond to each
data value ordered according to a criterion and then joining these by lines

3. Pie Charts: It displays the pictorial representation of data that makes it possible to visualize
the relationships between the parts and the whole of a variable.

4. Scatter Plots: Scatter plots are the graphs that present the relationship between two variables
in a data-set.

5. Heatmaps: Represent data values in a matrix format using colors to indicate the magnitude
of the values.
Matplotlib:

 It’s fast, efficient as it is based on NumPy and also easier to build.


 Has undergone a lot of improvements from the open-source community since inception and
hence a better library having advanced features as well.
 Well maintained visualization output with high quality graphics draws a lot of users to it
 Basic as well as advanced charts could be very easily built.

Seaborn:

 Built-in themes aid better visualization.


 Statistical functions aiding better data insights.
 Better aesthetics and built-in plots.

10
Data Wrangling Steps:

Each data project needs a different strategy to guarantee that the final dataset is trustworthy and
available. These are frequently referred to as necessary data wrangling stages or activities.
Step 1: Discovery of Data

Step 2: Data structuring

Step 3: Data Cleaning

Step 4: Data Enriching

Step 5: Data validating

Step 6: Data Publishing

Introduction to Preprocessing, Handling Missing data &


Outliers, Train & Test data, Feature Selection.
Introduction to Preprocessing: Data preprocessing is the process of generating raw data for machine
learning models. This is the first step in creating a machine- learning model. This is the most complex and
time-consuming aspect of data science.
Data preprocessing is required in machine learning algorithms to reduce its complexities.

Steps in Pre-processing: There are six steps of data pre-processing in machine learning

1. Import the Libraries.


2. Import the Loaded Data.
3. Check for Missing Values.
4. Arrange the Data.
5. Do Scaling.
6. Distribute Data into Training, Evaluation and Validation Sets.

Handling Missing Data: Missing values are data points that are absent for a specific variable in a
dataset. They can be represented in various ways, such as blank cells, null values, or special symbols
like “NA” or “unknown.” These missing data points pose a significant challenge in data analysis and
can lead to inaccurate or biased results.
Handling Outliers Data: Outliers can be caused by measurement uncertainty or due to experimental
error. Outliers in data can be observed using a number of techniques. To find outliers, we can simply
plot the box plot. Outliers are points that are outside of the minimum and maximum values. Outlier
11
detection and handling are crucial aspects of building reliable and robust machine learning models.
Understanding the impact of outliers, choosing the appropriate technique for your specific data and
task, and leveraging domain knowledge and data visualization can ensure that your models perform
well on unseen data and provide accurate and trustworthy predictions.

Train & Test Set: Train and test datasets are the two key concepts of machine learning, where the
training dataset is used to fit the model, and the test dataset is used to evaluate the model. we can easily
evaluate the performance of our model. Such as, if it performs well with the training data, but does not
perform well with the test dataset, then it is estimated that the model may be overfitted. For splitting
the dataset, we can use the train_test_split function of scikit-learn.
Feature selection: Feature selection is one of the important concepts of machine learning, which
highly impacts the performance of the model. As machine learning works on the concept of
"Garbage In Garbage Out", so we always need to input the most appropriate and relevant dataset to the
model in order to get a better result.
 It helps in avoiding the curse of dimensionality.
 It helps in the simplification of the model so that it can be easily interpreted by the researchers.
 It reduces the training time.
 It reduces overfitting hence enhance the generalization.

12
Introduction to Regression, Types of Regression, Logistic Regression
and its types.

Introduction to Regression: Regression is a supervised machine learning technique which is used


to predict continuous values. The ultimate goal of the regression algorithm is to plot a best-fit line or
a curve between the data.
Types of Regression:

Mainly there are two types of regression. They are:

1. Simple Linear Regression: It is a type of Regression algorithms that models the relationship
between a dependent variable and a single independent variable. The relationship shown by a
Simple Linear Regression model is linear or a sloped straight line, hence it is called Simple
Linear Regression

2. Multi Linear Regression: Multiple Linear Regression is an extension of Simple Linear


regression as it takes more than one predictor variable to predict the response variable

13
3. Logistic Regression: Logistic regression is used for binary classification where we use
sigmoid function, that takes input as independent variables and produces a probability value
between 0 and 1. Logistic regression predicts the output of categorical dependent variable.
Therefore, the outcome must be a categorical or discrete value.

Types of Logistic Regression:

On the basis of the categories, Logistic Regression can be classified into three types:

1. Binomial: In binomial Logistic regression, there can be only two possible types of the
dependent variables, such as 0 or 1, Pass or Fail, etc.
2. Multinomial: In multinomial Logistic regression, there can be 3 or more possible unordered
types of the dependent variable, such as “cat”, “dogs”, or “sheep”
3. Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of
dependent variables, such as “low”, “Medium”, or “High”.

14
Decision Trees, Random Forest & KNN algorithm.
Decision Trees: In machine learning, a decision tree is a predictive model that uses a Fsplits the
data into 10 subsets based on features, with each split maximizing the homogeneity of the target
variable within the subsets. Decision trees are used for classification and regression tasks, offering
interpretability and versatility in handling various types of data

Terminologies in Decision Tree:

 Root Node
 Decision Nodes
 Leaf Nodes
 Sub-Tree
 Pruning
 Sub-Tree
 Parent and Child Node

Steps in Decision tree:

1. Starting at the Root


2. Asking the Best Questions
3. Branching Out

15
4. Repeating the Process

Benefits of Decision tree:

1. Easy Navigation
2. Step-by-step approach
3. Time-saving

Random forest: It is also called a lazy learner algorithm because it does not learn from the training
set immediately instead it stores the dataset and at the time of classification, it performs an action on
the dataset

Uses of Random Forest:

1. It takes less training time as compared to other algorithms.


2. It predicts output with high accuracy, even for the large dataset it runs efficiently.
3. It can also maintain accuracy when a large proportion of data is missing.

The Working process can be explained in the below steps:

Step-1: Select random K data points from the training set.

Step-2: Build the decision trees associated with the selected data points (Subsets).

Step-3: Choose the number N for decision trees that you want to build.

Step-4: Repeat Step 1 & 2.

16
Step-5: For new data points, find the predictions of each decision tree, and assign the new data points
to the category that wins the majority votes
K-Nearest Neighbor (KNN) Algorithm: KNN algorithm stores all the available data and
classifies a new data point based on the similarity. This means when new data appears then it can be
easily classified into a well suite category by using KNN algorithm. It is also called a lazy learner
algorithm because it does not learn from the training set immediately instead it stores the dataset and
at the time of classification, it performs an action on the dataset.
The KNN working can be explained on the basis of the below algorithm:

Step-1: Select the number K of the neighbors

Step-2: Calculate the Euclidean distance of K number of neighbors

Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.

Step-4: Among these k neighbors, count the number of the data points in each category.

Step-5: Assign the new data points to that category for which the number of the neighbor is
maximum.
Step-6: Our model is ready

Bayes Theorem, Classification, Types of classification, Linear & Non-Linear

Bayes Theorem: Bayes theorem is one of the most popular machine learning concepts that
helps to calculate the probability of occurring one event with uncertain knowledge while other one
has already occurred.

Bayes' theorem can be derived using product rule and conditional probability of event X with known
event Y.
The theorem can be mathematically expressed as:

P(A|B) = P(B|A) * P(A) / P(B)

Where,

 P(A∣B) is the posterior probability of event A given event B.


 (B∣A) is the likelihood of event B given event A.
 P(A) is the prior probability of event A.
 P(B) is the total probability of event B.

17
Classification: Classification is a supervised machine learning method where the model tries to
predict the correct label of a given input data.
The Classification algorithm is a Supervised Learning technique that is used to identify the category
of new observations on the basis of training data. In Classification, a program learns from the given
dataset or observations and then classifies new
observation into a number of classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat
or dog, etc. Classes can be called as targets/labels or categories.

Unlike regression, the output variable of Classification is a category, not a value, such as "Green or
Blue", "fruit or animal", etc. Since the Classification algorithm is a Supervised learning technique,
hence it takes labeled input data, which means it contains input with the corresponding output.
Types of Classification: Classification Algorithms can be further divided into the Mainly
two category:

Linear Models

a) Logistic Regression
b) Support Vector Machines

Non- Linear Models

a) K-Nearest Neighbours
b) Kernel SVM
c) Decision Tree Classification

18
Ensemble Samplings & Techniques, Support Vector Machine, Hyper
Parameter Tuning.

Ensembling Sampling: Ensemble learning is an approach in which two or more models are
fitted to the same data, and the predictions of each model are combined. Ensemble learning aims to
achieve better performance with the ensemble of models than with any individual model.
There are several methods of ensembling techniques in machine learning, including:

1. Voting: Different models make predictions, and the final prediction is determined by a
majority vote or averaging their predictions.
2. Bagging (Bootstrap Aggregating): Multiple models (often of the same type) are trained on
different subsets of the training data, typically with replacement. The final prediction is an
average or voting of these models.
3. Boosting: Models are trained sequentially, where each subsequent model corrects the errors
of the previous one. Examples include AdaBoost, Gradient Boosting Machines (GBM), and
XGBoost.
4. Stacking: Predictions from multiple models are used as input features to a meta- model,
which then produces the final prediction. It combines the strengths of different models.
5. Random Forest: An ensemble method based on decision trees, where multiple decision trees
are trained on different subsets of the data, and the final prediction is determined by averaging
or voting. These methods help improve the overall performance and robustness of machine
learning models by leveraging the diversity of multiple models

Support Vector Machine: Support Vector Machine, abbreviated as SVM can be used for both
regression and classification tasks, but generally, they work best in classification problems. The main
objective of the SVM algorithm is to find the optimal hyperplane in an N-dimensional space that can
separate the data points in different classes in the feature space. The hyperplane tries that the margin
between the closest points of different classes should be as maximum as possible.

19
Types of Support Vector machine:

Support Vector Machines (SVM) can be divided into two main parts:

1. Linear SVM: Typically used for linear regression and classification problems. Linear SVM
is used for linearly separable data, which means if a dataset can be classified into two
classes by using a single straight line, then such data is termed as linearly separable data,
and classifier is used called as Linear SVM classifier

2. Non-Linear SVM: Non-Linear SVM is used for non-linearly separated data, which means
if a dataset cannot be classified by using a straight line, then such data is termed as non-
linear data and classifier used is called as Non- linear SVM classifier.

20
Hyperparameter Tuning: Hyperparameters directly control model structure, function, and
performance. Hyperparameter tuning allows data scientists to tweak model performance for optimal
results. This process is an essential part of machine learning, and choosing appropriate hyperparameter
values is crucial for success. The goal of hyperparameter tuning is to find the values that lead to the
best performance on a given task.
Types of Hyperparameter tuning:

1. Number of hidden layers: It’s a trade-off between keeping our neural network as simple as
possible (fast and generalized) and classifying our input data correctly.
2. Number of nodes/neurons per layer: More isn't always better when determining how many
neurons to use per layer. Increasing neuron count can help, up to a point.
3. Learning rate: Model parameters are adjusted iteratively and the learning rate controls the
size of the adjustment at each step.
4. Momentum: Momentum helps us avoid falling into local minima by resisting rapid changes
to parameter values

Introduction to Flask, Flask Frame work Process, Flask templating


HTML files, Flask Deployment & Functionalities of Flask.
Introduction To Flask:

1. Flask is a Micro Web framework written in python.


2. Flask is based on WSGI, Template Engine (Jinja 2).

Flask Framework Process

Flask Templating in HTML Files and Deployment:

Steps:

1. Setting up Environment for creating flask application.


2. Create an HTML file for rendering.
3. Create an app.py file and place it outside of HTML file.
4. Run the app.py file

21
Deploying ML Models on Flask:

1. Train and Save the Model


2. Create a Flask App
3. Set Up Flask App Structure
4. Load the Model
5. Create Flask Routes
6. Create HTML Templates
7. Run the Flask App Locally
8. Deploy to a Web Server
1. Routing and URL mapping: Flask allows developers to map different URLs to specific
functions, making it easy to handle different types of requests and build API endpoints for data
retrieval.
2. Templating: Flask's templating engine enables dynamic HTML generation, allowing data
scientists to integrate their analyses and visualizations seamlessly into web pages.
3. Integration with data science libraries: Flask seamlessly integrates with popular data science
libraries like NumPy, Pandas, and Matplotlib, enabling data scientists to leverage their powerful
functionality in web applications.
4. Form handling and user inputs: Flask simplifies the process of handling form submissions,
making it straightforward to collect user input for further analysis or processing.
5. Database integration: Flask can easily connect to databases through extensions like Flask-SQL
Alchemy, facilitating the storage and retrieval of data within web applications.

22
Introduction to ANN, ANN Architecture, Activation function,
Working of ANN, Types of ANN
Introduction to ANN: An Artificial Neural Network (ANN) was developed by Jeffrey Hilton.
ANN is similar to human brain’s neural structure. It consists of interconnected nodes (neurons)
organized into layers. Information flows through these nodes, and the network adjusts the connection
strengths (weights) during training to learn from data, enabling it to recognize patterns, make
predictions, and solve various tasks in machine learning and artificial intelligence.
Artificial Neural Networks Architecture: The term "Artificial Neural Network" is derived from
Biological neural networks that develop the structure of a human brain. Similar to the human brain
that has neurons interconnected to one another, artificial neural networks also have neurons that
are interconnected to one another in various layers of the networks. These neurons are known as
nodes.
ANN contains mainly three layers:

1. Input Layer: - Accepts all inputs that are given by the user.
2. Hidden Layer: - Performs all the calculations to find hidden features and patters.
3. Output Layer: - Predicts the Output based on input and weighted calculation at hidden neurons

ANN Activation function: This summed function is applied over an Activation function. The output
from this neuron is multiplied with the weight W3 and supplied as input to the output layer.
(Y=W1X1+W2X2+b)

Types of ANN:

1) Feed Forward Neural Network: Feedforward neural network is a linear network, which consists
of a single layer of output nodes, the inputs are fed directly to the outputs via a series of weights. The
23
sum of the products of the weights and the inputs is calculated in each node

2) Feed Backward propagation: Backward propagation is an algorithm that is designed to test for
errors working back from output nodes to input nodes. If there are any errors then by using this
algorithm, we can prevent the rate of errors.

24
Introduction to CNN, CNN Architecture, Introduction to Transfer
Learning, Types of Transfer Learning, Use Cases of Machine
Learning.
Introduction to CNN:

Convolutional neural network (CNN or ConvNet) is a network architecture for deep learning that
learns directly from data. CNNs are particularly useful for finding patterns in images to recognize
objects, classes, and categories. A CNN is a powerful tool but requires millions of labelled data
points for training.
Applications of CNN:

1. Image classification: CNNs are the state-of-the-art models for image classification. They can
be used to classify images into different categories, such as cats and dogs.
2. Object detection: CNN can detect and locate objects in images or videos like people, cars, and
buildings. They can also be used to localize objects in images, which means that they can
identify the location of an object in an image.
3. Image segmentation: CNNs can be used to segment images, which means that they can identify
and label different objects in an image. This is useful for applications such as medical imaging.

CNN Architecture:

The CNN’s job is to compress the images into a format that is easier to process while preserving
elements that are important for obtaining a decent prediction. A convolutional neural network,
ConvNets in short has three layers which are its building blocks, let’s have a look
1. Convolutional Layer
2. Pooling Layer
3. Fully Connected Laye

25
Types of layers:

Input Layers: It’s the layer in which we give input to our model. In CNN, Generally, the input will be
an image or a sequence of images. This layer holds the raw input of the image with width 32,
height 32, and depth 3.
Convolutional Layers: This is the layer, which is used to extract the feature from the input dataset.
It applies a set of learnable filters known as the kernels to the input images. The filters/kernels are
smaller matrices usually 2×2, 3×3, or 5×5 shape. it slides over the input image data and computes the
dot product between kernel weight and the corresponding input image patch. The output of this layer
is referred as feature maps.
Pooling layer: This layer is periodically inserted in the covnets and its main function is to reduce the
size of volume which makes the computation fast reduces memory and also prevents overfitting. Two
common types of pooling layers are max pooling and average pooling.
Fully Connected Layer: Fully Connected Layer is also known as Dense Layer. The fully connected
layer plays a critical role in the final stages of a CNN, where it is responsible for classifying images
based on the features extracted in the previous layers. The term fully connected means that each
neuron in one layer is connected to each neuron in the subsequent layer.
Activation Function: The activation function is typically applied to the output of each neuron in the
network. It takes in the weighted sum of the inputs and produces an output that is then passed on to the
next layer.

26
Introduction to Transfer Learning:

Transfer learning (TL) is a technique in machine learning (ML) in which knowledge learned from a
task is re-used in order to boost performance on a related task. For example, for image classification,
knowledge gained while learning to recognize cars could be applied when trying to recognize trucks.
There are two ways to make use of knowledge from the pre-trained model. The first is to freeze some
layers from the pre- trained model and then train layers using our new dataset. The second way is to
create a new model but also remove some features from the layer in the pre-trained model.
Types of transfer learning:

Deep transfer learning is a technique that utilizes pre-trained deep neural networks as the starting point
for training on a new task. There are several different types of algorithms used in deep transfer
learning, including:
1. Fine-tuning: This involves taking a pre-trained network and training it further on a new task
by adjusting the weights of the final layers.
2. Multi-task Learning: Training a single model to perform multiple tasks simultaneously,
where the knowledge learned from one task can benefit the performance on other related tasks.
3. Feature extraction: This method uses the features learned by a pre-trained network as input
to a new classifier, which is trained from scratch
Use Cases of Machine Learning:

Image Recognition: Identifying objects, people, places, and activities in images and videos, used in
applications like facial recognition, medical imaging, and autonomous vehicles.
Recommendation Systems: Suggesting products, services, or content based on user preferences and
behavior, seen in e-commerce platforms, streaming services, and social media platforms.
Speech Recognition: Converting spoken language into textor commands, utilized in virtual assistants,
voice-controlled devices, and speech-to-text applications.
Healthcare: Diagnosing diseases, predicting patient outcomes, and personalizing treatment plans
based on medical data and patient history.

27
Executive Summary
Objective

This model aims to provide a reliable and objective assessment of wine quality, thereby aiding winemakers,
distributors, and consumers in making informed decisions. The specific goals of the project are as follows:

Data Collection and Pre-processing:

 Gather a comprehensive dataset of red wine samples, including various chemical properties
and quality ratings.

 Clean and pre-process the data to handle missing values, normalize features, and prepare it for
analysis.

Feature Selection:

 Identify the most significant chemical properties that influence wine quality.

 Use statistical and machine learning techniques to select the most relevant features for the
predictive model.

Model Development and Training:

 Develop and train various machine learning models (e.g., Linear Regression, Decision Trees,
Random Forest) to predict wine quality based on the selected features.

 Evaluate the performance of these models using appropriate metrics such as accuracy,
precision, recall, and F1-score.

Model Optimization and Validation:

 Optimize the best-performing model through hyperparameter tuning and cross-validation.

 Validate the model on a separate test dataset to ensure its generalizability and robustness.

Implementation and Deployment:

 Implement the final predictive model in a user-friendly application or dashboard.

 Deploy the model for use by winemakers, distributors, and consumers to assess the quality of
red wine samples efficiently and objectively.

28
INTERNSHIP PART

During the Red Wine Quality Detection internship at Blackbucks, we engage in a variety of activities and
responsibilities aimed at providing practical experience in data pre-processing, feature engineering, and
machine learning model development using Python.

Activities and Responsibilities

Data Exploration and Pre-processing:

 Loaded and cleaned the red wine quality dataset.

 Handled missing values and outliers using appropriate techniques.

 Performed exploratory data analysis (EDA) to understand the distribution of features and their
relationship with wine quality.

Feature Engineering:

 Created new features or transformed existing ones to improve model performance.

 Implemented feature scaling and normalization techniques to enhance model accuracy.

Model Selection and Training:

 Experimented with various machine learning algorithms, including Decision Trees, Random Forest,
Support Vector Machines, and Neural Networks. 

 Trained and fine-tuned the selected models on the prepared dataset.

Model Evaluation:

 Evaluated model performance using metrics such as accuracy, precision, recall, F1-score, and
confusion matrix.

 Performed cross-validation to assess model generalization and robustness.

Hyperparameter Tuning:

 Optimized model performance by tuning hyperparameters using techniques like grid search and
random search. 

Model Deployment:

 Developed a user-friendly web application or API to deploy the model for real-world use cases

29
Equipment Used

Software:

 Python: The primary programming language for data analysis, machine learning, and model
development.

 Jupyter Notebook: An interactive environment for data exploration, visualization, and code
execution.

 Google Colab: A cloud-based platform for running Python code, including machine
learning experiments.

 Popular Python Libraries:

o NumPy: For numerical computations and array manipulation.

o Pandas: For data analysis and manipulation.

o Matplotlib and Seaborn: For data visualization.

o Scikit-learn: For machine learning algorithms and model evaluation.

Hardware:

 Personal Computer or Laptop: A reliable system with sufficient processing power and
memory to run the required software. 

 Cloud Computing Resources: Google Colab or other cloud-based platforms for accessing
powerful computing resources.

Tasks Performed

 Data cleaning and pre-processing to ensure high-quality datasets for analysis.


 Creating visualizations to represent data insights effectively and intuitively.
 Developing and training machine learning models to predict outcomes and provide actionable
insights.
 Conducting exploratory data analysis to identify trends and patterns in the data.
 Collaborating with team members and mentors to discuss findings and refine analysis
techniques.
30
Skills Acquired

Technical Skills:

 Data Analysis and Visualization:

o Proficient in using Python libraries like Pandas, NumPy, Matplotlib, and Seaborn for
data exploration, cleaning, and visualization.

 Machine Learning:

o Hands-on experience with various machine learning algorithms, including Decision


Trees, Random Forest, Support Vector Machines, and Neural Networks.

o Expertise in model training, evaluation, and hyperparameter tuning.

 Model Deployment:

o Knowledge of deploying machine learning models as web applications or APIs using


frameworks like Flask or Django.

 Problem-Solving:

o Ability to identify and address challenges in data analysis, model development, and
deployment.

 Critical Thinking:

o Skill in analysing complex problems and devising effective solutions.

Soft Skills:

 Teamwork: Collaborated effectively with team members to achieve project goals.

 Communication: Clearly communicated technical concepts and project findings to both


technical and non-technical audiences. 

 Time Management: Efficiently managed time and prioritized tasks to meet deadlines.

 Adaptability: Quickly adapted to changing requirements and technological advancements

31
Red Wine Quality Detection using Machine Learning
Libraries:
To implement the project, we need to import necessary packages such as data visualization package
known as matplotlib.pyplot, data exploration packages called NumPy, pandas, machine learning
model package called Logistic Regression, to split the dataset into training and testing we use
train_test_split function.

Dataset:
To load dataset, we need to use load_wine() class which has attributes like data, target. The data
attribute has 13 features and target attribute is a label class has three classes low, medium, and high.
To represent data attribute, we store data in X variable
To represent target attribute, we store target in Y variable.
After loading data and target into X and Y, we need to create dataset from it by using pandas
DataFrame.

32
After performing above activities, the dataset is like this -

Splitting the dataset into Training dataset and Testing dataset:


The training dataset is used to train the machine learning model whereas testing dataset is used to
evaluate the machine learning model. To split dataset, we use train_test_split function. This function
split the dataset samples into 80% of training samples and 20% of testing samples randomly. This
function also split training and testing datasets as x_train, x_test and y_train, y_test meaning its
sperate features and labels represented as X and Y.

Creating machine learning model:

I use Logistic Regression model to accurately predict the quality of wine, because Logistic Regression
used to classify the distinct values such as low, medium, high. Logistic Regression uses Sigmoid
function which is a probabilistic function that plot the samples into a range of 0<=X<=+Z meaning
this function predict two or more distinct values.

33
Training:
Training the Logistic Regression which is supervised machine learning model that takes two
parameters i.e., features and its labels.

Testing:
Testing means predicting the labels for testing dataset samples which are known as predicted output.
This predicted output will be evaluated with actual output which are in y_test variable., we use f1-
score to get accuracy.

Visualizing:
To evaluate we use graphical representation which is simply understandable by visualizing. The
predicted output is represented with blue colour and actual output represented with yellow, if yellow
overlap with blue means predicted accurately else not.

34
Model Evaluation:
To evaluate model, we consider following properties i.e., precision, recall, f1-score and support.
These properties evaluate model performance which will be helpful to improve model.

Accuracy:
Accuracy says how the work of the model? Is model can make better prediction? The accuracy gives
answers to these all questions.

35

You might also like