Industrial Report
Industrial Report
Submitted to:
Mr. S. S. Shekhawat
Head of Dept.
https://trainings.internshala.com/verify_certificate
ii
SESSION 2020-2021
DATA SCIENCE REPORT
Acknowledgement
I take this opportunity to express my deep sense of gratitude to my coordinator Mr. Loveleen
Kumar , Assistant Professor Department of Computer Science and Engineering, Global Institute
of Technology, Jaipur, for his valuable guidance and cooperation throughout the Practical Training
work. He provided constant encouragement and unceasing enthusiasm at every stage of the
Practical Training work.
We are grateful to our respected Dr. I. C. Sharma, Principal GIT for guiding us during Practical
Training period
We express our indebtedness to Mr. S.S.Shekhawat, Head of Department of Computer Science
and Engineering, Global Institute of Technology, Jaipur for providing me ample support during my
Practical Training period.
Without their support and timely guidance, the completion of our Practical Training would have
seemed a farfetched dream. In this respect we find ourselves lucky to have mentors of such a great
potential.
Amit Khandelwal
18EGJCS010
B.Tech. V Semester, III Year, CS
iii
SESSION 2020-2021
DATA SCIENCE REPORT
Abstract
In this project, we were asked to experiment with a real-world dataset, and to explore how machine
learning can be used to find the patterns in data. We were expected to gain experience using a
common data- mining and machine learning library, we were expected to submit a report about the
dataset and the algorithms used. After performing the required tasks on a dataset of the given
choice, here in lies my final report.
iv
SESSION 2020-2021
DATA SCIENCE REPORT
Table of Contents
Certificate ............................................................................................................................ ii
Abstract ............................................................................................................................... iv
1. History of python…………………………………………………..…………………...…….01
1.3.1 List………………………………………………..…………………..03
1.3.2 Dictionary………………………………………………………….…03
1.3.3 Tuple…………………………………………...………………….….04
1.3.4 Sets……………………………………………..…………………….04
1.4. File Handling…………………………………………………………………………….05
1.5. NumPy…………………………………………………………………………………...07
2.7 ML Algorithms…………………………………………...……………………14
v
SESSION 2020-2021
DATA SCIENCE REPORT
3. Introduction to Statistics……………………………………………………………………30
3.1 Terminologies…………………………………………………………………31
3.2 Types of analysis………………………………………………………………31
3.3 Categories………………………………………………………………...……32
3.3.1 Descriptive…………………………………………………..……….32
3.3.2 Inferential…………………………………………………………….33
vi
SESSION 2020-2021
DATA SCIENCE REPORT
4 Conclusion …………………………………………………………………………………45
5. Reference and Bibliography……………………………………………………………...46
vii
SESSION 2020-2021
DATA SCIENCE REPORT
List of Figures
Fig 1.1 Program of ‘Hello World’…………………………..………………………….……2
Fig 1.2 Example of file handling code in python……………………………………….…6
Fig 1.3 Example of dictionary.…………………………………………..……………….…6
Fig 1.4 Output of File handling..………………………………………………………...…7
Fig 1.5 Numpy Example 1…………………………………………………………….…...8
Fig 1.6 Numpy Example 2…………………………………………………………..……..8
Fig 1.7 Pandas example…………………………………………………………………....10
Fig 2.1 Process of ML…………………………………………………..……….…………11
Fig 2.2 Optimization……………………………………………………………………… 12
Fig 2.3 Relation to statistics………………………………………………………….……12
Fig 2.4 ML & AI Relation…………………………………………………………….……13
Fig 2.5 ML vs Traditional programming…………………………………………………..14
Fig 2.6 Traditional Approach……………………………………………………………...15
Fig 2.7 ML Approach……………………………………………………………….……...15
Fig 2.8 ML technique……………………………………………………………………....16
Fig 2.9 ML Application…………………………………………………………………….16
Fig 2.10 Supervised learning………………………………………………………………17
Fig 2.11 Unsupervised learning…………………………………………………...………..18
Fig 2.12 Types of Unsupervised learning.....…………………………………….…….......18
Fig 2.13 Labeled and Unlabeled………………………………………………...….…........19
Fig 2.14 Reinforcement learning ……………………………………………….….….…. .20
Fig 2.15 Linear regression graph…………………………………………………...…...….21
Fig 2.17 Multiple linear regression………………………………………………………...22
Fig 2.18 Decision tree…………………………………………………………………..….23
Fig 2.19 A single decision tree vs a bagging ensemble of 500 trees …………………...…24
Fig 2.20 Logistic regression……………………………………………………………….25
viii
SESSION 2020-2021
DATA SCIENCE REPORT
ix
SESSION 2020-2021
DATA SCIENCE REPORT
Chapter 1.
History of Python
Python was developed in 1980 by Guido van Rossum at the National Research
Institute for Mathematics and Computer Science in the Netherlands as a successor of
ABC language capable of exception handling and interfacing. Python features a
dynamic type system and automatic memory management. It supports multiple
programming paradigms, including object-oriented, imperative, functional and
procedural, and has a large and comprehensive standard library.
Van Rossum picked the name Python for the new language from a TV show, Monty
Python's Flying Circus.
In December 1989 the creator developed the 1st python interpreter as a hobby and then
on 16 October 2000, Python 2.0 was released with many new features.
...In December 1989, I was looking for a "hobby" programming project that would
keep me occupied during the week around Christmas. My office ... would be closed,
but I had a home computer, and not much else on my hands. I decided to write an
interpreter for the new scripting language I had been thinking about lately: a
descendant of ABC that would appeal to Unix/C hackers. I chose Python as a working
title for the project, being in a slightly irreverent mood (and a big fan of Monty
Python's Flying Circus)
The language's core philosophy is summarized in the document The Zen of Python
(PEP 20), which includes aphorisms such as…
1
SESSION 2020-2021
DATA SCIENCE REPORT
Readability counts
1.3.1 LISTS-
Example-
1.3.2 Dictionary-
Constructing a dictionary.
3
SESSION 2020-2021
DATA SCIENCE REPORT
Nesting Dictionaries.
Basic Syntax
o d={} empty dictionary will be generated and assign keys and values to
it, like d[‘animal’] = ‘Dog’
o d = {'K1':'V1', 'K2’:’V2'}
1.3.3 Tuples-
No type restriction
Indexing and slicing, everything's same like that in strings and lists.
Constructing tuples.
Immutability.
We can use tuples to present things that shouldn’t change, such as days of the
week, or dates on a calendar, etc.
1.3.4 Sets-
A set contains unique and unordered elements and we can construct them by
using a set() function.
l=[1,2,3,4,1,1,2,3,6,7]
k = set(l)
k becomes {1,2,3,4,6,7}
4
SESSION 2020-2021
DATA SCIENCE REPORT
Basic Syntax-
x=set()
x.add(1)
x = {1}
Python too supports file handling and allows users to handle files i.e., to read and write
files, along with many other file handling options, to operate on files. The concept of
file handling has stretched over various other languages, but the implementation is
either complicated or lengthy, but alike other concepts of Python, this concept here is
also easy and short. Python treats file differently as text or binary and this is important.
Each line of code includes a sequence of characters and they form text file. Each line
of a file is terminated with a special character, called the EOL or End of Line
characters like comma {,} or newline character. It ends the current line and tells the
interpreter a new one has begun. Let’s start with Reading and Writing files.
We use open () function in Python to open a file in read or write mode. As explained
above, open ( ) will return a file object. To return a file object we use open () function
along with two arguments, that accepts file name and the mode, whether to read or
write. So, the syntax being: open (filename, mode). There are three kinds of mode,
that Python provides and how files can be opened:
• “ r “, for reading.
• “ w “, for writing.
• “ a “, for appending.
5
SESSION 2020-2021
DATA SCIENCE REPORT
6
SESSION 2020-2021
DATA SCIENCE REPORT
It read the words from 101.txt file and print the all words which are present in the
file and also tell that word occurring how many times.
1.5 NumPy
Numeric, the ancestor of NumPy, was developed by Jim Hugunin. Another package
Num-array was also developed, having some additional functionalities. In 2005, Travis
Oliphant created NumPy package by incorporating the features of Num-array into
Numeric package. There are many contributors to this open-source project.
Operations related to linear algebra. NumPy has in-built functions for linear
algebra and random number generation.
First of all we import NumPy package then using this we take input in NumPy
function as a list then we create a matrix
There is many more function can be perform by using this like that take sin value of
the given value ,print a zero matrix etc. we also take any image in the form of array.
8
SESSION 2020-2021
DATA SCIENCE REPORT
1.6 Pandas
• Fast and efficient Data Frame object with default and customized indexing.
• Tools for loading data into in-memory data objects from different file
formats.
Series
Data Frame
Panel
These data structures are built on top of NumPy array, which means they are fast.
9
SESSION 2020-2021
DATA SCIENCE REPORT
10
SESSION 2020-2021
DATA SCIENCE REPORT
Chapter 2.
Arthur Samuel, an American pioneer in the field of computer gaming and artificial
intelligence, coined the term "Machine Learning" in 1959.
Over the past two decades Machine Learning has become one of the mainstays of
information technology.
With the ever-increasing amounts of data becoming available there is good reason
to believe that smart data analysis will become even more pervasive as a necessary
ingredient for technological progress.
Data mining uses many machine learning methods, but with different goals;
on the other hand, machine learning also employs data mining methods as
"unsupervised learning" or as a preprocessing step to improve learner
accuracy.
11
SESSION 2020-2021
DATA SCIENCE REPORT
Machine learning also has intimate ties to optimization: many learning problems
are formulated as minimization of some loss function on a training set of examples.
Loss functions express the discrepancy between the predictions of the model being
trained and the actual problem instances.
Michael I. Jordan suggested the term data science as a placeholder to call the overall
field. Leo Bierman distinguished two statistical modelling paradigms: data model
and algorithmic model, wherein "algorithmic model" means more or less the
machine learning algorithms like Random forest.
12
SESSION 2020-2021
DATA SCIENCE REPORT
Machine Learning revolution will stay with us for long and so will be the future of
Machine Learning.
Machine learning is an application of artificial intelligence (AI) that provides systems the
13
SESSION 2020-2021
DATA SCIENCE REPORT
ability to automatically learn and improve from experience without being explicitly
programmed. Machine learning focuses on the development of computer programs that
can access data and use it learn for themselves.
As more data gets added, Machine Learning training can be automated for learning
new data patterns and adapting its algorithm.
14
SESSION 2020-2021
DATA SCIENCE REPORT
15
SESSION 2020-2021
DATA SCIENCE REPORT
Fig 2.8
Supervised learning is the machine learning task of learning a function that maps an input to
16
SESSION 2020-2021
DATA SCIENCE REPORT
an output based on example input-output pairs. It infers a function from labeled training data
consisting of a set of training examples.
Fig 2.10
Voice Assistants
Gmail Filters
Weather Apps
17
SESSION 2020-2021
DATA SCIENCE REPORT
Fig 2.11
Here the task of machine is to group unsorted information according to similarities, patterns
and differences without any prior training of data.
Clustering
The most common unsupervised learning method is cluster analysis. It is used to find
18
SESSION 2020-2021
DATA SCIENCE REPORT
data clusters so that each cluster has the most closely matched data.
Visualization Algorithms
Anomaly Detection
Semi-supervised learning is a class of machine learning tasks and techniques that also make
use of unlabeled data for training – typically a small amount of labeled data with a large amount
of unlabeled data.
Semi-supervised learning falls between unsupervised learning (without any labeled training
data) and supervised learning (with completely labeled training data).
19
SESSION 2020-2021
DATA SCIENCE REPORT
Reinforcement Learning is a type of Machine Learning that allows the learning system to
observe the environment and learn the ideal behavior based on trying to maximize some notion
of cumulative reward. It differs from supervised learning in that labelled input/output pairs
need not be presented, and sub-optimal actions need not be explicitly corrected. Instead the
focus is finding a balance between exploration (of uncharted territory) and exploitation (of
current knowledge)
The learning system (agent) observes the environment, selects and takes certain
actions, and gets rewards in return (or penalties in certain cases).
The agent learns the strategy or policy (choice of actions) that maximizes its
rewards over time.
2.10.1 Regression
It includes many techniques for modeling and analyzing several variables, when the
focus is on the relationship between a dependent variable and one or more independent
variables (or 'predictors').
More specifically, regression analysis helps one understand how the typical value of
the dependent variable (or 'criterion variable') changes when any one of the independent
variables is varied, while the other independent variables are held fixed.
Linear regression is a linear approach for modeling the relationship between a scalar
dependent variable y and an independent variable x.
y^=wTx
where x, y, w are vectors of real numbers and w is a vector of weight parameters.
21
SESSION 2020-2021
DATA SCIENCE REPORT
y = wx + b
where b is the bias or the value of output for zero input
The graph shows dependent variable y plotted against two independent variables x1 and x2. It is
shown in 3D. More independent variables (if involved) will increase the dimensions further.
It represents line fitment between multiple inputs and one output, typically:
Y=w1x1+w2x2+b
Decision Trees are non-parametric models, which means that the number of parameters
22
SESSION 2020-2021
DATA SCIENCE REPORT
is not determined prior to training. Such models will normally overfit data.
Ensemble Learning uses the same algorithm multiple times or a group of different algorithms
together to improve the prediction of a model.
23
SESSION 2020-2021
DATA SCIENCE REPORT
2.10.2 Classification
This method is widely used for binary classification problems. It can also be extended to
multi-class classification problems.
A binary dependent variable can have only two values, like 0 or 1, win or lose, pass or fail,
healthy or sick, etc.
24
SESSION 2020-2021
DATA SCIENCE REPORT
The probability in the logistic regression is often represented by the Sigmoid function (also
called the logistic function or the S-curve). Represented as:-
𝑆 (𝑡) = 1⁄1 + 𝑒 −𝑡
SVMs are very versatile and are also capable of performing linear or
nonlinear classification, regression, and outlier detection.
25
SESSION 2020-2021
DATA SCIENCE REPORT
26
SESSION 2020-2021
DATA SCIENCE REPORT
2.11.1 Clustering
Clustering means
27
SESSION 2020-2021
DATA SCIENCE REPORT
Fig 2.24 Iterative process to get data points in the best clusters possible
Step 5: move points across clusters and re-calculate the distance from
the centroid
Step 6: keep moving the points across clusters until the Euclidean
distance is minimized
One could plot the Distortion against the number of clusters K. Intuitively, if K
increases, distortion should decrease. This is because the samples will be close
to their assigned centroids. This plot is called the Elbow method.
28
SESSION 2020-2021
DATA SCIENCE REPORT
It indicates the optimum number of clusters at the position of the elbow, the
point where distortion begins to increase most rapidly.
K-means is based on finding points close to cluster centroids. The distance between two points
x and y can be measured by the squared Euclidean distance between them in an m-dimensional
space.
Classifying high risk and low risk patients from a patient pool
29
SESSION 2020-2021
DATA SCIENCE REPORT
Chapter 3
Introduction to Statistics
Statistics is used to process complex problems in the real world so that Data Scientists and
Analysts can look for meaningful trends and changes in Data. In simple words, Statistics can be
used to derive meaningful insights from data by performing mathematical computations on it.
Several Statistical functions, principles, and algorithms are implemented to analyse raw data, build
a Statistical Model and infer or predict the result.
The field of Statistics has an influence over all domains of life, the Stock market, life sciences,
weather, retail, insurance, and education are but to name a few.
30
SESSION 2020-2021
DATA SCIENCE REPORT
The population is the set of sources from which data has to be collected.
A Sample is a subset of the Population
A Variable is any characteristics, number, or quantity that can be measured or counted.
A variable may also be called a data item.
Also known as a statistical model, A statistical Parameter or population parameter is a
quantity that indexes a family of probability distributions. For example, the mean, median,
etc of a population.
Before we move any further and discuss the categories of Statistics, let’s look at the types of
analysis.
Fig 3.2 Types of Analysis – Math And Statistics For Data Science
For example, if I want a purchase a coffee from Starbucks, it is available in Short, Tall and Grande.
This is an example of Qualitative Analysis. But if a store sells 70 regular coffees a week, it is
Quantitative Analysis because we have a number representing the coffees sold per week.
Although the purpose of both these analyses is to provide results, Quantitative analysis provides a
clearer picture hence making it crucial in analytics.
1. Descriptive Statistics
2. Inferential Statistics
Descriptive Statistics uses the data to provide descriptions of the population, either through
numerical calculations or graphs or tables.
Descriptive Statistics helps organize data and focuses on the characteristics of data providing
parameters.
Fig 3.3 Descriptive Statistics – Math and Statistics for Data Science
Suppose you want to study the average height of students in a classroom, in descriptive statistics
you would record the heights of all students in the class and then you would find out the maximum,
minimum and average height of the class.
32
SESSION 2020-2021
DATA SCIENCE REPORT
Fig 3.4 Descriptive Statistics Example – Math and Statistics for Data Science
Inferential Statistics makes inferences and predictions about a population based on a sample of
data taken from the population in question.
Inferential statistics generalizes a large data set and applies probability to arrive at a conclusion. It
allows you to infer parameters of the population based on sample stats and build models on it.
Fig 3.5 Inferential Statistics – Math and Statistics for Data Science
So, if we consider the same example of finding the average height of students in a class, in
Inferential Statistics, you will take a sample set of the class, which is basically a few people from
the entire class. You already have had grouped the class into tall, average and short. In this
method, you basically build a statistical model and expand it for the entire population in the class.
33
SESSION 2020-2021
DATA SCIENCE REPORT
Now let’s focus our attention on Descriptive Statistics and see how it can be used to solve
analytical problems.
When we try to represent data in the form of graphs, like histograms, line plots, etc. the data is
represented based on some kind of central tendency. Central tendency measures like, mean,
median, or measures of the spread, etc are used for statistical analysis. To better understand
Statistics lets discuss the different measures in Statistics with the help of an example.
Fig 3.6 Cars Data Set – Math and Statistics for Data Science
34
SESSION 2020-2021
DATA SCIENCE REPORT
1. Cars
2. Mileage per Gallon (mpg)
3. Cylinder Type (cyl)
4. Displacement (disp)
5. Horse Power (hp)
6. Real Axle Ratio (drat).
Before we move any further, let’s define the main Measures of the Centre or Measures of Central
tendency.
3.5 Measures of The Centre
Using descriptive Analysis, you can analyse each of the variables in the sample data set for mean,
standard deviation, minimum and maximum.
If we want to find out the mean or average horsepower of the cars among the population of
cars, we will check and calculate the average of all values. In this case, we’ll take the sum
of the Horse Power of each car, divided by the total number of cars:
If we want to find out the center value of mpg among the population of cars, we will
arrange the mpg values in ascending or descending order and choose the middle value. In
this case, we have 8 values which is an even entry. Hence we must take the average of the
two middle values.
Median = (22.8+23)/2=22.9
If we want to find out the most common type of cylinder among the population of cars, we
will check the value which is repeated most number of times. Here we can see that the
cylinders come in two values, 4 and 6. Take a look at the data set, you can see that the most
recurring value is 6. Hence 6 is our Mode.
35
SESSION 2020-2021
DATA SCIENCE REPORT
Just like the measure of centre, we also have measures of the spread, which comprises of the
following measures:
1. Range: It is the given measure of how spread apart the values in a data set are.
2. Inter Quartile Range (IQR): It is the measure of variability, based on dividing a data set into
quartiles.
3. Variance: It describes how much a random variable differs from its expected value. It
entails computing squares of deviations.
1. Deviation is the difference between each element from the mean.
2. Population Variance is the average of squared deviations
3. Sample Variance is the average of squared differences from the mean
4. Standard Deviation: It is the measure of the dispersion of a set of data from its mean.
Now that we’ve seen the stats and math behind Descriptive analysis, let’s try to work it out in R.
To under the characteristics of a general population, we take a random sample and analyze the
properties of the sample. We test whether or not the identified conclusion represents the population
accurately and finally we interpret their results. Whether or not to accept the hypothesis depends
upon the percentage value that we get from the hypothesis.
Consider four boys, Nick, John, Bob and Harry who were caught bunking a class. They were asked
to stay back at school and clean their classroom as a punishment.
36
SESSION 2020-2021
DATA SCIENCE REPORT
Fig 3.7 Inferential Analysis – Math and Statistics For Data Science
So, John decided that the four of them would take turns to clean their classroom. He came up with
a plan of writing each of their names on chits and putting them in a bowl. Every day they had to
pick up a name from the bowl and that person must clean the class.
Now it has been three days and everybody’s name has come up, except John’s! Assuming that this
event is completely random and free of bias, what is the probability of John not cheating?
Let’s begin by calculating the probability of John not being picked for a day:
The probability here is 75%, which is fairly high. Now, if John is not picked for three days in a
row, the probability drops down to 42%
P(John not picked for 3 days) = 3/4 ×3/4× 3/4 = 0.42 ( approx. )
Now, let’s consider a situation where John is not picked for 12 days in a row! The probability
drops down to 3.2%. Thus, the probability of John cheating becomes fairly high.
In order for statisticians to come to a conclusion, they define what is known as a threshold value.
Considering the above situation, if the threshold value is set to 5%, it would indicate that, if the
probability lies below 5%, then John is cheating his way out of detention. But if the probability is
above the threshold value, then John is just lucky, and his name isn’t getting picked.
The probability and hypothesis testing give rise to two important concepts, namely:
37
SESSION 2020-2021
DATA SCIENCE REPORT
Therefore, in our example, if the probability of an event occurring is less than 5%, then it is a
biased event, hence it approves the alternate hypothesis.
38
SESSION 2020-2021
DATA SCIENCE REPORT
Chapter 4.
Your client is a retail banking institution. Term deposits are a major source of income for a bank. A
term deposit is a cash investment held at a financial institution. Your money is invested for an
agreed rate of interest over a fixed amount of time, or term.
The bank has various outreach plans to sell term deposits to their customers such as email
marketing, advertisements, telephonic marketing and digital marketing. Telephonic marketing
campaigns still remain one of the most effective way to reach out to people. However, they require
huge investment as large call centres are hired to actually execute these campaigns. Hence, it is
crucial to identify the customers most likely to convert beforehand so that they can be specifically
targeted via call.
You are provided with the client data such as: age of the client, their job type, their marital status,
etc. Along with the client data, you are also provided with the information of the call such as the
duration of the call, day and month of the call, etc. Given this information, your task is to predict if
the client will subscribe to term deposit.
1. train.csv: Use this dataset to train the model. This file contains all the client and call details as
well as the target variable “subscribed”. You have to train your model using this file.
2. test.csv: Use the trained model to predict whether a new set of clients will subscribe the term
deposit
Variables Defination
ID Unique client ID
Age Age of the client
39
SESSION 2020-2021
DATA SCIENCE REPORT
The following were the steps that I performed to solve this problem statement
Importing all the necessary Machine learning libraries and loading the data.
40
SESSION 2020-2021
DATA SCIENCE REPORT
41
SESSION 2020-2021
DATA SCIENCE REPORT
So, 3715 users out of total 31647 have subscribed which is around 12%. Let's now explore the
variables to have a better understanding of the dataset. We will first explore the variables
individually using univariate analysis, then we will look at the relation between various
independent variables and the target variable. We will look at the correlation plot to see which
variables affects the target variable most.
Now we will check that is there any null value or not in the dataset
Next, we will start to build our predictive model to predict whether a client will subscribe to a term
deposit or not.
As the sklearn models takes only numerical input, we will convert the categorical variables into
numerical values using dummies. We will remove the ID variables as they are unique values and
then apply dummies. We will also remove the target variable and keep it in a separate variable.
43
SESSION 2020-2021
DATA SCIENCE REPORT
Fig 4.11 Using of the decision tree function from the sklearn library
Since the target variable is yes or no, we will convert 1 and 0 in the predictions to yes and no
respectively.
Fig 4.13 Converting the target value from “yes” or “no” to “1” or “0”
44
SESSION 2020-2021
DATA SCIENCE REPORT
Chapter 5.
Conclusion
Finally, when it comes to the development of machine learning models of your own,
you looked at the choices of various development languages, IDEs and Platforms. Next
thing that you need to do is start learning and practicing each machine learning
technique. The subject is vast, it means that there is width, but if you consider the
depth, each topic can be learned in a few hours. Each topic is independent of each
other. You need to take into consideration one topic at a time, learn it, practice it and
implement the algorithm/s in it using a language choice of yours. This is the best way
to start studying Machine Learning. Practicing one topic at a time, very soon you
would acquire the width that is eventually required of a Machine Learning expert.
45
SESSION 2020-2021
DATA SCIENCE REPORT
Chapter 6.
1. https://www.simplilearn.com/
2. https://www.scribd.com/document/434622438/AN-INDUSTRIAL-TRAINING-
REPORT-pdf
3. https://www.wikipedia.org/
4. https://www.analyticsvidhya.com/blog/2015/08/comprehensive-guide-regression/
5. https://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/
6. https://data-flair.training/blogs/svm-support-vector-machine-tutorial/
7. https://towardsdatascience.com/
8. https://towardsdatascience.com/machine-learning-algorithms-in-laymans-terms-part-
1-d0368d769a7b
9. https://www.expertsystem.com
10. https://analyticsindiamag.com/7-types-classification-algorithms/
11. https://www.edureka.co/
12. https://towardsdatascience.com/machine-learning-algorithms-in-laymans-terms-part-
1-d0368d769a7b
13. https://medium.com/
14. https://data-flair.training/blogs/machine-learning-tutorial/
46
SESSION 2020-2021
DATA SCIENCE REPORT
1 https://www.simplilearn.com/
2 https://www.lbef.org/machine-learning-projects-is-it-really-hard-to-manage/
3 https://www.wordstream.com/blog/ws/2017/07/28/machine-learning-applications
4 https://www.wikipedia.org/
5 https://www.youtube.com/
6 https://data-flair.training/blogs/svm-support-vector-machine-tutorial/
47
SESSION 2020-2021