0% found this document useful (0 votes)

32 views9 pages

DS&ML 1

Part 1

Uploaded by

anoopkumar.m

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views9 pages

DS&ML 1

Part 1

Uploaded by

anoopkumar.m

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

DATA SCIENCE & MACHINE LEARNING (part - 1)

techworldthink • February 02, 2022

1. What is data science?

Data science is the field of study that combines domain expertise, programming
skills, and knowledge of mathematics and statistics to extract meaningful insights
from data. Data science practitioners apply machine learning algorithms to numbers,
text, images, video, audio, and more to produce artificial intelligence (AI) systems to
perform tasks that ordinarily require human intelligence. In turn, these systems
generate insights which analysts and business users can translate into tangible
business value.

Data Science can be defined as the study of data, where it comes from, what it
represents, and the ways by which it can be transformed into valuable inputs and
resources to create business and IT strategies.

Data science is a deep study of the massive amount of data, which involves extracting
meaningful insights from raw, structured, and unstructured data that is processed
using the scientific method, different technologies, and algorithms.

It is a multidisciplinary field that uses tools and techniques to manipulate the data so
that you can find something new and meaningful.

Data science uses the most powerful hardware, programming systems, and most
efficient algorithms to solve the data related problems. It is the future of artificial
intelligence.

Data Science is about data gathering, analysis and decision-making.

Data Science is about finding patterns in data, through analysis, and make future
predictions.

By using Data Science, companies are able to make:

• Better decisions (should we choose A or B)

• Predictive analysis (what will happen next?)

• Pattern discoveries (find pattern, or maybe hidden information in the data)

Data Science is used in many industries in the world today, e.g. banking, consultancy,
healthcare, and manufacturing.

A Data Scientist requires expertise in several backgrounds:

• Machine Learning

• Statistics

• Programming (Python or R)

• Mathematics

• Databases

2. Explain the different types of data.

Almost anything can be turned into DATA. Building a deep understanding of the
different data types is a crucial prerequisite for doing Exploratory Data Analysis
(EDA) and Feature Engineering for Machine Learning models. You also need to
convert data types of some variables in order to make appropriate choices for visual
encodings in data visualization and storytelling.

Most data can be categorized into 4 basic types from a Machine Learning perspective:

numerical data

categorical data

time-series data

text data
1.Numerical Data

Numerical data is any data where data points are exact numbers. Statisticians also
might call numerical data, quantitative data. This data has meaning as
a measurement such as house prices or as a count, such as a number of residential
properties in Los Angeles or how many houses sold in the past year.

Numerical data can be characterized by continuous or discrete data. Continuous data

can assume any value within a range whereas discrete data has distinct values.

2.Categorical Data

Categorical data represents characteristics, such as a hockey player’s position, team,

hometown. Categorical data can take numerical values. For example, maybe we
would use 1 for the colour red and 2 for blue. But these numbers don’t have a
mathematical meaning. That is, we can’t add them together or take the average.

In the context of super classification, categorical data would be the class label. This
would also be something like if a person is a man or woman, or property is
residential or commercial.

3.Time Series Data

Time series data is a sequence of numbers collected at regular intervals over some
period of time. It is very important, especially in particular fields like finance. Time
series data has a temporal value attached to it, so this would be something like a date
or a timestamp that you can look for trends in time.

4.Text

Text data is basically just words. A lot of the time the first thing that you do with text
is you turn it into numbers using some interesting functions like the bag of words
formulation.
3. Differentiate between supervised and unsupervised learning
algorithms.

1.unsupervised learning

As the name suggests, unsupervised learning is a machine learning technique in

which models are not supervised using training dataset. Instead, models itself find
the hidden patterns and insights from the given data. It can be compared to learning
which takes place in the human brain while learning new things. It can be defined as:

Unsupervised learning is a type of machine learning in which models are trained using
unlabeled dataset and are allowed to act on that data without any supervision.

Unsupervised learning cannot be directly applied to a regression or classification

problem because unlike supervised learning, we have the input data but no
corresponding output data. The goal of unsupervised learning is to find the
underlying structure of dataset, group that data according to similarities, and
represent that dataset in a compressed format.

Why use Unsupervised Learning?

Below are some main reasons which describe the importance of Unsupervised
Learning:

• Unsupervised learning is helpful for finding useful insights from the data.

• Unsupervised learning is much similar as a human learns to think by their own

experiences, which makes it closer to the real AI.

• Unsupervised learning works on unlabeled and uncategorized data which make

unsupervised learning more important.

• In real-world, we do not always have input data with the corresponding output so
to solve such cases, we need unsupervised learning.
Types of Unsupervised Learning Algorithm:

• Clustering: Clustering is a method of grouping the objects into clusters such that
objects with most similarities remains into a group and has less or no similarities
with the objects of another group. Cluster analysis finds the commonalities
between the data objects and categorizes them as per the presence and absence
of those commonalities.

• Association: An association rule is an unsupervised learning method which is

used for finding the relationships between variables in the large database. It
determines the set of items that occurs together in the dataset. Association rule
makes marketing strategy more effective. Such as people who buy X item
(suppose a bread) are also tend to purchase Y (Butter/Jam) item. A typical
example of Association rule is Market Basket Analysis.

Unsupervised Learning algorithms:

Below is the list of some popular unsupervised learning algorithms:

• K-means clustering

• Hierarchal clustering

• Anomaly detection

• Neural Networks

• Principle Component Analysis

• Independent Component Analysis

• Apriori algorithm

• Singular value decomposition

Advantages of Unsupervised Learning

• Unsupervised learning is used for more complex tasks as compared to supervised

learning because, in unsupervised learning, we don't have labeled input data.
• Unsupervised learning is preferable as it is easy to get unlabeled data in
comparison to labeled data.

Disadvantages of Unsupervised Learning

• Unsupervised learning is intrinsically more difficult than supervised learning as it

does not have corresponding output.

• The result of the unsupervised learning algorithm might be less accurate as input
data is not labeled, and algorithms do not know the exact output in advance.

2.Supervised Machine Learning

Supervised learning is the types of machine learning in which machines are trained
using well "labelled" training data, and on basis of that data, machines predict the
output. The labelled data means some input data is already tagged with the correct
output.

In supervised learning, the training data provided to the machines work as the
supervisor that teaches the machines to predict the output correctly. It applies the
same concept as a student learns in the supervision of the teacher.

Supervised learning is a process of providing input data as well as correct output data
to the machine learning model. The aim of a supervised learning algorithm is to find
a mapping function to map the input variable(x) with the output variable(y).

In the real-world, supervised learning can be used for Risk Assessment, Image
classification, Fraud Detection, spam filtering, etc.

How Supervised Learning Works?

In supervised learning, models are trained using labelled dataset, where the model
learns about each type of data. Once the training process is completed, the model is
tested on the basis of test data (a subset of the training set), and then it predicts the
output.

Steps Involved in Supervised Learning:

• First Determine the type of training dataset

• Collect/Gather the labelled training data.

• Split the training dataset into training dataset, test dataset, and validation
dataset.

• Determine the input features of the training dataset, which should have enough
knowledge so that the model can accurately predict the output.

• Determine the suitable algorithm for the model, such as support vector machine,
decision tree, etc.

• Execute the algorithm on the training dataset. Sometimes we need validation sets
as the control parameters, which are the subset of training datasets.

• Evaluate the accuracy of the model by providing the test set. If the model predicts
the correct output, which means our model is accurate.

Types of supervised Machine learning Algorithms:

Supervised learning can be further divided into two types of problems:

1. Regression

Regression algorithms are used if there is a relationship between the input variable
and the output variable. It is used for the prediction of continuous variables, such as
Weather forecasting, Market Trends, etc. Below are some popular Regression
algorithms which come under supervised learning:

• Linear Regression

• Regression Trees

• Non-Linear Regression

• Bayesian Linear Regression

• Polynomial Regression

2. Classification
Classification algorithms are used when the output variable is categorical, which
means there are two classes such as Yes-No, Male-Female, True-false, etc.

Spam Filtering,

• Random Forest

• Decision Trees

• Logistic Regression

• Support vector Machines

Advantages of Supervised learning:

• With the help of supervised learning, the model can predict the output on the
basis of prior experiences.

• In supervised learning, we can have an exact idea about the classes of objects.

• Supervised learning model helps us to solve various real-world problems such

as fraud detection, spam filtering, etc.

Disadvantages of supervised learning:

• Supervised learning models are not suitable for handling the complex tasks.

• Supervised learning cannot predict the correct output if the test data is different
from the training dataset.

• Training required lots of computation times.

• In supervised learning, we need enough knowledge about the classes of object.

supervised machine learning algorithms

1. Decision Trees

2. Naive Bayes Classification

3.KNN

Data Mining Techniques in CRM Inside Customer Segmentation 1st Edition Konstantinos Tsiptsis Instant Download
No ratings yet
Data Mining Techniques in CRM Inside Customer Segmentation 1st Edition Konstantinos Tsiptsis Instant Download
52 pages
A Comprehensive Review of The Pipe Sticking Mechanism in Oil Well Drilling Operations
No ratings yet
A Comprehensive Review of The Pipe Sticking Mechanism in Oil Well Drilling Operations
22 pages
Decoding Artificial Intelligence-X - (Lesson Plans) (Final)
No ratings yet
Decoding Artificial Intelligence-X - (Lesson Plans) (Final)
17 pages
ML Customer Segmentation
No ratings yet
ML Customer Segmentation
39 pages
Sree Narayana Gurukulam College of Engineering Patent Information Centre-Kerala Kerala State Council For Science Technology and Environment
No ratings yet
Sree Narayana Gurukulam College of Engineering Patent Information Centre-Kerala Kerala State Council For Science Technology and Environment
1 page
Literature Review On Deep Learning For The Segmentation of Seismic Images
No ratings yet
Literature Review On Deep Learning For The Segmentation of Seismic Images
20 pages
ML Unit-1
No ratings yet
ML Unit-1
43 pages
Improving Operations
No ratings yet
Improving Operations
77 pages
Data Analyst Interview Question and Answer
No ratings yet
Data Analyst Interview Question and Answer
51 pages
Part B Unit 2 AI Project Cycle
No ratings yet
Part B Unit 2 AI Project Cycle
25 pages
Old CV
No ratings yet
Old CV
4 pages
Remote Sensing Techniques: Mapping and Monitoring of Mangrove Ecosystem-A Review
No ratings yet
Remote Sensing Techniques: Mapping and Monitoring of Mangrove Ecosystem-A Review
22 pages
Artificial Intelligence in Healthcare Material
No ratings yet
Artificial Intelligence in Healthcare Material
29 pages
Unit 5 - KVR
No ratings yet
Unit 5 - KVR
41 pages
Fault Detection Based On Deep Learning For Digital VLSI Circuits
No ratings yet
Fault Detection Based On Deep Learning For Digital VLSI Circuits
10 pages
ai电影评论
100% (1)
ai电影评论
7 pages
Module IV - Machine Learning
No ratings yet
Module IV - Machine Learning
53 pages
Jntuk Machine Learning 3-2 Unit-1
No ratings yet
Jntuk Machine Learning 3-2 Unit-1
31 pages
PLAG 4.2 Final
No ratings yet
PLAG 4.2 Final
41 pages
Fabric Anomaly Detection Automation Process
No ratings yet
Fabric Anomaly Detection Automation Process
6 pages
6 - CSE3013 - Learning Systems
No ratings yet
6 - CSE3013 - Learning Systems
42 pages
Theobald O. Machine Learning With Python 2024
No ratings yet
Theobald O. Machine Learning With Python 2024
146 pages
Week 12 Intro To DS and ML
No ratings yet
Week 12 Intro To DS and ML
67 pages
What Is AI 1610590751
No ratings yet
What Is AI 1610590751
8 pages
Ai Unit 4
No ratings yet
Ai Unit 4
32 pages
Unit 1 Part 1
No ratings yet
Unit 1 Part 1
61 pages
Ml-Unit 1
No ratings yet
Ml-Unit 1
53 pages
Unit 3 ML
No ratings yet
Unit 3 ML
119 pages
AIML
No ratings yet
AIML
26 pages
Unit 3 and Unit 4 Notes - Data Science - III BCA 2
No ratings yet
Unit 3 and Unit 4 Notes - Data Science - III BCA 2
27 pages
FAM Unit5
No ratings yet
FAM Unit5
47 pages
Unit-5 Machine Learning
No ratings yet
Unit-5 Machine Learning
25 pages
Ai Unit 4
No ratings yet
Ai Unit 4
34 pages
Unit 1
No ratings yet
Unit 1
24 pages
UNIT II Deep Learning
No ratings yet
UNIT II Deep Learning
42 pages
Machine Learning - Part - 1
No ratings yet
Machine Learning - Part - 1
17 pages
ML Unit-1 Notes
No ratings yet
ML Unit-1 Notes
13 pages
Types of Machine Learning - Tpoint Tech
No ratings yet
Types of Machine Learning - Tpoint Tech
10 pages
Anomaly Detection 2
No ratings yet
Anomaly Detection 2
8 pages
The Fundamentals of Machine Learning: Building Intelligent Systems from Data
From Everand
The Fundamentals of Machine Learning: Building Intelligent Systems from Data
Ethan Bennett
No ratings yet
Session 3 Types of Machine Learning
No ratings yet
Session 3 Types of Machine Learning
22 pages
AI Assignment 2
No ratings yet
AI Assignment 2
5 pages
Data Science and Machine Learning A Self-Study
No ratings yet
Data Science and Machine Learning A Self-Study
1 page
Make-A-Video - Text-to-Video Generation Without Text-Video Data - 2209.14792
No ratings yet
Make-A-Video - Text-to-Video Generation Without Text-Video Data - 2209.14792
13 pages
ML Unit 1
No ratings yet
ML Unit 1
42 pages
AI Unit4 Learning Dd83e0ee 7d19 48c7 Bc5d B39decf3b0fc
No ratings yet
AI Unit4 Learning Dd83e0ee 7d19 48c7 Bc5d B39decf3b0fc
19 pages
Ad8552 ML Unit I
No ratings yet
Ad8552 ML Unit I
31 pages
Supervised Unsupervised Reinforcement
No ratings yet
Supervised Unsupervised Reinforcement
39 pages
Deep Learning
No ratings yet
Deep Learning
9 pages
Advances and Opportunities in Process Data Analytics. - 1
No ratings yet
Advances and Opportunities in Process Data Analytics. - 1
9 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
8 pages
Types of Machine Learning
No ratings yet
Types of Machine Learning
14 pages
ML Unit 1
No ratings yet
ML Unit 1
42 pages
Unit 1
No ratings yet
Unit 1
19 pages
Unit3-Important Topics Related To Neural Network
No ratings yet
Unit3-Important Topics Related To Neural Network
10 pages
Learning Algorithms
No ratings yet
Learning Algorithms
28 pages
AI (Part II)
No ratings yet
AI (Part II)
11 pages
BDAunit 5
No ratings yet
BDAunit 5
26 pages
BDA Unit-5
No ratings yet
BDA Unit-5
26 pages
Evolution of Machine Learning
No ratings yet
Evolution of Machine Learning
7 pages
Chapter Five
No ratings yet
Chapter Five
178 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
17 pages
Cycle 1 Only 10 Programs
No ratings yet
Cycle 1 Only 10 Programs
1 page
ML Doc1
No ratings yet
ML Doc1
14 pages
Easy Version Control With Git
No ratings yet
Easy Version Control With Git
18 pages
Lab Facilities: Department of MCA
No ratings yet
Lab Facilities: Department of MCA
2 pages
Data Science S (2 Files Merged)
No ratings yet
Data Science S (2 Files Merged)
30 pages
Campus Code of Conduct VJEC
No ratings yet
Campus Code of Conduct VJEC
4 pages
What Is A Survey Paper
No ratings yet
What Is A Survey Paper
16 pages
AIML Unit 2 Introduction To Machine Learning
No ratings yet
AIML Unit 2 Introduction To Machine Learning
32 pages
Certificate of Merit Mail Merged
No ratings yet
Certificate of Merit Mail Merged
1 page
Machine L
No ratings yet
Machine L
29 pages
Machine Learning
No ratings yet
Machine Learning
35 pages
Module 1
No ratings yet
Module 1
122 pages
ML Type
No ratings yet
ML Type
13 pages
Flex Error Codes
No ratings yet
Flex Error Codes
9 pages
Android-Terminal-Emulator Wiki GitHub
No ratings yet
Android-Terminal-Emulator Wiki GitHub
8 pages
2 ML
No ratings yet
2 ML
9 pages
Data Science Lab-KTU
No ratings yet
Data Science Lab-KTU
5 pages
UNit 1 Introduction To ML
No ratings yet
UNit 1 Introduction To ML
225 pages
MLT Unit 1
No ratings yet
MLT Unit 1
15 pages
Business Data Mining Week 5
No ratings yet
Business Data Mining Week 5
19 pages
Unit 1
No ratings yet
Unit 1
19 pages
2014-Mahatma Gandhi University Examination Calendar
No ratings yet
2014-Mahatma Gandhi University Examination Calendar
1 page
Linux 1st Internal
No ratings yet
Linux 1st Internal
4 pages
Machine Learning Is The Branch of
No ratings yet
Machine Learning Is The Branch of
12 pages
FDS Assignment
No ratings yet
FDS Assignment
76 pages
Machine Learning Unit-I
No ratings yet
Machine Learning Unit-I
41 pages
Ann Unit 2
No ratings yet
Ann Unit 2
21 pages
Unit 1
No ratings yet
Unit 1
52 pages
Module 1 PPT
No ratings yet
Module 1 PPT
122 pages
Machine Learning Types
No ratings yet
Machine Learning Types
30 pages
ML Unit 1
No ratings yet
ML Unit 1
19 pages
Chapter 3notes
No ratings yet
Chapter 3notes
46 pages
Machine Learning For Beginners Overview of Algorithm TypesStart Learning Machine Learning From Here
No ratings yet
Machine Learning For Beginners Overview of Algorithm TypesStart Learning Machine Learning From Here
13 pages
Chapter-2-Fundamentals of Machine Learning
No ratings yet
Chapter-2-Fundamentals of Machine Learning
23 pages
Unit 1 PDF
No ratings yet
Unit 1 PDF
135 pages
Machine Learning - Data
No ratings yet
Machine Learning - Data
11 pages
Machine Learning and Web Scraping Lesson02
No ratings yet
Machine Learning and Web Scraping Lesson02
29 pages
Python UNIT-5
100% (1)
Python UNIT-5
67 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
5 pages
Machine Learning A Basic Approach
No ratings yet
Machine Learning A Basic Approach
9 pages

DS&ML 1

Uploaded by

DS&ML 1

Uploaded by

DATA SCIENCE & MACHINE LEARNING (part - 1)

techworldthink • February 02, 2022

1. What is data science?

Data Science is about data gathering, analysis and decision-making.

By using Data Science, companies are able to make:

• Predictive analysis (what will happen next?)

• Pattern discoveries (find pattern, or maybe hidden information in the data)

A Data Scientist requires expertise in several backgrounds:

2. Explain the different types of data.

Numerical data can be characterized by continuous or discrete data. Continuous data

Categorical data represents characteristics, such as a hockey player’s position, team,

3.Time Series Data

As the name suggests, unsupervised learning is a machine learning technique in

Unsupervised learning cannot be directly applied to a regression or classification

Why use Unsupervised Learning?

• Unsupervised learning is much similar as a human learns to think by their own

• Unsupervised learning works on unlabeled and uncategorized data which make

• Association: An association rule is an unsupervised learning method which is

Unsupervised Learning algorithms:

Below is the list of some popular unsupervised learning algorithms:

• Principle Component Analysis

• Independent Component Analysis

• Singular value decomposition

Advantages of Unsupervised Learning

• Unsupervised learning is used for more complex tasks as compared to supervised

Disadvantages of Unsupervised Learning

• Unsupervised learning is intrinsically more difficult than supervised learning as it

2.Supervised Machine Learning

How Supervised Learning Works?

Steps Involved in Supervised Learning:

• Collect/Gather the labelled training data.

Types of supervised Machine learning Algorithms:

Supervised learning can be further divided into two types of problems:

• Bayesian Linear Regression

• Support vector Machines

Advantages of Supervised learning:

• Supervised learning model helps us to solve various real-world problems such

Disadvantages of supervised learning:

• Training required lots of computation times.

• In supervised learning, we need enough knowledge about the classes of object.

supervised machine learning algorithms

2. Naive Bayes Classification

You might also like