0% found this document useful (0 votes)

56 views

Datascience Notes

Data science deals with extracting knowledge and insights from large amounts of data using techniques like data preparation, statistics, predictive modeling, and machine learning. Its findings can be applied across sectors like healthcare, education, and travel. Machine learning is a subset of artificial intelligence that uses algorithms to analyze patterns in data and make data-driven decisions without human interaction. It allows companies to unlock value from corporate and customer data to gain insights and stay ahead of competitors. Common mistakes in data preparation include not having complete or unbiased historical data, building models with insufficient data, and not properly cleaning data by removing outliers and duplicates.

Uploaded by

PGNSeetha

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views

Datascience Notes

Uploaded by

PGNSeetha

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

Why Data Science?

An interdisciplinary field, data science deals with processes and systems, that are used to
extract knowledge or insights from large amounts of data.
By using data preparation, statistics, predictive modeling and machine learning, data science
tries to resolve many issues within individual sectors and the economy at large.
To understand customers in a personalized manner
Its findings and results can be applied to almost any sector like travel, healthcare and education
among others.
Why machine learning?
A subset of artificial intelligence (AI), machine learning (ML) is the area of computational
science that focuses on analyzing and interpreting patterns and structures in data to enable
learning, reasoning, and decision making outside of human interaction. Simply put, machine
learning allows the user to feed a computer algorithm an immense amount of data and have
the computer analyze and make data-driven recommendations and decisions based on only the
input data. If any corrections are identified, the algorithm can incorporate that information to
improve its future decision making.
Data is the lifeblood of all business. Data-driven decisions increasingly make the difference
between keeping up with competition or falling further behind. Machine learning can be the
key to unlocking the value of corporate and customer data and enacting decisions that keep a
company ahead of the competition.

 The heavily hyped, self-driving Google car? The essence of machine learning.

 Online recommendation offers such as those from Amazon and Netflix? Machine learning
applications for everyday life.

 Knowing what customers are saying about you on Twitter? Machine learning combined with
linguistic rule creation.

 Fraud detection? One of the more obvious, important uses in our world today.

Data Preparation Process

Step-1- Selection

 What is the extent of the data you have available?

 What data is not available that you wish you had available?
 What data don’t you need to address the problem?
Step-2-Preprocess Data

 Formatting
 Cleaning
 Sampling
Step-3-Transform Data

 Scaling
 Decomposition
 Aggregation
Common mistakes in data cleaning process

 Historical data not available accurately: This is a common system constraint in Organizations
where there is no warehousing in place or in case when base systems overwrites data there
by erasing historical information.
 Data collected only for positive outcomes:
 Absence of non-biased data set:
 Including data from a period which is no longer valid
 Variables which can change because of change in customer behavior
 Building model on thin data
 Not removing outlier
 Not removing duplicates
 Not treating zero, null and special values carefully
 Adding ID as a variable
 Not being hypothesis driven in creating calculated / transformed variables
 Not spending enough time thinking about transformations

Types of Computer
57% (7)
Types of Computer
20 pages
Evolution of Machine Learning
No ratings yet
Evolution of Machine Learning
7 pages
Unit 5 BDTT
No ratings yet
Unit 5 BDTT
19 pages
Data Mining
No ratings yet
Data Mining
12 pages
Machine Learning
No ratings yet
Machine Learning
12 pages
Tauseef Sharif - Bda
No ratings yet
Tauseef Sharif - Bda
4 pages
Data Science Ppt1 Update
No ratings yet
Data Science Ppt1 Update
67 pages
introduction to data science
No ratings yet
introduction to data science
8 pages
Data Science Material
No ratings yet
Data Science Material
48 pages
Life Cycle of Data Science - Complete Step-By-step Guide
No ratings yet
Life Cycle of Data Science - Complete Step-By-step Guide
3 pages
Class 10 Ai Notes
No ratings yet
Class 10 Ai Notes
8 pages
ACCT 315 Data Analytics
No ratings yet
ACCT 315 Data Analytics
5 pages
Machine Learning: Abstract
No ratings yet
Machine Learning: Abstract
11 pages
Data Science Lifecycle
No ratings yet
Data Science Lifecycle
3 pages
artificial-intelligence-and-machine-learning-for-business
No ratings yet
artificial-intelligence-and-machine-learning-for-business
22 pages
Data Science vs. Big Data vs. Data Analytics
No ratings yet
Data Science vs. Big Data vs. Data Analytics
7 pages
imp
No ratings yet
imp
63 pages
Data Mining1
No ratings yet
Data Mining1
37 pages
Unit .1
No ratings yet
Unit .1
7 pages
Introduction To Big Data Unit - 2
No ratings yet
Introduction To Big Data Unit - 2
75 pages
Assgnment Friday
No ratings yet
Assgnment Friday
7 pages
Data Mining
No ratings yet
Data Mining
89 pages
(DSBDA) Unit 1 Introduction To Data Science
No ratings yet
(DSBDA) Unit 1 Introduction To Data Science
14 pages
e4f1fb7f-a61e-4090-9018-344695f0d7d4 (2)
No ratings yet
e4f1fb7f-a61e-4090-9018-344695f0d7d4 (2)
30 pages
Module 1
No ratings yet
Module 1
35 pages
Unit 2 (DWDM)
No ratings yet
Unit 2 (DWDM)
40 pages
Data Mining-Introduction
No ratings yet
Data Mining-Introduction
8 pages
Introduction To Data Science and Big Data
No ratings yet
Introduction To Data Science and Big Data
6 pages
BDL4
No ratings yet
BDL4
4 pages
Mit401 Unit 08-Slm
No ratings yet
Mit401 Unit 08-Slm
13 pages
Big Data: Characteristics
No ratings yet
Big Data: Characteristics
4 pages
Management Information Systems 1,2
No ratings yet
Management Information Systems 1,2
5 pages
Mcs 052 2018-19
No ratings yet
Mcs 052 2018-19
30 pages
Document
No ratings yet
Document
5 pages
Honey - Bda Assignment
No ratings yet
Honey - Bda Assignment
4 pages
Big Data Analytics
No ratings yet
Big Data Analytics
6 pages
lecture_1
No ratings yet
lecture_1
14 pages
(IJIT-V7I5P2) :yew Kee Wong
No ratings yet
(IJIT-V7I5P2) :yew Kee Wong
6 pages
Data Analytics
No ratings yet
Data Analytics
10 pages
01 Unit1
No ratings yet
01 Unit1
13 pages
Data Science
No ratings yet
Data Science
10 pages
Project Work 1
No ratings yet
Project Work 1
12 pages
Unit 3
No ratings yet
Unit 3
22 pages
Assignment #2 Course Code 8510 Course Name Business Research Methods Topic Number 9 Topic Special Data Problems and Their Solutions
No ratings yet
Assignment #2 Course Code 8510 Course Name Business Research Methods Topic Number 9 Topic Special Data Problems and Their Solutions
13 pages
Basic Data Science
No ratings yet
Basic Data Science
2 pages
Big Data Analytics
No ratings yet
Big Data Analytics
83 pages
Big Data Analytics
No ratings yet
Big Data Analytics
73 pages
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
From Everand
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
Steven Vollmer
No ratings yet
ISM
No ratings yet
ISM
83 pages
Data Mining
No ratings yet
Data Mining
6 pages
Unit 1 Data Science_055727
No ratings yet
Unit 1 Data Science_055727
7 pages
Fundamentals of Data Science unit 1
No ratings yet
Fundamentals of Data Science unit 1
33 pages
Notes Data Science With Python 1
No ratings yet
Notes Data Science With Python 1
18 pages
A Comparative Analysis of Predictive Modeling, Data Mining, and Machine Learning
No ratings yet
A Comparative Analysis of Predictive Modeling, Data Mining, and Machine Learning
11 pages
BDA UNIT-1 NOTES
No ratings yet
BDA UNIT-1 NOTES
10 pages
Machine Learning and Deep Learning Techn
No ratings yet
Machine Learning and Deep Learning Techn
9 pages
pg1fxvCFKW
No ratings yet
pg1fxvCFKW
4 pages
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
What Is Data Mining
No ratings yet
What Is Data Mining
8 pages
Lps Week 16 Iatb
No ratings yet
Lps Week 16 Iatb
5 pages
Predictive Analytics
No ratings yet
Predictive Analytics
28 pages
Green Kenue Tutorial: Creating WATFLOOD MAP File
No ratings yet
Green Kenue Tutorial: Creating WATFLOOD MAP File
13 pages
r57 PHP
No ratings yet
r57 PHP
46 pages
Automation Panel 9 D Series
No ratings yet
Automation Panel 9 D Series
158 pages
Takrim Poster Template Size A0
No ratings yet
Takrim Poster Template Size A0
1 page
4071_CANBUS Gateway AS_138-129
No ratings yet
4071_CANBUS Gateway AS_138-129
48 pages
Single Phase Smart Meter Using DLMS/COSEM Application Data
No ratings yet
Single Phase Smart Meter Using DLMS/COSEM Application Data
2 pages
How To Retrieve Data From A SQLite Database in Android - Anu's Crazy World
No ratings yet
How To Retrieve Data From A SQLite Database in Android - Anu's Crazy World
25 pages
Data Compression Algorithms and Their Applications
100% (1)
Data Compression Algorithms and Their Applications
14 pages
Leet
No ratings yet
Leet
2 pages
ATN 910&910I&910B&950B V200R003C10 Emergency Maintenance 02 (CLI)
No ratings yet
ATN 910&910I&910B&950B V200R003C10 Emergency Maintenance 02 (CLI)
39 pages
Department of Computer Science and Engineering: NBA Subject Code Course Outcomes
No ratings yet
Department of Computer Science and Engineering: NBA Subject Code Course Outcomes
2 pages
ECDIS Legal Aspects NumericChart ENC RNC
No ratings yet
ECDIS Legal Aspects NumericChart ENC RNC
57 pages
DNR-202L A1 Seutp Wizard - Win Release Notes: Content
No ratings yet
DNR-202L A1 Seutp Wizard - Win Release Notes: Content
4 pages
MSI PE60 2QUE Laptop Manual
No ratings yet
MSI PE60 2QUE Laptop Manual
54 pages
Splunk Fundamentals 1 Lab Exercises: Lab Module 6 - Using Fields in Searches
No ratings yet
Splunk Fundamentals 1 Lab Exercises: Lab Module 6 - Using Fields in Searches
3 pages
2000+ Udemy Coupons 100% Off
No ratings yet
2000+ Udemy Coupons 100% Off
79 pages
Sketchchair: Greg Saul JST Erator Design Ui Group
No ratings yet
Sketchchair: Greg Saul JST Erator Design Ui Group
7 pages
On August 19
No ratings yet
On August 19
9 pages
Oracle 12.1.x and 11.2.0.4 Database Performance Considerations With AIX On POWER8 01.05.2016
No ratings yet
Oracle 12.1.x and 11.2.0.4 Database Performance Considerations With AIX On POWER8 01.05.2016
26 pages
Help Desk Tier 2 Responsibilities Resume
100% (1)
Help Desk Tier 2 Responsibilities Resume
8 pages
Control Systems Lab 02
No ratings yet
Control Systems Lab 02
31 pages
3.1-Hardware-Storage-Devices-Notes-by-EMK
No ratings yet
3.1-Hardware-Storage-Devices-Notes-by-EMK
9 pages
Manual de ERDAS
0% (1)
Manual de ERDAS
206 pages
Booting For Hiren
No ratings yet
Booting For Hiren
2 pages
Julia
100% (1)
Julia
30 pages
Time Charateristics and Current Member Variables: Symptom
No ratings yet
Time Charateristics and Current Member Variables: Symptom
3 pages
Chapter 5-Esecurity
No ratings yet
Chapter 5-Esecurity
15 pages
PAM For Informatica Platform v10.2
No ratings yet
PAM For Informatica Platform v10.2
169 pages
Introduction To Computer Questions and Answers PDF - 1
No ratings yet
Introduction To Computer Questions and Answers PDF - 1
15 pages

Datascience Notes

Uploaded by

Datascience Notes

Uploaded by

Why Data Science?

Data Preparation Process

 What is the extent of the data you have available?

You might also like