0% found this document useful (0 votes)
109 views57 pages

FDS Unit1 Part1

Uploaded by

akarshnaik690
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views57 pages

FDS Unit1 Part1

Uploaded by

akarshnaik690
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Foundations of Data

Science
Course Code - 23DS3PCFDS
PREPARED BY - LAKSHMI SHREE K
AI & DS
Overview of the Course

Semester III
Course Foundations of Data Science
Title:
Course 23DS3PCFDS Total Contact Hours: 40 hours
Code:
L-T-P: 3-0-0 Total Credits: 3

DEPT OF AI & DS
UNIT I
• Introduction to Data Science: Describing Data science, The data science
Venn diagram, Python for Data Science, Data science case studies
• Types of Data: structured versus unstructured data, quantitative versus
qualitative data, the four levels of data: nominal, ordinal, interval and ratio
• Total information awareness, Bonferroni’s Principle, Rhine’s paradox.
• The Data Science Process: Overview, defining research goals, retrieving
data, Cleansing, integrating and transforming data, exploratory data analysis,
Build the models, Presenting findings. Data Analytics Lifecycle.
DEPT OF AI & DS
UNIT II
• Statistics & Probability: Statistics, Obtaining data, Sampling Data,
Statistical measures, empirical rule. Points estimates, Sampling distributions,
Confidence intervals, Hypothesis Tests: Conducting a hypothesis test, one
sample t-tests, Type I and type II errors, Hypothesis testing for categorical
variables

• Information Gain & Entropy, Probability Theory, Probability Types,


Probability Distribution Functions, Bayes’ Theorem, Inferential Statistics

DEPT OF AI & DS
UNIT III
• Correlation Analysis: Types of correlation, correlation coefficient.
• Regression Analysis: Linear Regression: Simple Linear Regression,
Multilinear Regression, p-values, Logistic Regression, Multinomial logistic
regression, Time-Series Model, Receiver Operating Characteristic

DEPT OF AI & DS
UNIT IV
• Dealing with missing data: single and multiple data imputation, Entropy based techniques, Monte Carlo and
MCMC simulations;
• Correcting inconsistent data: Deduplication, Entity resolution, Pairwise Matching; Fellegi-Sunter Model

• Dimensionality Reduction: Eigenvalues and Eigenvectors of Symmetric Matrices: Definitions, Computing


Eigenvalues and Eigenvectors, Finding Eigen pairs by Power Iteration, Eigenvector matrix
• Principal-Component Analysis: Example, Using Eigenvectors for Dimensionality Reduction, The matrix of
distances
• Singular-Value Decomposition: Definition, interpretation, Dimensionality Reduction Using SVD, Why
Zeroing Low Singular Values Works, Querying Using Concepts, Computing the SVD of a Matrix

DEPT OF AI & DS
UNIT V
• Data Analytics on Text: Major Text Mining Areas – Information Retrieval
– Data Mining – Natural Language Processing NLP) – Text analytics tasks:
Cleaning and Parsing, Searching, Retrieval, Text Mining, Part-of-Speech
Tagging, Stemming, Text Analytics Pipeline. NLP: Major components of
NLP, stages of NLP, and NLP applications.

DEPT OF AI & DS
Prescribed Text Book

Publi
Sl. No. Book Title Authors Edition Year
sher

Sinan Qzdemir, Sunil Kakade &


1. Principles of Data Science Second Edition Packt 2018
Macro Tibaldeschi

Sanjeev Wagh, Manisha Bhende,


CRC
2. Fundamentals of Data Science Anuradha First Edition 2022
Press
Thakare,

Introducing Data Science: Big Data, Machine Davy Cielen, Arno D.B. Manni
3. - 2016
Learning, and More Meysman, Mohamed Ali ng
Reference Text Book

Sl. No. Book Title Authors Edition Publisher Year

1. Doing Data Science Rachel Schutt, Cathy O’Neil - O’Reilly 2014

Jure Leskovec, Anand Dreamtech


2. Mining Massive Datasets 2nd 2016
Rajaraman, Jeffrey D Ullman Press
DEPT OF AI & DS
E-Book
Sl. Book
Authors Edition Publisher Year URL
No. Title

Data Dirk P. Kroese,


Science & ZdravkoI Botev, University of https://people.smp.uq.edu.au/DirkKroes
1. - 2023
Machine ThomasTaimre, Queensland e/DSML/DSML.pdf
Learning Radislav Vaisman

Becoming https://32net.id/bukaheula/share/QP2cf2
Alex J. Gutman
2. a Data - Wiley 2021 JLdeOPn00y3Nyu8aXHp1Slq1bc6P4Y
Jordan Goldmeier
Head cuI4.pdf

MOOC Course
Sl. Course
Course name Year URL
No. Offered By
IBM Data Science https://www.coursera.org/professional-certificates/ibm-data-scien
1. Coursera 2023
ce
Foundations of Data
2. DEPT OF AI & DS SWAYAM 2023 https://onlinecourses.swayam2.ac.in/imb23_mg64/preview
Science
Program Outcomes
PO1: Science and engineering PO7: Environment and Society
Knowledge PO8: Ethics
PO2: Problem Analysis PO9: Individual & Team Work
PO3: Design & Development PO10: Communication
PO4: Investigations of Complex PO11: Project Mgmt. & Finance
Problems PO12: Lifelong Learning
PO5: Modern Tool Usage
PO6: Engineer & Society
DEPT OF AI & DS
Course Outcomes
At the end of the course the student will be able to

CO1: Gain fundamental knowledge on data science


CO2: Analyze and visualize data for knowledge representation.
CO3: Demonstrate proficiency in data analysis.
CO4: Conduct experiments to demonstrate the use of various data science tools

DEPT OF AI & DS
Overview of UNIT 1
• Introduction to Data Science:
• Describing Data science
• The data science Venn diagram
• Python for Data Science
• Data science case studies

• Types of Data:
• structured versus unstructured data
• quantitative versus qualitative data
• the four levels of data: nominal, ordinal, interval and ratio
• Total information awareness, Bonferroni’s Principle, Rhine’s paradox.

• The Data Science Process:


• Overview
• Defining research goals
• Retrieving data
• Cleansing, integrating and transforming data
• Exploratory data analysis
• Build the models
• Presenting findings
• AI Data
DEPT OF & DSAnalytics Lifecycle.
Overview of UNIT 1
PART 1
• Introduction to Data Science
• Describing Data science
• The data science Venn diagram
• Python for Data Science
• Data science case studies

DEPT OF AI & DS
Data
• Industrial Age to Information Age
• Estimates around 64 zettabytes
• Data is created when you send a message, tweet, like , share, create a MS word doc
and so on.
• SO much data!!!! -In every industry
• Data leaks
• Make sense of the data – Data Age!!!
• Create insights and sources of knowledge that every human can benefit from.
DEPT OF AI & DS
History of Data Storage
How much data is created?
BIG DATA
• Data generated on the internet per minute
History of Data Science
• The art of uncovering insights and trends in data has been around since
ancient times.
• The ancient Egyptians used census data to increase efficiency in tax
collection and accurately predicted the Nile River's flooding every year.
• People have continued to use data to derive insights and predict outcomes.

DEPT OF AI & DS
Types of Digital Data
Makings of a skilled Data Scientist
• A data scientist has to be one with a very curious mind, willing to spend significant time and effort to explore
her hunches.
• Curious, argumentative, judgmental.
• Mathematical Sciences(Linear Algebra, Probability, Statistics, Calculus)
• Subject area knowledge
• Experience programming and analysing data
• Storyteller
• Adept at selecting suitable tools
• Apply expertise to problem-solving
• Diverse Background
DEPT OF AI & DS
Drew Conway’s Data Science Venn Diagram

DEPT OF AI & DS
Data Scientist's role in an organization
Data Stories Insights

Clarify the problem Recognition


Data Collection Storytelling
Analysis Visualization
DEPT OF AI & DS
Data Science: The Sexiest Job in the 21st
Century
• Because the digital revolution has touched every aspect of our lives, the
opportunity to benefit from learning about our behaviors is more so now
than ever before.
• Given the right data, marketers can take sneak peeks into our habit
formation.
• Research in neurology and psychology is revealing how habits and
preferences are formed and retailers like Target are out to profit from it.

DEPT OF AI & DS
Introduction to Data Science
• Data is collection of information
• Organized data:
• Data is sorted into a row/column structure, where every row represents a
single observation and the columns represent the characteristics of that
observation.
• Unorganized data:
• Data is in free form, usually text, raw audio/video signals
DEPT OF AI & DS
Introduction to Data Science
• Data Science is all about how we take data, use it to acquire knowledge, and then use that
knowledge to do the following:
• Make decisions
• Predict the future
• Understand the past/present
• Create new industries/products

• Data Science is using data in order to gain new insights that you would otherwise have
missed.
DEPT OF AI & DS
Why Data Science?
• Parsing the huge volume of data in a reasonable time frame with previous
forms of analysis is difficult
• Data can be missing, incomplete or wrong
• Data on different scales making it tough to compare

• Analytics on generated data decisions over stick-to-your-gut decisions


DEPT OF AI & DS
The Data Science Venn Diagram

DEPT OF AI & DS
Python Practices

DEPT OF AI & DS
Basic Logical Operators

DEPT OF AI & DS
Example for Basic Python

Create a list and access the items

DEPT OF AI & DS
Example for Basic Python
Parsing of a Tweet

DEPT OF AI & DS
Overview of UNIT 1
PART 2
• Types of Data
• Structured versus unstructured data
• Quantitative versus qualitative data
• The four levels of data:
• Nominal
• Ordinal
• Interval and
• Ratio
• Total information awareness, Bonferroni’s Principle, Rhine’s paradox.
DEPT OF AI & DS
Types of Digital Data
Structured Data
• Data insert, delete, update and append.
• Indexing to enable faster data retrieval.
• Scalability which enables increasing or decreasing capacities and data processing
operations such as storing, processing and analytics.
• Transaction’s processing which follows ACID rules (Atomicity, Consistency,
Isolation and Durability).
• Encryption and Decryption for data security.
Semi-Structured Data
• Semi-structured data contain tags and other markers.
• Here the data does not conform and associate with formal data model structures
• Examples of semi-structured data are XML and JSON documents.
Unstructured Data
• Unstructured data is information that either does not organize in a pre-defined manner or not have a
pre-defined data model.
• Absolute raw form.
• does not possess data features such as a table or a database.
• Some examples of unstructured data
• Mobile data: Text messages, chat messages, tweets, blogs and comments.
• Website content data: YouTube videos, browsing data, e-payments,
user-generated maps.
• Social media data: Images and videos from Instagram, Facebook, LinkedIn,
Flickr (upload, access, organize, edit and share photos from any device from
anywhere in the world).
• Satellite images, atmospheric data, surveillance, traffic videos.
Structured versus unstructured data

• Structured versus unstructured data


• Structured ( organized) data – This type of data is observations and characteristics
which is organised using a table ( row or columns)
• Scientific Observations
• Unstructured (unorganized) data –This data exists as a free entity and does not follow
any standard organization hierarchy.
• Text form of data from server logs and Facebook posts
• Genetic sequence of chemical nucleotides (ACGTATTGCA)
DEPT OF AI & DS
Quantitative versus qualitative data

DEPT OF AI & DS
Quantitative versus qualitative data

• Name of the coffee shop – Qualitative data


• Revenue – Quantitative data
• Zip Code - Qualitative data
• Average Monthly customers - Quantitative data
• Country of coffee origin - Qualitative data

DEPT OF AI & DS
Quantitative versus qualitative data

DEPT OF AI & DS
Four Levels of Data
• Nominal Level
• Ordinal Level
• Interval Level
• Ratio Level

DEPT OF AI & DS
Four Levels of Data

DEPT OF AI & DS
Four Levels of Data – ASSESSMENT -1

DEPT OF AI & DS
Four Levels of Data – ASSESSMENT -2

DEPT OF AI & DS
DEPT OF AI & DS
DEPT OF AI & DS
Levels of Data - Nominal Level
• This type of data is described purely by name or category
• Example : gender, nationality
• Mathematical Operations are not allowed except equality and membership
• Being a tech entrepreneur is same as being in tech industry, but not the other
way around

DEPT OF AI & DS
Levels of Data - Ordinal Level
• Data provides a rank order
• Example : Likert Scale – An ordinal level scale
• A survey asks users to rate a restaurant on a scale from 1-10
• Mathematical operations allowed
• Ordering – Ex: Spectrum of visible light
• Comparison
DEPT OF AI & DS
Levels of Data - Ordinal Level
• Mathematical operations not allowed
• Subtract
• Addition
• Appropriate way to define center is median

DEPT OF AI & DS
Levels of Data - Ordinal Level
• Measures of center
• Median

DEPT OF AI & DS
Levels of Data - Ordinal Level

DEPT OF AI & DS
Levels of Data - Interval Level
• Allows complicated mathematical operations
• Ordinal does not support subtraction but Interval supports
• Median and mode can be used for calculating the center of the data

DEPT OF AI & DS
Levels of Data - Interval Level

DEPT OF AI & DS
Levels of Data - Interval Level

DEPT OF AI & DS
Levels of Data - Interval Level
Example – calculate the variation

DEPT OF AI & DS
Levels of Data - Ratio Level
• Mathematical operations supported are
• Addition and Subtraction
• Multiply and Divide
• Calculate the mean using geometric mean

DEPT OF AI & DS
Levels of Data - Ratio Level

DEPT OF AI & DS

You might also like