0% found this document useful (0 votes)
8 views10 pages

DA106 Week1 Material

The document provides an overview of data science, highlighting its multidisciplinary nature and the relationship with machine learning. It covers the data science workflow, applications in business and social good, and the importance of data categorization and generation. Additionally, it outlines the roles within data science and the distinctions between structured and unstructured data, as well as quantitative and qualitative data.

Uploaded by

siddarthmodi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views10 pages

DA106 Week1 Material

The document provides an overview of data science, highlighting its multidisciplinary nature and the relationship with machine learning. It covers the data science workflow, applications in business and social good, and the importance of data categorization and generation. Additionally, it outlines the roles within data science and the distinctions between structured and unstructured data, as well as quantitative and qualitative data.

Uploaded by

siddarthmodi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Learning Objectives What is Data Science?

o Introduction to data science Data science is the field of study that combines
o Domain expertise
o Relationship between data science and
o Programming skills
artificial intelligence o Knowledge of math and statistics
o Comprehend the process of data to extract meaningful insights from data

collection and generation


o Learn about various data categorization

What is Data Science? What is Data Science?


Multidisciplinary field! Data science practitioners apply
machine learning algorithms to
data such as numbers, text, images, video,
audio to
develop systems to perform tasks which
ordinarily require human intelligence.
Data Science and Machine Learning Data Science and Machine Learning
Amount of data is growing exponentially due “machine learning”
to collection and storage of digital data
● Computers automatically detect patterns
and make predictions or decisions from
data
● Learn from data without relying on a
predetermined mathematical model.
● Is a subset of Artificial Intelligence (AI).

Data Science and Machine Learning Data Science and Machine Learning
“machine learning” Machine learning systems
generate insights
● Computers automatically detect patterns
and make predictions or decisions from that analysts and business users
data translate into tangible
business value.
● Learn from data without relying on a
predetermined mathematical model.
● Is a subset of Artificial Intelligence (AI). Solving useful problems,
otherwise useless
Data Science and Machine Learning Applications of Data Science
Actual data science workflow can BUSINESS
be complex! ● Help businesses to increase business value of
its available data for competitive advantage
against their competitors.
● Understand customers better
● Take better decisions

Applications of Data Science Example Applications


SOCIAL GOOD Credit Card Fraud Detection

Applications in: ● Formulate a supervised model to


○ Agriculture categorize it into either fraud or no
○ Education fraud.
○ Disaster management
○ Environment ● Ideally, you would have a good
○ Transportation etc. quantity of examples.
Example Applications Example Applications
Credit Card Fraud Detection Features
Customer Segmentation
● monetary amount
● frequency ● Use of an unsupervised model
● Formulate a supervised model to
● place
categorize it into either fraud or no ● Find patterns about somebody
● period
fraud. who buys specific products.
● transaction information
● transaction class ● Build a targeted marketing
● Ideally, you would have a good campaign for these consumers.
quantity of examples. Labels
● fraud
● no fraud

Example Applications Roles in Data Science


Features
Customer Segmentation
● products purchased
● Use of an unsupervised model ● location
● Find patterns about somebody ● spending rate “Data Scientist is a person who is better at statistics than any
who buys specific products. ● product manufacturers programmer and better at programming than any
● education statistician.”
● Build a targeted marketing ● income
campaign for these consumers. ● age
-Josh Wills, Head of Data Engineering at Slack
Labels
● None
Roles in Data Science Roles in Data Science

Recap: What is Data Science? Recap: What is Data Science?

Data Analysis &


!Red flags:
Problem Formulation Presentation
Modeling
o Taking shortcuts: not spending
Data Collection &
Insight/Prediction
enough time with data and problem
Processing
but jumping to modeling
o Mindless use of ML tools
Skills: Tools: o Ethical breach
● Domain expertise ● Machine learning o New learners are mostly susceptible…
● Programming skills ● Statistical tools
● Mathematical foundation ● R/Python libraries

An overly simplified block diagram


Recap: What is Data Science? Data Generation
Source of data
Mindset:
○ Need to capture information of a
o Insights that are significant to the
problem physical/digital activities
o Figuring out the non-obvious! ○ E.g., Sales, customer feedback, social
o Data driven scientific mindset media posts, speech, temperature,
o Spend lot of time with exploring data body movement etc.
○ As you can see “everything under
the sun” and “flowing through the
internet”

Data Generation Data Gathering

Data gathering
○ Collected using sensors or manual labour or
mining data from web
○ Digital data collected by sensors
○ Manual annotation and data entry to
computer from physical documents
○ Extract data from internet using scripts
Data Generation Data Categories

The resulted data is raw data ● Structured versus unstructured data


○ With specific formats (.raw/..mp4 ● Quantitative versus qualitative data
for video, .wav for speech, .csv for
tabular data etc.)
○ Not clean and suitable for analysis
○ Need to clean: an important step in
data science workflow

Structured and Unstructured Data Structured and Unstructured Data


● Structured (organized) data: E.g.,
○ This is data that can be thought of as observations and
● Most data that exists in text form, including server logs and
characteristics. It is usually organized using a table method
Facebook posts, is unstructured
(rows and columns).
● Scientific observations, as recorded by careful scientists, are kept in
● Unstructured (unorganized) data: a very neat and organized (structured) format
○ This data exists as a free entity and does not follow any
standard organization hierarchy.
Structured and Unstructured Data Structured and Unstructured Data
● Data scientist likely prefers structured data
but they also must be able to deal with the
world's massive amounts of unstructured
data.
● If 80-90% of the world's data is unstructured
● Data pre-processing is used to apply
transformations to convert unstructured data
into a structured counterpart
Source: lawtomated.com

Structured and Unstructured Data Quantitative and Qualitative

Data pre-processing is used to apply transformations to convert ● Quantitative data: This data can be described
using numbers, and basic mathematical
unstructured data into a structured counterpart.
procedures, including addition, are possible on
the set.
Key characteristics:
“This Wednesday morn, are you early to rise? Then look East. The ● Qualitative data: This data cannot be
Crescent Moon joins Venus & Saturn. Afloat in the dawn skies.”
described using numbers and basic
mathematics. This data is generally thought of
as being described using natural categories
and language.
Quantitative and Qualitative Quantitative and Qualitative

● Name of a coffee shop: Qualitative


● Revenue: Quantitative
● Zip code: Qualitative
● Average monthly customers: Quantitative
● Country of coffee origin: Qualitative

Quantitative and Qualitative Quantitative and Qualitative


For a quantitative column, you may ask For a qualitative column, none of the
questions such as the following: preceding questions can be answered.
o What is the average value? However, the following questions only
o Does this quantity increase or apply to qualitative values:
decrease over time (if time is a factor)? o Which value occurs the most and the
o Is there a threshold where if this least?
number became too high or too low, it
o How many unique values are there?
would signal trouble for the company?
o What are these unique values?
Summary of the module
o What is data science?
o Data science vs. machine learning
o Example applications
o Roles in data science
o Data generation
o Categories of data

You might also like