0% found this document useful (0 votes)
3 views30 pages

Module 1

The document outlines a Data Science course, detailing course outcomes, syllabus modules, and key concepts in data science, including data collection, management, and analysis. It emphasizes the importance of data science in various industries and its applications, such as personalized healthcare and optimizing food delivery. Additionally, it distinguishes between data science and machine learning, explaining their respective roles and components.

Uploaded by

shindetrupti1507
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views30 pages

Module 1

The document outlines a Data Science course, detailing course outcomes, syllabus modules, and key concepts in data science, including data collection, management, and analysis. It emphasizes the importance of data science in various industries and its applications, such as personalized healthcare and optimizing food delivery. Additionally, it distinguishes between data science and machine learning, explaining their respective roles and components.

Uploaded by

shindetrupti1507
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

OE-Data Science

6OE371
COs
Course Outcomes (CO) with Bloom’s Taxonomy Level

To acquaint core concepts and technologies in Data Science. Understanding


CO1
Demonstrate data collection and management using different Applying
CO2 technologies.
Study the key concepts in data science, including their real-world Applying
CO3 applications and toolkits used by data scientists.

Analyse and interpret large data sets in the context of real-world Analysing
CO4 problems.
Syllabus
Module Module Contents Hours
Module 1: Introduction to core concepts and technologies
I Introduction, Terminology, data science process, data science toolkit, Types of data, 4
Example applications
Module 2 Data Collection and Management
II Introduction, Sources of data, Data collection, Exploring and fixing data, Data storage 7
and management, Using multiple data sources.

Module 3 Data Pre-processing


III Data Cleaning, Data Integration, Data Reduction, Data Transformation and Data 8
Discretization.
Module 4 Data Visualization
Introduction, Types of data visualization, Data for visualization: Data types, Data
IV 6
encodings, Retinal variables, Mapping variables to encodings, visual encodings.

Module 5 Data Analysis


Introduction, Terminology and concepts, Introduction to statistics, Central tendencies
V and distributions, Variance, Distribution properties and arithmetic, Samples/CLT, 8
Correlation, Linear Regression, Least Squares, Residuals, Regression Inference,
classification, classifiers.
Module 6 Recent trends
Recent trends in various data collection and analysis techniques, various visualization
VI 6
techniques, Case Study, application development methods used in data science.
Introduction
• What is Data Science?
• Why Data Science?
• Components of DS
• Difference between ML and Data science
• Applications of Data Science.
What is Data Science?

• Deep study of the massive amount of data, which involves extracting


meaningful insights from raw, structured, and unstructured data that
is processed using the scientific method, different technologies, and
algorithms.
• Data science combines math and statistics, specialized programming,
advanced analytics, artificial intelligence (AI), and machine learning
with specific subject matter expertise to uncover actionable insights
hidden in an organization’s data.
What is Data Science?

• Data science is the study of data to extract meaningful insights for


business.
• Vast volumes of data using modern tools and techniques to find
unseen patterns, derive meaningful information, and make business
decisions.
Why Data Science?
• The purpose of data science is to find patterns.
• Data science enables businesses to interact with their customers.
• Products and businesses can better connect with their customers
when they use.
• Industries can quickly examine their problems and successfully
address them using data science.
• Depending on how data is used, can determine whether a product
succeeds or fails.
• Giving management and officials the ability to foster new ideas.
• An improved user experience.
Components of DS
Difference between DS and ML
Data Science Machine Learning
Data science is a field of computer science to extracts useful Machine Learning is a subset of Artificial Intelligence that
data from structured, unstructured, and semi-structured data. helps to make computers capable of predicting outcomes
based on training from old data/experience.

It primarily deals with data. Machine Learning uses data to learn from it and predict
insights or results.
Data in Data Science maybe or maybe not have evolved from It includes various technologies like supervised, unsupervised,
a machine or mechanical process. semi-supervised and reinforcement learning, regression,
clustering, etc.

It is broadly used as a multidisciplinary term. It is used in data science.

It includes various data operations such as cleaning, It includes operations such as data preparation, data analysis,
collection, manipulation, etc. training the model, etc.

It requires knowledge of various analytical functions and a It needs advanced knowledge of Data Modelling.
basic understanding of machine learning and Artificial
Intelligence.

It requires strong knowledge of Python, R, SAS, Scala, as well It requires knowledge of programming languages like Java,
as hands-on knowledge of SQL databases. Python, R as well as in-depth knowledge of mathematical
concepts such as probability and statistics.
Applications of Data Science
• Image recognition and speech recognition
• Gaming world
• Internet search
• Healthcare
• Recommendation systems
• Risk detection
Data Science Life Cycle / process
Data science toolkit
Types of Data
Qualitative data/Categorical
data
• Qualitative or Categorical Data describes the object under consideration
using a finite set of discrete classes.
• It means that this type of data can’t be counted or measured easily
using numbers and therefore divided into categories.
• The gender of a person (male, female, or others) is a good example of
this data type.
• These are usually extracted from audio, images, or text medium.
• Another example can be of a smartphone brand that provides
information about the current rating, the color of the phone, category
of the phone, and so on.
• All this information can be categorized as Qualitative data.
Nominal

• These are the set of values that don’t possess a natural ordering.
• e.g The color of a smartphone as we can’t compare one color with
others.
• It is not possible to state that ‘Red’ is greater than ‘Blue’.
• The gender of a person where we can’t differentiate between male,
female, or others.
• Nominal data types in statistics are not quantifiable and cannot be
measured through numerical units.
• Nominal types of statistical data are valuable while conducting
qualitative research as it extends freedom of opinion to subjects.
Ordinal
• These types of values have a natural ordering while maintaining their class of values.
• e.g If we consider the size of a clothing brand then we can easily sort them according
to their name tag in the order of small < medium < large.
• The grading system while marking candidates in a test can also be considered as an
ordinal data type where A+ is definitely better than B grade.
• These categories help us deciding which encoding strategy can be applied to which
type of data.
• Data encoding for Qualitative data is important because machine learning models
can’t handle these values directly and needed to be converted to numerical types as
the models are mathematical in nature.
• For nominal data type where there is no comparison among the categories, one-hot
encoding can be applied which is similar to binary coding considering there are in
less number and for the ordinal data type, label encoding can be applied which is a
form of integer encoding.
Quantitative Data Type

• This data type tries to quantify things and it does by considering


numerical values that make it countable in nature.
• The price of a smartphone, discount offered, number of ratings on a
product, the frequency of processor of a smartphone, or ram of that
particular phone, all these things fall under the category of
Quantitative data types.
• The key thing is that there can be an infinite number of values a
feature can take.
• For instance, the price of a smartphone can vary from x amount to any
value and it can be further broken down based on fractional values.
• Interval-scaled attributes
• Type of numerical attribute where the difference between two values
is meaningful.
• The term "interval scale" refers to an ordered series of numbers
where the difference between the values is consistent, but the zero
point is not truly meaningful.
• Interval variables can be added and subtracted, providing meaningful
results.
• For example, temperature, as measured in degrees Celsius or
Fahrenheit, is an interval variable.
• If it is 20 degrees today and 30 degrees tomorrow, it is correct to say
that tomorrow is 10 degrees hotter than today.
• Ratio-scaled attribute
• Ratio variables are a type of numerical attribute where the difference
between two values is meaningful and there is a true "zero" point,
which denotes the absence of the quantity.
• This zero point allows for the comparison of values through
multiplication or division, unlike interval-scaled attributes.
• Examples of ratio variables include age, salary, and height.
• In these examples, a value of 0 signifies the absence of the quantity: 0
years old means no age or not born yet, a salary of $0 means no
income, and a height of 0 cm signifies no height.
• If person A is 20 years old and person B is 40 years old, it's correct to
say that person B is twice as old as person A.
Discrete
• Discrete data is a type of numerical data that only takes specific or
'discrete' values and cannot be meaningfully subdivided into smaller
increments. This often corresponds to items or events that are countable.
• Examples of discrete data include:
• The number of pets a person has. You can have 2 dogs or 3 dogs, but it
doesn't make sense to have 2.7 dogs.
• The number of cars in a parking lot. You can have 10, 20, or 30 cars, but
not 22.5 cars.
• The number of students in a class. You can't have a fraction of a student.
Continuous
• Continuous data is a type of numerical data that can take on any value within
a certain range.
• Continuous data can be meaningfully subdivided into finer and finer
increments, depending on the precision of the measurement system.
• Examples of continuous data include:
• The height of people. You can be 170.18 cm or 170.19 cm tall, or any height in
between.
• The time it takes to run a marathon. It could be 3 hours, 45 minutes, 30.2
seconds, or 3 hours, 45 minutes, 30.3 seconds, or any time in between.
• The weight of a bag of apples. It could be 1.5 kg, 1.51 kg, 1.515 kg, and so on,
depending on how precise your scale is.
Exercise
Select the measurement scale Nominal, Ordinal, Interval or Ratio for
each scenario.
• A person’s age.
• A person’s race.
• Age groupings (baby, toddler, adolescent, teenager, adult, elderly).
• Clothing brand.
• A person’s IQ score.
• Temperature in degrees Celsius.
• The amount of mercury in a tuna fish.
Exercise
• Select the measurement scale Nominal, Ordinal, Interval or Ratio for each
scenario.
• Temperature in degrees Kelvin.
• Eye color.
• Year in school (freshman, sophomore, junior, senior).
• The weight of a hummingbird.
• The height of a building.
• The amount of iron in a person’s blood.
• A person’s gender.
• A person’s race.
Exercise
• State which type of variable each is, qualitative or quantitative?
• A person’s age.
• A person’s gender.
• The amount of mercury in a tuna fish.
• The weight of an elephant.
• Temperature in degrees Fahrenheit.
• State which type of variable each is, qualitative or quantitative?
• The height of a giraffe.
• A person’s race.
• Hair color.
• A person’s ethnicity.
• Year in school (freshman, sophomore, junior, senior).
Exercise
• State whether the variable is discrete or continuous.
• A person’s weight.
• The height of a building.
• A person’s age.
• The number of floors of a skyscraper.
• The number of clothing items available for purchase.
• State whether the variable is discrete or continuous.
• Temperature in degrees Celsius.
• The number of cars for sale at a car dealership.
• The time it takes to run a marathon.
• The amount of mercury in a tuna fish.
• The weight of a hummingbird.
Real life applications of data
science
• PERSONALIZING TREATMENT PLANS
• Oncora’s software uses machine learning to create personalized
recommendations for current cancer patients based on data from
past ones. Healthcare facilities using the company’s platform include
UT Health San Antonio and Scripps Health. Their radiology team
collaborated with Oncora data scientists to mine 15 years’ worth of
data on diagnoses, treatment plans, outcomes and side effects from
more than 50,000 cancer records. Based on this data, Oncora’s
algorithm learned to suggest personalized chemotherapy and
radiation regimens.
Real life applications of data
science
• OPTIMIZING FOOD DELIVERY
• The data scientists at UberEats have a fairly simple goal:
getting hot food delivered quickly. Making that happen across the
country though, takes machine learning, advanced statistical
modeling and staff meteorologists. In order to optimize the full
delivery process, the team has to predict how every possible variable
— from storms to holiday rushes — will impact traffic and cooking
time.
Real life applications of data
science
• TRACKING PHYSICAL DATA FOR ATHLETES
• WHOOP makes wearable devices that track athletes’ physical data like
resting heart rate, sleep cycle and respiratory rate. The goal is to help
athletes understand when to push their training and when to rest —
and to make sure they’re taking the necessary steps to get the most
out of their body. Professional athletes like Olympic sprinter Gabby
Thomas, Olympic golfer Nelly Korda and PGA golfer Nick Watney are
among the WHOOPS’ users, according to the company’s website.
Real life applications of data
science
• SUGGESTING FRIENDS ON FACEBOOK
• Meta’s Facebook platform, of course, uses data science in various
ways, but one of its buzzier data-driven features is the “People You
May Know” sidebar, which appears on the social network’s home
screen. Often creepily prescient, it’s based on a user’s friend list, the
people they’ve been tagged with in photos and where they’ve worked
and gone to school. It’s also based on “really good math,” according
to the Washington Post — specifically, a type of data science known
as network science, which essentially forecasts the growth of a user’s
social network based on the growth of similar users’ networks.

You might also like