Module 1
Module 1
6OE371
COs
Course Outcomes (CO) with Bloom’s Taxonomy Level
Analyse and interpret large data sets in the context of real-world Analysing
CO4 problems.
Syllabus
Module Module Contents Hours
Module 1: Introduction to core concepts and technologies
I Introduction, Terminology, data science process, data science toolkit, Types of data, 4
Example applications
Module 2 Data Collection and Management
II Introduction, Sources of data, Data collection, Exploring and fixing data, Data storage 7
and management, Using multiple data sources.
It primarily deals with data. Machine Learning uses data to learn from it and predict
insights or results.
Data in Data Science maybe or maybe not have evolved from It includes various technologies like supervised, unsupervised,
a machine or mechanical process. semi-supervised and reinforcement learning, regression,
clustering, etc.
It includes various data operations such as cleaning, It includes operations such as data preparation, data analysis,
collection, manipulation, etc. training the model, etc.
It requires knowledge of various analytical functions and a It needs advanced knowledge of Data Modelling.
basic understanding of machine learning and Artificial
Intelligence.
It requires strong knowledge of Python, R, SAS, Scala, as well It requires knowledge of programming languages like Java,
as hands-on knowledge of SQL databases. Python, R as well as in-depth knowledge of mathematical
concepts such as probability and statistics.
Applications of Data Science
• Image recognition and speech recognition
• Gaming world
• Internet search
• Healthcare
• Recommendation systems
• Risk detection
Data Science Life Cycle / process
Data science toolkit
Types of Data
Qualitative data/Categorical
data
• Qualitative or Categorical Data describes the object under consideration
using a finite set of discrete classes.
• It means that this type of data can’t be counted or measured easily
using numbers and therefore divided into categories.
• The gender of a person (male, female, or others) is a good example of
this data type.
• These are usually extracted from audio, images, or text medium.
• Another example can be of a smartphone brand that provides
information about the current rating, the color of the phone, category
of the phone, and so on.
• All this information can be categorized as Qualitative data.
Nominal
• These are the set of values that don’t possess a natural ordering.
• e.g The color of a smartphone as we can’t compare one color with
others.
• It is not possible to state that ‘Red’ is greater than ‘Blue’.
• The gender of a person where we can’t differentiate between male,
female, or others.
• Nominal data types in statistics are not quantifiable and cannot be
measured through numerical units.
• Nominal types of statistical data are valuable while conducting
qualitative research as it extends freedom of opinion to subjects.
Ordinal
• These types of values have a natural ordering while maintaining their class of values.
• e.g If we consider the size of a clothing brand then we can easily sort them according
to their name tag in the order of small < medium < large.
• The grading system while marking candidates in a test can also be considered as an
ordinal data type where A+ is definitely better than B grade.
• These categories help us deciding which encoding strategy can be applied to which
type of data.
• Data encoding for Qualitative data is important because machine learning models
can’t handle these values directly and needed to be converted to numerical types as
the models are mathematical in nature.
• For nominal data type where there is no comparison among the categories, one-hot
encoding can be applied which is similar to binary coding considering there are in
less number and for the ordinal data type, label encoding can be applied which is a
form of integer encoding.
Quantitative Data Type