4.0 Introduction To Data

The document provides an overview of data, defining it as raw facts and figures that can be analyzed for various purposes. It categorizes data based on nature (qualitative vs. quantitative), measurement scale, structure, source, and usage in machine learning, as well as discussing big data and its characteristics. Additionally, it touches on granularity, sources of data, streaming data, and spatial data.

Uploaded by

roqia.nasimzada12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views16 pages

4.0 Introduction To Data

Uploaded by

roqia.nasimzada12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Introduction to Data

Overview of nature and types of data

What is data
• Data in general refers to raw facts, figures, or statistics that are collected,
stored, and analyzed for various purposes. It is essentially any piece of information
that can be processed or used to draw conclusions, make decisions, or develop
insights. Data can take many forms, ranging from simple numbers and text to
more complex types such as images, audio, or video.

• `In common usage, data (/ˈdeɪtə/, also US: /ˈdætə/) is a collection of discrete or
continuous values that convey information, describing the quantity, quality, fact,
statistics, other basic units of meaning, or simply sequences of symbols that may
be further interpreted formally. A datum is an individual value in a collection of
data. Data are usually organized into structures such as tables that provide
additional context and meaning, and may themselves be used as data in larger
structures` -- wikipedia
Types of Data in general
• Data can be classified in several ways depending on its nature, structure, and
how it’s processed. Let’s break it down into several key categories:

• Based on Nature (Qualitative vs. Quantitative Data)

• Based on Measurement Scale (Nominal, Ordinal, Interval, Ratio)
• Based on Data Structure (Structured vs. Unstructured Data)
• Based on Data Source (Primary vs. Secondary Data)
• Based on Usage in Machine Learning (Labeled vs. Unlabeled Data)
• Big Data
Based on Nature (Qualitative vs.
Quantitative
Qualitative Data)
Data (Categorical Data): Qualitative data describes qualities or
characteristics and is often non-numerical.
and is descriptive, textual, and does not involve measurements or numbers. Like
• Color of a car (red, blue, green)
• Type of product (clothing, electronics)
• Survey responses (satisfied, neutral, unsatisfied)
Types of Qualitative Data:
• Nominal Data: Data that represents categories with no inherent order (e.g.,
gender, nationality).
• Ordinal Data: Data that represents categories with a specific order or ranking, but
the difference between ranks is not measurable (e.g., rating scales like "good,
better, best").
Based on Nature (Qualitative vs.
Quantitative Data)
Quantitative Data (Numerical Data): Quantitative data represents numerical
values and is measurable which Involves counting or measuring quantities, and can
be used for mathematical operations.
• Height of a person (in centimeters)
• Sales amount ($500, $1000)
• Age (25 years, 30 years)
Types of Quantitative Data:
• Discrete Data: Data that consists of distinct, countable values (e.g., the number
of children in a family, the number of cars sold).
• Continuous Data: Data that can take any value within a range and is typically
measured (e.g., height, temperature, time).
Based on Measurement Scale
• Nominal Data: Data categorized without a specific order or ranking. It’s purely
for labeling purposes like Eye color (blue, brown, green), types of animals (cat,
dog, bird).
• Ordinal Data: Data that has a meaningful order or ranking but no measurable
distance between ranks. Like Customer satisfaction (poor, average, excellent),
education level (high school, bachelor’s, master’s).
• Interval Data: Data with meaningful intervals between values, but no true zero
point. The difference between values is consistent, but ratios don’t make sense.
Like Temperature in Celsius or Fahrenheit, IQ scores.
• Ratio Data: Similar to interval data but with a true zero point, allowing for
meaningful ratios and comparisons. Like Weight, height, age, temperature in
Kelvin (where 0 means "absence" of heat).
Based on Data Structure
a. Structured Data:
• Definition: Data that is highly organized and follows a specific format or schema. It is easy to store, search, and analyze using traditional database systems (like
relational databases).
• Examples:
• Data in tables or databases (e.g., an Excel spreadsheet with rows and columns)
• Customer records (name, age, address, order history)
b. Unstructured Data:
• Definition: Data that has no predefined structure and doesn’t fit neatly into traditional database formats.
• Examples:
• Text documents, emails, social media posts
• Images, videos, audio files
• Webpages, PDF files
c. Semi-Structured Data:
• Definition: Data that doesn’t have a rigid structure but contains tags or markers to separate elements, making it easier to organize than unstructured data.
• Examples:
• JSON and XML files
• Emails with metadata (like sender, recipient, timestamp)
• Log files from servers
Based on Data Source
• Primary Data: Data that is collected first-hand for a specific purpose
directly from the source.
• Survey responses
• Interviews
• Experiment results
• Secondary Data: Data that has been collected previously by someone
else for a different purpose.
• Data from research papers
• Government reports
• Existing databases (e.g., census data, financial reports)
Based on Usage in Machine
Learning
a. Labeled Data:
• Definition: Data that has input features as well as associated labels or targets. It is primarily used in supervised learning tasks.
• Examples:
• A dataset of images where each image is labeled with the object it contains (e.g., cat, dog).
• A medical dataset with patient attributes and the diagnosis (cancer, no cancer).
b. Unlabeled Data:
• Definition: Data that has input features but no labels or targets. It is used in unsupervised learning tasks where the goal is to find patterns or structure in
the data.
• Examples:
• A dataset of customer purchasing behaviors with no predefined groups or outcomes.
• A set of sensor data without any predefined classifications.
Big Data
Data that is extremely large in volume, high in velocity, and diverse in variety. It
cannot be easily processed or analysed using traditional data processing
techniques.

Characteristics: Often requires distributed computing systems like Hadoop or

Spark to manage.

Examples:
• Social media data (Twitter, Facebook)
• Sensor data from IoT devices
• Streaming data from real-time systems like financial markets or website logs
Nature of data according to Data
science
In data science, the nature of data refers to the characteristics, types, and structures
of the data that are used for analysis, modeling, and decision-making. Data can vary
in its form, source, granularity, and structure, influencing how it is processed
and analyzed

The mentioned data types are used in machine learning however there are more for
example
Granularity of Data
Granularity refers to the level of detail in the data.
• Fine-grained data: Highly detailed data, such as individual transactions, which
provides more insight but can be difficult to aggregate and process. Like Sensor
data collected from a smart device every second.

• Coarse-grained data: Summarized or aggregated data, which is easier to

process but may lose some of the finer insights. Like Daily sales totals, average
monthly temperatures.
Sources of Data
• Generated Data
Sensors: Data from IoT devices, environmental sensors (e.g., weather stations), fitness
trackers.
Transactions: Data generated from financial transactions (e.g., payment records, online
purchases).
• Social and Web Data
Social Media: Posts, comments, likes, and shares from platforms like Twitter, Facebook,
and Instagram.
Web Analytics: Data from website interactions, such as clicks, page views, and time spent
on a site.
• Public and Open Data
Government datasets: Census data, weather reports, and other publicly available
information.
Open-source repositories: Datasets from sources like Kaggle, UCI Machine Learning
Repository.
• Survey Data: Collected from responses to questionnaires or polls, often used in
social science research. Like customer satisfaction surveys, demographic surveys.
Streaming Data
Data that is continuously generated and processed in real-time. For example
• Social media feeds: Real-time data from Twitter or Facebook.
• Sensor data: Data from IoT devices that continuously send signals, like smart
thermostats or fitness trackers.
Spatial Data
Data that includes information about locations, such as latitude and longitude
coordinates

Examples:
• GIS (Geographic Information System) Data: Maps, satellite images, and
data about locations (e.g., elevation, land use).
• GPS Data: Location tracking data from smartphones or vehicles.

Tcode List
100% (3)
Tcode List
2 pages
Unit 1
No ratings yet
Unit 1
85 pages
ML Lecture 4 Data
No ratings yet
ML Lecture 4 Data
22 pages
DATA ANALYSIS - Full - Note - Immersive 2
No ratings yet
DATA ANALYSIS - Full - Note - Immersive 2
13 pages
Fundamentals of Machine Learning and Data Science
No ratings yet
Fundamentals of Machine Learning and Data Science
73 pages
Coursera - Data Analytics - Course 3
No ratings yet
Coursera - Data Analytics - Course 3
14 pages
EDA Unit-1
No ratings yet
EDA Unit-1
9 pages
Module 1 - Lecture 3 - Types of Data - 16.5.2022
No ratings yet
Module 1 - Lecture 3 - Types of Data - 16.5.2022
38 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
86 pages
Data Science Using R
No ratings yet
Data Science Using R
74 pages
How Data Is Col
No ratings yet
How Data Is Col
11 pages
03-07-2024-Data Science - Orentation Programme
No ratings yet
03-07-2024-Data Science - Orentation Programme
53 pages
Course 3
No ratings yet
Course 3
22 pages
Lecture 5 1 Flavours of Data
No ratings yet
Lecture 5 1 Flavours of Data
30 pages
Undestanding Data Module-3
No ratings yet
Undestanding Data Module-3
8 pages
Data Science UNIT 1 Final
No ratings yet
Data Science UNIT 1 Final
107 pages
Types of Data and Examples
No ratings yet
Types of Data and Examples
3 pages
Chapter 1-Introduction To Data
No ratings yet
Chapter 1-Introduction To Data
18 pages
AI Module3 CH2
No ratings yet
AI Module3 CH2
13 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
41 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
41 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
15 pages
Lecture 01-05 Data, Central Tendency PDF
No ratings yet
Lecture 01-05 Data, Central Tendency PDF
51 pages
Chapter 2 EMTE@Kibru 014914
No ratings yet
Chapter 2 EMTE@Kibru 014914
40 pages
Chapter 2 - Intro To Data Sciences (Updated)
No ratings yet
Chapter 2 - Intro To Data Sciences (Updated)
67 pages
Introductory Big Data
No ratings yet
Introductory Big Data
34 pages
Unit-5 DS
No ratings yet
Unit-5 DS
20 pages
Chapter 1.1 Introduction To Data
No ratings yet
Chapter 1.1 Introduction To Data
10 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
52 pages
Data Analyst Work
No ratings yet
Data Analyst Work
22 pages
Chapter 1
No ratings yet
Chapter 1
149 pages
ML Chapter 01
No ratings yet
ML Chapter 01
19 pages
Unit I - Data Science
No ratings yet
Unit I - Data Science
161 pages
Introduction To Data Science: Chapter Two
No ratings yet
Introduction To Data Science: Chapter Two
52 pages
Big Data Study 1
No ratings yet
Big Data Study 1
77 pages
Data Visulaziation
No ratings yet
Data Visulaziation
42 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
16 pages
CHAPTER 2 Emerging
No ratings yet
CHAPTER 2 Emerging
8 pages
Moduke 2
No ratings yet
Moduke 2
55 pages
Sample Security Plan
No ratings yet
Sample Security Plan
9 pages
1. Εισαγωγή στην Εξόρυξη Δεδομένων
No ratings yet
1. Εισαγωγή στην Εξόρυξη Δεδομένων
70 pages
Unit 1ppt
No ratings yet
Unit 1ppt
29 pages
L1
No ratings yet
L1
44 pages
Unit-01 Varun Singh
No ratings yet
Unit-01 Varun Singh
34 pages
Unit 1 R
No ratings yet
Unit 1 R
10 pages
What Is Data? Explain The Importance of Data.: Unit I 1
No ratings yet
What Is Data? Explain The Importance of Data.: Unit I 1
52 pages
Note On Data Analytics
No ratings yet
Note On Data Analytics
21 pages
Unit 2 1
No ratings yet
Unit 2 1
48 pages
FDS Module 1 Notes
No ratings yet
FDS Module 1 Notes
27 pages
ET Ch-2 Data Science PPT
No ratings yet
ET Ch-2 Data Science PPT
28 pages
Mypresentation 1
No ratings yet
Mypresentation 1
50 pages
Lecture 1,2&3
No ratings yet
Lecture 1,2&3
80 pages
Chapter 2 - EMTE - 240216 - 133452
No ratings yet
Chapter 2 - EMTE - 240216 - 133452
47 pages
Final UNIT II-DESCRIPTIVE ANALYTICS
No ratings yet
Final UNIT II-DESCRIPTIVE ANALYTICS
128 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
57 pages
What Is Data
No ratings yet
What Is Data
8 pages
Data Science Class2
No ratings yet
Data Science Class2
33 pages
DG Intro
No ratings yet
DG Intro
22 pages
Module 1 Part1
No ratings yet
Module 1 Part1
68 pages
Data Collection: Six Sigma Thinking, #1
From Everand
Data Collection: Six Sigma Thinking, #1
Sumeet Savant
No ratings yet
Data Types: Getting Started With Statistics
From Everand
Data Types: Getting Started With Statistics
Lee Baker
No ratings yet
Linear Equation
No ratings yet
Linear Equation
6 pages
The Strategy Process
No ratings yet
The Strategy Process
34 pages
Business Decision Making 2, Spring 2021 Review For Final Exam
No ratings yet
Business Decision Making 2, Spring 2021 Review For Final Exam
2 pages
Applied Linear Regression
No ratings yet
Applied Linear Regression
9 pages
Value Creation Through Eflective Project Management: Herbert Hoover, Himself A Mining Engineer, Said
No ratings yet
Value Creation Through Eflective Project Management: Herbert Hoover, Himself A Mining Engineer, Said
8 pages
Abusive Supervision and Organizational Citizenship Behaviors
No ratings yet
Abusive Supervision and Organizational Citizenship Behaviors
10 pages
Procurement Planning
No ratings yet
Procurement Planning
93 pages
Group 1
No ratings yet
Group 1
41 pages
My Research Output Matrix
No ratings yet
My Research Output Matrix
6 pages
CoP On Monitoring and Maintenance Water Carrying Services Affecting Slopes
No ratings yet
CoP On Monitoring and Maintenance Water Carrying Services Affecting Slopes
96 pages
Multimodal Argument Instructions
No ratings yet
Multimodal Argument Instructions
2 pages
Perceived Greenwashing and Its Impact On Eco-Frien
No ratings yet
Perceived Greenwashing and Its Impact On Eco-Frien
12 pages
Practical Research 1 Quarter 4 Week 2
No ratings yet
Practical Research 1 Quarter 4 Week 2
4 pages
Chapters 1 3
No ratings yet
Chapters 1 3
30 pages
Tutorial 4
No ratings yet
Tutorial 4
7 pages
Literature Review Examples Nursing
100% (2)
Literature Review Examples Nursing
5 pages
2021.7.29 Bhargava Divya Thesis
No ratings yet
2021.7.29 Bhargava Divya Thesis
193 pages
Panel Data Assign
No ratings yet
Panel Data Assign
19 pages
1 Neyman-Pearson Lemma
No ratings yet
1 Neyman-Pearson Lemma
6 pages
Global Benchmarking Series 2023 - Contact Center Quality Assurance
No ratings yet
Global Benchmarking Series 2023 - Contact Center Quality Assurance
35 pages
Assignment SHRM
No ratings yet
Assignment SHRM
6 pages
Customer Satisfaction On Samsung Phones
No ratings yet
Customer Satisfaction On Samsung Phones
10 pages
VSSA Journal Volume 56 - April 2023
No ratings yet
VSSA Journal Volume 56 - April 2023
149 pages
Khajura RMTMP Final 30 June
No ratings yet
Khajura RMTMP Final 30 June
127 pages
Marketing Science Study Guide 02062025
No ratings yet
Marketing Science Study Guide 02062025
89 pages
Assessment - Blooms Taxonomy Action Verbs, Good List PDF
No ratings yet
Assessment - Blooms Taxonomy Action Verbs, Good List PDF
1 page
Interpretation of Triaxial Testing Data For Estimation of The Hoek-Brown Strength Parameter Mi PDF
No ratings yet
Interpretation of Triaxial Testing Data For Estimation of The Hoek-Brown Strength Parameter Mi PDF
11 pages
Nihms 173355
No ratings yet
Nihms 173355
24 pages
A Study of The Level of Awareness and Practices of Solid Waste Management in Chinhoyi, Urban, Zimbabwe Simbarashe Munikwa Beauty Dondo
No ratings yet
A Study of The Level of Awareness and Practices of Solid Waste Management in Chinhoyi, Urban, Zimbabwe Simbarashe Munikwa Beauty Dondo
10 pages

4.0 Introduction To Data

Uploaded by

4.0 Introduction To Data

Uploaded by

Introduction to Data

Overview of nature and types of data

• Based on Nature (Qualitative vs. Quantitative Data)

Characteristics: Often requires distributed computing systems like Hadoop or

• Coarse-grained data: Summarized or aggregated data, which is easier to

You might also like