0% found this document useful (0 votes)
3 views

Lecture_2_Basics of Data Science (1)

The document provides an overview of data science, defining data and its various types, including numeric, alphabetic, alphanumeric, graphic, audio, video, and mixed data. It distinguishes between qualitative and quantitative data, explaining their characteristics and methods of collection, both primary and secondary. Additionally, it discusses structured, semi-structured, and unstructured data, along with the importance of data collection techniques for analysis and decision-making.

Uploaded by

sravane1608
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Lecture_2_Basics of Data Science (1)

The document provides an overview of data science, defining data and its various types, including numeric, alphabetic, alphanumeric, graphic, audio, video, and mixed data. It distinguishes between qualitative and quantitative data, explaining their characteristics and methods of collection, both primary and secondary. Additionally, it discusses structured, semi-structured, and unstructured data, along with the importance of data collection techniques for analysis and decision-making.

Uploaded by

sravane1608
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 56

Program :B.

Sc
Course Code :DS101
Course Name :Basics of Data Science
Introduction to Data
Science

Program Name: Program Code:


What is Data?
 In general, data is any set of characters that is
gathered and translated for some purpose, usually
analysis. If data is not put into context, it doesn't
do anything to a human or computer.
 There are multiple types of data. Some of the
more common types of data include the following:
Single character
Boolean (true or false)
Text (string)
Number (integer or floating-point)
Picture
Sound
Video
What is Data?
 In a computer's storage, data is a series of bits (
binary digits) that have the value one or zero.
Data is processed by the CPU, which uses logical
operations to produce new data (output) from
source data (input).
Examples of computer
data?
Examples of data
1) Student Data on Admission Forms
When students get admission in a college. They fill
admission form. This form contains raw facts (data of
student) like name, father’s name, address of student,
obtained marks, photo graph etc.
2) Data of Citizens
During census, data of all citizens is collected. The
staff will go house to house and collect data about
citizens like number of persons living in a home, either
they are literate or illiterate, number of children, data
of each child, cast, religion, Computerized national
Identity Card number, address, how many rooms and
other facilities in the house etc.
Examples of data
3) Survey Data
Different companies collect data by survey to know the
opinion of people about their product. the company's
survey staff will go house to house and interview people
about the use or like / unlike their products. They also
collect data about their competitor companies in a
particular area.
4) Students Examination data
In examination system of a school or college, data about
obtained marks of different subjects for all students is
collected. Actually answer books are collected from
students and marked by teachers. Teachers provide
marks sheets / award lists containing data of obtained
marks by all students.
Different Types Of
1.NUMERIC DATA Data
Numeric data consists of 0 to 9 digits, + and –
sign and decimal point etc.
For example, 10, 78, 90.50, -56 etc

2. ALPHABETIC DATA
It consists of all alphabetic letters A to Z, a to z
and blank space etc.
Different Types Of
Data
3. ALPHANUMERIC DATA
It consists of alphabet letters, digits and special
characters like #, $, % etc.
For example, House Number 10-A, 14-August-1947,
F-16 etc.

4. GRAPHIC DATA
Graphic data or image data consists of charts, graphs
and images etc. For example a collection of maps of
countries, a collection of family pictures etc.
Different Types Of
5. AUDIO DATA
Data
Audio data consists of sounds and voices. For
example radio program, radio news, audio songs etc.

6. VIDEO DATA
Video data consists of moving pictures. For example
movie, TV drama, TV news etc.
Different Types Of
7. MIXED DATA Data
Mixed data consists of combined data of two or more
types.
For example, TV drama consists of audio as well as
video data. Another example of mixed data is the
admission form of students. Because students
provide different types of data on admission form like
numeric data, alphabetic data, alpha numeric data
and graphic/image data etc. as explained below:
Numeric data: marks obtained by student
Alphabetic data: Name, father name etc.
Alpha numeric data: Address
Graphic data: Picture of student
Quantitative data are measures of values or
counts and are expressed as numbers.

Quantitative data are data about numeric


variables (e.g. how many; how much; or how
often).

Qualitative data are measures of 'types' and


may be represented by a name, symbol, or a
number code.

Qualitative data are data about categorical


variables (e.g. what type).

Quantitative = Quantity and Qualitative


= Quality
To provide a complete picture of a
population, quantitative and qualitative
data are frequently combined because they
produce distinct results.

For instance, if annual income statistics are


gathered (quantitative), occupation data
could also be gathered (qualitative) to learn
additional specifics about the average
annual income for each type of occupation.

Quantitative and qualitative data can be


gathered from the same data unit
depending on whether the variable of
Example
Data unit Numeric = Categorical =
variable Quantitative variable Qualitative
data data
A person "How 4 children "In which Australia
many children country were
do you have?" your children
born?"
"How much do $60,000 p.a. "What is your Photograph
you earn?" occupation?" er

"How many 38 hours per "Do you Full-time


hours do you week work full-time
work?" or part-time?"
A house "How 200 square "In which city Brisbane
many square metres or town is the
metres is the house located?"
house?"
A business "How 264 employees "What is Retail
many workers the industry of
are currently the business?"
employed?"
A farm "How many milk 36 cows "What is the Dairy
cows are located main activity of
on the farm? the farm?"
Qualitative vs
Quantitative
Data can be qualitative or quantitative.
 Qualitative data is descriptive information
(it describes something)
 Quantitative data is numerical information
(numbers)
Qualitative vs
Quantitative
 Quantitative data can be Discrete or Continuous:
 Discrete data can only take certain values
(like whole numbers)
 Continuous data can take any value (within
a range)
 Put simply: Discrete data is counted,
Continuous data is measured
Qualitative vs
Quantitative
Example: What do we know about Arrow the Dog?

Qualitative:
He is brown and black
He has long hair
He has lots of energy
Qualitative vs
Example: What Quantitative
do we know about Arrow the Dog?

Quantitative:
Discrete:
He has 4 legs
He has 2 brothers
Continuous:
He weighs 25.5 kg
He is 565 mm tall
Qualitative vs
More Example: Quantitative
Qualitative:
Your friends' favourite holiday destination
The most common given names in your town
How people describe the smell of a new perfume
Quantitative:
Height (Continuous)
Weight (Continuous)
Petals on a flower (Discrete)
Customers in a shop (Discrete)
How can you use quantitative and qualitative data?

It is important to identify whether the data are quantitative or qualitative


as this affects the statistics that can be produced.

Frequency counts:

The number of times an observation occurs (frequency) for a data item


(variable) can be shown for both quantitative and qualitative data.

The graphs below arrange the quantitative and qualitative data to show
the frequency distribution of the data.
Quantitative Data
Qualitative Data
As absolute frequencies can be calculated on quantitative and
qualitative data, relative frequencies can also be produced, such as
percentages, proportions, rates and ratios. For example, the graphs
above show 4 people (20%) worked less than 30 hours per week, and
6 people (30%) are teachers.
Descriptive (summary) statistics:

Statistics that describe or summarise can be produced for


quantitative data and to a lesser extent for qualitative data.

As quantitative data are always numeric they can be ordered,


added together, and the frequency of an observation can be
counted. Therefore, all descriptive statistics can be calculated
using quantitative data.

As qualitative data represent individual (mutually exclusive)


categories, the descriptive statistics that can be calculated are
limited, as many of these techniques require numeric values
which can be logically ordered from lowest to highest and which
express a count.
Mode can be calculated, as it it the most frequency observed value.
Median, measures of shape, variance and standard deviation require the
mean to be calculated, which is not appropriate for categorical variables
as they have no numerical value.
Inferential statistics:

By making inferences about quantitative data from a sample, estimates or


projections for the total population can be produced.

Quantitative data can be used to inform broader understandings of a


population, or to consider how that population may change or progress into the
future.
For example, a simple income projection for an employee in 2015 may be
inferred from the rate of change for data collected in 2000, 2005, and 2010.

As shown in the graph below, data collected over time indicates a 5% increase
every five years. Therefore, if the rate of increase continues to follow the same
pattern, it can be projected that the annual income for that employee in 2015 will
be $46,305; which is the 2010 wage of $44,100 increased by an additional 5%.

Qualitative data are not


compatible with
inferential statistics as
all techniques are
based on numeric
values.
Structured and Semi-Structured and
Unstructured Data
Big Data can be structured, unstructured, and semi-
structured that are being collected from different sources
Structured Data : The data which
is to the point, factual, and highly
organized is referred to as structured
data. It is quantitative in nature, i.e.,
it is related to quantities that means
it contains measurable numerical
values like numbers, dates, and
times.

Structured data is data with a high


degree of organization, usually stored
in some sort of spreadsheet
The structured doesn't require much pre-
processing and may be used directly with
computing resources. Structured data is simple to
use for data analytics, machine learning, and
data visualisation.

Examples: CSV, EXCEL, RDBMS tables


• Structured data is simple to search and evaluate.
• Data that is structured has a predetermined format.
• One of the best examples of structured data is a
relational database, which consists of tables with
rows and columns.
• Tables like those seen in Google Docs spreadsheets
and Excel files typically include structured data.
• The structured data is managed using the
programming language SQL (structured query
language).
• Machine language can easily comprehend and fully
organize structured data.
• Applications of relational databases with structured
data that are often used include inventory
management, airline reservation systems, and
sales transactions.
Figure shows
customer data of
Your Model Car,
using a
spreadsheet as
an example of
structured data.
The tabular form
and inherent
structure make
this type of data
analysis-ready,
e.g. we could use
a computer to
filter the table for
customers living
in the USA (the
data is machine-
readable).
Unstructured Data :

Unstructured data is
the data that lacks any
predefined model or
format.

All the unstructured


files, log files, audio
files, and image files
are included in the
unstructured data.
Some organizations
have much data
available, but they did
not know how to derive
data value since the
data is raw.
It requires a lot of storage space, and it is hard to maintain security in it.

It cannot be presented in a data model or schema.

Hence managing, analyzing, or searching for unstructured data is hard.

It resides in various different formats like text, images, audio and video
files, etc.

It is qualitative in nature and sometimes stored in a non-relational


database or NO-SQL
The amount of unstructured data is much
more than the structured or semi-
structured data.
Examples of human-generated
unstructured data are Text files, Email,
social media, media, mobile data,
business applications, and others.
The machine-generated unstructured
data includes satellite images, scientific
data, sensor data, digital surveillance,
and many more.
semi-structured data

Semi-structured is data which has some degree of organization in it. It is


not as rigorously structured as structured data, but also not as messy as
unstructured data.

Example: HTML,XML,JSON files

In HTML a text and other data is organized with tags. These tags
somewhat organize this file and help the browser rendering it and
making sense of it. However, on a different webpage the number and
type of tags used might be completely different.
JSON
An employee data JSON file is shown in the diagram., JSON files
naturally contain a tree-like structure that provides some organisation,
though it is weaker than a table's. As a result, it is partially possible to
analyse the data using simple filter choices, although doing so is more
difficult than doing so with structured data.
What is Data Collection?
Data collection is the process of collecting, measuring
and analyzing different types of information using a
set of standard validated techniques.

The basic goal of data collecting is to obtain


trustworthy, information-rich data that can be used
for analysis and important business decisions. To
make the data genuinely usable for organisations, it
must undergo a thorough process of data cleansing
and processing after it is collected.
Methods of Data Collection

•Primary Data Collection

•Secondary Data Collection


Primary Data Collection Methods
Data gathered through first-hand experience and taken straight
from the original source is referred to as primary data.

It refers to information that has never before been used. The best
type of data for study is typically thought to be that which is
obtained using primary data collection techniques.

Quantitative data collection methods (which deal with factors that


can be counted) and qualitative data collection methods (which deal
with factors that are not always numerical in nature) can be used to
further categorize the techniques for gathering primary data.
1. Interviews

The direct approach of gathering data is through interviews. It


is only a process in which the interviewee answers questions
that are posed by the interviewer. It offers a great level of
flexibility because questions can be modified and changed as
necessary depending on the circumstance.

II. Observations
Researchers use this technique to observe their surroundings
and document their results. It can be used to assess how
various people behave in scenarios that are
controlled (everyone is aware that they are being watched)
and
uncontrolled (no one is aware that they are being watched).

Because it is simple and independent of other participants,


this strategy is quite effective.
III. Surveys and Questionnaires

Surveys and questionnaires offer a comprehensive viewpoint from


sizable populations.

They can be carried out in-person, by mail, or even posted online to


collect responses from people all over the world.

Yes or no, true or false, multiple choice, and even open-ended


questions are acceptable as responses.

However, surveys and questionnaires have the disadvantage of


delayed responses and the potential for confusing responses.
IV. Focus Groups

Similar to an interview, a focus group is done with a group of


people who all share a same interest.

Similar to in-person interviews, the data gathered provides a


greater insight of why a certain set of people believes the way they
do.

However, this approach has certain limitations, including


lack of privacy and interview dominance by one or two people.

Focus groups might take a lot of time and be difficult, but they can
help disclose some of the best information for difficult
circumstances.
V. Oral Histories

Similar to interviews and focus groups, oral histories also entail


questioning participants.

However, it is more specifically defined, and the information gathered


is connected to a single phenomenon.

It entails compiling the viewpoints and firsthand accounts of those who


participated in a specific event.

For instance, it can be useful in researching the impact of a new product


on a certain community.
Secondary Data Collection Methods

Data that has already been gathered by


another party is referred to as secondary
data.

Compared to primary data, it is


significantly more accessible and less
expensive to obtain.

Although primary data collecting yields


more true and original data, secondary
data collection frequently offers businesses
a lot of benefit.
Methods
1. Internet

One of the most often used secondary data collection techniques in


recent years is the usage of the Internet.

On the Internet, there is a significant selection of both free and paid


research resources.

Despite the fact that this method is quick and simple, you should only
use reliable websites for gathering data.
II. Government Archives

We can use a lot of the data that is available in the government


archives. The fact that the information in official archives can be
verified and is authentic is the biggest benefit.

The problem, though, is that data isn't always easily accessible for
a variety of reasons.

Criminal records, for instance, may fall under the category of


classified information and are difficult for anybody to access.

Example :

https://data.gov.in/ (Open Govt. Data) (OGD) of India


Database - Eurostat (europa.eu)
III. Libraries

We can take data from libraries which are like repositories of


information.
Collecting
Data can be collected in many ways. The simplest
way is direct observation.

Example: Counting Cars

You want to find how many cars pass by a certain


point on a road in a 10-minute interval.
So: stand near that road, and count the cars that pass
by in 10 minutes.
You might want to count many 10-minute intervals at
different times during the day, and on different days
too!
Census or Sample
 A Census is when we collect data
for every member of the group (the whole
"population").
 A Sample is when we collect data just
for selected members of the group.
Example: 120 people in your local football club
You can ask everyone (all 120) what their age is.
That is a census.
Or you could just choose the people that are there
this afternoon. That is a sample.

 A census is accurate, but hard to do. A sample is


not as accurate, but may be good enough, and is
a lot easier.
How to Show Data
Bar
Graphs
A Bar Graph (also called Bar Chart) is a graphical
display of data using bars of different heights.
Example: Imagine you just did a survey of your
friends to find which kind of movie they liked best:

Table: Favourite Type of Movie


Comedy Action Romance Drama SciFi
4 5 6 1 4
How to Show Data
Pie Chart
Pie Chart: a special chart that uses "pie slices" to
show relative sizes of data..
Example: Imagine you just did a survey of your
friends to find which kind of movie they liked best:

Table: Favourite Type of Movie


Comedy Action Romance Drama SciFi
4 5 6 1 4
How to Show Data
Dot Plots
A Dot Plot is a graphical display of data using dots.
Example: Minutes To Eat Breakfast
A survey of "How long does it take you to eat
breakfast?" has these results:

Minutes
0 1 2 3 4 5 6 7 8 9 10 11 12
:

People: 6 2 3 5 2 5 0 0 2 3 7 4 1
How to Show Data
Line
Graphs
Line Graph: a graph that shows information that is
connected in some way (such as change over time)
Example: You are learning facts about dogs, and
each day you do a short test to see how good you
are. These are the results:
Table: Facts I got Correct
Day 1 Day 2 Day 3 Day 4
3 4 12 15
https://www.excel-easy.com/functions/statistical-
functions.html

https://www.excel-easy.com/examples/box-
whisker-plot.html

You might also like