0% found this document useful (0 votes)
86 views27 pages

DS Module2 L1 L11

Uploaded by

rishipaul221
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views27 pages

DS Module2 L1 L11

Uploaded by

rishipaul221
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Data Science(BTCS-616-18)

Module -2
Lecture-1
Presented By
Dr. Rini Saxena
Professor (Computer Science & Engineering)
CEC Jhanjeri Mohali
[email protected]
Preview of last Lecture
Intro to Python
Data types
List
Various list Operations
UNIT 2: Data Collection and Data Pre-Processing
Data Collection Strategies, Data Pre-Processing
Overview, Data Cleaning, Data Integration and
Transformation, Data Reduction, Data Discretization.
Today’s Content
Data Collection and Data Pre-Processing
What is data collection
Methods of data collection
Qualitative data collection
Quantitative data collection
Data Collection
DATA COLLECTION
Data acquisition or data collection is the 2nd step. Data
is the starting point of the problem.
Data is a combination of information and noise.
The point of interest is to work on the information
while negating the noise. Basically, there are two types
of data
Primary Data: It is raw data which is usually obtained
by doing surveys or questionnaires. A first hand data
that we can make use of.
Secondary Data: Data that is already collected and
published but still unprepared.
Data Collection
From a purely business perspective, the general rule of
thumb is: when in doubt, collect the data.

It is not uncommon for 90% of a company’s data to go


unused but here comes the tricky part: you never know
when you will have to use a tiny part of that 90% in some
of your analytics projects.

Although data can be valuable, too much information is


disorganized, and the wrong data is useless.

The right data collection method can mean the


difference between useful insights and time-wasting
misdirection.
Data Collection
For e-commerce business and digital applications, we
can literally track everything.

Every website page view, every user interaction,


even every mouse movement. The sky's the limit.

There are certain limitations that you should consider


though:
ethical and legal limitations - These are the two
most important limitations. any default settings
associated with your data collection services.
Data Collection
implementation time - Tracking should be
implemented carefully, so it will take substantial
developer time.
The more things you track, the more engineering time
you will have to allocate to it.

site speed (websites) - Data collection can be


implemented very efficiently in terms of site speed but
complex tracking scripts can still slow down your
website page load time.
We are talking about milliseconds here, but this delay
can easily add up.
Data Collection
How to decide what to collect?
Before your developers implement your tracking
scripts, you will should come up with a data collection
specification where you'll list all the things you want
to track.

There are no general rules and best practices of what


you should collect or what you shouldn't. It always
depends on the given business and on the given use
case.
Data Collection
List all the features of your product!
When you create this list, you will learn more about
the typical user workflows, the relationships between
the different features and also what's important and
what's not.
Work Backwards
List all the stakeholders who have concerns with the
functionality of you app or website. This is certainly
product, engineering, and marketing but customer
service, business development, and sales may need to
understand how certain features or functionalities are
working. Find out each stakeholders’ KPIs that relate
to the application or website.
Data Collection
Run a workshop!
When you are fully aware of every aspect of your
product, bring developers, product people, marketers,
managers, etc… into one room and run a data
collection workshop.
Brainstorming: everyone can add her own idea about
what we should collect, even if it seems impossible.
Removing irrelevant things: It's time to rationalize.
We can remove ideas that are not important or that
would be technically really hard (or impossible) to
track.
Organizing: we try to structure the different
proposed data-points and build a schema from it.
Data Collection
Data Collection Specification (Tracking Plan)
A data collection specification or tracking plan is like
Google Translate for translating business
requirements into metrics by listing each event
that should be tracked, how it should be collected,
how it should be displayed in the analytics tool being
used, and the associated data dictionary of possible
values.

A good data collection spec should include tests that


the engineers can check against to ensure that the data
that is being collected in the right format, with the
right values, and at the right frequency.
Data Collection Methods
Qualitative vs quantitative data collection
methods
Some of the methods are quantitative, dealing with
something that can be counted.

Others are qualitative, meaning that they consider


factors other than numerical values.

In general, questionnaires, surveys, and documents


and records are quantitative, while interviews, focus
groups, observations, and oral histories are qualitative.
There can also be crossover between the two methods.
Data Collection Methods
Quantitative methods, such as surveys, large-scale
benchmarks, and prioritization, answer the question
“How much?”

But these methods can leave the question “Why?”


unanswered.

This is where qualitative data collection methods


come into play.
qualitative data collection Methods
Qualitative data collection looks at several factors to
provide a depth of understanding to raw data.

While qualitative methods involve the collection,


analysis, and management of data, instead of counting
responses or recording numeric data, this method
aims to assess factors like the thoughts and
feelings of research participants.

Qualitative data collection methods go


beyond recording events to create context.
qualitative data collection Methods
With this enhanced view, researchers can
Describe the environment. Understanding where
observations take place can add meaning to recorded
numbers.

Identify the people involved in the study. If research is


limited to a particular group of people, whether
intentionally or as a function of demographics or other
factors, this information can inform the results.

Describe the content of the study. Sometimes, the


specific activities involved in research and how messages
about the study were delivered and received may
illuminate facts about the study.
Qualitative data collection Methods
Interact with study participants. Interactions
between respondents and research staff can provide
valuable information about the results.

Be aware of external factors. Unanticipated events


can affect research outcomes.

Qualitative data collection methods allow researchers


to identify these events and weave them into their
results narrative, which is nearly impossible to do with
just a quantitative approach.
Qualitative data collection
Methods
There are three commonly used qualitative data
collection methods:

Ethnographic,
theory grounded, and
 Phenomenological.
Qualitative data collection Methods

Ethnography comes from anthropology, the study of


human societies and cultures. Ethnography seeks to
understand how people live their lives.

This approach is intended to reveal behaviors from a


subject’s perspective rather than from the view of the
researchers.

Ethnography helps fill in the blanks when a


participant may not be able to articulate their
desires or the reasons for their decisions or
behaviours.
Qualitative data collection Methods

Grounded theory arose when sociological


researchers sought to provide a level of legitimacy to
qualitative research — to ground it in reality rather
than assumptions.

 Before this method, qualitative data analysis was


actually done before any quantitative data was
collected, so it was disconnected from the collection
and analysis process.
Qualitative data collection Methods
Grounded theory uses the following methods:
Participant observation. Researchers immerse
themselves in the daily lives of subjects. Another term
for this is “fieldwork.”
Interviews. These can vary in formality from informal
chats to structured interviews.
Document and artifact collection. Grounded
theory often is about more than observation and
interviews. Researchers can learn about a group of
people from looking at materials the group used.
For example, a local community’s laws may shed
light on opinions and provide a clearer picture of
residents’ sentiments.
Qualitative data collection Methods
phenomenology describes how people experience
certain events or unique encounters.
This method measures reactions to occurrences that
are outside of the norm, so it’s essential to understand
the whole picture, not just facts and figures.
An example of phenomenology is studying the
experiences of individuals involved in a natural
disaster.
To analyze data from such an event, the researcher
must become familiar with the data; focus the
analysis on the subject matter, time period, or
other factors; and categorize the data
Qualitative data collection Methods
Completing these tasks gives the researcher a
framework for understanding how the natural
disaster impacts people.

Together, the understanding, focus, and organization


help researchers identify patterns, make
connections, interpret data, and explain findings.

Each of these qualitative data collection methods


sheds light on factors that can be hidden in simple
data analysis.
Qualitative data collection Methods

Qualitative data is one way to add context and


reality to raw numbers. Often, researchers find value
in a hybrid approach, where qualitative data collection
methods are used alongside quantitative ones.

Initially, the collected data is unstructured. Various


facts and figures may or may not have context. A
researcher’s job is to make sense of this data, and
the choice of data collection method often helps.
Quantitative Data Collection Methods

One of the most widely used methods of collecting


information for research purposes is
quantitative data collection.

Quantitative analysis relates to evaluating a numerical


result. A classic example is a survey, which asks
questions to collect responses that shed light on
trends, preferences, actions, opinions, and any
other element that can be counted.

Quantitative data collection methods are popular


because they are relatively straightforward.
Quantitative Data Collection Methods

Using these methods, researchers ask questions to


collect sets of facts and figures.

Quantitative data is measurable and expressed in


numerical form.

While this seems like a fairly simple concept, like


many aspects of research, there are various approaches
to quantitative data collection that depend on the
particular research being conducted.
Quantitative Data Collection Methods

Researchers use four different


primary quantitative research designs:
Descriptive,
Correlational,
Experimental, and
Quasi-experimental.

You might also like