Data Collection Lecture
Data Collection Lecture
During data collection, researchers must identify the data types, the
sources of data, and the methods being used. We will soon see that there
are many different
The concept of data collection isn’t new, and has really helped to change
and shape the world to the stage it is right now, with the aid of technology
data collection has been made super easy, and individuals have access to
various data at any point in time.
Now that we’ve explained the various techniques let’s narrow our focus
even further by looking at some specific tools.
Word Association: The researcher gives the respondent a set of
words and asks them what comes to mind when they hear each
word.
Quality Assurance
Quality assurance in data collection refers to a systematic process of
ensuring that the data gathered is accurate, complete, reliable, and
consistent by implementing procedures to identify and address potential
errors throughout the data collection process, ultimately guaranteeing the
quality of the information collected for analysis and decision-making.
Quality Control
Quality control" in data collection refers to the process of implementing
methods to identify, prevent, and correct errors in data being gathered,
ensuring its accuracy, consistency, completeness, and reliability before
analysis; essentially, it's a set of procedures to guarantee the quality of the
data collected is high enough for its intended purpose. accordance with the
manual's defined methods. Additionally, quality control determines the
appropriate solutions, or "actions," to fix flawed data gathering procedures
and reduce recurrences.
1. Process and Analyze Your Data: At this stage, you’ll use various
methods to explore your data more thoroughly. This can involve
statistical methods to uncover patterns or qualitative techniques to
understand the broader context. The goal is to turn raw data into
actionable insights that can guide decisions and strategies moving
forward.
Inconsistent Data
When working with various data sources, it's conceivable that the same
information will have discrepancies between sources. The differences could
be in formats, units, or occasionally spellings. The introduction of
inconsistent data might also occur during firm mergers or relocations.
Inconsistencies in data tend to accumulate and reduce the value of data if
they are not continually resolved. Organizations that focus heavily on data
consistency do so because they only want reliable data to support their
analytics.
Data Downtime
Data is the driving force behind the decisions and operations of data-driven
businesses. However, there may be brief periods when their data is
unreliable or not prepared. Customer complaints and Experimental
analytical outcomes are the only two ways this data unavailability can be
resolved.
Ambiguous Data
Even with thorough oversight, some errors can still occur in massive
databases or data lakes. The issue becomes more overwhelming when
data streams at a fast speed. Spelling mistakes can go unnoticed,
formatting difficulties can occur, and column heads might be deceptive.
This unclear data might cause several problems for reporting and analytics.
Duplicate Data
Data sources are likely to duplicate and overlap each other quite a bit. For
instance, duplicate contact information has a substantial impact on
customer experience. Marketing campaigns suffer if certain prospects are
ignored while others are engaged repeatedly. The likelihood of biased
analytical outcomes increases when duplicate data are present.
Abundance of Data
A data quality problem may occur if excessive data exists. There is a risk of
getting lost in abundant data when searching for information pertinent to
your purpose of study. Data scientists, data analysts, and business users
devote 60% of their work to finding and organizing the appropriate data.
With increased data volume, other problems with data quality become more
serious, mainly when dealing with streaming data and significant files or
databases.
Inaccurate Data
Data accuracy is crucial for highly regulated businesses like healthcare.
Given the current experience, it is more important than ever to increase the
data quality, typical example was the COVID-19 era.
Data inaccuracies can be attributed to several things, including data
degradation, human mistakes, and data drift. Worldwide data decay occurs
at a rate of about 3% per month, which is quite concerning. Data integrity
can be compromised while transferring between different systems, and
data quality might deteriorate with time.
Hidden Data
One of the major constraints for data collection is hidden data, this occurs
when researchers tend to hide useful data from the general public for
confidentiality sake.