Module 4
Module 4
DATA ANALYSIS :
2. Data Preparation
Data Cleaning: Removing duplicates, fixing errors, and handling missing values.
Data Transformation: Converting data into a suitable format or structure for analysis (e.g.,
normalization, encoding categories).
Data Integration: Combining data from different sources to create a cohesive dataset.
3. Data Analysis
Applying statistical or computational methods to explore and analyze the data, looking for
trends, correlations, or insights.
Techniques such as descriptive statistics, exploratory data analysis, and machine learning
algorithms may be used.
4. Data Storage
Storing processed data in databases or data warehouses for easy access, retrieval,
and analysis.
Choosing the right storage solutions based on the type of data (structured vs.
unstructured), size, and access frequency.
5. Data Visualization
Creating visual representations of data (charts, graphs, dashboards) to help
communicate findings and insights effectively.
6. Data Interpretation
Drawing conclusions from the analyzed data and providing actionable insights
based on the findings.
This may involve using reports or presentations to share the results with
stakeholders.
7. Data Security and Compliance
Ensuring that data is handled in accordance with legal, regulatory, and
organizational policies regarding privacy and security.
8. Data Maintenance
Regularly updating and maintaining data to ensure its accuracy, relevance, and
reliability over time.
Tools and Technologies
Data processing often involves using various tools and frameworks, such as:
Removing duplicates
Filling in missing values
Correcting typos and inconsistencies.
Data Formatting: Ensuring that data adheres to a specified format, such as:
Data Validation: Checking data against defined rules to ensure its reliability, such as:
Database Management Systems: SQL databases allow for more complex editing and
querying of large data sets.
Data Cleaning Tools: Software like OpenRefine, Talend, or Trifacta specifically focuses on
cleaning and transforming data.
Programming Languages: Languages like Python and R have libraries (e.g., Pandas, dplyr)
designed for data manipulation and cleaning.
Common Tasks in Data Editing:
:Best Practices for Data Editing:
Backup Original Data: Always keep a copy of the original data before making
changes.
Document Changes: Maintain a log of edits made to the data for transparency
and reproducibility.
Automate When Possible: Use scripts and tools to automate repetitive tasks,
reducing the likelihood of human error.
Test Changes: After editing, validate the changes to ensure that they have had
the desired effect without introducing new errors.
PURPOSES AND OBJECTIVES OF
DATA EDITING
◾ The basic purpose served by data editing is that it improves the
quality, accuracy and adequacy of the collected data thereby making
it more suitable for the purpose for which the data was collected. The
following can therefore be identified as the main objectives of the
data editing process :
◾ Detection of errors in the data that otherwise affect the validity of
outputs.
◾ Validation of data for the purposes it was collected.
◾ Provision of information that that would help access the overall level of
accuracy of the data.
◾ Detection and identification of any inconsistencies in the data and
TYPES OF DATA
EDITING
◾ Validity and completeness of data: refers to correctness and
completeness of obtained responses. This helps ensure that there
are no missing values or empty fields in the data bases.
◾ Range: verifies that data within a field fall between the
boundaries specified for the particular field.
◾ Duplicate data entry: this helps ensure that there is no
repetition or duplication of data and each unit on the data base or
register was filled only once.
◾ Logical consistency: through this type of editing connections
between data fields
or variables are taken into account.
STAGES OF DATA
EDITING
◾ The manual desk editing stage is a traditional method
that is put into effect by a specialized editing team. The
data, (if) on paper is checked after the data has been
collected and before it is fed into the data bases. If
however, electronic means have been used to collect the
data, the forms entered into the database are revised
individually.
◾ The automated data editing method makes use of
computer programs and systems for checking the data
all at once after it has been entered electronically.
LIMITATIONS TO DATA
EDITING
◾ Data editing can be influenced by the amount of time available, the budget, the presence
or absence of other resources
and also by the group of people involved in the editing process.
◾ The available computer software programs.
◾ Follow up with the respondents is of critical importance in the data editing process
because they are often the best source of information in many cases. However, the
respondents might feel this to be stressful and burdensome thereby causing limitations
to the data editing process.
◾ Some types of data do not require extensive editing, therefore it would be better to keep
in mind the intended uses of data and make sure that the more important part of data iz
kept free from all errors. In this way, the intended use of data does play an important
role in influencing the data editing process.
◾ What you need to do is to establish the methods and procedures that must be followed
while correcting or handling the
data errors, in the survey plan, right at the start of the project otherwise the process
would be of no or little use.
◾ Also remember that if you plan to edit your data manually, you must develop and
GENERAL GUIDELINES FOR
DATA EDITING
◾ Who should make or set the editing rules? The answer to this question would be
that such rules should be
made by professionals who are experts in data collection, questionnaire design
and analysis.
◾ The editing rules need to be consistent and free from any contradictions.
◾ When setting the editing rules, it must be established whether the variable is
qualitative or quantitative because the rules for editing either one are
different from the other.
◾ Give enough time to each of the various stages of the process, that is, data
collection entry and analysis and at the end of each make a quick check to see
that all the necessary edits have been made and that there are no empty places
within the questionnaire form.
◾ The questionnaire must be edited in full during the early stages of editing. If
however it is found that some errors remain, a sample of forms should be
subjected to re editing. The size of the sample is determined according to the
CODI
NG
When a researcher has completed collecting information or data, this information is
ready to be processed and analyzed. Quantitative data is information that is measurable
and focuses on numerical values, unlike qualitative data which is more descriptive.
During the data processing step, the collected data is transformed into a form that is
appropriate to manipulate and analyze.
Data coding is a process used in various fields, including research, statistics, and data
analysis, to convert qualitative and quantitative information into a format that can be
easily analyzed and interpreted. This process is crucial for organizing data, ensuring
consistency, and facilitating statistical analysis. Here are some key aspects of data
coding:
Types of data CODING
◾ Complex Tabulation
WHAT ARE THE OBJECTIVES OF
TABULATION?
◾For Simplification of Complex
Data –
◾To Highlight Important
Information –
◾To Enable Easy Comparison –
◾To Help in the Statistical
VALIDATI
ON
◾ What is Data Validation?
◾ Data validation refers to the process of ensuring the accuracy and
quality of data. It is implemented by building several checks into a
system or report to ensure the logical consistency of input and
stored data.
◾ In automated systems, data is entered with minimal or no human
supervision. Therefore, it is necessary to ensure that the data that
enters the system is correct and meets the desired quality
standards. The data will be of little use if it is not entered properly
and can create bigger downstream reporting issues. Unstructured
data, even if entered correctly, will incur related costs for cleaning,
transforming, and storage.
TYPES OF DATA
VALIDATION
1.Data Type
Check
2.Code Check
3.Range Check
4.Format Check
5.Consistency
Check
ANALYSIS AND
INTERPRETATION
Data analysis is defined as a process of cleaning, transforming, and
modeling data to discover useful information for business decision-
making. The purpose of Data Analysis is to extract useful information
from data and taking the decision based upon the data analysis.
A simple example of Data analysis is whenever we take any decision
in our day-to- day life is by thinking about what happened last time or
what will happen by choosing that particular decision. This is
nothing but analyzing our past or future and making decisions
based on it. For that, we gather memories of our past or dreams of
our future. So that is nothing but data analysis. Now same thing
analyst does for business purposes, is called Data Analysis.
DATA ANALYSIS
TOOLS
TYPES OF DATA ANALYSIS:
TECHNIQUES AND
METHODS
◾Text Analysis
◾Statistical Analysis
◾Diagnostic Analysis
◾Predictive Analysis
◾Prescriptive Analysis
DATA ANALYSIS
PROCESS
◾ The Data Analysis Process is nothing but gathering information by using a
proper application or tool which allows you to explore the data and find a pattern
in it. Based on that information and data, you can make decisions, or you can get
ultimate conclusions.
◾ Data Analysis consists of the following phases:
◾ Data Requirement Gathering
◾ Data Collection
◾ Data Cleaning
◾ Data Analysis
◾ Data Interpretation
◾ Data Visualization
DATA
INTERPRETATION
◾ After analyzing your data, it’s finally time to interpret your results. You can choose the
way to express or communicate your data analysis either you can use simply in words
or maybe a table or chart. Then use the results of your data analysis process to decide
your best course of action.
◾ Data Visualization
◾ Data visualization is very common in your day to day life; they often appear in the form of
charts and graphs. In other words, data shown graphically so that it will be easier for the
human brain to understand and process it. Data visualization often used to discover
unknown facts and trends. By observing relationships and comparing datasets, you can
find a way to find out meaningful information.
IMPORTANCE OF DATA
INTERPRETATION
◾Make better
decisions
◾Find trends and
take action
◾Better resource
WHAT ARE THE STEPS IN
INTERPRETING DATA?
◾Gather the data
◾Develop your
discoveries
◾Draw Conclusions
◾Give
recommendations