0% found this document useful (0 votes)
6 views

Module 4

The document outlines the process of data analysis, including data processing, editing, coding, and visualization, emphasizing the importance of transforming raw data into a usable format for decision-making. It details various techniques and tools for data analysis, such as statistical methods and software applications, while also highlighting best practices for data editing and validation. Additionally, it discusses the significance of data interpretation and the steps involved in analyzing and presenting data effectively.

Uploaded by

hithushwetha555
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Module 4

The document outlines the process of data analysis, including data processing, editing, coding, and visualization, emphasizing the importance of transforming raw data into a usable format for decision-making. It details various techniques and tools for data analysis, such as statistical methods and software applications, while also highlighting best practices for data editing and validation. Additionally, it discusses the significance of data interpretation and the steps involved in analyzing and presenting data effectively.

Uploaded by

hithushwetha555
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

MODULE-4

DATA ANALYSIS :

Data processing – Editing, coding, pictorial and graphical


presentation of Data, Parametric and non-parametric- Hypothesis testing
using statistical tools such as descriptive statistics, Chi–square, t- test,
ANOVA, Correlation and Regression (only application through Excel Sheet
and Jamovi).
DATA ANALYSIS :

◾ The data which has been collected from various


primary/secondary sources is RAW in nature, this means
that there are likely chances of errors and inconsistencies
in it.
◾ Since data collected is of pivotal importance to policy and
decision makers everywhere be it governmental
departments, business organizations, health or
educational institutions etc it would be better to have a
team of experts at hand who know how to scrutinize,
review and edit this data before it is finally fed into the
Data Processing
Data processing refers to the manipulation, management, and
transformation of raw data into a usable format for analysis,
reporting, or decision-making. This process may involve several
steps, depending on the type of data and its intended use. Here are
the main steps typically involved in data processing:
1. Data Collection
Gathering raw data from various sources such as surveys, sensors, transactions, and databases.

2. Data Preparation
Data Cleaning: Removing duplicates, fixing errors, and handling missing values.
Data Transformation: Converting data into a suitable format or structure for analysis (e.g.,
normalization, encoding categories).
Data Integration: Combining data from different sources to create a cohesive dataset.

3. Data Analysis
Applying statistical or computational methods to explore and analyze the data, looking for
trends, correlations, or insights.
Techniques such as descriptive statistics, exploratory data analysis, and machine learning
algorithms may be used.
4. Data Storage
Storing processed data in databases or data warehouses for easy access, retrieval,
and analysis.
Choosing the right storage solutions based on the type of data (structured vs.
unstructured), size, and access frequency.

5. Data Visualization
Creating visual representations of data (charts, graphs, dashboards) to help
communicate findings and insights effectively.

6. Data Interpretation
Drawing conclusions from the analyzed data and providing actionable insights
based on the findings.
This may involve using reports or presentations to share the results with
stakeholders.
7. Data Security and Compliance
Ensuring that data is handled in accordance with legal, regulatory, and
organizational policies regarding privacy and security.

8. Data Maintenance
Regularly updating and maintaining data to ensure its accuracy, relevance, and
reliability over time.
Tools and Technologies
Data processing often involves using various tools and frameworks, such as:

Programming Languages: Python, R, SQL


Data Processing Frameworks: Apache Spark, Hadoop
Database Management Systems: MySQL, PostgreSQL, MongoDB
Visualization Tools: Tableau, Power BI, Matplotlib, Seaborn
Common Tasks in Data Editing:
Common Tasks in Data Editing:
 Data Cleaning: Identifying and correcting errors in the data set. This includes:

 Removing duplicates
 Filling in missing values
 Correcting typos and inconsistencies.

 Data Formatting: Ensuring that data adheres to a specified format, such as:

 Standardizing date formats (e.g., MM/DD/YYYY vs. DD/MM/YYYY)


 Ensuring consistent case (e.g., all uppercase or lowercase)

 Data Validation: Checking data against defined rules to ensure its reliability, such as:

 Ensuring numerical values are within a certain range


 Validating email addresses or URLs
Common Tasks in Data Editing:
:Tools for Data Editing:
Spreadsheet Software: Tools such as Microsoft Excel or Google Sheets are widely used for
manual data editing and analysis.

Database Management Systems: SQL databases allow for more complex editing and
querying of large data sets.

Data Cleaning Tools: Software like OpenRefine, Talend, or Trifacta specifically focuses on
cleaning and transforming data.

Programming Languages: Languages like Python and R have libraries (e.g., Pandas, dplyr)
designed for data manipulation and cleaning.
Common Tasks in Data Editing:
:Best Practices for Data Editing:
Backup Original Data: Always keep a copy of the original data before making
changes.
Document Changes: Maintain a log of edits made to the data for transparency
and reproducibility.
Automate When Possible: Use scripts and tools to automate repetitive tasks,
reducing the likelihood of human error.
Test Changes: After editing, validate the changes to ensure that they have had
the desired effect without introducing new errors.
PURPOSES AND OBJECTIVES OF
DATA EDITING
◾ The basic purpose served by data editing is that it improves the
quality, accuracy and adequacy of the collected data thereby making
it more suitable for the purpose for which the data was collected. The
following can therefore be identified as the main objectives of the
data editing process :
◾ Detection of errors in the data that otherwise affect the validity of
outputs.
◾ Validation of data for the purposes it was collected.
◾ Provision of information that that would help access the overall level of
accuracy of the data.
◾ Detection and identification of any inconsistencies in the data and
TYPES OF DATA
EDITING
◾ Validity and completeness of data: refers to correctness and
completeness of obtained responses. This helps ensure that there
are no missing values or empty fields in the data bases.
◾ Range: verifies that data within a field fall between the
boundaries specified for the particular field.
◾ Duplicate data entry: this helps ensure that there is no
repetition or duplication of data and each unit on the data base or
register was filled only once.
◾ Logical consistency: through this type of editing connections
between data fields
or variables are taken into account.
STAGES OF DATA
EDITING
◾ The manual desk editing stage is a traditional method
that is put into effect by a specialized editing team. The
data, (if) on paper is checked after the data has been
collected and before it is fed into the data bases. If
however, electronic means have been used to collect the
data, the forms entered into the database are revised
individually.
◾ The automated data editing method makes use of
computer programs and systems for checking the data
all at once after it has been entered electronically.
LIMITATIONS TO DATA
EDITING
◾ Data editing can be influenced by the amount of time available, the budget, the presence
or absence of other resources
and also by the group of people involved in the editing process.
◾ The available computer software programs.
◾ Follow up with the respondents is of critical importance in the data editing process
because they are often the best source of information in many cases. However, the
respondents might feel this to be stressful and burdensome thereby causing limitations
to the data editing process.
◾ Some types of data do not require extensive editing, therefore it would be better to keep
in mind the intended uses of data and make sure that the more important part of data iz
kept free from all errors. In this way, the intended use of data does play an important
role in influencing the data editing process.
◾ What you need to do is to establish the methods and procedures that must be followed
while correcting or handling the
data errors, in the survey plan, right at the start of the project otherwise the process
would be of no or little use.
◾ Also remember that if you plan to edit your data manually, you must develop and
GENERAL GUIDELINES FOR
DATA EDITING
◾ Who should make or set the editing rules? The answer to this question would be
that such rules should be
made by professionals who are experts in data collection, questionnaire design
and analysis.
◾ The editing rules need to be consistent and free from any contradictions.
◾ When setting the editing rules, it must be established whether the variable is
qualitative or quantitative because the rules for editing either one are
different from the other.
◾ Give enough time to each of the various stages of the process, that is, data
collection entry and analysis and at the end of each make a quick check to see
that all the necessary edits have been made and that there are no empty places
within the questionnaire form.
◾ The questionnaire must be edited in full during the early stages of editing. If
however it is found that some errors remain, a sample of forms should be
subjected to re editing. The size of the sample is determined according to the
CODI
NG
When a researcher has completed collecting information or data, this information is
ready to be processed and analyzed. Quantitative data is information that is measurable
and focuses on numerical values, unlike qualitative data which is more descriptive.
During the data processing step, the collected data is transformed into a form that is
appropriate to manipulate and analyze.

Data coding is a process used in various fields, including research, statistics, and data
analysis, to convert qualitative and quantitative information into a format that can be
easily analyzed and interpreted. This process is crucial for organizing data, ensuring
consistency, and facilitating statistical analysis. Here are some key aspects of data
coding:
Types of data CODING

•Qualitative Coding: Involves categorizing qualitative data


(e.g., interviews, open-ended survey responses) into themes or
codes. Techniques include open coding, axial coding, and
selective coding.
•Quantitative Coding: Involves assigning numerical values to
responses (e.g., coding "Yes" as 1 and "No" as 0) to allow for
statistical analysis.
Steps of data CODING
 Defining Categories: Determine the variables and categories
relevant to the research or analysis.

 Creating a Codebook: Develop a codebook that lists all the


codes and their descriptions. This is essential for maintaining
consistency.

 Applying Codes: Go through the data and assign codes based


on defined categories.

 Data Entry: Enter the coded data into a statistical software or


database for analysis.
IMPORTANT TO KEEP THE FOLLOWING
POINTS IN MIND DURING CODING
OF DATA:
◾ Identification variables:-
◾ Code categories :-
◾ Preserving original
information:-
◾ Closed-ended
questions:-
◾ Open-ended questions:-
DEALING WITH
MISSING DATA
◾Refusal to answer or No
response
◾Don’t know responses
◾Processing error
◾Not applicable
◾No match

CLASSIFICAT
ION
◾ Meaning of Classification of Data
◾ It is the process of arranging data into homogeneous (similar)
groups according
to their common characteristics.
◾ Raw data cannot be easily understood, and it is not fit for
further analysis and interpretation. Arrangement of data helps users
in comparison and analysis.
◾ For example, the population of a town can be grouped according
to sex, age,
marital status, etc.
◾ Definition of classification given by Professor. Secrist -
OBJECTIVES OF DATA
CLASSIFICATION
◾ To consolidate the volume of data in such a way that
similarities and differences can be quickly understood.
Figures can consequently be ordered in sections with
common traits.
◾ To aid comparison.
◾ To point out the important characteristics of the data
at a flash.
◾ To give importance to the prominent data
collected while separating the optional elements.
TABULATI
ON
Tabulation is the systematic and logical
representation of figures in rows and columns to
ease comparison and statistical analysis. It eases
comparison by bringing related information closer to
each other and helps further in statistical research
and interpretation. In other words, tabulation is a
method of arranging or organizing data in a tabular
form. The tabulation process may be simple or
complex depending upon the type of categorization.
WHAT ARE THE ESSENTIAL PARTS
OF A TABLE?
◾ Table Number –
◾ Title of the Table –
◾ Headnote –
◾ Column Headings or
Captions –
◾ Row Headings or Stubs
-
◾ Body of a Table –
◾ Footnote –
◾ Source Note –
TYPES OF
TABULATION
◾ Simple Tabulation or One-way
Tabulation

◾ Double Tabulation or Two-way


Tabulation

◾ Complex Tabulation
WHAT ARE THE OBJECTIVES OF
TABULATION?
◾For Simplification of Complex
Data –
◾To Highlight Important
Information –
◾To Enable Easy Comparison –
◾To Help in the Statistical
VALIDATI
ON
◾ What is Data Validation?
◾ Data validation refers to the process of ensuring the accuracy and
quality of data. It is implemented by building several checks into a
system or report to ensure the logical consistency of input and
stored data.
◾ In automated systems, data is entered with minimal or no human
supervision. Therefore, it is necessary to ensure that the data that
enters the system is correct and meets the desired quality
standards. The data will be of little use if it is not entered properly
and can create bigger downstream reporting issues. Unstructured
data, even if entered correctly, will incur related costs for cleaning,
transforming, and storage.
TYPES OF DATA
VALIDATION
1.Data Type
Check
2.Code Check
3.Range Check
4.Format Check
5.Consistency
Check
ANALYSIS AND
INTERPRETATION
Data analysis is defined as a process of cleaning, transforming, and
modeling data to discover useful information for business decision-
making. The purpose of Data Analysis is to extract useful information
from data and taking the decision based upon the data analysis.
A simple example of Data analysis is whenever we take any decision
in our day-to- day life is by thinking about what happened last time or
what will happen by choosing that particular decision. This is
nothing but analyzing our past or future and making decisions
based on it. For that, we gather memories of our past or dreams of
our future. So that is nothing but data analysis. Now same thing
analyst does for business purposes, is called Data Analysis.
DATA ANALYSIS
TOOLS
TYPES OF DATA ANALYSIS:
TECHNIQUES AND
METHODS

◾Text Analysis
◾Statistical Analysis
◾Diagnostic Analysis
◾Predictive Analysis
◾Prescriptive Analysis
DATA ANALYSIS
PROCESS
◾ The Data Analysis Process is nothing but gathering information by using a
proper application or tool which allows you to explore the data and find a pattern
in it. Based on that information and data, you can make decisions, or you can get
ultimate conclusions.
◾ Data Analysis consists of the following phases:
◾ Data Requirement Gathering
◾ Data Collection
◾ Data Cleaning
◾ Data Analysis
◾ Data Interpretation
◾ Data Visualization
DATA
INTERPRETATION
◾ After analyzing your data, it’s finally time to interpret your results. You can choose the
way to express or communicate your data analysis either you can use simply in words
or maybe a table or chart. Then use the results of your data analysis process to decide
your best course of action.
◾ Data Visualization
◾ Data visualization is very common in your day to day life; they often appear in the form of
charts and graphs. In other words, data shown graphically so that it will be easier for the
human brain to understand and process it. Data visualization often used to discover
unknown facts and trends. By observing relationships and comparing datasets, you can
find a way to find out meaningful information.
IMPORTANCE OF DATA
INTERPRETATION
◾Make better
decisions
◾Find trends and
take action
◾Better resource
WHAT ARE THE STEPS IN
INTERPRETING DATA?
◾Gather the data
◾Develop your
discoveries
◾Draw Conclusions
◾Give
recommendations

You might also like