0% found this document useful (0 votes)
88 views11 pages

Data Analysis UNIT-III

Data analysis is the process of studying data to answer questions about past events like sales or growth. It involves filtering relevant data from larger datasets collected to become the target of analysis. Data analysis defines patterns and reports that can then be used for data analytics. Data analytics takes the results of analysis to help with decision making, like predicting future events based on past data. Common types of data analysis include descriptive analysis of past performance, diagnostic analysis of why events occurred, predictive analysis of what may happen in the future, and prescriptive analysis which helps determine the best options.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views11 pages

Data Analysis UNIT-III

Data analysis is the process of studying data to answer questions about past events like sales or growth. It involves filtering relevant data from larger datasets collected to become the target of analysis. Data analysis defines patterns and reports that can then be used for data analytics. Data analytics takes the results of analysis to help with decision making, like predicting future events based on past data. Common types of data analysis include descriptive analysis of past performance, diagnostic analysis of why events occurred, predictive analysis of what may happen in the future, and prescriptive analysis which helps determine the best options.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Data Analysis:

the term “Analysis” is a process of answering “How?” and “Why?”. For


example, how was the growth of XYZ Company in the last quarter? Or why
did the sales of XYZ Company drop last summer? So to answer those
questions we take the data that we already have.
Out of that, we filter out what we need. This filtered data is the final dataset of
the larger chunk that we have already collected and that becomes the target of
data analysis
Data Analysis Definition:
The process of studying the data to find out the answers to how and why things
happened in the past. Usually, the result of data analysis is the final dataset, i.e a
pattern, or a detailed report that you can further use for Data Analytics.

Defining Data Analysis by Differentiating with Data Analytics


So what does Data Analytics mean? When you have done with data analysis,
you have all your results, reports, and data sets in your hand. Now, what next?
Next, you will take a step towards decision making and that step is known as
“Data Analytics“.
In data analytics, reading the data set or the outcome of the data analysis and
processing them to find out the events that are occur in the future.
Example:
 Let’s say you own a business and sell daily products.
 You buy products from the supplier and sell them to the customer.
 Let’s assume the biggest challenge for your business is to find the right
amount of stock at the given time. You can’t stock excess dairy products
as they are perishable and if they go bad you can’t sell them, resulting in
a direct loss for you. At the same time, you can not understock as it may
result in the loss of potential customers.
 But data analytics can help you in predicting the strength of your
customers at a given time.
 Using that result, you can sufficiently stock your supplies, in turn,
minimizing the loss.
 data analysis, you can find out the time of the year when your store has
the least or the most customers. Using this info, you can stock your
supplies accordingly.

Types of Data Analysis Methods


1. Descriptive Analysis
2. Diagnostic Analysis
3. Predictive Analysis
4. Prescriptive Analysis
5. Statistical Analysis

1. Descriptive Analysis
Descriptive Analysis looks at data and analyzes past events for insight as to
how to approach future events. It looks at the past performance and
understands the performance by mining historical data to understand the cause
of success or failure in the past. Almost all management reporting such as
sales, marketing, operations, and finance uses this type of analysis.
Example: Let’s take the example of DMart, we can look at the product’s
history and find out which products have been sold more or which products
have large demand by looking at the product sold trends, and based on their
analysis we can further make the decision of putting a stock of that item in
large quantity for the coming year.
Diagnostic Analysis
Diagnostic analysis works hand in hand with Descriptive Analysis. As
descriptive Analysis finds out what happened in the past, diagnostic Analysis,
on the other hand, finds out why did that happen or what measures were taken
at that time, or how frequently it has happened. it basically gives a detailed
explanation of a particular scenario by understanding behavior patterns.
Example: Let’s take the example of Dmart again. Now if we want to find out
why a particular product has a lot of demand, is it because of their brand or is
it because of quality. All this information can easily be identified using
diagnostic Analysis.
3. Predictive Analysis
Information we have received from descriptive and diagnostic analysis, we can
use that information to predict future data. it basically finds out what is likely
to happen in the future. Now when future data doesn’t mean we have become
fortune-tellers, by looking at the past trends and behavioral patterns we are
forecasting that it might happen in the future.
Example: The best example would be Amazon and Netflix recommender
systems. You might have noticed that whenever you buy any product from
Amazon, on the payment side it shows you a recommendation saying the
customer who purchased this has also purchased this product that
recommendation is based on the customer purchase behavior in the past. By
looking at customer past purchase behavior analyst creates an association
between each product and that’s the reason it shows recommendation when
you buy any product.
4. Prescriptive Analysis
This is an advanced method of Predictive Analysis. Now when you predict
something or when you start thinking out of the box you will definitely have a
lot of options, and then we get confused as to which option will actually work.
Prescriptive Analysis helps to find which is the best option to make it happen
or work. As predictive Analysis forecast future data, Prescriptive Analysis on
the other hand helps to make it happen whatever we have forecasted.
Prescriptive Analysis is the highest level of Analysis that is used for choosing
the best optimal solution by looking at descriptive, diagnostic, and predictive
data.
Example: The best example would be Google’s self-driving car, by looking
at the past trends and forecasted data it identifies when to turn or when to slow
down, which works much like a human driver.

Statistical Analysis
Statistical Analysis is a statistical approach or technique for analyzing data
sets in order to summarize their important and main characteristics generally
by using some visual aids. This approach can be used to gather knowledge
about the following aspects of data:
1. Main characteristics or features of the data.
2. The variables and their relationships.
3. Finding out the important variables that can be used in our problem.

Data Analysis Process


 Data analysis has the ability to transform raw data into meaningful
insights for your business and your decision-making.
 While there are several different ways of collecting and interpreting this
data, most data-analysis processes follow the steps.
1. Specify Data Requirements
2. Collect Data
3. Clean and Process the Data
4. Analyse the Data
5. Interpretation
6. Report

Specify Data Requirements


data analysis process define what you want to answer through data.
This typically stems from a business problem or questions, such as
 How can we reduce production costs without sacrificing quality?
 How do customers view our brand?
 How can we increase sales opportunities using our current resources?

2. Collect Data
 Find Your Source: Determine what information can be collected from
existing sources, and what you need to find elsewhere.
 Standardize Collection: Create file storage and naming system ahead of
time.
 Keep Track: Keep data organized in a log with dates and add any source
notes as you go.

Where is data collected?

Internal Sources External Sources

Customer service data Social media APIs

Marketing analytics Google public data

Sales statistics Public government data

Human resource data Global finance data

Google trends

Official research statistics


Internal Sources External Sources

Clean and Process the Data


Ensure your data is correct and usable by identifying and removing any errors
or corruption.
 Monitor Errors: Keep a record and look at trends of where most errors are
coming from.
 Validate Accuracy: Research and invest in data tools that allow you to
clean your data in real-time.
 Scrub for Duplicate Data: Identify and remove duplicates so you save
time during analysis.
 Delete all Formatting: Standardise the look of your data by removing any
formatting styles.
4. Analyse the Data
Different data analysis techniques allow you to understand, interpret, and
derive conclusions based on your business question or problem.

Descriptive Analysis Inferential Analysis

Analysis of data that helps show Exploring the relationship between


variables in a meaningful way and find multiple variables to make
patterns. predictions.

Measure of Tendency: The central


Correlation: Describe the
position of a frequency distribution for a
relationship between two variables.
group of data.

Measure of Spread: Summarising a


Regression: Shows or predicts the
group of data by describing how to
relationship between two variables.
spread out the scores are.

Analysis of Variance: Tests the


extent to which two groups differ.
5. Interpretation
As you interpret the result of your data, ask yourself these key questions:
 Does the data answer your question? How?
 Does the data help you defend against any objections? How?
 Are there any limitations or angles you haven’t considered?
6. Report
Data Analysis can be used to report to different people:
 A primary collaborator or client
 Executive and business leaders
 A technical supervisor

 Keep it Succinct: Organize data in a way that makes it easy for different
audiences to skim through it to find the information most relevant to them.
 Make it Visual: Use data visualizations techniques, such as tables and
charts, to communicate the message clearly.
 Include an Executive Summary: This allows someone to analyze your
findings upfront and harness your most important points to influence their
decisions.

Top Data Analysis Tools


Data analysis tools make it easier for users to process and manipulate data,
analyze the relationships and correlations between data sets, and it also helps
to identify patterns and trends for interpretation. Below is the list of some
popular tools explain briefly:
 SAS :SAS was a programming language developed by the SAS Institute for
performed advanced analytics, multivariate analyses, business intelligence,
data management, and predictive analytics. , SAS was developed for very
specific uses and powerful tools are not added every day to the extensive
already existing collection thus making it less scalable for certain
applications.
 Microsoft Excel :It is an important spreadsheet application that can be
useful for recording expenses, charting data, and performing easy
manipulation and lookup and or generating pivot tables to provide the
desired summarized reports of large datasets that contain significant data
findings. It is written in C#, C++, and .NET Framework, and its stable
version was released in 2016.
 R :It is one of the leading programming languages for performing complex
statistical computations and graphics. It is a free and open-source language
that can be run on various UNIX platforms, Windows, and macOS. It also
has a command-line interface that is easy to use. However, it is tough to
learn especially for people who do not have prior knowledge about
programming.
 Python:It is a powerful high-level programming language that is used for
general-purpose programming. Python supports both structured and
functional programming methods. Its extensive collection of libraries make
it very useful in data analysis. Knowledge
of Tensorflow, Theano, Keras, Matplotlib, Scikit-learn, and Keras can get
you a lot closer to your dream of becoming a machine learning engineer.
 Tableau Public:Tableau Public is free software developed by the public
company “Tableau Software” that allows users to connect to any
spreadsheet or file and create interactive data visualizations. It can also be
used to create maps, dashboards along with real-time updation for easy
presentation on the web. The results can be shared through social media
sites or directly with the client making it very convenient to use.
 RapidMiner: RapidMiner is an extremely versatile data science platform
developed by “RapidMiner Inc”. The software emphasizes lightning-fast
data science capabilities and provides an integrated environment for the
preparation of data and application of machine learning, deep learning, text
mining, and predictive analytical techniques. It can also work with many
data source types including Access, SQL, Excel, Tera data, Sybase,
Oracle, MySQL, and Dbase.
 Knime :Knime, the Konstanz Information Miner is a free and open-source
data analytics software. It is also used as a reporting and integration
platform. It involves the integration of various components for Machine
Learning and data mining through the modular data-pipe lining. It is written
in Java and developed by KNIME.com AG. It can be operated in various
operating systems such as Linux, OS X, and Windows.

Introduction of Statistics and its Types:

Statistics simply means numerical data, and is field of math that generally
deals with collection of data, tabulation, and interpretation of numerical data.
It is actually a form of mathematical analysis that uses different quantitative
models to produce a set of experimental data or studies of real life. It is an area
of applied mathematics concern with data collection analysis, interpretation,
and presentation. Statistics deals with how data can be used to solve complex
problems. Some people consider statistics to be a distinct mathematical
science rather than a branch of mathematics.
Statistics makes work easy and simple and provides a clear and clean picture
of work you do on a regular basis.
Basic terminology of Statistics :
 Population –
It is actually a collection of set of individuals or objects or events whose
properties are to be analyzed.
 Sample –
It is the subset of a population.
Types of Statistics :

1. Descriptive Statistics :
Descriptive statistics uses data that provides a description of the population
either through numerical calculation or graph or table. It provides a graphical
summary of data. It is simply used for summarizing objects, etc. There are two
categories in this as following below.
(a). Measure of central tendency –
Measure of central tendency is also known as summary statistics that is used to
represents the center point or a particular value of a data set or sample set.
In statistics, there are three common measures of central tendency as shown
below:
(i) Mean :
It is measure of average of all value in a sample set.
For example,

ii) Median :
It is measure of central value of a sample set. In these, data set is ordered from
lowest to highest value and then finds exact middle.
For example,

(iii) Mode :
It is value most frequently arrived in sample set. The value repeated most of
time in central set is actually mode.
For example,

(b). Measure of Variability –


Measure of Variability is also known as measure of dispersion and used to
describe variability in a sample or population. In statistics, there are three
common measures of variability as shown below:
(i) Range :
It is given measure of how to spread apart values in sample set or data set.
Range = Maximum value - Minimum value
(ii) Variance :
It simply describes how much a random variable defers from expected value
and it is also computed as square of deviation.
S2= ∑ni=1 [(xi - ͞x)2 ÷ n]
In these formula, n represent total data points, ͞x represent mean of data points
and xi represent individual data points.
(iii) Dispersion :
It is measure of dispersion of set of data from its mean.
σ= √ (1÷n) ∑ni=1 (xi - μ)2
2. Inferential Statistics :
Inferential Statistics makes inference and prediction about population based on
a sample of data taken from population. It generalizes a large dataset and
applies probabilities to draw a conclusion. It is simply used for explaining
meaning of descriptive stats. It is simply used to analyze, interpret result, and
draw conclusion. Inferential Statistics is mainly related to and associated with
hypothesis testing whose main target is to reject null hypothesis.
Hypothesis testing is a type of inferential procedure that takes help of sample
data to evaluate and assess credibility of a hypothesis about a population.
Inferential statistics are generally used to determine how strong relationship is
within sample. But it is very difficult to obtain a population list and draw a
random sample.
Inferential statistics can be done with help of various steps as given below:
1. Obtain and start with a theory.
2. Generate a research hypothesis.
3. Operationalize or use variables
4. Identify or find out population to which we can apply study material.
5. Generate or form a null hypothesis for these population.
6. Collect and gather a sample of children from population and simply run
study.
7. Then, perform all tests of statistical to clarify if obtained characteristics of
sample are sufficiently different from what would be expected under null
hypothesis so that we can be able to find and reject null hypothesis.
Types of inferential statistics –
Various types of inferential statistics are used widely nowadays and are very
easy to interpret. These are given below:
 One sample test of difference/One sample hypothesis test
 Confidence Interval
 Contingency Tables and Chi-Square Statistic
 T-test or Anova
 Pearson Correlation
 Bi-variate Regression
 Multi-variate Regression

You might also like