0% found this document useful (0 votes)
4 views5 pages

Statistical Data Analysis

The document outlines two primary types of data analytics: descriptive and predictive. Descriptive analytics simplifies past data for easy understanding without making predictions, while predictive analytics uses statistical measures to forecast future events and improve performance. It also discusses decision trees as a tool for predictive analytics, detailing their structure and methods for minimizing cost functions in classification and regression tasks.

Uploaded by

gs559225
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views5 pages

Statistical Data Analysis

The document outlines two primary types of data analytics: descriptive and predictive. Descriptive analytics simplifies past data for easy understanding without making predictions, while predictive analytics uses statistical measures to forecast future events and improve performance. It also discusses decision trees as a tool for predictive analytics, detailing their structure and methods for minimizing cost functions in classification and regression tasks.

Uploaded by

gs559225
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Statistical Data Analysis

(DA204C)

1/5
Descriptive analytics
► Descriptive analysis is data simplification, where past data is collected, organised and then
presented in a way that is easily understood.

► A good descriptive analysis is meant to answer relevant research questions. However, unlike other
methods of analysis, it is not used to draw inferences or predictions from its findings.

► As it is the most simplistic form of data analytics, it can stand on its own as a research product.

► Descriptive analysis uses data aggregation and data mining as two key methods for analytics.

Table: Descriptive data analysis – h t t p s : / / c o l a b . r e s e a r c h . g o o g l e . com/


d r i v e / 1ImOzFu11jATJOs1Jt0Vro0qWOYGbb_ _ l ? usp= s h a r i n g
Predictive analytics

– The term refers to the use of statistical measures and modeling techniques to make
predictions about the future events.

– It is a causal analysis tool to improve performance and minimize risks.

– Predictive analytic models help to make predictions towards a variety of data, such as
weather forecasting, or marketing trend analysis.

– Predictive models use regression methods, decision trees, and neural networks for
analytical purpose.

– In succession to predictive analysis, perspective analysis uses the knowledge gained at


previous two steps to determine future course of actions

– Prescriptive analytics anticipates what, when and, importantly, why something


might happen.
Predictive analytics tools – Decision trees
Decision trees are used to visually and explicitly represent
decisions and decision making.

• It is a common analytic tool for classification and


regression tasks.

• The trees are implemented upside–down, the root node


being on the top.

• Decision tree algorithms are referred to as CART


(classification and regression trees).

– Decisions are generally learnt on the basis of recursive binary splitting.

–Each feature of the dataset acts as a candidate to determine the cost of splitting.

–The candidate feature imparting the least cost of splitting is chosen based on Greedy
algorithm.
Predictive analytics tools – Decision trees (Contd.)
– Minimizing cost function in decision trees is a process to find most homogeneous branches

– For regression tasks, the cost function usually is the mean of distance between predicted
data points (yˆ) and actual curve (y ).

N
1
L= Σ (y i − yˆ)i 2
N
i =1

This cost is calculated for all candidate splits, and the candidate with minimum cost is
chosen.

– For classification, entropy or cross–entropy functions are used for determining cost.

– Gini index is also a good measure. It is given by

1 K
G=
K Σ p (1 − p )
K k
k =1

where pk is the proportion of the same class in k th group.

You might also like