Statistical Data Analysis
Statistical Data Analysis
(DA204C)
1/5
Descriptive analytics
► Descriptive analysis is data simplification, where past data is collected, organised and then
presented in a way that is easily understood.
► A good descriptive analysis is meant to answer relevant research questions. However, unlike other
methods of analysis, it is not used to draw inferences or predictions from its findings.
► As it is the most simplistic form of data analytics, it can stand on its own as a research product.
► Descriptive analysis uses data aggregation and data mining as two key methods for analytics.
– The term refers to the use of statistical measures and modeling techniques to make
predictions about the future events.
– Predictive analytic models help to make predictions towards a variety of data, such as
weather forecasting, or marketing trend analysis.
– Predictive models use regression methods, decision trees, and neural networks for
analytical purpose.
–Each feature of the dataset acts as a candidate to determine the cost of splitting.
–The candidate feature imparting the least cost of splitting is chosen based on Greedy
algorithm.
Predictive analytics tools – Decision trees (Contd.)
– Minimizing cost function in decision trees is a process to find most homogeneous branches
– For regression tasks, the cost function usually is the mean of distance between predicted
data points (yˆ) and actual curve (y ).
N
1
L= Σ (y i − yˆ)i 2
N
i =1
This cost is calculated for all candidate splits, and the candidate with minimum cost is
chosen.
– For classification, entropy or cross–entropy functions are used for determining cost.
1 K
G=
K Σ p (1 − p )
K k
k =1