Computer >> Computer tutorials >  >> Programming >> Programming

What is the techniques of statistical data mining?


There are various techniques of statistical data mining which are as follows −

Regression − These approaches are used to forecast the value of a response (dependent) variable from one or more predictor (independent) variables where the variables are numeric. There are several forms of regression, including linear, multiple, weighted, polynomial, nonparametric, and robust (robust techniques are beneficial when errors fail to satisfy normalcy conditions or when the data includes significant outliers).

Generalized linear models − These models, and their generalization (generalized additive models), enable a categorical response variable (or some transformation of it) to be associated with a set of predictor variables like the modeling of a numeric response variable using linear regression. Generalized linear models contain logistic regression and Poisson regression.

Analysis of variance − These methods analyze experimental data for two or more populations defined by a numeric response variable and one or more categorical variables (factors). In general, an ANOVA (single-factor analysis of variance) problem contains a comparison of k population or treatment defines to decide if at least two of the means are different.

Mixed-effect models − These models are for analyzing grouped data—data that can be categorized as per one or more grouping variables. They generally define relationships between a response variable and some covariates in data combined as per one or more factors. Typical areas of application such as multilevel data, repeated measures data, block designs, and longitudinal data.

Factor analysis − This method can determine which variables are merged to make a given factor. For instance, for some psychiatric data, it is not feasible to measure a specific factor of interest directly (including intelligence); however, it is applicable to measure other quantities (including student test scores) that reflect the element of interest. Here, none of the variables are designated as dependent.

Discriminant analysis − This method can predict a categorical response variable. Unlike generalized linear models, it implies that the independent variables follow a multivariate normal distribution.

The process tries to determine some discriminant functions (linear set of the independent variables) that discriminate between the groups represented by the response variable. Discriminant analysis is generally used in social sciences.

Time series analysis − There are some statistical techniques for analyzing time-series data, including auto-regression methods, univariate ARIMA (autoregressive integrated moving average) modeling, and long-memory time-sequence modeling.

Survival analysis − Several well-established statistical methods exist for survival analysis. These methods initially were designed to forecast the probability that a patient undergoing medical treatment can survive at least to time t.

Quality control − Several statistics can be used to prepare charts for quality control, including Shewhart charts and CUSUM charts (both of which display group summary statistics). These statistics contain the mean, standard deviation, range, count, moving average, moving standard deviation, and moving range.