0% found this document useful (0 votes)
74 views4 pages

Jss Science and Technology University MYSURU-570006 Department of Information Science and Engineering

Statistical data mining is a field of software engineering that uses statistical techniques to analyze large amounts of data. It provides tools and analytics methods to deal with multidimensional and complex data sets. Major statistical methods for data analysis include regression, generalized linear models, analysis of variance, mixed-effect models, factor analysis, discriminant analysis, survival analysis, and quality control. These methods are designed to efficiently handle large volumes of data and extract meaningful insights. Statistical data mining has applications in estimating population parameters, hypothesis testing, correlation analysis, regression analysis, understanding data spread and distribution, and excluding irrelevant data.

Uploaded by

anirudh devaraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views4 pages

Jss Science and Technology University MYSURU-570006 Department of Information Science and Engineering

Statistical data mining is a field of software engineering that uses statistical techniques to analyze large amounts of data. It provides tools and analytics methods to deal with multidimensional and complex data sets. Major statistical methods for data analysis include regression, generalized linear models, analysis of variance, mixed-effect models, factor analysis, discriminant analysis, survival analysis, and quality control. These methods are designed to efficiently handle large volumes of data and extract meaningful insights. Statistical data mining has applications in estimating population parameters, hypothesis testing, correlation analysis, regression analysis, understanding data spread and distribution, and excluding irrelevant data.

Uploaded by

anirudh devaraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

JSS SCIENCE AND TECHNOLOGY UNIVERSITY

MYSURU-570006
Department of Information Science and Engineering

Advanced Data Mining Techniques Assignment on:


Statistical Data Mining

Submitted by:
Anirudh D (01JST19PSE001)
Statistical Data Mining

Data Mining is extracting information from huge sets of data. In other words, data mining is
the procedure of mining knowledge from data.

Due to the broad scope of data mining and the large variety of data mining methodologies are
present. Some of the methodologies include Statistical Data Mining. Foundations of Data
Mining, Visual and Audio Data Mining etc.

Statistical Data Mining is a field of software engineering. Statistics is a component of


data mining that provides the tools and analytics techniques for dealing with large amounts of
data. It is the science of learning from data and includes everything from collecting and
organizing to analysing and presenting data. Statistics focuses on probabilistic models,
specifically inference, using data. It is the procedure of finding examples in broad information
sets including strategies of manmade brainpower, machine learning and database frameworks.
The general objective of the mining is to concentrate data from an information set and change
it into a structure for further use. It includes database and information administration
viewpoints, information pre-preparing, models etc.

Statistical Data Mining are designed for the efficient handling of huge amounts of data that are
typically multidimensional and possibly of various complex types. Major statistical methods
for data analysis include:

• Regression
• Generalized linear models
• Analysis of variance
• Mixed-effect models
• Factor analysis
• Discriminant analysis
• Survival analysis
• Quality control
1. Regression: Regression is a data mining technique used to fit an equation to a dataset.
Regression analysis is a statistical method to model the relationship between a
dependent and independent variable with one or more independent variables. More
specifically, Regression analysis helps us to understand how the value of the dependent
variable is changing corresponding to an independent variable. It predicts
continuous/real values such as temperature, age, salary, price, etc.
2. Generalized linear models: These models and their generalization, allow a categorical
response variable to be related to a set of predictor variables in a manner similar to the
modelling of a numeric response variable using linear regression. Generalized linear
models include logistic regression and Poisson regression.
3. Analysis of variance: Analysis of variance (ANOVA) is an analysis tool used in
statistics that splits an observed aggregate variability found inside a data set into two
parts: systematic factors and random factors. The systematic factors have a statistical
influence on the given data set, while the random factors do not. Analysts use the
ANOVA test to determine the influence that independent variables have on the
dependent variable in a regression study.
4. Mixed-effect models: Mixed-effect models are an extension of simple linear models
to allow both fixed and random effects, and are particularly used when there is non-
independence in the data, such as arises from a hierarchical structure. They describe
relationships between a response variable and some covariates in data grouped
according to one or more factors. Common areas of application include multilevel data,
repeated measures data, block designs, and longitudinal data.
5. Factor analysis: Factor Analysis is an exploratory data analysis method used to search
influential underlying factors from a set of observed variables. It helps in data
interpretations by reducing the number of variables. It extracts maximum common
variance from all variables and puts them into a common score. Factor analysis is
widely utilized in market research, advertising, psychology, finance, and operation
research.
6. Discriminant analysis: Discriminant analysis is a statistical method that helps to
understand the relationship between a dependent variable and one or more independent
variables. A dependent variable is the variable used to explain or predict from the
values of the independent variables. Discriminant analysis is similar to regression
analysis and analysis of variance (ANOVA). Discriminant analysis is commonly used
in social sciences.
7. Survival analysis: Survival analysis is one of the primary statistical methods for
analysing data on time to an event such as death, heart attack, device failure, etc. Such
data analysis is essential for many facets of legal proceedings including apportioning
cost of future medical care, estimating years of life lost, evaluating product reliability,
assessing drug safety. Some methods include Kaplan Meier estimates of survival, Cox
proportional hazards regression models.
8. Quality Control: Quality control is a set of methods used by organizations to achieve
quality parameters or quality goals and continually improve the organization's ability
to ensure that a software product will meet quality goals. It confirms that the standards
are followed while working on the product.

Applications of Statistical Data Mining

• Use a sample to estimate the values of a population’s parameters.


• Carry out hypothesis tests to see if two datasets are similar or disparate.
• Carry out correlation analysis to examine if two variables are interdependent.
• Conduct linear- or multiple-regression analysis to explain causation.
• Analyse the spread of data.
• Understand the distribution of data.
• Exclude data to ensure only relevant data is used for analyses.

You might also like