Jss Science and Technology University MYSURU-570006 Department of Information Science and Engineering

Statistical data mining is a field of software engineering that uses statistical techniques to analyze large amounts of data. It provides tools and analytics methods to deal with multidimensional and complex data sets. Major statistical methods for data analysis include regression, generalized linear models, analysis of variance, mixed-effect models, factor analysis, discriminant analysis, survival analysis, and quality control. These methods are designed to efficiently handle large volumes of data and extract meaningful insights. Statistical data mining has applications in estimating population parameters, hypothesis testing, correlation analysis, regression analysis, understanding data spread and distribution, and excluding irrelevant data.

Uploaded by

anirudh devaraj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views4 pages

Jss Science and Technology University MYSURU-570006 Department of Information Science and Engineering

Uploaded by

anirudh devaraj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

JSS SCIENCE AND TECHNOLOGY UNIVERSITY

MYSURU-570006
Department of Information Science and Engineering

Advanced Data Mining Techniques Assignment on:

Statistical Data Mining

Submitted by:
Anirudh D (01JST19PSE001)
Statistical Data Mining

Data Mining is extracting information from huge sets of data. In other words, data mining is
the procedure of mining knowledge from data.

Due to the broad scope of data mining and the large variety of data mining methodologies are
present. Some of the methodologies include Statistical Data Mining. Foundations of Data
Mining, Visual and Audio Data Mining etc.

Statistical Data Mining is a field of software engineering. Statistics is a component of

data mining that provides the tools and analytics techniques for dealing with large amounts of
data. It is the science of learning from data and includes everything from collecting and
organizing to analysing and presenting data. Statistics focuses on probabilistic models,
specifically inference, using data. It is the procedure of finding examples in broad information
sets including strategies of manmade brainpower, machine learning and database frameworks.
The general objective of the mining is to concentrate data from an information set and change
it into a structure for further use. It includes database and information administration
viewpoints, information pre-preparing, models etc.

Statistical Data Mining are designed for the efficient handling of huge amounts of data that are
typically multidimensional and possibly of various complex types. Major statistical methods
for data analysis include:

• Regression
• Generalized linear models
• Analysis of variance
• Mixed-effect models
• Factor analysis
• Discriminant analysis
• Survival analysis
• Quality control
1. Regression: Regression is a data mining technique used to fit an equation to a dataset.
Regression analysis is a statistical method to model the relationship between a
dependent and independent variable with one or more independent variables. More
specifically, Regression analysis helps us to understand how the value of the dependent
variable is changing corresponding to an independent variable. It predicts
continuous/real values such as temperature, age, salary, price, etc.
2. Generalized linear models: These models and their generalization, allow a categorical
response variable to be related to a set of predictor variables in a manner similar to the
modelling of a numeric response variable using linear regression. Generalized linear
models include logistic regression and Poisson regression.
3. Analysis of variance: Analysis of variance (ANOVA) is an analysis tool used in
statistics that splits an observed aggregate variability found inside a data set into two
parts: systematic factors and random factors. The systematic factors have a statistical
influence on the given data set, while the random factors do not. Analysts use the
ANOVA test to determine the influence that independent variables have on the
dependent variable in a regression study.
4. Mixed-effect models: Mixed-effect models are an extension of simple linear models
to allow both fixed and random effects, and are particularly used when there is non-
independence in the data, such as arises from a hierarchical structure. They describe
relationships between a response variable and some covariates in data grouped
according to one or more factors. Common areas of application include multilevel data,
repeated measures data, block designs, and longitudinal data.
5. Factor analysis: Factor Analysis is an exploratory data analysis method used to search
influential underlying factors from a set of observed variables. It helps in data
interpretations by reducing the number of variables. It extracts maximum common
variance from all variables and puts them into a common score. Factor analysis is
widely utilized in market research, advertising, psychology, finance, and operation
research.
6. Discriminant analysis: Discriminant analysis is a statistical method that helps to
understand the relationship between a dependent variable and one or more independent
variables. A dependent variable is the variable used to explain or predict from the
values of the independent variables. Discriminant analysis is similar to regression
analysis and analysis of variance (ANOVA). Discriminant analysis is commonly used
in social sciences.
7. Survival analysis: Survival analysis is one of the primary statistical methods for
analysing data on time to an event such as death, heart attack, device failure, etc. Such
data analysis is essential for many facets of legal proceedings including apportioning
cost of future medical care, estimating years of life lost, evaluating product reliability,
assessing drug safety. Some methods include Kaplan Meier estimates of survival, Cox
proportional hazards regression models.
8. Quality Control: Quality control is a set of methods used by organizations to achieve
quality parameters or quality goals and continually improve the organization's ability
to ensure that a software product will meet quality goals. It confirms that the standards
are followed while working on the product.

Applications of Statistical Data Mining

• Use a sample to estimate the values of a population’s parameters.

• Carry out hypothesis tests to see if two datasets are similar or disparate.
• Carry out correlation analysis to examine if two variables are interdependent.
• Conduct linear- or multiple-regression analysis to explain causation.
• Analyse the spread of data.
• Understand the distribution of data.
• Exclude data to ensure only relevant data is used for analyses.

Unit 05: Data Preparation & Analysis
100% (1)
Unit 05: Data Preparation & Analysis
26 pages
DWDM 4 Unit Notes
No ratings yet
DWDM 4 Unit Notes
21 pages
DA5.6 Marketing Analytics Q&a
No ratings yet
DA5.6 Marketing Analytics Q&a
4 pages
Wa0016.
No ratings yet
Wa0016.
60 pages
Module Ii
No ratings yet
Module Ii
31 pages
MODULE II Updated
No ratings yet
MODULE II Updated
13 pages
Best Practices For
No ratings yet
Best Practices For
8 pages
30-Additional Themes On Data Mining
No ratings yet
30-Additional Themes On Data Mining
9 pages
Unit - 5
No ratings yet
Unit - 5
14 pages
BI - Unit 5
No ratings yet
BI - Unit 5
9 pages
Unit 2
No ratings yet
Unit 2
20 pages
Dsur Ea2352001010391 W1
No ratings yet
Dsur Ea2352001010391 W1
6 pages
Glossary of Research Methodology
From Everand
Glossary of Research Methodology
Dr. Awadhesh Kishore
No ratings yet
Unit Ii DM
No ratings yet
Unit Ii DM
8 pages
Overview Of Bayesian Approach To Statistical Methods: Software
From Everand
Overview Of Bayesian Approach To Statistical Methods: Software
Vinaitheerthan Renganathan
No ratings yet
26 - May - Assingment
No ratings yet
26 - May - Assingment
3 pages
Trapti Chap1
No ratings yet
Trapti Chap1
14 pages
Biostatistics Explored Through R Software: An Overview
From Everand
Biostatistics Explored Through R Software: An Overview
Vinaitheerthan Renganathan
3.5/5 (2)
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Regression
No ratings yet
Regression
86 pages
Notes
No ratings yet
Notes
5 pages
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
No ratings yet
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
14 pages
Data Mining Techniques
No ratings yet
Data Mining Techniques
2 pages
Unit 1 DM
No ratings yet
Unit 1 DM
24 pages
7 Types of Analysis Meneses
No ratings yet
7 Types of Analysis Meneses
1 page
Statistical Data Project
No ratings yet
Statistical Data Project
4 pages
Data Analytics
No ratings yet
Data Analytics
17 pages
Data Mining-Unit-1
No ratings yet
Data Mining-Unit-1
21 pages
UNIT II-DSDA - Docx Notes
No ratings yet
UNIT II-DSDA - Docx Notes
26 pages
Datawarehouse Assignment
No ratings yet
Datawarehouse Assignment
11 pages
Seminar On Data Mining Concepts and Its
No ratings yet
Seminar On Data Mining Concepts and Its
8 pages
Lec 02
No ratings yet
Lec 02
33 pages
7 Types of Statistical Analysis
100% (1)
7 Types of Statistical Analysis
9 pages
Data Mining Reviewer
No ratings yet
Data Mining Reviewer
4 pages
Wisdom and StatisticsTecq-Amitava
No ratings yet
Wisdom and StatisticsTecq-Amitava
18 pages
Rudra Bhatt Data
No ratings yet
Rudra Bhatt Data
9 pages
MULTIVARIATE ANALYSIS Part 1
No ratings yet
MULTIVARIATE ANALYSIS Part 1
30 pages
Data Mining Techniques: By-Priyank Yadav CSE
No ratings yet
Data Mining Techniques: By-Priyank Yadav CSE
8 pages
Survey Paper SN
No ratings yet
Survey Paper SN
4 pages
Research Paper PDF
No ratings yet
Research Paper PDF
15 pages
Statistical Data Analysis
No ratings yet
Statistical Data Analysis
4 pages
Application of Data Mining - A Survey Paper: Aarti Sharma, Rahul Sharma, Vivek Kr. Sharma, Vishal Shrivatava
No ratings yet
Application of Data Mining - A Survey Paper: Aarti Sharma, Rahul Sharma, Vivek Kr. Sharma, Vishal Shrivatava
3 pages
InTech-Mining Enrollment Data Using Descriptive and Predictive Approaches
No ratings yet
InTech-Mining Enrollment Data Using Descriptive and Predictive Approaches
21 pages
4th Unit Research Methodology 4th
No ratings yet
4th Unit Research Methodology 4th
10 pages
Synopsis Print
No ratings yet
Synopsis Print
4 pages
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
No ratings yet
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
10 pages
Different Types of Data Analysis - Data Analysis Methods and Techniques in Research Projects
No ratings yet
Different Types of Data Analysis - Data Analysis Methods and Techniques in Research Projects
10 pages
DATA MINIING Unit 1 Notes
No ratings yet
DATA MINIING Unit 1 Notes
22 pages
Data Analysis Methods
No ratings yet
Data Analysis Methods
3 pages
Lecture 13
No ratings yet
Lecture 13
51 pages
Unit 3 Data Warehousing and Data Mining
No ratings yet
Unit 3 Data Warehousing and Data Mining
7 pages
Bana1 Visualization
No ratings yet
Bana1 Visualization
22 pages
Unit 1 Notes - Data Analysis Using R
No ratings yet
Unit 1 Notes - Data Analysis Using R
17 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
Topic 4 - Data Mining Tools and Technique
No ratings yet
Topic 4 - Data Mining Tools and Technique
22 pages
TJ 11 2017 3 128 132
No ratings yet
TJ 11 2017 3 128 132
5 pages
Data Mining Process 1
No ratings yet
Data Mining Process 1
4 pages
Knowledge Management UNIT-3 Notes
No ratings yet
Knowledge Management UNIT-3 Notes
17 pages
5th Module SDS
No ratings yet
5th Module SDS
13 pages
Data Mining Notes Jntuh Compress
No ratings yet
Data Mining Notes Jntuh Compress
62 pages
SQA R Paper 1
No ratings yet
SQA R Paper 1
20 pages
Mining Infrequent Itemset Using Association Rule: P.Kavya A.Kalaiselvi
No ratings yet
Mining Infrequent Itemset Using Association Rule: P.Kavya A.Kalaiselvi
4 pages
Explain Multirelational Data Mining Concept in Detail
No ratings yet
Explain Multirelational Data Mining Concept in Detail
7 pages
Li2015 PDF
No ratings yet
Li2015 PDF
329 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
Chapter1 PDF
100% (1)
Chapter1 PDF
80 pages
Projector+itachi CP S370W
No ratings yet
Projector+itachi CP S370W
60 pages
Electro Chemistry (MS)
No ratings yet
Electro Chemistry (MS)
208 pages
German Volume Training (GVT) Program Spreadsheet
No ratings yet
German Volume Training (GVT) Program Spreadsheet
26 pages
ATS - Daily Trading Plan 27agustus2018
No ratings yet
ATS - Daily Trading Plan 27agustus2018
1 page
Tensor Software
100% (1)
Tensor Software
6 pages
MAths IGCSE PAper 2 May 2002
60% (5)
MAths IGCSE PAper 2 May 2002
12 pages
MPL Series P21 - 33
No ratings yet
MPL Series P21 - 33
13 pages
A Practical Approach To Vascular Access For Hemodialysis and Predictors of Success
No ratings yet
A Practical Approach To Vascular Access For Hemodialysis and Predictors of Success
7 pages
Exp # 1 Melting Point
No ratings yet
Exp # 1 Melting Point
11 pages
GM Screen Daggerheart - Portrait
No ratings yet
GM Screen Daggerheart - Portrait
4 pages
Belimo EF Installation-Instructions En-Us
No ratings yet
Belimo EF Installation-Instructions En-Us
10 pages
Pc102 Document SemesterProjectWorkbook
No ratings yet
Pc102 Document SemesterProjectWorkbook
6 pages
Psychological Statistics Assignment
No ratings yet
Psychological Statistics Assignment
4 pages
Calculation For Open Drain Design: Rain Storm Discharge Calculation
No ratings yet
Calculation For Open Drain Design: Rain Storm Discharge Calculation
45 pages
3-1 Derivatives of Elementary Weaves
No ratings yet
3-1 Derivatives of Elementary Weaves
20 pages
NSX Administration Guide NSX For Vsphere 6.2
No ratings yet
NSX Administration Guide NSX For Vsphere 6.2
370 pages
STAT-231-Statistical Methods
No ratings yet
STAT-231-Statistical Methods
98 pages
Short Notes On Servo Motor
100% (3)
Short Notes On Servo Motor
2 pages
A Method For Selection of Power MOSFETs To Minimiz
No ratings yet
A Method For Selection of Power MOSFETs To Minimiz
8 pages
Physical Characterization of Activated Carbon Derived From Mangosteen Peel PDF
No ratings yet
Physical Characterization of Activated Carbon Derived From Mangosteen Peel PDF
5 pages
Fluid Focus Lens PDF
No ratings yet
Fluid Focus Lens PDF
25 pages
S01 Hydraulic Cylinders - SEC
No ratings yet
S01 Hydraulic Cylinders - SEC
13 pages
Air, Atmospheric Pressure and Winds
100% (1)
Air, Atmospheric Pressure and Winds
42 pages
Taking The Control System For Granted - Ensuring The Integrity of Sub-Sil Instrumented Functions
No ratings yet
Taking The Control System For Granted - Ensuring The Integrity of Sub-Sil Instrumented Functions
5 pages
Elog Guide: December 2006
No ratings yet
Elog Guide: December 2006
55 pages
Leapfrog EDGE Workflow
100% (1)
Leapfrog EDGE Workflow
1 page
Physics
100% (1)
Physics
7 pages
4th Sem End Semester Question Papers
No ratings yet
4th Sem End Semester Question Papers
15 pages
MV Seapace - Final Safety Investigation Report Annexes (Rocking Test)
No ratings yet
MV Seapace - Final Safety Investigation Report Annexes (Rocking Test)
120 pages
Summative Test No. 1 Objectives Code Percenta Ge No. of Items Item Placement
50% (2)
Summative Test No. 1 Objectives Code Percenta Ge No. of Items Item Placement
4 pages

Jss Science and Technology University MYSURU-570006 Department of Information Science and Engineering

Uploaded by

Jss Science and Technology University MYSURU-570006 Department of Information Science and Engineering

Uploaded by

JSS SCIENCE AND TECHNOLOGY UNIVERSITY

Advanced Data Mining Techniques Assignment on:

Statistical Data Mining is a field of software engineering. Statistics is a component of

Applications of Statistical Data Mining

• Use a sample to estimate the values of a population’s parameters.

You might also like