0% found this document useful (0 votes)

57 views6 pages

CLC - Calibration and External Validation Literature Review

Business analysis and data mining are complementary disciplines that use analytical techniques and tools to extract insights from data and address business challenges. Data mining techniques are an essential part of business analysis and help uncover valuable insights from data to identify opportunities and make informed decisions. The combination of business analysis and data mining provides a comprehensive approach to understanding and utilizing data to solve complex business problems.

Uploaded by

Rana Usama

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views6 pages

CLC - Calibration and External Validation Literature Review

Uploaded by

Rana Usama

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

business analysis and data mining are complementary disciplines that use analytical

techniques and tools to extract insights from data and address business challenges.
They enable organizations to harness the power of data to make informed decisions,
improve efficiency, and gain a competitive advantage in the market.

By leveraging the power of data mining techniques, business analysts can identify
patterns and trends that drive business outcomes, predict future behaviors, optimize
processes, and improve overall decision-making. The combination of business
analysis and data mining provides a comprehensive approach to understanding and
utilizing data to solve complex business problems, optimize operations, and drive
organizational success.

Integration of Business Analysis and Data Mining

Business analysis and data mining go hand in hand, as data mining techniques are an
essential part of the business analysis process. Data mining helps business analysts
uncover valuable insights from the available data, enabling them to identify
opportunities, make data-driven decisions, and develop effective strategies. The
insights gained from data mining can inform various aspects of business analysis,
such as market analysis, customer segmentation, risk assessment, and performance
evaluation.

Data mining is a specific analytical approach within business analysis that focuses on
discovering patterns, relationships, and trends in large datasets. It involves applying
statistical and machine learning algorithms to extract meaningful information and
insights from structured and unstructured data. Data mining techniques can uncover
hidden patterns, associations, and correlations that may not be apparent through
traditional analysis methods. By mining the data, analysts can gain valuable insights
into customer behavior, market trends, and other factors that impact business
performance.

Business analysis is the process of understanding business needs and identifying

solutions to meet those needs. It involves gathering and analyzing data from various
sources within an organization to gain insights into its operations, processes, and
performance. Business analysts use a range of analytical techniques to identify
problems, opportunities, and areas for improvement. They translate complex business
requirements into clear, actionable recommendations and help organizations make
informed decisions.

By incorporating data validation and model validation and verification, analysts can
enhance the reliability and credibility of their models. These steps help identify and
address any data issues, assess the model's performance, and ensure that the model
aligns with the intended purpose. Ultimately, data validation and verification
contribute to building robust and trustworthy models that can inform decision-making
and drive successful outcomes.

Verification involves reviewing the model's assumptions, methodologies, algorithms,

and implementation to confirm that they align with the intended use case. It also
involves validating the model against real-world scenarios or expert knowledge to
assess its practicality and usefulness.
Cross-Validation
Cross-validation is a technique used to assess the performance of a model by
partitioning the available data into multiple subsets or folds. The model is trained on a
subset of the data and evaluated on the remaining fold(s). This process is repeated
multiple times, with each fold serving as both a training set and a validation set. The
results from each iteration are then averaged to obtain an overall assessment of the
model's performance.

External Validation
External validation involves assessing the performance of a model using independent,
unseen data that was not used during the model development phase. The purpose is to
determine how well the model performs on new data and to validate its ability to
generalize beyond the training dataset. This is important because a model that
performs well on the training data may not necessarily perform well on new, unseen
data.
To perform external validation, the dataset is typically divided into two parts: a
training set and a validation set. The model is trained on the training set and then
evaluated on the validation set. The evaluation metrics, such as accuracy, precision,
recall, or area under the ROC curve, are calculated to assess the model's performance.
If the model performs well on the validation set, it indicates that it has good
generalization capability.

Validation Method

The sufficiency of our validation method depends on various factors, including the
size and representativeness of the validation data, the performance metrics used, and
the requirements of the business problem. It is important to ensure that the validation
data accurately reflects the real-world scenarios and encompasses a diverse range of
cases to assess the model's generalizability. Additionally, using appropriate
performance metrics helps in determining the effectiveness of the model in meeting
the desired objectives.

In terms of consistency with theories in the field, it depend on the domain and
addressed research question . The model results compared and evaluated against
existing theories, prior research, established benchmarks in the field. This helps in
assessing whether the model aligns with existing knowledge and whether the results
support and contradict existing theories. If the model results are consistent with
established theories, it adds credibility to the model and increases confidence in its
validity. However, if the results contradict established theories, it may require further
investigation and analysis to understand the reasons behind the discrepancies and
assess the implications for the field.

the sufficiency of the validation method evaluated based on the context and
requirements of the project. Additionally, we assessing the consistency of the model
results with existing theories and knowledge in the field provides valuable insights
into the validity and applicability of the model.

Next Steps
The sufficiency of the validation method e tailored to the project requirements, and
assessing the consistency of the model results with existing theories and knowledge in
the field is essential to validate the model's validity and applicability.

The validation method should be designed to assess the performance and

generalizability of the model in a way that aligns with the project's goals. This
includes ensuring that the validation data is representative of the target population and
covers a wide range of scenarios and variations that the model is expected to
encounter in real-world applications.

Additionally, comparing the model results with existing theories and knowledge in the
field is crucial to evaluate the validity and applicability of the model. Consistency
with established theories provides confidence in the model's ability to capture relevant
patterns and relationships within the data. It also allows for the identification of any
inconsistencies or deviations that may require further investigation.

Furthermore, external validation by independent researchers and experts in the field

can provide an unbiased evaluation of the model's performance and lend credibility to
its findings. This external validation ensures that the model's results can be trusted
and relied upon for decision-making purposes.

Encountered Shortcomings

Shortcomings or limitations include:

Overfitting or underfitting

If the model performs extremely well on the training data but fails to generalize well
to new or unseen data, it may indicate overfitting or underfitting. In such cases, model
revision involve adjusting the model's complexity and incorporating regularization
techniques to improve its generalization ability.

Data quality issues

If the model's performance is affected by poor quality data, such as missing values,
outliers, or inconsistencies, it may require data cleansing or preprocessing techniques
to address these issues. This involve imputing missing values, removing outliers, or
handling data inconsistencies to improve the model's accuracy.

Model assumptions
If the model's assumptions are violated or not aligned with the underlying data, it
may result in biased or unreliable predictions. In such cases, revisiting and revising
the model assumptions or exploring alternative modeling approaches necessary.

Variable selection
If the model includes irrelevant or redundant variables that do not contribute
significantly to the prediction accuracy, it may be necessary to refine the variable
selection process or explore feature engineering techniques to improve the model's
performance.

Future Recommendations.

These recommendations aim to refine and improve the model's performance, address
potential limitations, and ensure its continued relevance and usefulness in solving the
identified business problem.

Collect more diverse and representative data

If the current dataset was limited in terms of its diversity or representativeness,
acquiring additional data from different sources or expanding the dataset's scope
could help improve the model's generalization and predictive capabilities.

Feature engineering
Exploring additional variables and transforming existing variables through feature
engineering techniques we provide more informative features for the model. This
involve creating new variables, combining existing ones and deriving more complex
features that capture important patterns and relationships in the data.

Fine-tune hyperparameters
Model performance often be improved by fine-tuning the hyperparameters, such as
the learning rate, regularization parameters,tree depth, depending on the specific
algorithm used. This is done through systematic experimentation and optimization
techniques, such as grid search Bayesian optimization, to find the best combination
of hyperparameters.

Ensembling or model stacking

Consider combining multiple models using ensemble techniques, such as bagging,
boosting and stacking, to leverage the strengths of different models and improve
overall prediction performance. This help mitigate the weaknesses and biases of
individual models and enhance predictive accuracy.
Continuous monitoring and model updating
Implement a robust monitoring system to track the model's performance over time
and identify any potential degradation and concept drift. Regularly updating the
model with new data and retraining it can help ensure its relevance and accuracy in
dynamic and evolving business environments.

Model explainability and interpretability

Enhance the model's transparency by using techniques and algorithms that provide
interpretable results. This can help stakeholders understand the underlying factors
driving the predictions and facilitate better decision-making based on the model's
insights.

External validation
Seek external validation from independent experts and domain specialists to validate
the model's findings, assumptions, and predictions. This is provide additional
confidence in the model's reliability and applicability to real-world scenarios.

Peer-Reviewed Sources.

Bouckaert, R. R., & Frank, E. (2004). Evaluating the replicability of significance tests
for comparing learning algorithms. In Advances in Knowledge Discovery and Data
Mining (pp. 3-12). Springer.

Arlot, S., & Celisse, A. (2010). A survey of cross-validation procedures for model
selection. Statistics Surveys, 4, 40-79.

Japkowicz, N., & Shah, M. (2011). Evaluating learning algorithms: A classification

perspective. Cambridge University Press.

ML Group 9
No ratings yet
ML Group 9
7 pages
Rapport PFE Balsam Bendhif
No ratings yet
Rapport PFE Balsam Bendhif
73 pages
An Introduction To Statistical Learning PDF
No ratings yet
An Introduction To Statistical Learning PDF
35 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
(English (Auto-Generated) ) All Machine Learning Algorithms Explained in 17 Min (DownSub - Com)
No ratings yet
(English (Auto-Generated) ) All Machine Learning Algorithms Explained in 17 Min (DownSub - Com)
19 pages
Evaluation and Cross Validation Detailed
No ratings yet
Evaluation and Cross Validation Detailed
2 pages
1 Introduction (Bishop)
No ratings yet
1 Introduction (Bishop)
16 pages
UNIT II Part-1
No ratings yet
UNIT II Part-1
59 pages
IoT-Based Smart Biofloc Monitoring System For Fish Farming Using Machine Learning
No ratings yet
IoT-Based Smart Biofloc Monitoring System For Fish Farming Using Machine Learning
13 pages
The Essential Backtesting Starter Kit
No ratings yet
The Essential Backtesting Starter Kit
6 pages
Lecture9 ML-Algorithms
No ratings yet
Lecture9 ML-Algorithms
22 pages
Avcce QB Aml JSD - Ice
No ratings yet
Avcce QB Aml JSD - Ice
10 pages
Midterm 2022 V (B) Model Answer
No ratings yet
Midterm 2022 V (B) Model Answer
1 page
CT Lung Nodule Segmentation A Comparative Study of Data Preprocessing and Deep Learning Models
No ratings yet
CT Lung Nodule Segmentation A Comparative Study of Data Preprocessing and Deep Learning Models
7 pages
Note of Advantage and Disadvantage of Population Prediction
No ratings yet
Note of Advantage and Disadvantage of Population Prediction
3 pages
AID300 ML132 Group 22 Final Report
No ratings yet
AID300 ML132 Group 22 Final Report
25 pages
AI Empowered Methods For Smart Energy Consumption: A Review of Load Forecasting, Anomaly Detection and Demand Response
No ratings yet
AI Empowered Methods For Smart Energy Consumption: A Review of Load Forecasting, Anomaly Detection and Demand Response
31 pages
AI Associate Merged
No ratings yet
AI Associate Merged
100 pages
2 Marks MLT Ai&ds
No ratings yet
2 Marks MLT Ai&ds
2 pages
Application of Machine Learning in Chemical Engineering: Outlook and Perspectives
No ratings yet
Application of Machine Learning in Chemical Engineering: Outlook and Perspectives
12 pages
Course Slides - Regression Analysis
No ratings yet
Course Slides - Regression Analysis
63 pages
Internship Report
No ratings yet
Internship Report
23 pages
Chapter-3-Common Issues in Machine Learning
No ratings yet
Chapter-3-Common Issues in Machine Learning
20 pages
Training Verifiers To Solve Math Word Problems
No ratings yet
Training Verifiers To Solve Math Word Problems
22 pages
Benchmark - Model Building
No ratings yet
Benchmark - Model Building
4 pages
WIN SEM (2023-24) FRESHERS - CSE0504 - ETH - AP2023247000196 - 2024-02-29 - Reference-Material-I
No ratings yet
WIN SEM (2023-24) FRESHERS - CSE0504 - ETH - AP2023247000196 - 2024-02-29 - Reference-Material-I
62 pages
Sales Management
No ratings yet
Sales Management
20 pages
Food Wateage Controll App Chapter 1
No ratings yet
Food Wateage Controll App Chapter 1
12 pages
CLC - Analytics Problem Statement
No ratings yet
CLC - Analytics Problem Statement
13 pages
CLC - Final Capstone Project Thesis
No ratings yet
CLC - Final Capstone Project Thesis
61 pages
Bia Unit Ii
No ratings yet
Bia Unit Ii
37 pages
Babu Banarasi Das: Final Year Project Report
No ratings yet
Babu Banarasi Das: Final Year Project Report
28 pages
DecisionTree
No ratings yet
DecisionTree
73 pages
3 - Bias - and - Variance - With - Mismatched - Data - Distributions
No ratings yet
3 - Bias - and - Variance - With - Mismatched - Data - Distributions
2 pages
Simple Linear Regression: Math Behind
No ratings yet
Simple Linear Regression: Math Behind
6 pages
EEL 6935 Data Analytics: Introduction To Data Science & Machine Learning
No ratings yet
EEL 6935 Data Analytics: Introduction To Data Science & Machine Learning
12 pages
Evaluations: The Art and Science of Evaluation
From Everand
Evaluations: The Art and Science of Evaluation
Draft2Digital
No ratings yet
Beyond The Algorithm: Practical Machine Learning Strategies
From Everand
Beyond The Algorithm: Practical Machine Learning Strategies
Jane Onwuchekwa
No ratings yet
Question 4 Module
No ratings yet
Question 4 Module
26 pages
Principles of Data Mining
From Everand
Principles of Data Mining
Subodh Keshari
No ratings yet
Core Concepts in Statistical Learning
From Everand
Core Concepts in Statistical Learning
Tushar Gulati
No ratings yet
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Project Measurement
From Everand
Project Measurement
Steve Neuendorf
No ratings yet
Essentials of Data Analysis
From Everand
Essentials of Data Analysis
Agasti Khatri
No ratings yet
Predictive Analytical Models CHAP 2
No ratings yet
Predictive Analytical Models CHAP 2
24 pages
Statistical Analysis and Visualization
From Everand
Statistical Analysis and Visualization
Mohit Chatterjee
No ratings yet
Excel Data Mastery for Beginners
From Everand
Excel Data Mastery for Beginners
Kevogo Musudia
No ratings yet
Employee Surveys That Work: Improving Design, Use, and Organizational Impact
From Everand
Employee Surveys That Work: Improving Design, Use, and Organizational Impact
Alec Levenson
No ratings yet
Data-Driven Decision Making
From Everand
Data-Driven Decision Making
Aadinath Pothuvaal
No ratings yet
Introduction to Business Analytics
From Everand
Introduction to Business Analytics
Dwaipayan Sethi
No ratings yet
Business Analytics: Leveraging Data for Insights and Competitive Advantage
From Everand
Business Analytics: Leveraging Data for Insights and Competitive Advantage
Ronald BLaha
No ratings yet
Business Analytics
From Everand
Business Analytics
Hiriyappa .B
5/5 (1)
Using Forecasting Methodologies to Explore an Uncertain Future
From Everand
Using Forecasting Methodologies to Explore an Uncertain Future
James Poon
No ratings yet
Business Analytics and Big Data
From Everand
Business Analytics and Big Data
Sachin Naha
No ratings yet
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
IT Specialist: Artificial Intelligence Exam Prep - 500 Questions for Certification Success (0225)
From Everand
IT Specialist: Artificial Intelligence Exam Prep - 500 Questions for Certification Success (0225)
Satou Takahiro
No ratings yet
Deequ for Scalable Data Quality Assurance: The Complete Guide for Developers and Engineers
From Everand
Deequ for Scalable Data Quality Assurance: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Applied Statistical Analysis with SPSS: Definitive Reference for Developers and Engineers
From Everand
Applied Statistical Analysis with SPSS: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Workfront Implementation and Optimization Techniques: Definitive Reference for Developers and Engineers
From Everand
Workfront Implementation and Optimization Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Great Expectations Checkpoints in Data Validation: The Complete Guide for Developers and Engineers
From Everand
Great Expectations Checkpoints in Data Validation: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Measuring Performance
From Everand
Measuring Performance
Harvard Business School Press
4/5 (5)
The Definitive Guide to IT Service Metrics
From Everand
The Definitive Guide to IT Service Metrics
Kurt McWhirter
4.5/5 (2)
Healthcare Staffing Candidate Screening and Interviewing Handbook
From Everand
Healthcare Staffing Candidate Screening and Interviewing Handbook
Business Success Shop
No ratings yet
Strategy
From Everand
Strategy
Jacob Varghese
4/5 (1)
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
From Everand
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
Steven Taylor
No ratings yet
Secrets of Statistical Data Analysis and Management Science!
From Everand
Secrets of Statistical Data Analysis and Management Science!
Andrei Besedin
No ratings yet
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
The Six Sigma Method: Boost quality and consistency in your business
From Everand
The Six Sigma Method: Boost quality and consistency in your business
50minutes
3/5 (2)
Operating model A Clear and Concise Reference
From Everand
Operating model A Clear and Concise Reference
Gerardus Blokdyk
No ratings yet
Data Breach A Complete Guide - 2020 Edition
From Everand
Data Breach A Complete Guide - 2020 Edition
Gerardus Blokdyk
No ratings yet
Master Data Management Solutions A Complete Guide - 2020 Edition
From Everand
Master Data Management Solutions A Complete Guide - 2020 Edition
Gerardus Blokdyk
No ratings yet
Data grid The Ultimate Step-By-Step Guide
From Everand
Data grid The Ultimate Step-By-Step Guide
Gerardus Blokdyk
No ratings yet
Metrology Third Edition
From Everand
Metrology Third Edition
Gerardus Blokdyk
No ratings yet
Coverage data Second Edition
From Everand
Coverage data Second Edition
Gerardus Blokdyk
No ratings yet
Data model Second Edition
From Everand
Data model Second Edition
Gerardus Blokdyk
No ratings yet
Data system Third Edition
From Everand
Data system Third Edition
Gerardus Blokdyk
No ratings yet
Data flow A Clear and Concise Reference
From Everand
Data flow A Clear and Concise Reference
Gerardus Blokdyk
No ratings yet
Logical data model A Clear and Concise Reference
From Everand
Logical data model A Clear and Concise Reference
Gerardus Blokdyk
No ratings yet
Generic data model Complete Self-Assessment Guide
From Everand
Generic data model Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Metadata modeling Second Edition
From Everand
Metadata modeling Second Edition
Gerardus Blokdyk
No ratings yet
Data cluster A Complete Guide
From Everand
Data cluster A Complete Guide
Gerardus Blokdyk
No ratings yet
Data Matrix Complete Self-Assessment Guide
From Everand
Data Matrix Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Data Reference Model The Ultimate Step-By-Step Guide
From Everand
Data Reference Model The Ultimate Step-By-Step Guide
Gerardus Blokdyk
No ratings yet
System model A Complete Guide
From Everand
System model A Complete Guide
Gerardus Blokdyk
No ratings yet
Mobile data management strategy Third Edition
From Everand
Mobile data management strategy Third Edition
Gerardus Blokdyk
No ratings yet
Model audit The Ultimate Step-By-Step Guide
From Everand
Model audit The Ultimate Step-By-Step Guide
Gerardus Blokdyk
No ratings yet
Change data capture Third Edition
From Everand
Change data capture Third Edition
Gerardus Blokdyk
No ratings yet
Data migration A Clear and Concise Reference
From Everand
Data migration A Clear and Concise Reference
Gerardus Blokdyk
No ratings yet
Data entry A Clear and Concise Reference
From Everand
Data entry A Clear and Concise Reference
Gerardus Blokdyk
No ratings yet
Test strategy Standard Requirements
From Everand
Test strategy Standard Requirements
Gerardus Blokdyk
No ratings yet
Database model A Complete Guide
From Everand
Database model A Complete Guide
Gerardus Blokdyk
No ratings yet
Data stream Complete Self-Assessment Guide
From Everand
Data stream Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Data link Complete Self-Assessment Guide
From Everand
Data link Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Data auditing The Ultimate Step-By-Step Guide
From Everand
Data auditing The Ultimate Step-By-Step Guide
Gerardus Blokdyk
No ratings yet
Data transmission Third Edition
From Everand
Data transmission Third Edition
Gerardus Blokdyk
No ratings yet

CLC - Calibration and External Validation Literature Review

Uploaded by

CLC - Calibration and External Validation Literature Review

Uploaded by

business analysis and data mining are complementary disciplines that use analytical

Integration of Business Analysis and Data Mining

Business analysis is the process of understanding business needs and identifying

Verification involves reviewing the model's assumptions, methodologies, algorithms,

The validation method should be designed to assess the performance and

Furthermore, external validation by independent researchers and experts in the field

Shortcomings or limitations include:

Data quality issues

Collect more diverse and representative data

Ensembling or model stacking

Model explainability and interpretability

Japkowicz, N., & Shah, M. (2011). Evaluating learning algorithms: A classification

You might also like