0% found this document useful (0 votes)
57 views6 pages

CLC - Calibration and External Validation Literature Review

Business analysis and data mining are complementary disciplines that use analytical techniques and tools to extract insights from data and address business challenges. Data mining techniques are an essential part of business analysis and help uncover valuable insights from data to identify opportunities and make informed decisions. The combination of business analysis and data mining provides a comprehensive approach to understanding and utilizing data to solve complex business problems.

Uploaded by

Rana Usama
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views6 pages

CLC - Calibration and External Validation Literature Review

Business analysis and data mining are complementary disciplines that use analytical techniques and tools to extract insights from data and address business challenges. Data mining techniques are an essential part of business analysis and help uncover valuable insights from data to identify opportunities and make informed decisions. The combination of business analysis and data mining provides a comprehensive approach to understanding and utilizing data to solve complex business problems.

Uploaded by

Rana Usama
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

business analysis and data mining are complementary disciplines that use analytical

techniques and tools to extract insights from data and address business challenges.
They enable organizations to harness the power of data to make informed decisions,
improve efficiency, and gain a competitive advantage in the market.

By leveraging the power of data mining techniques, business analysts can identify
patterns and trends that drive business outcomes, predict future behaviors, optimize
processes, and improve overall decision-making. The combination of business
analysis and data mining provides a comprehensive approach to understanding and
utilizing data to solve complex business problems, optimize operations, and drive
organizational success.

Integration of Business Analysis and Data Mining


Business analysis and data mining go hand in hand, as data mining techniques are an
essential part of the business analysis process. Data mining helps business analysts
uncover valuable insights from the available data, enabling them to identify
opportunities, make data-driven decisions, and develop effective strategies. The
insights gained from data mining can inform various aspects of business analysis,
such as market analysis, customer segmentation, risk assessment, and performance
evaluation.

Data mining is a specific analytical approach within business analysis that focuses on
discovering patterns, relationships, and trends in large datasets. It involves applying
statistical and machine learning algorithms to extract meaningful information and
insights from structured and unstructured data. Data mining techniques can uncover
hidden patterns, associations, and correlations that may not be apparent through
traditional analysis methods. By mining the data, analysts can gain valuable insights
into customer behavior, market trends, and other factors that impact business
performance.

Business analysis is the process of understanding business needs and identifying


solutions to meet those needs. It involves gathering and analyzing data from various
sources within an organization to gain insights into its operations, processes, and
performance. Business analysts use a range of analytical techniques to identify
problems, opportunities, and areas for improvement. They translate complex business
requirements into clear, actionable recommendations and help organizations make
informed decisions.

By incorporating data validation and model validation and verification, analysts can
enhance the reliability and credibility of their models. These steps help identify and
address any data issues, assess the model's performance, and ensure that the model
aligns with the intended purpose. Ultimately, data validation and verification
contribute to building robust and trustworthy models that can inform decision-making
and drive successful outcomes.

Verification involves reviewing the model's assumptions, methodologies, algorithms,


and implementation to confirm that they align with the intended use case. It also
involves validating the model against real-world scenarios or expert knowledge to
assess its practicality and usefulness.
Cross-Validation
Cross-validation is a technique used to assess the performance of a model by
partitioning the available data into multiple subsets or folds. The model is trained on a
subset of the data and evaluated on the remaining fold(s). This process is repeated
multiple times, with each fold serving as both a training set and a validation set. The
results from each iteration are then averaged to obtain an overall assessment of the
model's performance.

External Validation
External validation involves assessing the performance of a model using independent,
unseen data that was not used during the model development phase. The purpose is to
determine how well the model performs on new data and to validate its ability to
generalize beyond the training dataset. This is important because a model that
performs well on the training data may not necessarily perform well on new, unseen
data.
To perform external validation, the dataset is typically divided into two parts: a
training set and a validation set. The model is trained on the training set and then
evaluated on the validation set. The evaluation metrics, such as accuracy, precision,
recall, or area under the ROC curve, are calculated to assess the model's performance.
If the model performs well on the validation set, it indicates that it has good
generalization capability.

Validation Method

The sufficiency of our validation method depends on various factors, including the
size and representativeness of the validation data, the performance metrics used, and
the requirements of the business problem. It is important to ensure that the validation
data accurately reflects the real-world scenarios and encompasses a diverse range of
cases to assess the model's generalizability. Additionally, using appropriate
performance metrics helps in determining the effectiveness of the model in meeting
the desired objectives.

In terms of consistency with theories in the field, it depend on the domain and
addressed research question . The model results compared and evaluated against
existing theories, prior research, established benchmarks in the field. This helps in
assessing whether the model aligns with existing knowledge and whether the results
support and contradict existing theories. If the model results are consistent with
established theories, it adds credibility to the model and increases confidence in its
validity. However, if the results contradict established theories, it may require further
investigation and analysis to understand the reasons behind the discrepancies and
assess the implications for the field.

the sufficiency of the validation method evaluated based on the context and
requirements of the project. Additionally, we assessing the consistency of the model
results with existing theories and knowledge in the field provides valuable insights
into the validity and applicability of the model.

Next Steps
The sufficiency of the validation method e tailored to the project requirements, and
assessing the consistency of the model results with existing theories and knowledge in
the field is essential to validate the model's validity and applicability.

The validation method should be designed to assess the performance and


generalizability of the model in a way that aligns with the project's goals. This
includes ensuring that the validation data is representative of the target population and
covers a wide range of scenarios and variations that the model is expected to
encounter in real-world applications.

Additionally, comparing the model results with existing theories and knowledge in the
field is crucial to evaluate the validity and applicability of the model. Consistency
with established theories provides confidence in the model's ability to capture relevant
patterns and relationships within the data. It also allows for the identification of any
inconsistencies or deviations that may require further investigation.

Furthermore, external validation by independent researchers and experts in the field


can provide an unbiased evaluation of the model's performance and lend credibility to
its findings. This external validation ensures that the model's results can be trusted
and relied upon for decision-making purposes.

Encountered Shortcomings

Shortcomings or limitations include:

Overfitting or underfitting

If the model performs extremely well on the training data but fails to generalize well
to new or unseen data, it may indicate overfitting or underfitting. In such cases, model
revision involve adjusting the model's complexity and incorporating regularization
techniques to improve its generalization ability.

Data quality issues


If the model's performance is affected by poor quality data, such as missing values,
outliers, or inconsistencies, it may require data cleansing or preprocessing techniques
to address these issues. This involve imputing missing values, removing outliers, or
handling data inconsistencies to improve the model's accuracy.

Model assumptions
If the model's assumptions are violated or not aligned with the underlying data, it
may result in biased or unreliable predictions. In such cases, revisiting and revising
the model assumptions or exploring alternative modeling approaches necessary.

Variable selection
If the model includes irrelevant or redundant variables that do not contribute
significantly to the prediction accuracy, it may be necessary to refine the variable
selection process or explore feature engineering techniques to improve the model's
performance.

 Future Recommendations.

These recommendations aim to refine and improve the model's performance, address
potential limitations, and ensure its continued relevance and usefulness in solving the
identified business problem.

Collect more diverse and representative data


If the current dataset was limited in terms of its diversity or representativeness,
acquiring additional data from different sources or expanding the dataset's scope
could help improve the model's generalization and predictive capabilities.

Feature engineering
Exploring additional variables and transforming existing variables through feature
engineering techniques we provide more informative features for the model. This
involve creating new variables, combining existing ones and deriving more complex
features that capture important patterns and relationships in the data.

Fine-tune hyperparameters
Model performance often be improved by fine-tuning the hyperparameters, such as
the learning rate, regularization parameters,tree depth, depending on the specific
algorithm used. This is done through systematic experimentation and optimization
techniques, such as grid search Bayesian optimization, to find the best combination
of hyperparameters.

Ensembling or model stacking


Consider combining multiple models using ensemble techniques, such as bagging,
boosting and stacking, to leverage the strengths of different models and improve
overall prediction performance. This help mitigate the weaknesses and biases of
individual models and enhance predictive accuracy.
Continuous monitoring and model updating
Implement a robust monitoring system to track the model's performance over time
and identify any potential degradation and concept drift. Regularly updating the
model with new data and retraining it can help ensure its relevance and accuracy in
dynamic and evolving business environments.

Model explainability and interpretability


Enhance the model's transparency by using techniques and algorithms that provide
interpretable results. This can help stakeholders understand the underlying factors
driving the predictions and facilitate better decision-making based on the model's
insights.

External validation
Seek external validation from independent experts and domain specialists to validate
the model's findings, assumptions, and predictions. This is provide additional
confidence in the model's reliability and applicability to real-world scenarios.

Peer-Reviewed Sources.

Bouckaert, R. R., & Frank, E. (2004). Evaluating the replicability of significance tests
for comparing learning algorithms. In Advances in Knowledge Discovery and Data
Mining (pp. 3-12). Springer.

Arlot, S., & Celisse, A. (2010). A survey of cross-validation procedures for model
selection. Statistics Surveys, 4, 40-79.

Japkowicz, N., & Shah, M. (2011). Evaluating learning algorithms: A classification


perspective. Cambridge University Press.

You might also like