Research Notes Draft 2
Research Notes Draft 2
Input & output Input: Features and labels; Input: Unlabeled data; Output: Clusters
data Output: Predicted labels. For of similar data points. For example,
example, crop yield could be farmers could be clustered based on
predicted based on weather their farming practices
conditions and soil quality
Disadvantages Regression models cannot work The challenge with cluster analysis is
properly if the input data has that there are many different algorithms
errors (that is poor quality data). producing different outcomes. Also, if
If the data preprocessing is not the data preprocessing is not performed
performed well to remove well to remove missing values or
missing values or redundant data redundant data or outliers or
or outliers or imbalanced data imbalanced data distribution, the
distribution, the validity of the validity of the cluster model suffers.
regression model suffers
Usage Used in machine learning, image Used for prediction and forecasting,
analysis, data mining, pattern assessing the strength of the
recognition. Popular in marketing relationship between variables,
for customer segmentation modeling the future relationship
between them. Widely used in financial
analysis
Use of labels Regression analysis uses labels. Cluster analysis doesn’t use labels. It
It predicts a dependent variable groups data based on similarities
(label) based on independent among the features
variables (features)
Challenges The challenge with regression The challenge with cluster analysis is
analysis is that it requires a large that it requires a good understanding of
amount of data for accurate the data and the right choice of
predictions. Also, the choice of clustering algorithm
the regression model is crucial
Data Similar to cluster analysis, data Data preprocessing for cluster analysis
Preprocessing preprocessing for regression involves dealing with missing or
analysis involves cleaning, erroneous data, transforming data into
transforming, and normalizing a usable format, normalizing data, and
data reducing dimensionality
Model Regression analysis has a clear Cluster analysis doesn’t have a natural
evaluation measure of accuracy, which is measure of accuracy. Instead, the goal
typically the difference between is to group objects into clusters based
the predicted and actual values only on their observable features
References:
CFI Team (2023) Cluster sampling, Corporate Finance Institute. Available at:
https://corporatefinanceinstitute.com/resources/data-science/cluster-sampling/
(Accessed: 01 March 2024).
Das, S. (2017) Decision trees vs. clustering algorithms vs. linear regression - dzone,
dzone.com. Available at: https://dzone.com/articles/decision-trees-v-clustering-
algorithms-v-linear-re (Accessed: 01 March 2024).
Ergando, H.M. (2023) Wheat Cluster Farming Approach: Challenges and prospects for
smallholder farmers in Ethiopia. Available at:
https://publications.waset.org/abstracts/166374/wheat-cluster-farming-approach-
challenges-and-prospects-for-smallholder-farmers-in-ethiopia (Accessed: 01 March
2024).
Galli, S. (2023) Mastering data preprocessing: Techniques and best practices, Train in
Data Blog. Available at: https://www.blog.trainindata.com/mastering-data-
preprocessing-techniques/ (Accessed: 01 March 2024).
Hassan, A. (2023) What is cluster analysis?, Built In. Available at:
https://builtin.com/data-science/cluster-analysis (Accessed: 01 March 2024).
Hassan, M. (2023) Cluster analysis - types, methods and examples, Research Method.
Available at: https://researchmethod.net/cluster-analysis/ (Accessed: 01 March 2024).
Mishra, S. (2017) Unsupervised learning and data clustering, Medium. Available at:
https://towardsdatascience.com/unsupervised-learning-and-data-clustering-
eeecb78b422a (Accessed: 01 March 2024).
Query and Search, S. (2023) Data Analysis Part 5: Data Classification, clustering, and
regression, Query. Available at: https://www.query.ai/resources/blogs/data-analysis-
part-5-data-classification-clustering-and-regression/ (Accessed: 01 March 2024).