Module 5 Machine Learning
Module 5 Machine Learning
• Risk Assessment: By using the Random Forest model, the bank can more accurately
assess the risk of lending to an applicant. This allows the bank to better manage its
loan portfolio by identifying high-risk applicants who are likely to default.
• Personalized Loan Offers: For applicants with a lower likelihood of defaulting, the
bank can offer better terms such as lower interest rates, whereas for high-risk
applicants, the bank might offer higher interest rates or ask for additional collateral.
• Prevention of Default: The model helps the bank identify customers who are at a
higher risk of default. The bank can take proactive steps, such as offering financial
counseling or restructuring loans, to help these customers before they default.
• Regulatory Compliance: Predicting defaults accurately can help the bank meet
regulatory requirements by maintaining an appropriate risk profile for its loans and
avoiding financial crises due to excessive bad loans.
• In this example, the Random Forest algorithm is used by a bank to
predict whether a loan applicant is likely to default. By training
multiple decision trees on random subsets of data and features, the
Random Forest model creates a more robust prediction compared to
a single decision tree. The bank can then use this predictive model to
assess risk more accurately, reduce potential defaults, and make more
informed lending decisions.
Unsupervised Learning
• Unsupervised learning refers to a type of machine learning where the
algorithm is given data without labeled outcomes or target variables.
Instead, the algorithm tries to find patterns, relationships, or
groupings in the data. Common techniques in unsupervised learning
include clustering, dimensionality reduction, and anomaly detection.
Cluster
• Clustering: Task of grouping a set of data points such that data points
in the same group are more similar to each other than data points in
another group (group is known as cluster)
• it groups data instances that are similar to (near) each other in one
cluster and data instances that are very different (far away) from each
other into different clusters.
Types of Clustering
1. Exclusive Clustering: K-means
2. Overlapping Clustering: Fuzzy C-means
3. Hierarchical Clustering: Agglomerative clustering, divisive clustering
4. Probabilistic Clustering: Mixture of Gaussian models
How to choose a clustering
algorithm
• A vast collection of algorithms are available.
• Which one to choose for our problem ?
• Choosing the “best” algorithm is a challenge.
• Every algorithm has limitations and works well with certain data distributions.
• It is very hard, if not impossible, to know what distribution the application
data follow.
• The data may not fully follow any “ideal” structure or distribution required by
the algorithms.
• One also needs to decide how to standardize the data, to choose a suitable
distance function and to select other parameter values.
How to choose a clustering
algorithm
• Due to these complexities, the common practice is to
• run several algorithms using different distance functions and parameter
settings, and then carefully analyze and compare the results.
• The interpretation of the results must be based on insight into the meaning
of the original data together with knowledge of the algorithms used.
• Clustering is highly application dependent and to certain extent subjective
(personal preferences).
Case Study
• Objective: Segmenting customers into different groups based on their purchasing behavior.
• Scenario:
• A retail company wants to better understand its customer base in order to tailor marketing
strategies, product recommendations, and promotions. The company has data on customers'
past purchases, and they want to identify groups of customers with similar purchasing behavior.
This can be done using clustering techniques, specifically K-means clustering, which groups
similar data points together.
• Data Variables (Features):
• Annual spending (in dollars) on products
• Product category preferences (e.g., electronics, clothing, groceries)
• Frequency of visits (number of store visits or website visits per month)
• Demographic data (e.g., age, location, income level)
• Purchase history (total number of items bought over the last year)
• Approach:
• The company uses K-means clustering, a popular unsupervised learning
technique, to identify groups of customers that behave similarly. Here’s how it
works:
• Preprocessing: Clean and normalize the data to ensure all variables are on a
similar scale (for example, normalizing spending and visit frequency).
• Selecting the number of clusters: The company decides to segment customers
into 3 groups (clusters) based on a business strategy (for example, targeting
"high-value" customers, "frequent shoppers", and "occasional buyers").
• Running the clustering algorithm: K-means will then partition the customer
data into 3 distinct clusters based on the similarity of their purchasing behavior.
• Example of Clusters Identified:
• Cluster 1: High-Value Customers
• Customers who spend a lot annually but may not visit the store frequently.
• They may prefer high-end products or luxury items.
• These customers are likely more influenced by product quality or exclusivity.
• Cluster 2: Frequent Shoppers
• Customers who visit the store or website frequently, but their total annual spend is moderate.
• They might shop for basic or everyday items and tend to buy in smaller quantities but regularly.
• These customers could be price-sensitive and value promotions or discounts.
• Cluster 3: Occasional Buyers
• Customers who rarely visit but make large purchases when they do.
• These might be customers who prefer to buy in bulk or make seasonal purchases (e.g., holiday
shopping).
• They may be less brand-loyal but still responsive to targeted campaigns.
Results & Use in Business:
• Targeted Marketing: Each group of customers can be targeted with different marketing
strategies. For instance:
• For high-value customers, the company could offer exclusive product releases or loyalty programs.
• For frequent shoppers, they might offer personalized discounts or promotions to drive more frequent
purchases.
• For occasional buyers, the company could offer seasonal sales or reminders about products they
might need based on past purchases.
• Personalized Recommendations: The company can tailor recommendations based on the
identified clusters. For example, customers in the "high-value" group might receive
recommendations for premium products, while those in the "frequent shopper" cluster
could receive suggestions for everyday items.
• Resource Allocation: The company can allocate resources more effectively by focusing on
high-value customers for special promotions and tailoring their inventory based on the
preferences of each cluster.