0% found this document useful (0 votes)
9 views

Assignment Report Predictive Modeling

Uploaded by

Paurik Trivedi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Assignment Report Predictive Modeling

Uploaded by

Paurik Trivedi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Assignment Report: Predictive Modeling for Purchase of 'Art History of

Florence'
The goal of this assignment was to build a predictive model to determine whether a
customer is likely to purchase the book 'Art History of Florence' based on their purchase
behavior across various categories of books. We aimed to maximize profit by targeting only
potential buyers with marketing mailers, thus minimizing costs and maximizing revenue.

Step 1: Data Preparation


We began by cleaning the dataset to focus only on the relevant variables. Columns unrelated
to purchase behavior, such as identification numbers and unrelated demographic data, were
removed. The key variables selected included customer interactions with different types of
books such as 'Child Books', 'Youth Books', 'Art Books', and others. This selection aimed to
capture purchase behavior patterns that might indicate an interest in art-related products.

Step 2: Data Splitting


The cleaned dataset was divided into three parts: 1800 samples for training, 1400 samples
for validation, and 800 samples for testing. This approach ensures that the model is trained
on one portion of the data, validated on another to tune its performance, and finally tested
on a separate set to evaluate generalization. This helps prevent overfitting and gives a
clearer indication of how the model will perform on unseen data.

Step 3: Model Building


We chose to use a logistic regression model because it is a widely-used, interpretable
classification model that can estimate the probability of a purchase. Logistic regression is
effective for binary classification tasks, making it suitable for predicting whether a purchase
will occur ('Yes' or 'No'). The model was trained using various book categories as features
to understand which behaviors most strongly predict interest in the 'Art History of
Florence' book.

Step 4: Threshold Selection to Maximize Profit


Rather than using the standard 0.5 threshold, we explored different thresholds to maximize
profit. The threshold determines how confident the model must be to predict a 'Yes'
(purchase). A higher threshold results in fewer positive predictions, focusing only on
customers most likely to buy, thus minimizing unnecessary mailer costs. Conversely, a
lower threshold may increase potential sales but also raises marketing costs.

We calculated profit for thresholds ranging from 0.5 to 0.95. Profit was determined using
the formula:
- Revenue = (True Positives) × (3 × Cost of Mailer)
- Cost = (Total Mailers Sent) × Cost of Mailer
- Profit = Revenue - Cost
The optimal threshold was identified as the one that maximized this profit, balancing the
trade-off between reaching potential buyers and minimizing unnecessary expenditures.

Step 5: Model Evaluation


After identifying the optimal threshold, the model was evaluated on the testing dataset. The
performance metrics, including accuracy, precision, recall, and profit, were used to assess
how well the model generalized to new data. A confusion matrix was created to visualize
the model's performance, showing the number of true positives (correctly predicted
purchases), false positives (mailer sent but no purchase), true negatives, and false
negatives.

Conclusion
The assignment demonstrated that adjusting the threshold for classification can
significantly impact the cost-efficiency of marketing campaigns. By choosing a higher
threshold that maximized profit, we ensured that mailers were only sent to customers who
showed a strong likelihood of purchasing 'Art History of Florence'. This approach helped
minimize costs associated with unnecessary marketing, resulting in a higher net profit.

The key takeaway is that predictive models, combined with thoughtful threshold selection,
can greatly enhance marketing strategies by enabling data-driven decision-making. Future
improvements could include experimenting with other machine learning models or more
sophisticated sampling techniques to further enhance predictive accuracy.

You might also like