Chirag DataScientist
Chirag DataScientist
PROFESSIONAL EXPERIENCE
WALGREENS BOOTS ALLIANCE, Chicago, IL May 2022 – January 2024
Senior Data Scientist
I was recruited to the organization to contribute towards statistical machine learning, propensity modeling, statistical
modeling, data wrangling and reporting to drive marketing strategies for Walgreens Boots Alliance.
TECH STACK - Python, R, SQL, PowerBI, Hadoop, Microsoft Azure Databricks, PySpark
• Built a Customer Behavioral Progression propensity model for Walgreens Front of Store in Python using XGBOOST for
multi-class classification. The model was trained on a dataset of 1.6M records and achieved an average accuracy of
85% across 10 classes on the test set. Wrote an efficient data processing PySpark pipeline and successfully scored
dataset of 200M+ records (all customers) using the trained XGBOOST model. Used insights from the last decision tree in
the XGBOOST ensemble to make recommendations to the business.
• Built and deployed a Credit Card Propensity Tool prediction framework (propensity model) for Walgreens Front of Store
Customers in Python using an XGBOOST multi-class classification model on a dataset of 200K customers with an
accuracy of 76% across 4 classes. Wrote an efficient data processing PySpark pipeline and successfully scored 132M
customers using the trained XGBOOST model. The propensity scores from the model are being used to generate new
credit card offers for Walgreens customers.
• Engineered complex and efficient multi-layered SQL queries processing >40B rows of data via Microsoft Azure
Databricks, creating a robust Customer Demographics Table and and visualized it in a Store Level Insights Dashboard
using PowerBI.
• Performed K-means clustering in PySpark to segment a dataset of customers into groups based on customer behavior
metrics.
• Wrote complex SQL queries to pull data for ad hoc requests from internal business partners.
CHIRAG SUBRAMANIAN (224) 622-1395| [email protected]
Professional experience, continued…
• Reviewed SQL code written by junior associates and gave them best practices recommendations.
• Mentored and guided junior associates.
• Acted as the lead interviewer for Associate Data Scientist roles in the company.
• Increased the size of historical tornado data set from 64,000 events to 90,000+ events, after de-trending; created a
comprehensive stochastic set of 26 million tornado events covering the entire United States in R. Successfully created
and visualized historical and simulated tornado data in R through histogram, 3D plots, and contour plots.
• Prepared presentation slides and presented the Tornado model at the IF US Quarterly Team Meeting in October 2019.
Performed convolution of damage function for different tornado intensities in R. Currently calculating hit probability for
26 million tornado events in the United States, and calculating loss by state for Tornado peril in R.
U.S. Hurricane
• Cleaned, sorted hurricane data, preprocessed data, and performed linear interpolation. Wrote code in R using the k-
nearest neighbors’ algorithm to generate simulations of hurricane track and central pressure, which compared well with
historical measurements.
• Wrote code in R for linear interpolation, data cleaning, predictive modeling, simulation and data visualization.
Relevant Courses: Deep Learning, Reinforcement Learning, Machine Learning, Regression Analysis, Database System
Concepts and Design, Computing for Data Analysis, Introduction to Analytics Modeling, Data and Visual Analytics,
Deterministic Operations Research, Business Fundamentals for Analytics, Data Analytics in Business, Applied Analytics
Practicum
Relevant Courses: Deterministic Operations Research, Applied Probability and Statistics, Probabilistic Operations Research,
Data Mining in Engineering, Statistical Data Mining, Optimization and Complexity, Master’s Thesis