IIM PBA Assignment 2
IIM PBA Assignment 2
Content
As a data analyst, you will work with a dataset containing sales records, customer
information, and product details. Your tasks will involve utilizing object-oriented
programming principles, working with Python modules and packages, handling files, and
scraping data from the web. Additionally, you will perform data manipulation using NumPy
and Pandas, and create visualizations with Matplotlib, Plotly, and Seaborn to uncover
insights and present your findings effectively.
Data Description
The dataset for this assignment is publicly available on Kaggle and consists of the following
files:
1. Sales Data: Contains transactional sales records, including transaction ID, product ID,
customer ID, date of purchase, quantity, and price.
2. Customer Data: Includes customer demographics such as age, gender, location, and
loyalty membership status.
3. Product Data: Lists product details like product ID, name, category, and supplier
information.
Objective
The objective of this assignment is to integrate various programming, data handling, and
visualization skills to perform a comprehensive analysis of the sales data. By the end of this
assignment, you should be able to:
1. Apply object-oriented programming (OOP) principles in Python.
2. Utilize Python modules and packages for data handling and web scraping.
3. Perform data cleaning, manipulation, and analysis using NumPy and Pandas.
4. Create insightful visualizations using Matplotlib, Plotly, and Seaborn.
5. Implement best practices and PEP standards in your Python code.
Tasks
1. Data Loading and Inspection
- Load the sales, customer, and product datasets into Pandas DataFrames.
- Inspect the data for missing values, inconsistencies, and outliers.
5. File Handling
- Write functions to read, append, and handle files in Python.
- Save cleaned data to new CSV files.
6. Web Scraping
- Use BeautifulSoup or Scrapy to scrape additional data (e.g., product reviews) from a
relevant e-commerce website.
9. Geospatial Analysis
- Utilize latitude and longitude data for creating maps and charts.
- Plot customer distribution and sales hotspots on a map.