Task 1: Importing and Cleaning Data
Objectives
→ Import the data in to the excel.
→ Fill the empty spaces in the data set. Also clean the dataset if entity
is completely missing.
→ Remove non-numeric characters from the Tel Number Coloumn and
make it homogenize.
1.1 Import Data in to Excel:
1.2 Filling the empty Cells:
Below are the steps to fill and clean the data from empty spaces.
→ Visual Basic Scripting of Microsoft Excel is used to accommodate the
missing values in the dataset.
→ This ensures that dataset is fully populated and accommodate the missing
values according to previous values.
1.3 Homogenize the “Tel Number” Entities:
Again use Visual Basic for removing non-numeric letters. Below are the
steps how to use Visual Basic:
→ Open the VBA editor using shortcut key Alt+ F11
→ Got to Insert and than click on module. Write the VB macro in the editor.
Save it and than close it.
→ Select all the fields under the Tel Number Column and than press Alt+F8
Below is the screen shot of the script:
This way data is cleaned and prune free. Below is the screenshot of
final, cleaned and homogeneous data.
Task 2: Descriptive Analysis and Hypothesis
Testing
Objectives
→ Summarize the key attributes using mean, max, average and standard
deviation to describe what the data actually says.
→ Design and test hypothesis to find out if there is a huge gap between the
salaries of people of USA and salaries of people of other regions.
2.1 Statistical Analysis:
Result of analysis of key metrics included `Units Sld`, `Revenue`, `Cost`, and
`Profit` are depicted below:
Units Sold: Average 1,118 units were sold. Standard Deviation is 878.
Revenue: Mean revenue is $6,842 with standard deviation of $4705.
This indicates that revenue varies sharply from entity to entity.
Cost: Combining the cost of all the products, it comes out to be $2812.
The standard deviation is again very high and is $2070. This again
shows great variation in the cost of different products.
Profit: Mean profit is $4016.7. Here the Standard deviation is $2658.
This shows that some products are highly profitable and some are not
that much profitable.
2.2 Hypothesis Testing:
We have to design and test the hypothesis to determine that the given
statement is true and false.
Statement:
“Average Salaries of the people of the USA are greater than the
Average Salaries of the people of other regions.”
Hypothesis:
We can design two type of Hypothesis:
Null Hypothesis (H₀): The statement is false and there is no
difference (significant difference) in average pay of people between
the USA and other regions.
Alternative Hypothesis (H₁): The Statement is true and there is a
difference in average pay of people of USA and other regions.
Conclusion:
After running a two-sample t-test, the p-value was greater than 0.05,
leading to the conclusion that there was no significant difference in
average pay.
Task 3: Creating an Interactive Dashboard
Objective:
→ Build an interactive dashboard using Pivot Tables for comparative data
analysis.
Dashboard Entities:
Following entities are created in the dashboard for comparative study.
1) Pet vs. Sales Performance:
→ This table summarizes the total units sold and revenue generated for different types of
pets (Bird, Cat, Dog, Fish, Hamster, Rabbit).
→ It provides insights into which pet category contributes the most to sales volume and
revenue.
2) Region vs. Financial Metrics:
→ This table compares total revenue and profit across different regions (Rest of World
and USA).
→ It shows how revenue and profit vary geographically, highlighting the financial
performance of each region.
3) Area vs. Sales Metrics:
→ This table displays the total revenue and units sold for different areas (Blackpool,
Dudley, Glasgow, Manchester, Margate).
→ It helps identify which areas have the highest sales and revenue contributions, useful
for regional sales analysis.