Group 7 - Statistic For Business - Group Report-1
Group 7 - Statistic For Business - Group Report-1
STUDENT DETAILS
DECLARATION
I hold a copy of this assignment if the original is lost or damaged.
I hereby certify that no part of this assignment or product has been copied from any other student’s work or from any
other source except where due acknowledgement is made in the assignment.
I hereby certify that no part of this assignment or product has been submitted by me in another (previous or
current) assessment, except where appropriately referenced, and with prior permission from the Lecturer / Tutor
/ Unit Coordinator for this unit.
No part of the assignment/product has been written/ produced for me by any other person except where
collaboration has been authorised by the Lecturer / Tutor /Unit Coordinator concerned.
I am aware that this work may be reproduced and submitted to plagiarism detection software programs for the purpose
of detecting possible plagiarism (which may retain a copy on its database for future plagiarism checking).
1
Note: An examiner or lecturer / tutor has the right to not mark this assignment if the above declaration has not been signed.
2
TABLE OF CONTENTS
INTRODUCTION................................................................................................................................ 4
DATA VISUALIZATION AND DATA DESCRIPTIVE ANALYSIS............................................. 5
I. Data Visualization......................................................................................................................... 5
1. Salary distribution by Field in India (2022)........................................................................... 5
2. Salary distribution by Job Level in India (2022)....................................................................6
3. Salary distribution by Location in India (2022)..................................................................... 8
II. Descriptive data......................................................................................................................... 10
1. Mild Right-Skewed Distribution.......................................................................................... 12
2. Income Inequality.................................................................................................................12
3. Salary Clustering.................................................................................................................. 12
4. Outliers................................................................................................................................. 12
ANOVA................................................................................................................................................14
I. Differences between Salary and 4-related fields:....................................................................... 14
II. Differences of Salary within 4-related fields.............................................................................15
CONCLUSION...................................................................................................................................18
REFERENCES................................................................................................................................... 19
3
INTRODUCTION
The comprehension of salary distribution can result in a better evaluation of the job market and
personal career decisions made by individuals most effectively. In that regard, the study is concerned
with the salary trends viewed in India through field, job level, and location. This analysis also
discusses key variables responsible for such differences to give a full picture of this complete salary
landscape.
The market-overview part of this report is divided into three sections. The first part is about salary
distribution in four fields of data profession: data science, data engineering, machine learning, and
data analytics. The second part talks about salary levels across a range of job seniority: entry-level to
managerial roles. Furthermore, the report looks at the geographical perspective and related patterns
and trends present in different areas of India, based on where they are from. In this regard, nearly all
insights are being presented with statistical visualisation support and some descriptive data. Finally,
it will conduct ANOVA analysis on salaries in the above fields using the purpose of finding
statistically significant variances and their meanings. Evidence is supported with statistical
Mainly, it adds to the knowledge base of job seekers, giving one a better idea of salary expectations
and market demand. It helps people know in which skills, experience, and localities they can get
great earning potential by aligning one's career with it. Thus, eventually, individuals can make
informed decisions about the career path that will bring them more satisfaction and a sound financial
position. On the other hand, this can serve as a tip to individual organisations, which can help them
spruce up their recruitment strategy to keep an eye on competitive compensation and equal
opportunities.
4
DATA VISUALIZATION AND DATA DESCRIPTIVE ANALYSIS
I. Data Visualization
The boxplot graph illustrates the distribution of salary in four different related fields.
Overall, the minimum Salary of the four related fields was consistent, showing the baseline for
entry-level of the 4 related fields is relatively comparable. The distribution of all areas were
positively skewed and narrow, showing that the large proportion concentrated on the lower end.
Besides, data science stands out with the highest median value, which indicates that data science is
more profitable than the others due to its complexity of technical skills and requires higher
educational standards (Kumar, 2024). This is essential for the escalating demand for data science in
5
the current data-driven market (Labb, 2024). Moreover, the boxplot of Data Analysts is narrower
than the general fields, which means there was a low difference and spread between salaries within
the field.
Additionally, the chart delineates numerous outliers in each field which implies significant diversity
and dispersion in the salary distribution. These outliers are potentially influenced by different
geographical locations and varying levels of expertise. In particular, the data science field results as a
dense and high-end outlier (approximately up to $60,000), which indicates profitable opportunities
for individuals with advanced expertise or specialized roles. In contrast, the Data Analyst field shows
a significant outlier near $20,000, with only one data point reaching the highest value which results
6
The boxplot graph and the table indicate the distribution of salary among five major levels and the
Overall, the salary distribution across levels shows fluctuations, with some levels exhibiting positive,
symmetric, and negative skewness in the data. Moreover, the manager results as the highest and the
widest salary range, while junior implies the lowest and the narrowest salary scale.
In the boxplot graph, the minimum salary values across levels were quite consistent and low, which
directly opposite with the maximum value, indicating that the entry baseline for each level was
relatively uniform. Although it could lower the barrier to entry, it also might pose challenges for job
seekers as the low salary might affect their livelihood and harm the companies that require skilled
labour.
Besides, the Associate level still recorded the top salary of about $80,000 with its substantial outliers
with high diversity and dispersion, the result likely influenced by geographical location and
specialized skills of individuals within the levels. Such variation could benefit individuals seeking
job advancement and commission-based roles. On the other hand, other levels lack outliers,
Moreover, median salaries also vary between the levels. The manager standards displayed higher
(mean>median>mode). The reason for this is that managers usually face higher risk and pressure
than other levels, therefore paying a higher salary potentially satisfies such a high level
(thegrownupschool, 2024). Moreover, consultant level distributions are quite symmetrical with the
median approximately $20,000. In contrast, although the median of Junior standard close to the Q3
7
percentiles, the interquartile box was relatively symmetric with few outliers, indicating low variation
The boxplot graph illustrates the distribution of Salaries in different locations in India (2022).
Overall, there was a consistent variation of salary between each location, especially the median
salary, where the majority of locations were at about $10,000, except Bangalore which underwent a
slightly higher salary due to the differences in regional economics and standards. The data could
show that the inequality rate of salary between locations was low in India.
Besides, the interquartile range of salary is also similar to each other, suggesting consistent
variability in salary distribution within each location. Therefore, the differences in salary between
8
locations are not overly pronounced. Additionally, the distribution is positive-skewed exhibiting
longer upper whiskers in most locations. This skewness implicates that there are a large proportion of
employees earning within a narrow range, smaller rates were earned significantly higher.
Moreover, the graph shows numerous outliers in each city, spotting that there were substantial
employees whose salaries were higher than the typical range. This proves that there was an existence
of specialized skills, and high performance which efficiently achieved higher salaries. Additionally,
Bangalore and Mumbai outliers reached approximately $80,000, as those two areas were the major
economic (Briefing, 2024) and technology (Saldanha, 2024) hubs in India. Therefore, professionals
with advanced skills and experiences would earn benefits by leveraging their economic and
technological advantages.
9
II. Descriptive data
The goal of this analysis is to explore the distribution of salary data, examine its key statistical
properties, and derive insights about income patterns. The analysis combines descriptive statistics
Mean 11703.96
Median 9144.551
10
Third Quartile (Q3) 15403.421
The dataset reveals several key observations about salary distribution. Firstly, the mean salary
(11,703.96) is slightly greater than the median (9,144.55), indicating a mild right-skew. However,
this difference is not substantial, suggesting that the distribution is closer to symmetric than highly
skewed. From the histogram, it is evident that most salaries fall below 20,000, with a long tail
extending toward higher values. Secondly, salaries are primarily concentrated within the range of
5,000 to 15,000, which represents the middle 50% (interquartile range, IQR). The 25th percentile
(Q1) is 5,450.86, and the 75th percentile (Q3) is 15,403.42, highlighting where the majority of
salaries lie.
The dataset also shows that low salaries dominate, as the majority of salaries cluster in the 0–20,000
range, with a significant proportion earning below the median of 9,144.55. In contrast, high salaries
are rare, with a few outliers near the maximum of 78,572.08 pulling the mean higher than the
median. Lastly, there is notable variability in salaries, with values ranging from a minimum of
140.58 to a maximum of 78,572.08, indicating significant disparity. The IQR of 9,952.56 illustrates
that while the middle 50% of salaries show considerable variation, they remain far below the
Insight:
11
1. Mild Right-Skewed Distribution
The salary data is slightly skewed to the right, with most employees earning less than 15,000, while a
2. Income Inequality
Salaries are concentrated in the lower ranges, with a substantial portion earning near or below the
median.
High-income outliers create a disparity, but they are not representative of the majority of employees.
3. Salary Clustering
Salaries cluster predominantly in the 5,000–15,000 range, making this the typical earning range for
most employees.
Few employees earn above 40,000, which reflects a smaller proportion of high earners.
4. Outliers
Salaries approaching the maximum of 78,572.08 are likely outliers that could distort overall
Recommendation:
The analysis of salary distribution suggests several actionable recommendations. First, it is essential
to investigate the lower salary range, specifically salaries below the 25th percentile (5,450.86), as
they encompass the lowest-paid employees. This investigation should determine whether these low
salaries are influenced by job type or role, inadequate skill levels, or systemic factors such as pay
inequity. Understanding the underlying reasons can help address potential issues effectively.
12
Second, high salaries near the maximum value (78,572.08) should be validated and addressed. These
high figures could represent genuine outliers, such as executive-level roles, or data anomalies
requiring further validation. Reviewing these salaries is crucial to ensure consistency and fairness
Third, enhancing pay equity should be a priority if salary disparity is found to be systemic. This can
involve implementing pay adjustments for employees in the lowest quartile and introducing
upskilling programs to help lower-paid employees improve their earning potential. Such initiatives
Lastly, performing further segmentation of the salary data could provide valuable insights. Analyzing
salaries by additional factors such as department, role, experience level, or location may help identify
patterns of pay disparity or concentration. This detailed segmentation can guide the development of
more targeted strategies to address any identified issues and ensure fair compensation practices
13
ANOVA
From the ANOVA analysis results, we can see that the p-value is extremely small (less than
2 × 10-16, almost approaching 0). This shows that the difference between groups (industries) is very
clear and has extremely high statistical significance. In other words, the probability of this difference
This confirms that there are significant differences in average salaries between industry groups. This
may be the result of specific factors in each field, such as the level of expertise required, labor supply
and demand, or the economic value that each industry brings. However, to specifically determine
which industry groups have outstanding differences, it is necessary to perform additional Tukey tests
In conclusion, this result emphasizes that the occupational factor plays an important role in
influencing wages, and this is a factor that needs to be carefully considered in labor market-related
analyses or policies.
14
II. Differences of Salary within 4-related fields
The findings indicate the presence of a noteworthy disparity in salaries within four areas of data
profession namely: Data Science, Data Engineering, Machine Learning, and Data Analyst, in
that order from the highest to the lowest salary. Based on ANOVA and Tukey’s test results, Data
Science provides the best annual remuneration whereas Data Analyst offers the least amount. It can
be seen from the pairwise comparisons that all two fields have huge salaries, since p-values were less
than 0.1 showing that there is a significant difference in the earnings across all of them.
Specific differences:
Data Engineering vs. Data Analyst: p~0 so they have a huge difference in their annual salary,
Data Science vs. Data Analyst: p~0 so Data Science will have more salary than Data
15
Machine Learning vs. Data Analyst: p~0 so Machine Learning has nearly $2,913 higher in annual
Data Science vs. Data Engineering: p~0 so people in the Data Science field will gain $2,271 higher
Machine Learning vs. Data Engineering: p~0.01 so the annual salary in the Machine Learning
Machine Learning vs. Data Science: p~0 so Data Science will receive $3,810 higher in annual
Additional Insights:
Salary trends: The data points to a distinct order in the salaries of employees, specialised roles such
as Data Science attracting more pay than general roles like that of Data Analyst. To explain, a report
indicates that the average salary for Data Science can reach around $100,000 per year, while the
salary for Data Analysis usually ranges from $60,000 to $70,000 (Hillier, 2023).
Market Demand: Bearing in mind the attractive salaries offered to data scientists and data
engineers, these particular streams of employment are likely to be sustained over a long period in the
job market. Looking ahead, it is anticipated that there will be an appreciable increase in the uptake of
Data Science and Data Engineering in the Indian market. This, according to NASSCOM, presents a
case where the data science industry in India is likely to stand at approximately $16 bn by the year
2025 and the industry poised to grow with a CAGR of almost 30% (Ait, 2024). Besides, The salaries
for Data Science job positions are around $100,000 per annum on the average, and Data Engineering
Skill Requirements:
16
Another reason for the variation in compensation may be the different skills needed in these different
roles. For instance, Data Science usually requires more significant attention to statistics,
programming, and machine learning, whereas Data Engineering deals with constructing the data
architecture and infrastructure. Typically, a higher salary role will require an advanced degree,
17
CONCLUSION
The analysis of salary data presents a mild right-skewed distribution. The majority of the Employees
earn between the 5,000 and 15000 salary brackets, while there are fewer other employees that earn in
excess of 40,000, hence a considerable rise in the average. Employees are generally associated with
salaries less than 25 percentile (5,450.86) as the lowest paid, while a few outliers found close to the
maximum salary (78,572.08) are responsible for skewing the whole picture. Very few of these
outliers showcase the difference in earnings captured in the dataset. Data science and data
engineering demand much higher salaries. In particular, the data science field results as a dense and
high-end outlier (approximately up to $60,000, in contrast, the Data Analyst field shows a significant
outlier near $20,000. However, the requirements regarding skills and qualification needed to obtain
such high-paying roles include expertise in machine learning, programming, and data architecture.
Trends like this point toward the expected growth of India's data science industry, estimated to grow
to $16 billion by 2025 at a CAGR of 30% (Ait, 2024). This means salaries largely have a
concentration within the 5,000–15,000 salary range, which most employees fall into, while a few
others earn very much higher salaries. It is determined clearly from all such significant factors that
pay levels differ by geography as well as by job seniority; in fact, mostly such pay is restricted to the
management or to some specific location positions. Well, such differences range greatly from these
variances on fields, job levels and regions, which are ANOVA-manipulated and various statistical
visualizations. ANOVA and statistical visualizations show that there are significant variances in
factors such as the field, job level, and region. The analysis thus beneficially helps some individuals
be updated on the salary trends because, in general, it gives good conditions to understand earning
potential across fields and roles and provides a better understanding of market demand and skill
18
REFERENCES
Briefing, I. (2024, February 16). Mumbai investment profile: Economy, infrastructure, industries.
https://www.india-briefing.com/news/mumbai-india-economy-investment-profile-6704.html/
Kumar, A. (2024). Data analyst vs. data scientist: Key difference in 2023 | simplilearn.
Simplilearn.com.
https://www.simplilearn.com/tutorials/data-analytics-tutorial/data-analyst-vs-data-scientist
Labb, Z. L. (2024, February 26). Discover the top data science salaries in india. this comprehensive
article provides insights into the highest-paying jobs in the field. Linkedin.com.
https://www.linkedin.com/pulse/which-data-science-roles-highest-paid-india-learning-labb-4
d86c
Saldanha, K. (2024, March 20). List of top IT cities in india in [2024]. Mygate.
https://mygate.com/blog/neighbourhood/it-cities-in-india/
thegrownupschool. (2024, May 14). Why do top-level managers usually receive high salaries? - the
https://thegrownupschool.com/why-do-top-level-managers-usually-receive-high-salaries/
Ait, S. (2024, April 27). SOM - Đào tạo sau đại học Tổng quan, nhu cầu và tương lai ngành phân
https://som.edu.vn/nhu-cau-va-tuong-lai-cua-nganh-phan-tich-du-lieu/
Hillier, W. (2023, September 28). What Are the Best Data Analyst Salaries by Industry?
Careerfoundry.com.
https://careerfoundry.com/en/blog/data-analytics/data-analyst-salaries-by-industry/
Koyla Tevosyan. (2024, April 5). Best Jobs for Veterans to Civilian Careers - Hiring America.
https://hiringamerica.com/best-jobs-for-veterans-to-civilian-careers/
19
Yosifova, A. (2024, February 29). Data Scientist Job Market in 2024: Analysis, Trends, and
https://365datascience.com/career-advice/data-scientist-job-market/
20