0% found this document useful (0 votes)
33 views20 pages

Group 7 - Statistic For Business - Group Report-1

This document is a group assignment cover sheet detailing student information, unit and tutorial details, and assignment specifics for a report on salary distribution in India. The report includes data visualization, descriptive analysis, and ANOVA results, highlighting salary trends across different fields, job levels, and locations. Key findings indicate significant disparities in salaries based on profession and suggest actionable recommendations for addressing pay equity and improving salary structures.

Uploaded by

23004863
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views20 pages

Group 7 - Statistic For Business - Group Report-1

This document is a group assignment cover sheet detailing student information, unit and tutorial details, and assignment specifics for a report on salary distribution in India. The report includes data visualization, descriptive analysis, and ANOVA results, highlighting salary trends across different fields, job levels, and locations. Key findings indicate significant disparities in salaries based on profession and suggest actionable recommendations for addressing pay equity and improving salary structures.

Uploaded by

23004863
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

GROUP ASSIGNMENT COVER SHEET

STUDENT DETAILS

Student name: Lại Khiết Ân Student ID number: 23004863

Student name: Võ Thị Mỹ Duyên Student ID number: 22004047

Student name: Hồ Ái Linh Student ID number: 23006029

Student name: Võ Thu Ngân Student ID number: 22003950

Student name: Nguyễn Kim Thảo Student ID number: 23005095

Student name: Nguyễn Ngọc Như Ý Student ID number: 23005903


UNIT AND TUTORIAL DETAILS

Unit name: Statistic for Business Unit number: SB-T12425PWB-3


Saturday,
Tutorial/Lecture: Lecture Class day and time: 8:00an-11:15am
Lecturer or Tutor name: Bùi Anh Tuấn
ASSIGNMENT DETAILS

Title: Group Report


Length: 2,807 Due date: 30/11/2024 Date submitted: 29/11/2024

DECLARATION
I hold a copy of this assignment if the original is lost or damaged.
I hereby certify that no part of this assignment or product has been copied from any other student’s work or from any
other source except where due acknowledgement is made in the assignment.
I hereby certify that no part of this assignment or product has been submitted by me in another (previous or
current) assessment, except where appropriately referenced, and with prior permission from the Lecturer / Tutor
/ Unit Coordinator for this unit.
No part of the assignment/product has been written/ produced for me by any other person except where
collaboration has been authorised by the Lecturer / Tutor /Unit Coordinator concerned.
I am aware that this work may be reproduced and submitted to plagiarism detection software programs for the purpose
of detecting possible plagiarism (which may retain a copy on its database for future plagiarism checking).

Student’s signature: Lai Khiet An


Student’s signature: Vo Thi My Duyen
Student’s signature: Ho Ai Linh
Student’s signature: Vo Thu Ngan
Student’s signature: Nguyen Kim Thao
Student’s signature: Nguyen Ngoc Nhu Y

1
Note: An examiner or lecturer / tutor has the right to not mark this assignment if the above declaration has not been signed.

2
TABLE OF CONTENTS

INTRODUCTION................................................................................................................................ 4
DATA VISUALIZATION AND DATA DESCRIPTIVE ANALYSIS............................................. 5
I. Data Visualization......................................................................................................................... 5
1. Salary distribution by Field in India (2022)........................................................................... 5
2. Salary distribution by Job Level in India (2022)....................................................................6
3. Salary distribution by Location in India (2022)..................................................................... 8
II. Descriptive data......................................................................................................................... 10
1. Mild Right-Skewed Distribution.......................................................................................... 12
2. Income Inequality.................................................................................................................12
3. Salary Clustering.................................................................................................................. 12
4. Outliers................................................................................................................................. 12
ANOVA................................................................................................................................................14
I. Differences between Salary and 4-related fields:....................................................................... 14
II. Differences of Salary within 4-related fields.............................................................................15
CONCLUSION...................................................................................................................................18
REFERENCES................................................................................................................................... 19

3
INTRODUCTION

The comprehension of salary distribution can result in a better evaluation of the job market and

personal career decisions made by individuals most effectively. In that regard, the study is concerned

with the salary trends viewed in India through field, job level, and location. This analysis also

discusses key variables responsible for such differences to give a full picture of this complete salary

landscape.

The market-overview part of this report is divided into three sections. The first part is about salary

distribution in four fields of data profession: data science, data engineering, machine learning, and

data analytics. The second part talks about salary levels across a range of job seniority: entry-level to

managerial roles. Furthermore, the report looks at the geographical perspective and related patterns

and trends present in different areas of India, based on where they are from. In this regard, nearly all

insights are being presented with statistical visualisation support and some descriptive data. Finally,

it will conduct ANOVA analysis on salaries in the above fields using the purpose of finding

statistically significant variances and their meanings. Evidence is supported with statistical

visualisations, descriptive data, and advanced comparative tests.

Mainly, it adds to the knowledge base of job seekers, giving one a better idea of salary expectations

and market demand. It helps people know in which skills, experience, and localities they can get

great earning potential by aligning one's career with it. Thus, eventually, individuals can make

informed decisions about the career path that will bring them more satisfaction and a sound financial

position. On the other hand, this can serve as a tip to individual organisations, which can help them

spruce up their recruitment strategy to keep an eye on competitive compensation and equal

opportunities.

4
DATA VISUALIZATION AND DATA DESCRIPTIVE ANALYSIS

I. Data Visualization

1. Salary distribution by Field in India (2022)

The boxplot graph illustrates the distribution of salary in four different related fields.

Overall, the minimum Salary of the four related fields was consistent, showing the baseline for

entry-level of the 4 related fields is relatively comparable. The distribution of all areas were

positively skewed and narrow, showing that the large proportion concentrated on the lower end.

Besides, data science stands out with the highest median value, which indicates that data science is

more profitable than the others due to its complexity of technical skills and requires higher

educational standards (Kumar, 2024). This is essential for the escalating demand for data science in

5
the current data-driven market (Labb, 2024). Moreover, the boxplot of Data Analysts is narrower

than the general fields, which means there was a low difference and spread between salaries within

the field.

Additionally, the chart delineates numerous outliers in each field which implies significant diversity

and dispersion in the salary distribution. These outliers are potentially influenced by different

geographical locations and varying levels of expertise. In particular, the data science field results as a

dense and high-end outlier (approximately up to $60,000), which indicates profitable opportunities

for individuals with advanced expertise or specialized roles. In contrast, the Data Analyst field shows

a significant outlier near $20,000, with only one data point reaching the highest value which results

in lower lucrative opportunities compared to the others.

2. Salary distribution by Job Level in India (2022)

6
The boxplot graph and the table indicate the distribution of salary among five major levels and the

frequency of different levels in a sample of 4331 companies in 2022.

Overall, the salary distribution across levels shows fluctuations, with some levels exhibiting positive,

symmetric, and negative skewness in the data. Moreover, the manager results as the highest and the

widest salary range, while junior implies the lowest and the narrowest salary scale.

In the boxplot graph, the minimum salary values across levels were quite consistent and low, which

directly opposite with the maximum value, indicating that the entry baseline for each level was

relatively uniform. Although it could lower the barrier to entry, it also might pose challenges for job

seekers as the low salary might affect their livelihood and harm the companies that require skilled

labour.

Besides, the Associate level still recorded the top salary of about $80,000 with its substantial outliers

with high diversity and dispersion, the result likely influenced by geographical location and

specialized skills of individuals within the levels. Such variation could benefit individuals seeking

job advancement and commission-based roles. On the other hand, other levels lack outliers,

implicating the tighter clustering of salary.

Moreover, median salaries also vary between the levels. The manager standards displayed higher

median salaries compared to the others, posing a positive-skewed distribution of salary

(mean>median>mode). The reason for this is that managers usually face higher risk and pressure

than other levels, therefore paying a higher salary potentially satisfies such a high level

(thegrownupschool, 2024). Moreover, consultant level distributions are quite symmetrical with the

median approximately $20,000. In contrast, although the median of Junior standard close to the Q3

7
percentiles, the interquartile box was relatively symmetric with few outliers, indicating low variation

and dispersion in salary distribution.

3. Salary distribution by Location in India (2022)

The boxplot graph illustrates the distribution of Salaries in different locations in India (2022).

Overall, there was a consistent variation of salary between each location, especially the median

salary, where the majority of locations were at about $10,000, except Bangalore which underwent a

slightly higher salary due to the differences in regional economics and standards. The data could

show that the inequality rate of salary between locations was low in India.

Besides, the interquartile range of salary is also similar to each other, suggesting consistent

variability in salary distribution within each location. Therefore, the differences in salary between

8
locations are not overly pronounced. Additionally, the distribution is positive-skewed exhibiting

longer upper whiskers in most locations. This skewness implicates that there are a large proportion of

employees earning within a narrow range, smaller rates were earned significantly higher.

Moreover, the graph shows numerous outliers in each city, spotting that there were substantial

employees whose salaries were higher than the typical range. This proves that there was an existence

of specialized skills, and high performance which efficiently achieved higher salaries. Additionally,

Bangalore and Mumbai outliers reached approximately $80,000, as those two areas were the major

economic (Briefing, 2024) and technology (Saldanha, 2024) hubs in India. Therefore, professionals

with advanced skills and experiences would earn benefits by leveraging their economic and

technological advantages.

9
II. Descriptive data

The goal of this analysis is to explore the distribution of salary data, examine its key statistical

properties, and derive insights about income patterns. The analysis combines descriptive statistics

with visual information from the histogram provided.

Mean 11703.96

Median 9144.551

Minimum (Min) 140.582

First Quartile (Q1) 5450.861

10
Third Quartile (Q3) 15403.421

Maximum (Max) 78572.078

Sample Size 4311

The dataset reveals several key observations about salary distribution. Firstly, the mean salary

(11,703.96) is slightly greater than the median (9,144.55), indicating a mild right-skew. However,

this difference is not substantial, suggesting that the distribution is closer to symmetric than highly

skewed. From the histogram, it is evident that most salaries fall below 20,000, with a long tail

extending toward higher values. Secondly, salaries are primarily concentrated within the range of

5,000 to 15,000, which represents the middle 50% (interquartile range, IQR). The 25th percentile

(Q1) is 5,450.86, and the 75th percentile (Q3) is 15,403.42, highlighting where the majority of

salaries lie.

The dataset also shows that low salaries dominate, as the majority of salaries cluster in the 0–20,000

range, with a significant proportion earning below the median of 9,144.55. In contrast, high salaries

are rare, with a few outliers near the maximum of 78,572.08 pulling the mean higher than the

median. Lastly, there is notable variability in salaries, with values ranging from a minimum of

140.58 to a maximum of 78,572.08, indicating significant disparity. The IQR of 9,952.56 illustrates

that while the middle 50% of salaries show considerable variation, they remain far below the

influence of the extreme outliers.

Insight:

11
1. Mild Right-Skewed Distribution

The salary data is slightly skewed to the right, with most employees earning less than 15,000, while a

few high earners raise the average.

2. Income Inequality

Salaries are concentrated in the lower ranges, with a substantial portion earning near or below the

median.

High-income outliers create a disparity, but they are not representative of the majority of employees.

3. Salary Clustering

Salaries cluster predominantly in the 5,000–15,000 range, making this the typical earning range for

most employees.

Few employees earn above 40,000, which reflects a smaller proportion of high earners.

4. Outliers

Salaries approaching the maximum of 78,572.08 are likely outliers that could distort overall

statistical summaries. These may require further investigation.

Recommendation:

The analysis of salary distribution suggests several actionable recommendations. First, it is essential

to investigate the lower salary range, specifically salaries below the 25th percentile (5,450.86), as

they encompass the lowest-paid employees. This investigation should determine whether these low

salaries are influenced by job type or role, inadequate skill levels, or systemic factors such as pay

inequity. Understanding the underlying reasons can help address potential issues effectively.

12
Second, high salaries near the maximum value (78,572.08) should be validated and addressed. These

high figures could represent genuine outliers, such as executive-level roles, or data anomalies

requiring further validation. Reviewing these salaries is crucial to ensure consistency and fairness

across the organization.

Third, enhancing pay equity should be a priority if salary disparity is found to be systemic. This can

involve implementing pay adjustments for employees in the lowest quartile and introducing

upskilling programs to help lower-paid employees improve their earning potential. Such initiatives

can contribute to a more equitable and motivated workforce.

Lastly, performing further segmentation of the salary data could provide valuable insights. Analyzing

salaries by additional factors such as department, role, experience level, or location may help identify

patterns of pay disparity or concentration. This detailed segmentation can guide the development of

more targeted strategies to address any identified issues and ensure fair compensation practices

across the organization.

13
ANOVA

I. Differences between Salary and 4-related fields:

Df Sum Sq Mean Sq F value Pr(>F)

as.factor(Field) 3 3.075e+10 1.025e+10 125.2 <2e-16 ***

Residuals 4307 3.528e+11 8.191e+07

From the ANOVA analysis results, we can see that the p-value is extremely small (less than

2 × 10-16, almost approaching 0). This shows that the difference between groups (industries) is very

clear and has extremely high statistical significance. In other words, the probability of this difference

occurring due to random factors is almost zero.

This confirms that there are significant differences in average salaries between industry groups. This

may be the result of specific factors in each field, such as the level of expertise required, labor supply

and demand, or the economic value that each industry brings. However, to specifically determine

which industry groups have outstanding differences, it is necessary to perform additional Tukey tests

to compare each pair of groups in detail.

In conclusion, this result emphasizes that the occupational factor plays an important role in

influencing wages, and this is a factor that needs to be carefully considered in labor market-related

analyses or policies.

14
II. Differences of Salary within 4-related fields

Diff lwr upr p adj

Data Engineering-Data Analyst 4452.660 4154.406 4750.914 0.0000000

Data Science-Data Analyst 6724.063 6478.899 6969.227 0.0000000

Machine Learning-Data Analyst 2913.164 2587.193 3239.134 0.0000000

Data Science-Data Engineering 2271.403 2007.834 2534.973 0.0000001

Machine Learning-Data Engineering -1539.496 -1879.526 -1199.466 0.0093616

Machine Learning-Data Science -3810.900 -4105.468 -3516.331 0.0000000

The findings indicate the presence of a noteworthy disparity in salaries within four areas of data

profession namely: Data Science, Data Engineering, Machine Learning, and Data Analyst, in

that order from the highest to the lowest salary. Based on ANOVA and Tukey’s test results, Data

Science provides the best annual remuneration whereas Data Analyst offers the least amount. It can

be seen from the pairwise comparisons that all two fields have huge salaries, since p-values were less

than 0.1 showing that there is a significant difference in the earnings across all of them.

Specific differences:

Data Engineering vs. Data Analyst: p~0 so they have a huge difference in their annual salary,

approximately $4,452 higher for Data Engineering.

Data Science vs. Data Analyst: p~0 so Data Science will have more salary than Data

Analyst,approximately $6,724 higher for Data Science.

15
Machine Learning vs. Data Analyst: p~0 so Machine Learning has nearly $2,913 higher in annual

salary than Data Analyst.

Data Science vs. Data Engineering: p~0 so people in the Data Science field will gain $2,271 higher

in annual salary than those in Data Engineering.

Machine Learning vs. Data Engineering: p~0.01 so the annual salary in the Machine Learning

field will be $1,539 lower than Data Engineering.

Machine Learning vs. Data Science: p~0 so Data Science will receive $3,810 higher in annual

salary than people in the Machine Learning field.

Additional Insights:

Salary trends: The data points to a distinct order in the salaries of employees, specialised roles such

as Data Science attracting more pay than general roles like that of Data Analyst. To explain, a report

indicates that the average salary for Data Science can reach around $100,000 per year, while the

salary for Data Analysis usually ranges from $60,000 to $70,000 (Hillier, 2023).

Market Demand: Bearing in mind the attractive salaries offered to data scientists and data

engineers, these particular streams of employment are likely to be sustained over a long period in the

job market. Looking ahead, it is anticipated that there will be an appreciable increase in the uptake of

Data Science and Data Engineering in the Indian market. This, according to NASSCOM, presents a

case where the data science industry in India is likely to stand at approximately $16 bn by the year

2025 and the industry poised to grow with a CAGR of almost 30% (Ait, 2024). Besides, The salaries

for Data Science job positions are around $100,000 per annum on the average, and Data Engineering

pays almost the same as well (Yosifova, 2024).

Skill Requirements:

16
Another reason for the variation in compensation may be the different skills needed in these different

roles. For instance, Data Science usually requires more significant attention to statistics,

programming, and machine learning, whereas Data Engineering deals with constructing the data

architecture and infrastructure. Typically, a higher salary role will require an advanced degree,

master's, or pertinent certifications as a minimum qualification (Koyla Tevosyan, 2024).

17
CONCLUSION

The analysis of salary data presents a mild right-skewed distribution. The majority of the Employees

earn between the 5,000 and 15000 salary brackets, while there are fewer other employees that earn in

excess of 40,000, hence a considerable rise in the average. Employees are generally associated with

salaries less than 25 percentile (5,450.86) as the lowest paid, while a few outliers found close to the

maximum salary (78,572.08) are responsible for skewing the whole picture. Very few of these

outliers showcase the difference in earnings captured in the dataset. Data science and data

engineering demand much higher salaries. In particular, the data science field results as a dense and

high-end outlier (approximately up to $60,000, in contrast, the Data Analyst field shows a significant

outlier near $20,000. However, the requirements regarding skills and qualification needed to obtain

such high-paying roles include expertise in machine learning, programming, and data architecture.

Trends like this point toward the expected growth of India's data science industry, estimated to grow

to $16 billion by 2025 at a CAGR of 30% (Ait, 2024). This means salaries largely have a

concentration within the 5,000–15,000 salary range, which most employees fall into, while a few

others earn very much higher salaries. It is determined clearly from all such significant factors that

pay levels differ by geography as well as by job seniority; in fact, mostly such pay is restricted to the

management or to some specific location positions. Well, such differences range greatly from these

variances on fields, job levels and regions, which are ANOVA-manipulated and various statistical

visualizations. ANOVA and statistical visualizations show that there are significant variances in

factors such as the field, job level, and region. The analysis thus beneficially helps some individuals

be updated on the salary trends because, in general, it gives good conditions to understand earning

potential across fields and roles and provides a better understanding of market demand and skill

requirement in an evolving job market.

18
REFERENCES

Briefing, I. (2024, February 16). Mumbai investment profile: Economy, infrastructure, industries.

India Briefing News.

https://www.india-briefing.com/news/mumbai-india-economy-investment-profile-6704.html/

Kumar, A. (2024). Data analyst vs. data scientist: Key difference in 2023 | simplilearn.

Simplilearn.com.

https://www.simplilearn.com/tutorials/data-analytics-tutorial/data-analyst-vs-data-scientist

Labb, Z. L. (2024, February 26). Discover the top data science salaries in india. this comprehensive

article provides insights into the highest-paying jobs in the field. Linkedin.com.

https://www.linkedin.com/pulse/which-data-science-roles-highest-paid-india-learning-labb-4

d86c

Saldanha, K. (2024, March 20). List of top IT cities in india in [2024]. Mygate.

https://mygate.com/blog/neighbourhood/it-cities-in-india/

thegrownupschool. (2024, May 14). Why do top-level managers usually receive high salaries? - the

grown-up school. The Grown-up School.

https://thegrownupschool.com/why-do-top-level-managers-usually-receive-high-salaries/

Ait, S. (2024, April 27). SOM - Đào tạo sau đại học Tổng quan, nhu cầu và tương lai ngành phân

tích dữ liệu. SOM - Đào Tạo Sau Đại Học.

https://som.edu.vn/nhu-cau-va-tuong-lai-cua-nganh-phan-tich-du-lieu/

Hillier, W. (2023, September 28). What Are the Best Data Analyst Salaries by Industry?

Careerfoundry.com.

https://careerfoundry.com/en/blog/data-analytics/data-analyst-salaries-by-industry/

Koyla Tevosyan. (2024, April 5). Best Jobs for Veterans to Civilian Careers - Hiring America.

Hiring America - Improving Veteran’s Lives.

https://hiringamerica.com/best-jobs-for-veterans-to-civilian-careers/

19
Yosifova, A. (2024, February 29). Data Scientist Job Market in 2024: Analysis, Trends, and

Opportunities. 365 Data Science.

https://365datascience.com/career-advice/data-scientist-job-market/

20

You might also like