0% found this document useful (0 votes)
17 views17 pages

BA Tableau Final Capstone A Section

Uploaded by

Rudra Prasad PRK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views17 pages

BA Tableau Final Capstone A Section

Uploaded by

Rudra Prasad PRK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

RASHTREEYA SIKSHANA SAMITHI TRUST

R V INSTITUTE OF
MANAGEMENT
CA 17, 26 Main, 36th Cross, 4th T Block, Jayanagar
Bengaluru, Karnataka 560 041

II SEMESTER
Batch 2023-
2025

SUBJECT CODE: 23MBA721

SUBJECT: BUSINESS ANALYTICS SKILLS

CAPTSTONE PROJECT ON
“PYTHON”

NAME OF THE STUDENT Rudraprasad N

SEMESTER 2nd Semester, ‘C’ Section

REGISTERED NUMBER P18FW23M015109

NAME OF THE FACULTY Prof. Mithun D J


TABLE OF CONTENT

SL NO. TOPIC PAGE


NO.

1 1-4
INTRODUCTION(DATASET)

2 5-9
DESCRIPTIVE STATISTICS

3 10-11
LINEAR REGRESSION MODEL

4 GRAPH and HEAT MAP 11-14

0
DATA SET: Indian State Work Participation Rates (2011)

Source of Data set: www.kaggle.com

Data summary: The Work Participation Rate (WPR) is a key demographic and labour market indicator
that measures the percentage of the working-age population (usually defined as individuals aged 15 to 64 years)
who are either employed or actively seeking employment. This rate provides valuable insights into the level of
economic engagement and labour force participation within a specific population. The principal classification in
the context of Work Participation Rate is the percentage itself, indicating the proportion of the working-age
population involved in some form of work-related activity. It offers a broad overview of the level of workforce
engagement within a particular demographic or geographic area. Subsidiary classifications may include details
such as gender, age groups, and different sectors of employment. Breaking down the Work Participation Rate by
these factors allows for a more nuanced understanding of the labour market dynamics and highlights variations in
workforce participation based on specific demographic characteristics.
More Info Jump to Metadata.

The dataset contains work participation rates and related worker statistics by Indian states for the year 2011. The
dataset includes 35 rows and 19 columns.

Description of the Variables:


 srcStateName: Indicates the name of the Indian state.
o Type of variable: Categorical (Nominal).
 srcYear: Indicates the year of the data (2011).
o Type of variable: Categorical (Ordinal).
 Work participation rate: Percentage of the population involved in the workforce.
o Type of variable: Numerical (Continuous).
 Main workers to total population: Percentage of main (full-time) workers relative to the total
population.
o Type of variable: Numerical (Continuous).
 Main workers to total worker: Percentage of main workers relative to the total workforce.
o Type of variable: Numerical (Continuous).
1
 Cultivators (CL) to main worker: Percentage of cultivators among main workers.
o Type of variable: Numerical (Continuous).
 Agricultural Labourers (AL) to main worker: Percentage of agricultural labourers among main
workers.
o Type of variable: Numerical (Continuous).
 Workers in Household (HH) Industry to marginal worker: Percentage of workers in household
industries among marginal workers.
o Type of variable: Numerical (Continuous).
 Other workers (OT) to main worker: Percentage of other types of workers among main workers.
o Type of variable: Numerical (Continuous).
 Marginal workers to total population: Percentage of marginal (part-time or irregular) workers relative
to the total population.
o Type of variable: Numerical (Continuous).
 Marginal workers to total worker: Percentage of marginal workers relative to the total workforce.
o Type of variable: Numerical (Continuous).
 Cultivators (CL) to marginal worker: Percentage of cultivators among marginal workers.
o Type of variable: Numerical (Continuous).
 Agricultural Labourers (AL) to marginal worker: Percentage of agricultural labourers among marginal
workers.
o Type of variable: Numerical (Continuous).
 Workers in Household (HH) Industry to marginal worker: Percentage of workers in household
industries among marginal workers.
o Type of variable: Numerical (Continuous).
 Other workers (OT) to marginal worker: Percentage of other types of workers among marginal
workers.
o Type of variable: Numerical (Continuous).
 Non worker to total population: Percentage of non-workers relative to the total population.
o Type of variable: Numerical (Continuous).

2
3
4
5
DESCRIPTIVE SATISTICS:

Descriptive statistics summarize and describe the main features of a dataset through measures like mean,
median, mode, and standard deviation. They provide a quick overview of the data's central tendency, variability,
and distribution. These statistics help in understanding the overall pattern and characteristics of the data without
drawing conclusions beyond the immediate dataset.

MEAN

6
MEDIAN

7
MODE

8
VARIANCE

STANDARD DEVIATION

Interpretation:

MEAN: Highest participation rates are observed in Daman & Diu (53.58) and Andaman & Nicobar Islands
(40.47). Lower participation rates are seen in states like Bihar (28.62) and Lakshadweep (28.01). The data
reveals a variation in work participation across the regions, with most states falling within the 30-40% range.

9
MEDIAN: Bihar (11.83) and Andhra Pradesh (7.45) have higher ratios, indicating a higher proportion of
agricultural labourers relative to other workers. Lakshadweep (0.00) and Daman & Diu (0.18) have very low
or no agricultural labourers compared to main workers. Most states fall between 1-5%, showing varied
reliance on agricultural labour across different regions.

MODE: Mizoram (16.35) and Manipur (14.69) have the highest ratios, indicating a strong focus on
cultivation in these regions. Delhi (0.31) and Lakshadweep (0.00) have very low proportions of cultivators
relative to main workers.

VARIANCE: The variance for the ratio of Agricultural Labourers (AL) to main workers is 6.60, indicating
a relatively low variability in the distribution of this ratio. The Main workers to total workers ratio has a
much higher variance of 42.90, suggesting a wider dispersion of values in this category. Finally, the Work
participation rate has a variance of 23.05, representing a moderate level of variability. In general, higher
variance values indicate more spread in the data, while lower values show that data points are more tightly
clustered around the mean.

STANDARD DEVIATION: The Work participation rate has a standard deviation of 4.80, indicating a
moderate level of variability in this category. The Main workers to total workers ratio has a higher standard
deviation of 6.55, suggesting a broader spread of values around the mean. In contrast, the Agricultural
Labourers (AL) to main workers ratio has a lower standard deviation of 2.57, meaning that the data points are
more tightly clustered around the average. Overall, higher standard deviation values indicate greater
dispersion in the dataset, while lower values suggest less variability.

10
LINEAR REGRESSION MODEL: Linear Regression Model is a statistical method that is
used to predict a continuous dependent variable (target variable) based on one or more independent
variables (predictor variables). This technique assumes a linear relationship between the dependent
and independent variables, which implies that the dependent variable changes proportionally with
changes in the independent variables. In other words, linear regression is used to determine the extent
to which one or more variables can predict the value of the dependent variable.

11
Interpretation

The linear regression model being built and evaluated to predict the Work participation rate based on two
independent variables: Main workers to total worker and Agricultural Labourers (AL) to main worker. The
model is evaluated on the test data (X_test, y_test). The Mean Squared Error (MSE) is 49.47, indicating the
average squared difference between actual and predicted values. The R-squared value is -0.027, which
indicates how well the model explains the variance in the data. An R-squared of close to zero (or negative)
suggests that the model is not a good fit for the data, meaning it does not explain the variability in the
dependent variable effectively.
The linear regression model shows poor predictive performance, with a low (negative) R-squared value,
suggesting that the independent variables do not adequately explain the variability in the Work participation
rate. The high mean squared error further confirms the model's weak predictive ability.

GRAPH

12
Interpretation

There's significant variation in work participation rates across states, ranging from about 28% to over 50%.
Daman & Diu has the highest work participation rate at approximately 54%. Bihar appears to have the lowest
rate at around 28%. The majority of states fall within the 30-40% range for work participation. After Daman
& Diu, states like Sikkim, Mizoram, and Andaman & Nicobar Islands have relatively high participation rates
(above 40%). States like Uttar Pradesh, Punjab, and Assam are on the lower end of the spectrum with rates
below 35%.

HEAT MAP

13
Interpretation

 Work participation rate and Main workers to total population: These two variables have a very strong
positive correlation (0.94). This indicates that as the work participation rate increases, the proportion
of main workers to the total population also increases almost proportionally.

14
 Work participation rate and Cultivators to main worker: There's a very weak positive correlation
(0.09) between these variables. This suggests that there's virtually no linear relationship between the
overall work participation rate and the proportion of cultivators among main workers.

 Main workers to total population and Cultivators to main worker: There's a very weak negative
correlation (-0.09) between these variables. This implies that there's practically no linear relationship
between the proportion of main workers in the population and the proportion of cultivators among
main workers.

 The strong correlation between work participation rate and main workers to total population suggests
these metrics are closely related and may be measuring similar aspects of employment.

 The weak correlations with the cultivators to main worker ratio indicate that the agricultural
workforce doesn't strongly influence (or isn't strongly influenced by) overall work participation or the
proportion of main workers in the population.

 This data suggests that increases in work participation or the proportion of main workers aren't
necessarily tied to changes in the agricultural workforce, pointing to possible diversification in the
types of work contributing to these metrics.

15

You might also like