BA Tableau Final Capstone A Section
BA Tableau Final Capstone A Section
R V INSTITUTE OF
MANAGEMENT
CA 17, 26 Main, 36th Cross, 4th T Block, Jayanagar
Bengaluru, Karnataka 560 041
II SEMESTER
Batch 2023-
2025
CAPTSTONE PROJECT ON
“PYTHON”
1 1-4
INTRODUCTION(DATASET)
2 5-9
DESCRIPTIVE STATISTICS
3 10-11
LINEAR REGRESSION MODEL
0
DATA SET: Indian State Work Participation Rates (2011)
Data summary: The Work Participation Rate (WPR) is a key demographic and labour market indicator
that measures the percentage of the working-age population (usually defined as individuals aged 15 to 64 years)
who are either employed or actively seeking employment. This rate provides valuable insights into the level of
economic engagement and labour force participation within a specific population. The principal classification in
the context of Work Participation Rate is the percentage itself, indicating the proportion of the working-age
population involved in some form of work-related activity. It offers a broad overview of the level of workforce
engagement within a particular demographic or geographic area. Subsidiary classifications may include details
such as gender, age groups, and different sectors of employment. Breaking down the Work Participation Rate by
these factors allows for a more nuanced understanding of the labour market dynamics and highlights variations in
workforce participation based on specific demographic characteristics.
More Info Jump to Metadata.
The dataset contains work participation rates and related worker statistics by Indian states for the year 2011. The
dataset includes 35 rows and 19 columns.
2
3
4
5
DESCRIPTIVE SATISTICS:
Descriptive statistics summarize and describe the main features of a dataset through measures like mean,
median, mode, and standard deviation. They provide a quick overview of the data's central tendency, variability,
and distribution. These statistics help in understanding the overall pattern and characteristics of the data without
drawing conclusions beyond the immediate dataset.
MEAN
6
MEDIAN
7
MODE
8
VARIANCE
STANDARD DEVIATION
Interpretation:
MEAN: Highest participation rates are observed in Daman & Diu (53.58) and Andaman & Nicobar Islands
(40.47). Lower participation rates are seen in states like Bihar (28.62) and Lakshadweep (28.01). The data
reveals a variation in work participation across the regions, with most states falling within the 30-40% range.
9
MEDIAN: Bihar (11.83) and Andhra Pradesh (7.45) have higher ratios, indicating a higher proportion of
agricultural labourers relative to other workers. Lakshadweep (0.00) and Daman & Diu (0.18) have very low
or no agricultural labourers compared to main workers. Most states fall between 1-5%, showing varied
reliance on agricultural labour across different regions.
MODE: Mizoram (16.35) and Manipur (14.69) have the highest ratios, indicating a strong focus on
cultivation in these regions. Delhi (0.31) and Lakshadweep (0.00) have very low proportions of cultivators
relative to main workers.
VARIANCE: The variance for the ratio of Agricultural Labourers (AL) to main workers is 6.60, indicating
a relatively low variability in the distribution of this ratio. The Main workers to total workers ratio has a
much higher variance of 42.90, suggesting a wider dispersion of values in this category. Finally, the Work
participation rate has a variance of 23.05, representing a moderate level of variability. In general, higher
variance values indicate more spread in the data, while lower values show that data points are more tightly
clustered around the mean.
STANDARD DEVIATION: The Work participation rate has a standard deviation of 4.80, indicating a
moderate level of variability in this category. The Main workers to total workers ratio has a higher standard
deviation of 6.55, suggesting a broader spread of values around the mean. In contrast, the Agricultural
Labourers (AL) to main workers ratio has a lower standard deviation of 2.57, meaning that the data points are
more tightly clustered around the average. Overall, higher standard deviation values indicate greater
dispersion in the dataset, while lower values suggest less variability.
10
LINEAR REGRESSION MODEL: Linear Regression Model is a statistical method that is
used to predict a continuous dependent variable (target variable) based on one or more independent
variables (predictor variables). This technique assumes a linear relationship between the dependent
and independent variables, which implies that the dependent variable changes proportionally with
changes in the independent variables. In other words, linear regression is used to determine the extent
to which one or more variables can predict the value of the dependent variable.
11
Interpretation
The linear regression model being built and evaluated to predict the Work participation rate based on two
independent variables: Main workers to total worker and Agricultural Labourers (AL) to main worker. The
model is evaluated on the test data (X_test, y_test). The Mean Squared Error (MSE) is 49.47, indicating the
average squared difference between actual and predicted values. The R-squared value is -0.027, which
indicates how well the model explains the variance in the data. An R-squared of close to zero (or negative)
suggests that the model is not a good fit for the data, meaning it does not explain the variability in the
dependent variable effectively.
The linear regression model shows poor predictive performance, with a low (negative) R-squared value,
suggesting that the independent variables do not adequately explain the variability in the Work participation
rate. The high mean squared error further confirms the model's weak predictive ability.
GRAPH
12
Interpretation
There's significant variation in work participation rates across states, ranging from about 28% to over 50%.
Daman & Diu has the highest work participation rate at approximately 54%. Bihar appears to have the lowest
rate at around 28%. The majority of states fall within the 30-40% range for work participation. After Daman
& Diu, states like Sikkim, Mizoram, and Andaman & Nicobar Islands have relatively high participation rates
(above 40%). States like Uttar Pradesh, Punjab, and Assam are on the lower end of the spectrum with rates
below 35%.
HEAT MAP
13
Interpretation
Work participation rate and Main workers to total population: These two variables have a very strong
positive correlation (0.94). This indicates that as the work participation rate increases, the proportion
of main workers to the total population also increases almost proportionally.
14
Work participation rate and Cultivators to main worker: There's a very weak positive correlation
(0.09) between these variables. This suggests that there's virtually no linear relationship between the
overall work participation rate and the proportion of cultivators among main workers.
Main workers to total population and Cultivators to main worker: There's a very weak negative
correlation (-0.09) between these variables. This implies that there's practically no linear relationship
between the proportion of main workers in the population and the proportion of cultivators among
main workers.
The strong correlation between work participation rate and main workers to total population suggests
these metrics are closely related and may be measuring similar aspects of employment.
The weak correlations with the cultivators to main worker ratio indicate that the agricultural
workforce doesn't strongly influence (or isn't strongly influenced by) overall work participation or the
proportion of main workers in the population.
This data suggests that increases in work participation or the proportion of main workers aren't
necessarily tied to changes in the agricultural workforce, pointing to possible diversification in the
types of work contributing to these metrics.
15