0% found this document useful (0 votes)

5 views

Code PLFS MVPA

python code for regression on PLFS

Uploaded by

vtechonlinejobs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Code PLFS MVPA

python code for regression on PLFS

Uploaded by

vtechonlinejobs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

4/2/24, 2:43 PM PLFS_MVPA

In [1]: # Step 1
# upload the dataset

import pandas as pd
df = pd.read_excel('C:/Users/user/Desktop/PLFS_2022_23.xlsx')

In [2]: #List the columns in the dataset

df.columns.tolist()

['Sector',
Out[2]:
'State',
'Religion',
'Social Group',
'Sex',
'Age',
'Marital Status',
'General Education',
'Technical Education',
'No of years in formal education',
'Status of Current Attendance in Educational Institution',
'Whether received any Vocational/ Technical Training',
'Duration of Training',
'Status Code',
'Industry Code',
'Whether Engaged in any work in Subsidiary Capacity',
'No of Workers in the Enterprise',
'Type of Job Contract',
'Eligible of Paid Leave',
'Social Security Benefits',
'Earning for Regular Salaried/ Wage Workers',
'Earnings for Self Employed']

In [3]: # Data cleaning step - 2

# Sector variable - Assigning Rural as 0 and Urban as 1

df['Sector'] = df['Sector'].apply(lambda x: 1 if x == 2 else 0)

In [4]: df['Sector'].value_counts()

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 1/12

4/2/24, 2:43 PM PLFS_MVPA

0 56713
Out[4]:
1 31542
Name: Sector, dtype: int64

In [5]: # Data Cleaning - Step 3

# Assign 1 for Hinduism (majority) and 0 for other religions (minority)

df['Religion'] = df['Religion'].apply(lambda x: 1 if x == 1 else 0)

In [6]: df['Religion'].value_counts()

1 72706
Out[6]:
0 15549
Name: Religion, dtype: int64

In [7]: # Data Cleaning - Step 4

# Assign 0 to SC/ST/OBC and 1 for others

df['Social Group'] = df['Social Group'].apply(lambda x: 1 if x == 9 else 0)

df['Social Group'].value_counts()

0 71226
Out[7]:
1 17029
Name: Social Group, dtype: int64

In [8]: # Data Cleaning - Step 5

# Assign 1 to Male and 0 for others

df['Sex'] = df['Sex'].apply(lambda x: 1 if x == 1 else 0)

df['Sex'].value_counts()

1 45439
Out[8]:
0 42816
Name: Sex, dtype: int64

In [9]: # Data Cleaning - Step 6

# Assign 0 to upto higher secondary education and 1 for above higher secondary education

df['General Education'] = df['General Education'].apply(lambda x: 0 if x in (1,2,3,4,5,6,7,8,10) else 1)

In [10]: # Data Cleaning - Step 7

# Assign 0 to NO technical education and 1 for others

df['Technical Education'] = df['Technical Education'].apply(lambda x: 0 if x == 1 else 1)

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 2/12
4/2/24, 2:43 PM PLFS_MVPA

In [11]: # Data Cleaning - Step 8

# Assign 0 to received no vocational/technical training and 1 for others

df['Whether received any Vocational/ Technical Training'] = df['Whether received any Vocational/ Technical Training'].apply(lambd

In [12]: # Data Cleaning - Step 9

# Assign 0 to being Engaged in any work in Subsidiary Capacity and 1 for No

df['Whether Engaged in any work in Subsidiary Capacity'] = df['Whether Engaged in any work in Subsidiary Capacity'].apply(lambda

In [13]: # Data Cleaning - Step 10

# Assign 0 to NO written contract and 1 for others

df['Type of Job Contract'] = df['Type of Job Contract'].apply(lambda x: 0 if x == 1 else 1)

In [14]: # Data Cleaning - Step 11

# Assign 1 to currently married and 0 for others

df['Marital Status'] = df['Marital Status'].apply(lambda x: 1 if x == 2 else 0)

In [15]: # Data Cleaning - Step 12

# Adding new Log columns to my df to deal with high variations in both the Earning Columns

import numpy as np

epsilon = 1e-7

df['log_sal'] = np.log(df['Earning for Regular Salaried/ Wage Workers'] + epsilon)

df['log_self'] = np.log(df['Earnings for Self Employed']+ epsilon)

In [16]: # Data Cleaning - Step 13

#Adding new squared columns to handle in case of non linear relations

df['Age_sq'] = df['Age'] ** 2
df['Formal_Edu_sq'] = df['No of years in formal education'] ** 2

In [17]: #List all the final columns in the dataset

df.columns

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 3/12

4/2/24, 2:43 PM PLFS_MVPA

Index(['Sector', 'State', 'Religion', 'Social Group', 'Sex', 'Age',

Out[17]:
'Marital Status', 'General Education', 'Technical Education',
'No of years in formal education',
'Status of Current Attendance in Educational Institution',
'Whether received any Vocational/ Technical Training',
'Duration of Training', 'Status Code', 'Industry Code',
'Whether Engaged in any work in Subsidiary Capacity',
'No of Workers in the Enterprise', 'Type of Job Contract',
'Eligible of Paid Leave', 'Social Security Benefits',
'Earning for Regular Salaried/ Wage Workers',
'Earnings for Self Employed', 'log_sal', 'log_self', 'Age_sq',
'Formal_Edu_sq'],
dtype='object')

In [18]: # Data Cleaning - Step 14

# Seggregating data for salaried population and self earning population into 2 separate dataframes

df_sal = df[df['Earning for Regular Salaried/ Wage Workers'] > 0]

df_self = df[df['Earnings for Self Employed'] > 0]

In [19]: #Step - 15 - Regression Model

# MODEL No. 1 - Estimating Earnings for Regular Salaried/ Wage Workers

import statsmodels.api as sm

# Define the independent variables

independent_vars = [
'Sector',
'Religion',
'Sex',
'Age',
'Social Group',
'General Education',
'Marital Status',
'Technical Education',
'No of years in formal education',
'Whether received any Vocational/ Technical Training',
'Whether Engaged in any work in Subsidiary Capacity',
'Type of Job Contract',
'Age_sq',
'Formal_Edu_sq'

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 4/12

4/2/24, 2:43 PM PLFS_MVPA

# Add a constant to the independent variables

X = sm.add_constant(df_sal[independent_vars])

# Define the target variable :

y = df_sal['log_sal']

# Fit the linear regression model

model = sm.OLS(y, X).fit()

# Print the model summary

print(model.summary())

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 5/12

4/2/24, 2:43 PM PLFS_MVPA

OLS Regression Results

==============================================================================
Dep. Variable: log_sal R-squared: 0.502
Model: OLS Adj. R-squared: 0.501
Method: Least Squares F-statistic: 403.0
Date: Tue, 02 Apr 2024 Prob (F-statistic): 0.00
Time: 14:36:52 Log-Likelihood: -4655.5
No. Observations: 5610 AIC: 9341.
Df Residuals: 5595 BIC: 9441.
Df Model: 14
Covariance Type: nonrobust
=======================================================================================================================
coef std err t P>|t| [0.025 0.975]
-----------------------------------------------------------------------------------------------------------------------
const 6.7024 0.090 74.836 0.000 6.527 6.878
Sector 0.1722 0.017 9.907 0.000 0.138 0.206
Religion 0.0626 0.021 3.018 0.003 0.022 0.103
Sex 0.5594 0.022 25.993 0.000 0.517 0.602
Age 0.0596 0.005 13.043 0.000 0.051 0.069
Social Group 0.1134 0.017 6.624 0.000 0.080 0.147
General Education 0.0320 0.036 0.898 0.369 -0.038 0.102
Marital Status 0.1157 0.022 5.297 0.000 0.073 0.159
Technical Education 0.2505 0.029 8.581 0.000 0.193 0.308
No of years in formal education 0.0017 0.006 0.283 0.777 -0.010 0.013
Whether received any Vocational/ Technical Training 0.0275 0.015 1.815 0.070 -0.002 0.057
Whether Engaged in any work in Subsidiary Capacity 0.2856 0.028 10.222 0.000 0.231 0.340
Type of Job Contract 0.4822 0.017 27.850 0.000 0.448 0.516
Age_sq -0.0006 5.59e-05 -11.248 0.000 -0.001 -0.001
Formal_Edu_sq 0.0022 0.000 5.798 0.000 0.001 0.003
==============================================================================
Omnibus: 173.792 Durbin-Watson: 1.969
Prob(Omnibus): 0.000 Jarque-Bera (JB): 355.426
Skew: -0.205 Prob(JB): 6.61e-78
Kurtosis: 4.163 Cond. No. 2.05e+04
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.05e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

In [20]: # Step - 16
# Calculate VIF values for the Model - 1

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 6/12

4/2/24, 2:43 PM PLFS_MVPA

from statsmodels.stats.outliers_influence import variance_inflation_factor

from statsmodels.tools.tools import add_constant

# Define independent variables used in the regression model

# Add a constant column for intercept (necessary for VIF calculation)

df_with_const = add_constant(df_sal[independent_vars])

# Calculate VIF for each independent variable

vif_data = pd.DataFrame()
vif_data["Variable"] = df_with_const.columns
vif_data["VIF"] = [variance_inflation_factor(df_with_const.values, i) for i in range(df_with_const.shape[1])]

print(vif_data)

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 7/12

4/2/24, 2:43 PM PLFS_MVPA

Variable VIF
0 const 145.780813
1 Sector 1.190473
2 Religion 1.066603
3 Sex 1.091667
4 Age 48.178772
5 Social Group 1.086503
6 General Education 5.226346
7 Marital Status 1.666061
8 Technical Education 1.353767
9 No of years in formal education 15.317316
10 Whether received any Vocational/ Technical Tra... 1.045240
11 Whether Engaged in any work in Subsidiary Capa... 1.182732
12 Type of Job Contract 1.218561
13 Age_sq 44.150284
14 Formal_Edu_sq 25.843036

In [21]: # Step - 17
# MODEL No. 2 - Estimating Earnings for Self Employed

import statsmodels.api as sm

# Define the independent variables

# Add a constant to the independent variables

X = sm.add_constant(df_self[independent_vars])
file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 8/12
4/2/24, 2:43 PM PLFS_MVPA

# Define the target variable :

y = df_self['log_self']

# Fit the linear regression model

model = sm.OLS(y, X).fit()

# Print the model summary

print(model.summary())

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 9/12

4/2/24, 2:43 PM PLFS_MVPA

OLS Regression Results

==============================================================================
Dep. Variable: log_self R-squared: 0.418
Model: OLS Adj. R-squared: 0.418
Method: Least Squares F-statistic: 715.8
Date: Tue, 02 Apr 2024 Prob (F-statistic): 0.00
Time: 14:37:51 Log-Likelihood: -12329.
No. Observations: 13960 AIC: 2.469e+04
Df Residuals: 13945 BIC: 2.480e+04
Df Model: 14
Covariance Type: nonrobust
=======================================================================================================================
coef std err t P>|t| [0.025 0.975]
-----------------------------------------------------------------------------------------------------------------------
const 6.4892 0.080 80.708 0.000 6.332 6.647
Sector 0.3093 0.012 26.292 0.000 0.286 0.332
Religion -0.0239 0.014 -1.753 0.080 -0.051 0.003
Sex 0.8892 0.014 64.611 0.000 0.862 0.916
Age 0.0461 0.003 17.629 0.000 0.041 0.051
Social Group 0.1482 0.013 11.301 0.000 0.122 0.174
General Education 0.0799 0.033 2.435 0.015 0.016 0.144
Marital Status 0.0814 0.016 5.022 0.000 0.050 0.113
Technical Education 0.0932 0.043 2.156 0.031 0.008 0.178
No of years in formal education 0.0147 0.004 3.808 0.000 0.007 0.022
Whether received any Vocational/ Technical Training 0.0205 0.011 1.946 0.052 -0.000 0.041
Whether Engaged in any work in Subsidiary Capacity 0.1606 0.013 11.935 0.000 0.134 0.187
Type of Job Contract 0.5474 0.058 9.418 0.000 0.433 0.661
Age_sq -0.0005 2.87e-05 -18.217 0.000 -0.001 -0.000
Formal_Edu_sq 0.0004 0.000 1.271 0.204 -0.000 0.001
==============================================================================
Omnibus: 736.413 Durbin-Watson: 1.970
Prob(Omnibus): 0.000 Jarque-Bera (JB): 1526.202
Skew: -0.368 Prob(JB): 0.00
Kurtosis: 4.443 Cond. No. 4.34e+04
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 4.34e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

In [22]: # Step - 18
# Calculate VIF Values for Model - 2

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 10/12

4/2/24, 2:43 PM PLFS_MVPA

from statsmodels.stats.outliers_influence import variance_inflation_factor

from statsmodels.tools.tools import add_constant

# Define independent variables used in the regression model

# Add a constant column for intercept (necessary for VIF calculation)

df_with_const = add_constant(df_self[independent_vars])

# Calculate VIF for each independent variable

vif_data = pd.DataFrame()
vif_data["Variable"] = df_with_const.columns
vif_data["VIF"] = [variance_inflation_factor(df_with_const.values, i) for i in range(df_with_const.shape[1])]

print(vif_data)

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 11/12

4/2/24, 2:43 PM PLFS_MVPA

Variable VIF
0 const 263.213006
1 Sector 1.163624
2 Religion 1.074025
3 Sex 1.220613
4 Age 44.292376
5 Social Group 1.096183
6 General Education 4.291929
7 Marital Status 1.206137
8 Technical Education 1.179082
9 No of years in formal education 16.354550
10 Whether received any Vocational/ Technical Tra... 1.109433
11 Whether Engaged in any work in Subsidiary Capa... 1.116591
12 Type of Job Contract 1.026795
13 Age_sq 43.738403
14 Formal_Edu_sq 25.047101

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 12/12

Midterm Fall2011
No ratings yet
Midterm Fall2011
13 pages
Ps 3
No ratings yet
Ps 3
13 pages
Coding Activity 3.ipynb - Colaboratory
No ratings yet
Coding Activity 3.ipynb - Colaboratory
7 pages
Note 4
No ratings yet
Note 4
18 pages
Adoption
No ratings yet
Adoption
7 pages
5103A1
No ratings yet
5103A1
6 pages
Results
No ratings yet
Results
7 pages
Department of Economics Problem Set
No ratings yet
Department of Economics Problem Set
5 pages
Franciele - Bloco de Notas
No ratings yet
Franciele - Bloco de Notas
6 pages
Weka
No ratings yet
Weka
9 pages
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
No ratings yet
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
12 pages
OLS Stata9
No ratings yet
OLS Stata9
11 pages
R Programing 6 Feb
No ratings yet
R Programing 6 Feb
10 pages
Student Notebook HR Analysis
No ratings yet
Student Notebook HR Analysis
11 pages
Linear Regression Using R
No ratings yet
Linear Regression Using R
24 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
Support Vector Machines - Problem - Statement
No ratings yet
Support Vector Machines - Problem - Statement
15 pages
Assignment -Group 3
No ratings yet
Assignment -Group 3
2 pages
EC4401-07-2023
No ratings yet
EC4401-07-2023
7 pages
First Binary
No ratings yet
First Binary
2 pages
DW 14
No ratings yet
DW 14
14 pages
283 (1)
No ratings yet
283 (1)
7 pages
CH 5 - Multicollearity
No ratings yet
CH 5 - Multicollearity
27 pages
Frequencies
No ratings yet
Frequencies
14 pages
R Working Manuals Students
No ratings yet
R Working Manuals Students
11 pages
Introduction To Quantitative Methods
No ratings yet
Introduction To Quantitative Methods
33 pages
OLS Stata9
No ratings yet
OLS Stata9
13 pages
Probit and Logit Models Stata Program and Output PDF
No ratings yet
Probit and Logit Models Stata Program and Output PDF
10 pages
Homework 2 Questions
No ratings yet
Homework 2 Questions
7 pages
Project Advanced Statistics UMESHHASIJA SEP2021 Jupyter File
100% (1)
Project Advanced Statistics UMESHHASIJA SEP2021 Jupyter File
25 pages
aiml_
No ratings yet
aiml_
27 pages
hw-3
No ratings yet
hw-3
20 pages
Untitled4 Assigment 3
No ratings yet
Untitled4 Assigment 3
9 pages
Dummy Variable Regression Models (Lec 8-9) : 1 Nguyen Thu Hang, BMNV, FTU CS2
No ratings yet
Dummy Variable Regression Models (Lec 8-9) : 1 Nguyen Thu Hang, BMNV, FTU CS2
48 pages
GMU Econ535-Applied Econometrics Problem Set2 (PS2) Solutions Spring 2024
No ratings yet
GMU Econ535-Applied Econometrics Problem Set2 (PS2) Solutions Spring 2024
14 pages
GMU Econ535-Applied Econometrics Problem Set3 (PS3) Solutions Spring 2024
No ratings yet
GMU Econ535-Applied Econometrics Problem Set3 (PS3) Solutions Spring 2024
15 pages
All As 525 v2
No ratings yet
All As 525 v2
10 pages
Data Preprocessing
No ratings yet
Data Preprocessing
27 pages
Life Expectancy at BirthJ Total Years
No ratings yet
Life Expectancy at BirthJ Total Years
3 pages
Heckman Selection Model
No ratings yet
Heckman Selection Model
9 pages
Centeno - Alexander PSET2 LBYMET2 Final
No ratings yet
Centeno - Alexander PSET2 LBYMET2 Final
11 pages
Predictive Modelling - Logistic Regression - Mentor Version-1 - Jupyter Notebook
No ratings yet
Predictive Modelling - Logistic Regression - Mentor Version-1 - Jupyter Notebook
22 pages
Assignment 3 FINAL
No ratings yet
Assignment 3 FINAL
3 pages
Intro LOGIT
No ratings yet
Intro LOGIT
46 pages
TITLE: Bank Marketing Classification: Submitted To: Dr. Supriya Kumar de Professor XLRI, Jamshedpur
No ratings yet
TITLE: Bank Marketing Classification: Submitted To: Dr. Supriya Kumar de Professor XLRI, Jamshedpur
18 pages
ch4 Dummy
No ratings yet
ch4 Dummy
54 pages
PA Data Prep Solution
No ratings yet
PA Data Prep Solution
5 pages
Data_Analysis_Report
No ratings yet
Data_Analysis_Report
16 pages
Stata output Logit
No ratings yet
Stata output Logit
3 pages
BA Project - Section 1 Group 1
No ratings yet
BA Project - Section 1 Group 1
27 pages
Dummy Variable Regression Models
No ratings yet
Dummy Variable Regression Models
48 pages
Results 1
No ratings yet
Results 1
4 pages
Assignment
No ratings yet
Assignment
9 pages
Building Logistic regression model in python
No ratings yet
Building Logistic regression model in python
24 pages
Machine Learning Project
67% (3)
Machine Learning Project
30 pages
Econometrics 7
No ratings yet
Econometrics 7
49 pages
Trackpad Information Technology for Class 10: CODE 402 | Skill Education, Based on Windows & OpenOffice
From Everand
Trackpad Information Technology for Class 10: CODE 402 | Skill Education, Based on Windows & OpenOffice
Shalini Harisukh
No ratings yet
Touchpad Plus Ver. 1.1 Class 7: Windows 7 & MS Office 2010
From Everand
Touchpad Plus Ver. 1.1 Class 7: Windows 7 & MS Office 2010
Nisha Batra
No ratings yet
Digital Skills for Agile Business Analysis
From Everand
Digital Skills for Agile Business Analysis
Tj. Blake Williams
No ratings yet
yt’s Data Protection Governance Framework – Volume 2
From Everand
yt’s Data Protection Governance Framework – Volume 2
Yang Yen Thaw
No ratings yet
Ogunsola 2023 Employee Selection Process - An Approach For Effective Organizational Performance
No ratings yet
Ogunsola 2023 Employee Selection Process - An Approach For Effective Organizational Performance
9 pages
ShowTime OTT Business report
No ratings yet
ShowTime OTT Business report
17 pages
The Effects of Brand Equity Dimensions On Con-Sumer Purchase Intention in A Case of Kaldi's Coffee PDF
No ratings yet
The Effects of Brand Equity Dimensions On Con-Sumer Purchase Intention in A Case of Kaldi's Coffee PDF
98 pages
The Effect of Relationship Marketing On Customer Retention
No ratings yet
The Effect of Relationship Marketing On Customer Retention
15 pages
Multicolnearity 2
No ratings yet
Multicolnearity 2
28 pages
Regression Cheat Sheet
No ratings yet
Regression Cheat Sheet
6 pages
Linear Regression - Cheatsheet
No ratings yet
Linear Regression - Cheatsheet
8 pages
Research On The Impact of Geopolitical Instability On Russian Trade
100% (1)
Research On The Impact of Geopolitical Instability On Russian Trade
15 pages
Session on Multicollinearity
No ratings yet
Session on Multicollinearity
11 pages
Revue VIETNAM
No ratings yet
Revue VIETNAM
14 pages
JURNAL Nailatuseng
No ratings yet
JURNAL Nailatuseng
16 pages
Safety Since
No ratings yet
Safety Since
11 pages
Artikel - Ni Putu Novita Verdyana Putri
No ratings yet
Artikel - Ni Putu Novita Verdyana Putri
19 pages
Laptop Price Predicton Report
No ratings yet
Laptop Price Predicton Report
30 pages
Corporate Governance and National Culture: A Multi-Country Study
No ratings yet
Corporate Governance and National Culture: A Multi-Country Study
16 pages
Theinfluenceoflearningvalueonlearningmanagementsystemuse Anextensionof UTAUT2
100% (1)
Theinfluenceoflearningvalueonlearningmanagementsystemuse Anextensionof UTAUT2
17 pages
Multicollinearity and Regression Analysis
No ratings yet
Multicollinearity and Regression Analysis
12 pages
Impacts of Building Environment and Urban Green SP
No ratings yet
Impacts of Building Environment and Urban Green SP
13 pages
Business analytics Viva questions
No ratings yet
Business analytics Viva questions
9 pages
Paper CMBN
No ratings yet
Paper CMBN
30 pages
01 - Notes - Chapter 9
No ratings yet
01 - Notes - Chapter 9
31 pages
049 Stat 326 Regression Final Paper
No ratings yet
049 Stat 326 Regression Final Paper
17 pages
The Value Relevance of Accounting Disclosures Among Listed Nigerian Firms: IFRS Adoption
No ratings yet
The Value Relevance of Accounting Disclosures Among Listed Nigerian Firms: IFRS Adoption
34 pages
Linear - Regression & Evaluation Metrics
No ratings yet
Linear - Regression & Evaluation Metrics
31 pages
Muket Awoke Final Research
No ratings yet
Muket Awoke Final Research
60 pages
Bekelech
No ratings yet
Bekelech
43 pages
Lambert Et Al., 2019
No ratings yet
Lambert Et Al., 2019
9 pages
Profitability Matrix of Standalone Health Insurance Companies in India
No ratings yet
Profitability Matrix of Standalone Health Insurance Companies in India
9 pages
02 - Multivariate - Multiple Regression Analysis With Excel - RVSD
No ratings yet
02 - Multivariate - Multiple Regression Analysis With Excel - RVSD
82 pages
economies-12-00209
No ratings yet
economies-12-00209
16 pages

Code PLFS MVPA

Uploaded by

Code PLFS MVPA

Uploaded by

4/2/24, 2:43 PM PLFS_MVPA

In [2]: #List the columns in the dataset

In [3]: # Data cleaning step - 2

df['Sector'] = df['Sector'].apply(lambda x: 1 if x == 2 else 0)

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 1/12

In [5]: # Data Cleaning - Step 3

df['Religion'] = df['Religion'].apply(lambda x: 1 if x == 1 else 0)

In [7]: # Data Cleaning - Step 4

df['Social Group'] = df['Social Group'].apply(lambda x: 1 if x == 9 else 0)

In [8]: # Data Cleaning - Step 5

df['Sex'] = df['Sex'].apply(lambda x: 1 if x == 1 else 0)

In [9]: # Data Cleaning - Step 6

df['General Education'] = df['General Education'].apply(lambda x: 0 if x in (1,2,3,4,5,6,7,8,10) else 1)

In [10]: # Data Cleaning - Step 7

df['Technical Education'] = df['Technical Education'].apply(lambda x: 0 if x == 1 else 1)

In [11]: # Data Cleaning - Step 8

In [12]: # Data Cleaning - Step 9

In [13]: # Data Cleaning - Step 10

df['Type of Job Contract'] = df['Type of Job Contract'].apply(lambda x: 0 if x == 1 else 1)

In [14]: # Data Cleaning - Step 11

df['Marital Status'] = df['Marital Status'].apply(lambda x: 1 if x == 2 else 0)

In [15]: # Data Cleaning - Step 12

df['log_sal'] = np.log(df['Earning for Regular Salaried/ Wage Workers'] + epsilon)

In [16]: # Data Cleaning - Step 13

In [17]: #List all the final columns in the dataset

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 3/12

Index(['Sector', 'State', 'Religion', 'Social Group', 'Sex', 'Age',

In [18]: # Data Cleaning - Step 14

df_sal = df[df['Earning for Regular Salaried/ Wage Workers'] > 0]

In [19]: #Step - 15 - Regression Model

# Define the independent variables

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 4/12

# Add a constant to the independent variables

# Define the target variable :

# Fit the linear regression model

# Print the model summary

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 5/12

OLS Regression Results

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 6/12

from statsmodels.stats.outliers_influence import variance_inflation_factor

# Define independent variables used in the regression model

# Add a constant column for intercept (necessary for VIF calculation)

# Calculate VIF for each independent variable

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 7/12

# Define the independent variables

# Add a constant to the independent variables

# Define the target variable :

# Fit the linear regression model

# Print the model summary

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 9/12

OLS Regression Results

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 10/12

from statsmodels.stats.outliers_influence import variance_inflation_factor

# Define independent variables used in the regression model

# Add a constant column for intercept (necessary for VIF calculation)

# Calculate VIF for each independent variable

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 11/12

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 12/12

You might also like