0% found this document useful (0 votes)
171 views26 pages

Final Report Econometrics

Econometrics
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
171 views26 pages

Final Report Econometrics

Econometrics
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

REPORT FINAL EXAMINATION

Subject: Econometrics
PROJECT 1

Class : INS304902 Semester 1 (2023-2024)

Lecturer : TS. Tran Quang Tuyen

: Ths. Le Van Dao

Group : 1

Member : Nguyen Tuan Dat - 21070812

Nguyen Thi Lan Anh - 21070572

Nguyen Minh Chau - 21070197

Le Dinh Huong Giang - 21070185


Table of Contents

I. Introduction ...................................................................................................................... 4

1. Background and motivation .......................................................................................... 4

2. Objectives of research ................................................................................................... 5

II. Literature review ............................................................................................................ 6

III. Dataset description ....................................................................................................... 7

IV. Research design.............................................................................................................. 9

1. Economic model ............................................................................................................ 9

2. Econometric model ..................................................................................................... 10

2.1. Functional form................................................................................................... 10

2.2. Multiple regression model ....................................................................................11

2.3. Multiple regression assumptions ......................................................................... 12

V. Results and discussion .................................................................................................. 18

VI. Conclusion and policy recommendations ................................................................. 23

VII. References .................................................................................................................. 25

2
List of figures

Figure 1. Multiple regression model ................................................................................... 11

Figure 2. Multiple regression model of income .................................................................. 12

Figure 3. The results of VIF test for collinearity ................................................................. 13

Figure 4. The result of Ramsey RESET test for omitted variables ..................................... 14

Figure 5. The correlation between residual and explanatory variables ............................... 15

Figure 6. The result of White test for heteroscedasticity..................................................... 16

Figure 7. The normal distribution of residuals of log_income regression .......................... 17

Figure 8. Beta coefficient .................................................................................................... 20

Figure 9. Test the alternative hypothesis that the effect of education on income is larger

than 5% and 7% ............................................................................................................................ 21

Figure 10. Quadratic relationship between age and log of income ..................................... 22

List of tables
Table 1. Definition of variables in dataset............................................................................. 7

Table 2. Expected sign of explanatory variables ................................................................... 9

3
I. Introduction

1. Background and motivation

Vietnam has more than 99.2 million people, accounting for 1.23% of world's population

(vietnam population). With 54 ethnic groups distributed in 6 different economic regions,

Northern midlands and mountainous regions and Central Highlands account for only 14.4% of

the total population (PRESS RELEASE RESULTS OF THE 2019 POPULATION AND

HOUSING CENSUS). In addition, the GDP of entire Northern Midlands and Mountains region

in 2018 reached 43.72 million VND/person/year, poverty rate in 2015 was three times higher

than the national average, and some places are at risk of falling back into poverty (4.17%)

(Chung, Dung, & Duy, 2015). Furthermore, household income in the Northwest region is very

low, about 2,604,000 VND/person/month, compared to the average income of other countries

which is 4,295,000 VND/person/month. Obviously, there is a difference in income sources of

people here, creating a huge economic gap over the years. In the world and Vietnam, there have

been many studies on factors affecting household income. Research results show that identifying

factors affecting farming households plays an important role in proposing recommendations to

improve household income and living standards. Therefore, the study was created for purpose of

investigating factors affecting household income of ethnic minorities in Vietnam.

4
2. Objectives of research

The study examines factors affecting household income in Northern Midlands and

Mountains of Vietnam. It focuses on factors such as age, gender, education, etc. measured using

econometric models. The study also examined unbiased coefficient estimations and MLR

requirements. The economic differences between various family groupings were investigated.

Based on empirical evidence, insights and policy recommendations to promote inclusive

economic growth and reduce income disparity in region.

5
II. Literature review

Household income is influenced by main factors such as household size, age, and gender of

household members, household composition, health, education, resources, social capital and

assets, employment, and many other factors. There are also other community factors such as

weather, prices and infrastructure (Benin and Randriamamonjy, 2008). In Vietnam, research on

farm household income is mainly concentrated in small areas such as a district or province, with

few studies in large areas, especially Northwest provinces - where the economy and income are

low in Vietnam.

Research by Tran Quang Tuyen (2015) on factors affecting income of ethnic minority

households in Northwest region, shows that education level, non-agricultural employment, land

size Belt along with other factors such as transportation system, post office and job opportunities

in non-agricultural sectors affect household income.

Many studies have shown that there is a positive correlation between education level and

income, especially in rural areas. Individuals with higher levels of education tend to earn higher

incomes compared to those with lower levels of education (Duncan, Yeung, Brooks-Tunn and

Smith, 1998).

Age and gender are also important factors to consider. According to Tax Foundation,

income tends to increase with age as people accumulate more experience and then decreases

slightly as taxpayers enter retirement in an inverted U-shape. Further, female household heads

typically earn less than their male counterparts due to existing socio-cultural tendencies and

potential obstacles related to their limited access to resources or employment suited for revenue

creation. Household income level can be influenced not only by individual factors such as

ethnicity or marital status, but also by family groupings (Barnard & Turner, 2011).

6
III. Dataset description

The data has been collected and extracted from an available dataset that has been

implemented in 14 provinces located in Northern Midlands and Mountains in Vietnam with

various factors, a sample included 7417 households. Based on the data, we draw a figure about

factors that affect the per capita income of the household and their definition.

Table 1. Definition of variables in dataset

Variable Explain

Explanatory age Age of household head (years)


varriables

edu The number of schooling of the household head (years)

gender Gender of household head: “1 = male; 0 = female”

married Marital status of household head: “1=married; 0=otherwise”

ethnicity Ethnicity of household head: “1= Kinh; 0 = ethnic minority”

hhsize Total household members (persons)

dep_ratio Dependency ratio is calculated by dividing the number of


dependents by household size

log_aland The natural log of the size of annual cropland

log_pland The natural log of the size of perennial cropland

log_fland The natural log of the size of forestland

log_gland The natural log of the size of garden land

province A categorical variable including 14 categories (name of

7
provinces)

urban “1=living in urban areas; 0=living in rural areas”

Dependent income Household income per capita/month (thousand VND)


varriable

8
IV. Research design

1. Economic model

This economic model delineates several variables that impact monthly per capita income.

Income = f(age, edu, gender, married status, ethnicity, household size, dependency ratio,

annual cropland, perennial cropland, forestland, garden land, province, urban)

Table 2. Expected sign of explanatory variables

Explanatory variables Expected signs

Age +/-

Edu +

Gender +/-

Married status +/-

Ethnicity +/-

Household size -

Dependency ratio -

Annual cropland +

Perennial cropland +

Forestland +

Garden land +

Province +/-

Urban +/-

9
2. Econometric model

Because the dependent variable (household income per capita) is a continuous variable,

econometric models using ordinary least squares were used in the study (Tuyen, 2015). The

regression models were used to analyze relationships between per capita household income and

various explanatory variables (Tuyen, 2015). In this research, STATA will be used for

calculating model coefficients, generating graphs, testing hypotheses, predicting some

parameters, and more.

2.1. Functional form

Dependent variable: log of household income per capita monthly

Independent variables:

- Quadratic function of age (capture decreasing or increasing marginal effects)

(M.Wooldridge, p. 194) .

- In province variable, we have chosen Ha Giang to be base province or benchmark

province, that is, the province against which comparisons are made (M.Wooldridge, p.

230). The model will present 13 provinces remaining, allowing us to evaluate impact of

each province on per capita income in comparison to Ha Giang.

- Ethnicity-urban interaction term

- Remaining variables: edu, gender, marred status, household size, dependency ratio,

annual cropland, perennial cropland, forestland, garden land

10
2.2. Multiple regression model

Figure 1. Multiple regression model

Sample linear regression function (SRF) is as follows:

̂ )=
Log(𝑖𝑛𝑐𝑜𝑚𝑒 6.549 + β1*province + 0.365*ethnicity + 0.475*urban -

0.332*ethnicity*urban - 0.0313*gender + 0.0334*age - 0.0002659*age2 + 0.059*married +

11
0.058*edu - 0.061*hhsize - 0.448*dep_ratio - 0.034*log_aland + 0.0012*log_pland +

0.0033*log_fland + 0.0018*log_gland

2.3. Multiple regression assumptions

We regress the model to obtain following outcomes. Evaluating and testing assumptions is

a crucial task for researchers using multiple regression or any statistical technique. Significant

Sassessments of precision of regression coefficients. This part will provide an overview of

necessary assumptions for multiple regression model.

Figure 2. Multiple regression model of income

A. Assumption MLR1 (Linear in parameters)

First, multiple linear regression requires the relationship between the independent and

dependent variables to be linear. βj appears with a power of 1 only and is not multiplied or

divided by any other parameter.

B. Assumption MLR2 (Random sampling)

12
The data is a random sample drawn from the population. It means that each observation

has an equal chance of being included in the sample. This is a dataset of random 7417

households in 14 provinces of Northern Midlands and Mountains in Vietnam, which enables our

model to satisfy Assumption MLR2.

C. Assumption MLR3 (No perfect collinearity)

Multicollinearity refers to a high correlation between two or more independent variables in

a regression model. Perfect multicollinearity occurs when there is an exact linear relationship

between the independent variables (Assumptions of Multiple Linear Regression, 2019). This

phenomenon can have an impact on the estimation of coefficients in the regression model,

making the parameters inaccurate or unreliable. Under the influence of perfect multicollinearity,

the estimates of the coefficients can become extremely sensitive to small changes in the data.

Figure 3. The results of VIF test for collinearity

13
To identify potential signs of multicollinearity, a correlation matrix analysis and a study of

the variance inflation factor (VIF) were conducted. The results indicate that this model has no

issues with multicollinearity, as VIFs are less than 10, so our model satisfies MLR3.

D. MLR4 (Zero conditional mean)

The assumption regarding the expectation of the error term states that at each value of the

explanatory variables in the model, it should have a mean of zero 𝐸(𝑢|𝑥) = 𝐸(𝑢) = 0. It is

expressed as the condition that the independent variable (x) does not contain any information

about the mean of the unobserved factors (u). This assumption also requires that x and u have no

relationship, either linear or nonlinear. This is one of the most crucial, abstract, and easily

violated assumptions in model construction. Moreover, this assumption plays a crucial role in

unbiasedness of the estimates from a sample to the population. One way that Assumption MLR.4

can fail is if omitting an important factor that is correlated with any of x1, x2, ... We use Ramsey

RESET test to detect potential functional form misspecification in regression models.

Figure 4. The result of Ramsey RESET test for omitted variables

The outcome (p-value = 0.0000) demonstrates that the model has omitted variables, which

is one of the reasons why the 4th assumption fails. We tried to achieve this assumption by adding

age2, ethnicity-urban interaction term to model, however, the p-value (0.0314) was still less than

0.05.

14
If satisfying the first four assumptions (MLR.1-MLR.4) mentioned above, the estimation

for a research sample ensures unbiasedness, i.e., 𝐸(̂ ) = . However, in practice, the MLR.4

assumption is often difficult to achieve and our model fail it too (M.Wooldridge, p. 86). This

assumption is replaced by one more attainable assumption, referred to as MLR.4', if the model

satisfies this weaker assumption, it could be consistent.

E. Assumption MRL4’ (Zero correlation) (Matt N. Williams, 2013)

The covariance or linear correlation between u and x is zero Cov (x,u) = 0. In that case,

replacing the MLR.4 assumption with MLR.4' results in consistent estimates but they may still

be biased, or mathematically lim ̂ 𝑗 = 𝑗


𝑛→+

Figure 5. The correlation between residual and explanatory variables

F. Assumption MLR5 (Homoskedasticity)

Ensuring assumptions MLR.1-MLR.5, the estimates are called BLUE (Best Linear

Unbiased Estimators). At this point, the errors have the smallest value, making these estimates

the most efficient among other linear estimation methods. Assumption 5 in multiple regression

analysis implies that the variance of u should remain constant for all values of the independent

variables

𝑉𝑎𝑟(𝑢𝑖 |𝑥𝑖 ) = 𝜎 2

15
Figure 6. The result of White test for heteroscedasticity

Using White test for heteroscedasticity, the results with p-value = 0.0000 show that there is

a phenomena where the error variance changes. It is vital to handle this problem when the

homoscedasticity assumption is violated. Here are several methods for handling

heteroscedasticity:

1. Log-transformation.

2. Using the “robust” option provided by Stata.

3. Using other linear regression estimators

G. Assumption MLR 6 (Normality)

The concepts of consistency and unbiasedness are both related to repeatedly sampling in

the estimation process. To address the issue of potential variability in estimates with each

sample, Assumption 6, commonly known as the asymptotic requirement, is needed to satisfy the

t-test. The population error u is independent of the explanatory variables x1, x2, …, xk and is

normally distributed with zero mean and variance 𝑢~𝑁(0; 1) (M.Wooldridge, p. 118).

We use the Jarque-Bera test to check whether a dataset follows a normal distribution. The

regression model of income shows that the distribution of residuals is not normal, but the

regression model of log_income meets the assumption of normality.

16
Figure 7. The normal distribution of residuals of log_income regression

17
V. Results and discussion

Group 1: Kinh households in urban areas

Group 2: Ethnic minority households in urban areas

Group 3: Kinh households in rural areas

Group 4: Ethnic minority households in rural areas (base group)

What is the proportional differential in income between these groups?

Holding other factors fixed, compared with ethnic minority households in rural areas (base

group), the gap in predicted income is about 78.3 % higher than for Kinh households in urban

areas, 47,5% higher than for ethnic minority households in urban areas and 36.5% higher than

for Kinh households in rural areas.

What is the proportional differential in income between the base province (Ha Giang) and

other provinces?

In the multiple regression model, Ha Giang was designated as the base group for the

province variable, facilitating a comparison of income levels with 13 other provinces in the

dataset. The results indicate significant differences in income levels among the various

provinces.

From the regression table, we can see that most provinces have a smaller income level than

Ha Giang, only 5 provinces have a higher income: Tuyen Quang, Lao Cai, Hoa Binh, Thai

Nguyen, Bac Giang. Bac Giang leads the gap in predicted income with 15,37%, followed

immediately by Lao Cai and Tuyen Quang with 6.05% and 2.62%. Although Hoa Binh and Thai

Nguyen have a larger predicted income, it is not significant.

On the contrary, we can clearly see that there are 5 provinces with predicted income levels

lower than Ha Giang: Cao Bang, Bac Kan, Dien Bien, Son La and Phu Tho with figures of

18
25.35%, 17.13%, 16.57%, 17.99% and 13.65% respectively. The remaining three provinces, Lai

Chau, Yen Bai and Lang Son, have negligible differences in income levels compared to Ha

Giang.

Interpret the effect of various types of land on income.

log_pland, log_fland and log_gland have p-values of 0.6, 0.091 and 0.522 respectively. If

we take the significance level of 0.05, these three variables are considered to be not statistically

significant. This implies that the dimensions of land types exhibit no correlation with household

income, with the exception of annual cropland. A 10% increase in the size of annual cropland is

associated with approximately a 0.34% decrease in household income.

How do you quantify and compare the relative importance of each individual explanatory

variable to the dependent variable (income)?

We use “beta coefficients”, which measure the effects of the independent variables on the

dependent variable in standard deviation units. The beta coefficients are obtained from a standard

OLS regression after the dependent and independent variables have been transformed into z-

scores (M.Wooldridge, p. 216). The result allows us to compare the strength of the effect of each

individual explanatory variable on the dependent variable. The greater the absolute magnitude of

the beta coefficient, the more significant the explanatory variable.

The variables with the greatest effect are: ethnicity (0.232), edu (0.282), dep_ratio (-0.199)

and log_aland (-0.157). For example, if edu increases by one standard deviation,

log ̂
𝑖𝑛𝑐𝑜𝑚𝑒 changes by 0.282 standard deviations. On the contrary, the least important variables

are: gender (-0.013), log_pland (0.008), log_fland (0.016) and log_gland (0.004).

19
Figure 8. Beta coefficient

Test the alternative hypothesis that the effect of education on income is larger than 5% and

7%.

The hypothesis pair under consideration is delineated as follows: H_0: The effect

ofeducation on income equals 5%, while H_A: The null hypothesis is rejected. Utilizing a right

tail t-test, it was determined that the calculated p-value approximates 0.002241, not exceeding

the designated significance level of 0.05 (or 5%). Therefore, based on the obtained results, it is

feasible to assert, with a 5% significance level, that the effect of education on income is larger

than 5%.

About other test hypotheses, H_0 is the effect of education on income equals 7% and H_A

is the effect of education on income that is greater than 7%. Using the t-test of the right tail, it

was determined the p-value is very large 0.9998391. Therefore, the null hypothesis do not

rejected and there is not enough evidence to conclude that at the significance level of 0.05 (5%),

the effect of education on income is greater than 7%.

20
Figure 9. Test the alternative hypothesis that the effect of education on income is larger than 5% and 7%

Test whether or not there is a quadratic relationship between age and the log of income.

“utest” is used after estimation commands to test for the presence of a U-shaped or inverse

U-shaped relationship between an explanatory variable and the outcome variable (Jo Thori

Lind).

We reject H0 if p-value < 0.05. Because p-value = 1.26*10^(-37) < 0.05, we reject H0.

Therefore, there is an inverse U shape: diminishing effect.

The extreme point is 60. It means that in the age interval [19, 60], each year an increase of

age of the household head leads to increase in monthly household income per capita. After age

60, the increase in household head age and monthly income have an inverse relationship. This

indicates a negative impact of aging on household income, possibly due to reduced working

capacity or changes in the way they manage their income sources.

Higher income is associated with social participation over time. Social participation is less

likely among older aged and those living in rural areas (Zeyun Feng, 2020). It means that older

people tend to have less social participation, and social participation is associated with higher

income, so after age 60, a decrease in household income is completely reasonable. Another

reason that may contribute to this argument is the retirement age in Vietnam. Law on Social

Insurance 2006: Conditions for receiving retirement benefits are that employees have paid social

21
insurance for 20 years or more and have one of the following conditions: Men are 60 years old,

women are 55 years old (Hương, 2018).

Use Stata to draw graphs showing the quadratic relationship between age and the log of

income.

Figure 10. Quadratic relationship between age and log of income

22
VI. Conclusion and policy recommendations

The objective of this paper is to investigate the factors influencing household income in the

Northern Midlands and Mountains, Vietnam. Using the dataset provided by Ph.D. Tran Quang

Tuyen, Master Le Van Dao, this research provides the initial findings on the determinants of

household income in the most economically deprived area of Vietnam. From insight drawn from

the study, we propose a few policy recommendations to help the state more easily implement

policies aimed at the economy of the Northern Midlands and Mountains provinces.

The results of the study indicate that education has the strongest positive influence on

monthly household income. People with higher education levels will have higher incomes. The

Northern Midlands and Mountains region is a very specific geographical area. It has complex

mountainous terrain, accounting for nearly 1/3 of the country's area with about 14.7 million

people living, of which ethnic minorities account for nearly 50% of the population (Huyền,

2023). The average population density is also quite low, 50 - 100 people/km2. Therefore,

building infrastructure for schools and universalizing basic education in this place is very

difficult. People's access to education depends on many factors such as ethnicity, province,

distance from home to school, etc. We suggest that the government prioritize the development of

infrastructure and implement suitable policies to facilitate and support children's access to

education. Furthermore, the regime for teachers working in difficult areas also needs to be

improved.

Ethnicity ranks second only to education in relationship with household income. The

government has many policies to gradually narrow the gap in living standards and income of

ethnic minorities and mountainous areas compared to the national average (Dương, 2020). We

23
believe that these policies demonstrate considerable effectiveness, yet the crucial aspect lies in

raising awareness among ethnic minorities.

Land variables, including Log_pland, Log_fland, and Log_gland, have no clear impact on

household income compared to log_aland. That means having more or less land for perennial

crops, industrial land and garden land does not affect income. Therefore, we can see that a large

amount of land is being used inefficiently, bringing low value to the owner. The government

should implement a more streamlined land management system.

To diminish dependency ratio and facilitate income growth, it is essential to devise and

implement supportive policies in a thoughtful manner. Key strategies include the implementation

of family planning policies to control birth rates and manage population growth, the

improvement of reproductive health education, and the provision of effective unintended

contraceptives.

24
VII. References

Assumptions of Multiple Linear Regression. (2019, March 21). Retrieved from

statisticssolutions: https://www.statisticssolutions.com/free-resources/directory-of-

statistical-analyses/assumptions-of-multiple-linear-regression/

Chung, D. K., Dung, T. K., & Duy, L. V. (2015). INFLUENCE OF SOME FACTORS ON

POVERTY REDUCTION IN THE NORTHWEST REGION. Retrieved from

https://www.researchgate.net/publication/332291011_ANH_HUONG_CUA_MOT_SO_

YEU_TO_DEN_GIAM_NGHEO_O_VUNG_TAY_BAC

Dương, T. (2020, June 22). Focus on socio-economic development in ethnic minority and

mountainous areas. Retrieved from NIF:

https://mof.gov.vn/webcenter/portal/vclvcstc/pages_r/l/chi-tiet-

tin?dDocName=MOFUCM178315

Hương, T. L. (2018, April 03). Is there a request for equipment to increase the retirement age?

Retrieved from kiemsat: https://kiemsat.vn/co-can-thiet-tang-tuoi-nghi-huu-49466.html

Huyền, T. (2023, February 13). Removing difficulties and obstacles in implementing national

target programs in the Northern Midlands and Mountainous areas. Retrieved from Tin

Thời sự.

Jo Thori Lind, H. M. (n.d.). utest. Retrieved from bocode:

http://fmwww.bc.edu/RePEc/bocode/u/utest.html

M.Wooldridge, J. (n.d.). Introduction economics.

Matt N. Williams, e. a. (2013). ASSUMPTIONS OF MULTIPLE REGRESSION: CORRECTING

TWO MISCONCEPTIONS. Retrieved from

https://scholarworks.umass.edu/pare/vol18/iss1/11/

25
PRESS RELEASE RESULTS OF THE 2019 POPULATION AND HOUSING CENSUS. (n.d.).

Retrieved from Tổng cục thống kê: https://www.gso.gov.vn/su-kien/2019/12/thong-cao-

bao-chi-ket-qua-tong-dieu-tra-dan-so-va-nha-o-nam-2019/

Tuyen, T. Q. (2015). Socio-Economic Determinants of Household Income among Ethnic

Minorities in the North-West Mountains, Vietnam. Retrieved from

https://www.researchgate.net/publication/279298912_Socio-

Economic_Determinants_of_Household_Income_among_Ethnic_Minorities_in_the_Nort

h-West_Mountains_Vietnam

vietnam population. (n.d.). Retrieved from worldometers: https://www.worldometers.info/world-

population/vietnam-population/

Zeyun Feng, e. a. (2020). The longitudinal relationship between income and social participation

among Chinese older people. Retrieved from

https://www.sciencedirect.com/science/article/pii/S2352827320302731

26

You might also like