Final Report Econometrics
Final Report Econometrics
Subject: Econometrics
PROJECT 1
Group : 1
I. Introduction ...................................................................................................................... 4
2
List of figures
Figure 4. The result of Ramsey RESET test for omitted variables ..................................... 14
Figure 9. Test the alternative hypothesis that the effect of education on income is larger
Figure 10. Quadratic relationship between age and log of income ..................................... 22
List of tables
Table 1. Definition of variables in dataset............................................................................. 7
3
I. Introduction
Vietnam has more than 99.2 million people, accounting for 1.23% of world's population
Northern midlands and mountainous regions and Central Highlands account for only 14.4% of
the total population (PRESS RELEASE RESULTS OF THE 2019 POPULATION AND
HOUSING CENSUS). In addition, the GDP of entire Northern Midlands and Mountains region
in 2018 reached 43.72 million VND/person/year, poverty rate in 2015 was three times higher
than the national average, and some places are at risk of falling back into poverty (4.17%)
(Chung, Dung, & Duy, 2015). Furthermore, household income in the Northwest region is very
low, about 2,604,000 VND/person/month, compared to the average income of other countries
people here, creating a huge economic gap over the years. In the world and Vietnam, there have
been many studies on factors affecting household income. Research results show that identifying
improve household income and living standards. Therefore, the study was created for purpose of
4
2. Objectives of research
The study examines factors affecting household income in Northern Midlands and
Mountains of Vietnam. It focuses on factors such as age, gender, education, etc. measured using
econometric models. The study also examined unbiased coefficient estimations and MLR
requirements. The economic differences between various family groupings were investigated.
5
II. Literature review
Household income is influenced by main factors such as household size, age, and gender of
household members, household composition, health, education, resources, social capital and
assets, employment, and many other factors. There are also other community factors such as
weather, prices and infrastructure (Benin and Randriamamonjy, 2008). In Vietnam, research on
farm household income is mainly concentrated in small areas such as a district or province, with
few studies in large areas, especially Northwest provinces - where the economy and income are
low in Vietnam.
Research by Tran Quang Tuyen (2015) on factors affecting income of ethnic minority
households in Northwest region, shows that education level, non-agricultural employment, land
size Belt along with other factors such as transportation system, post office and job opportunities
Many studies have shown that there is a positive correlation between education level and
income, especially in rural areas. Individuals with higher levels of education tend to earn higher
incomes compared to those with lower levels of education (Duncan, Yeung, Brooks-Tunn and
Smith, 1998).
Age and gender are also important factors to consider. According to Tax Foundation,
income tends to increase with age as people accumulate more experience and then decreases
slightly as taxpayers enter retirement in an inverted U-shape. Further, female household heads
typically earn less than their male counterparts due to existing socio-cultural tendencies and
potential obstacles related to their limited access to resources or employment suited for revenue
creation. Household income level can be influenced not only by individual factors such as
ethnicity or marital status, but also by family groupings (Barnard & Turner, 2011).
6
III. Dataset description
The data has been collected and extracted from an available dataset that has been
various factors, a sample included 7417 households. Based on the data, we draw a figure about
factors that affect the per capita income of the household and their definition.
Variable Explain
7
provinces)
8
IV. Research design
1. Economic model
This economic model delineates several variables that impact monthly per capita income.
Income = f(age, edu, gender, married status, ethnicity, household size, dependency ratio,
Age +/-
Edu +
Gender +/-
Ethnicity +/-
Household size -
Dependency ratio -
Annual cropland +
Perennial cropland +
Forestland +
Garden land +
Province +/-
Urban +/-
9
2. Econometric model
Because the dependent variable (household income per capita) is a continuous variable,
econometric models using ordinary least squares were used in the study (Tuyen, 2015). The
regression models were used to analyze relationships between per capita household income and
various explanatory variables (Tuyen, 2015). In this research, STATA will be used for
Independent variables:
(M.Wooldridge, p. 194) .
province, that is, the province against which comparisons are made (M.Wooldridge, p.
230). The model will present 13 provinces remaining, allowing us to evaluate impact of
- Remaining variables: edu, gender, marred status, household size, dependency ratio,
10
2.2. Multiple regression model
̂ )=
Log(𝑖𝑛𝑐𝑜𝑚𝑒 6.549 + β1*province + 0.365*ethnicity + 0.475*urban -
11
0.058*edu - 0.061*hhsize - 0.448*dep_ratio - 0.034*log_aland + 0.0012*log_pland +
0.0033*log_fland + 0.0018*log_gland
We regress the model to obtain following outcomes. Evaluating and testing assumptions is
a crucial task for researchers using multiple regression or any statistical technique. Significant
First, multiple linear regression requires the relationship between the independent and
dependent variables to be linear. βj appears with a power of 1 only and is not multiplied or
12
The data is a random sample drawn from the population. It means that each observation
has an equal chance of being included in the sample. This is a dataset of random 7417
households in 14 provinces of Northern Midlands and Mountains in Vietnam, which enables our
a regression model. Perfect multicollinearity occurs when there is an exact linear relationship
between the independent variables (Assumptions of Multiple Linear Regression, 2019). This
phenomenon can have an impact on the estimation of coefficients in the regression model,
making the parameters inaccurate or unreliable. Under the influence of perfect multicollinearity,
the estimates of the coefficients can become extremely sensitive to small changes in the data.
13
To identify potential signs of multicollinearity, a correlation matrix analysis and a study of
the variance inflation factor (VIF) were conducted. The results indicate that this model has no
issues with multicollinearity, as VIFs are less than 10, so our model satisfies MLR3.
The assumption regarding the expectation of the error term states that at each value of the
explanatory variables in the model, it should have a mean of zero 𝐸(𝑢|𝑥) = 𝐸(𝑢) = 0. It is
expressed as the condition that the independent variable (x) does not contain any information
about the mean of the unobserved factors (u). This assumption also requires that x and u have no
relationship, either linear or nonlinear. This is one of the most crucial, abstract, and easily
violated assumptions in model construction. Moreover, this assumption plays a crucial role in
unbiasedness of the estimates from a sample to the population. One way that Assumption MLR.4
can fail is if omitting an important factor that is correlated with any of x1, x2, ... We use Ramsey
The outcome (p-value = 0.0000) demonstrates that the model has omitted variables, which
is one of the reasons why the 4th assumption fails. We tried to achieve this assumption by adding
age2, ethnicity-urban interaction term to model, however, the p-value (0.0314) was still less than
0.05.
14
If satisfying the first four assumptions (MLR.1-MLR.4) mentioned above, the estimation
for a research sample ensures unbiasedness, i.e., 𝐸(̂ ) = . However, in practice, the MLR.4
assumption is often difficult to achieve and our model fail it too (M.Wooldridge, p. 86). This
assumption is replaced by one more attainable assumption, referred to as MLR.4', if the model
The covariance or linear correlation between u and x is zero Cov (x,u) = 0. In that case,
replacing the MLR.4 assumption with MLR.4' results in consistent estimates but they may still
Ensuring assumptions MLR.1-MLR.5, the estimates are called BLUE (Best Linear
Unbiased Estimators). At this point, the errors have the smallest value, making these estimates
the most efficient among other linear estimation methods. Assumption 5 in multiple regression
analysis implies that the variance of u should remain constant for all values of the independent
variables
𝑉𝑎𝑟(𝑢𝑖 |𝑥𝑖 ) = 𝜎 2
15
Figure 6. The result of White test for heteroscedasticity
Using White test for heteroscedasticity, the results with p-value = 0.0000 show that there is
a phenomena where the error variance changes. It is vital to handle this problem when the
heteroscedasticity:
1. Log-transformation.
The concepts of consistency and unbiasedness are both related to repeatedly sampling in
the estimation process. To address the issue of potential variability in estimates with each
sample, Assumption 6, commonly known as the asymptotic requirement, is needed to satisfy the
t-test. The population error u is independent of the explanatory variables x1, x2, …, xk and is
normally distributed with zero mean and variance 𝑢~𝑁(0; 1) (M.Wooldridge, p. 118).
We use the Jarque-Bera test to check whether a dataset follows a normal distribution. The
regression model of income shows that the distribution of residuals is not normal, but the
16
Figure 7. The normal distribution of residuals of log_income regression
17
V. Results and discussion
Holding other factors fixed, compared with ethnic minority households in rural areas (base
group), the gap in predicted income is about 78.3 % higher than for Kinh households in urban
areas, 47,5% higher than for ethnic minority households in urban areas and 36.5% higher than
What is the proportional differential in income between the base province (Ha Giang) and
other provinces?
In the multiple regression model, Ha Giang was designated as the base group for the
province variable, facilitating a comparison of income levels with 13 other provinces in the
dataset. The results indicate significant differences in income levels among the various
provinces.
From the regression table, we can see that most provinces have a smaller income level than
Ha Giang, only 5 provinces have a higher income: Tuyen Quang, Lao Cai, Hoa Binh, Thai
Nguyen, Bac Giang. Bac Giang leads the gap in predicted income with 15,37%, followed
immediately by Lao Cai and Tuyen Quang with 6.05% and 2.62%. Although Hoa Binh and Thai
On the contrary, we can clearly see that there are 5 provinces with predicted income levels
lower than Ha Giang: Cao Bang, Bac Kan, Dien Bien, Son La and Phu Tho with figures of
18
25.35%, 17.13%, 16.57%, 17.99% and 13.65% respectively. The remaining three provinces, Lai
Chau, Yen Bai and Lang Son, have negligible differences in income levels compared to Ha
Giang.
log_pland, log_fland and log_gland have p-values of 0.6, 0.091 and 0.522 respectively. If
we take the significance level of 0.05, these three variables are considered to be not statistically
significant. This implies that the dimensions of land types exhibit no correlation with household
income, with the exception of annual cropland. A 10% increase in the size of annual cropland is
How do you quantify and compare the relative importance of each individual explanatory
We use “beta coefficients”, which measure the effects of the independent variables on the
dependent variable in standard deviation units. The beta coefficients are obtained from a standard
OLS regression after the dependent and independent variables have been transformed into z-
scores (M.Wooldridge, p. 216). The result allows us to compare the strength of the effect of each
individual explanatory variable on the dependent variable. The greater the absolute magnitude of
The variables with the greatest effect are: ethnicity (0.232), edu (0.282), dep_ratio (-0.199)
and log_aland (-0.157). For example, if edu increases by one standard deviation,
log ̂
𝑖𝑛𝑐𝑜𝑚𝑒 changes by 0.282 standard deviations. On the contrary, the least important variables
are: gender (-0.013), log_pland (0.008), log_fland (0.016) and log_gland (0.004).
19
Figure 8. Beta coefficient
Test the alternative hypothesis that the effect of education on income is larger than 5% and
7%.
The hypothesis pair under consideration is delineated as follows: H_0: The effect
ofeducation on income equals 5%, while H_A: The null hypothesis is rejected. Utilizing a right
tail t-test, it was determined that the calculated p-value approximates 0.002241, not exceeding
the designated significance level of 0.05 (or 5%). Therefore, based on the obtained results, it is
feasible to assert, with a 5% significance level, that the effect of education on income is larger
than 5%.
About other test hypotheses, H_0 is the effect of education on income equals 7% and H_A
is the effect of education on income that is greater than 7%. Using the t-test of the right tail, it
was determined the p-value is very large 0.9998391. Therefore, the null hypothesis do not
rejected and there is not enough evidence to conclude that at the significance level of 0.05 (5%),
20
Figure 9. Test the alternative hypothesis that the effect of education on income is larger than 5% and 7%
Test whether or not there is a quadratic relationship between age and the log of income.
“utest” is used after estimation commands to test for the presence of a U-shaped or inverse
U-shaped relationship between an explanatory variable and the outcome variable (Jo Thori
Lind).
We reject H0 if p-value < 0.05. Because p-value = 1.26*10^(-37) < 0.05, we reject H0.
The extreme point is 60. It means that in the age interval [19, 60], each year an increase of
age of the household head leads to increase in monthly household income per capita. After age
60, the increase in household head age and monthly income have an inverse relationship. This
indicates a negative impact of aging on household income, possibly due to reduced working
Higher income is associated with social participation over time. Social participation is less
likely among older aged and those living in rural areas (Zeyun Feng, 2020). It means that older
people tend to have less social participation, and social participation is associated with higher
income, so after age 60, a decrease in household income is completely reasonable. Another
reason that may contribute to this argument is the retirement age in Vietnam. Law on Social
Insurance 2006: Conditions for receiving retirement benefits are that employees have paid social
21
insurance for 20 years or more and have one of the following conditions: Men are 60 years old,
Use Stata to draw graphs showing the quadratic relationship between age and the log of
income.
22
VI. Conclusion and policy recommendations
The objective of this paper is to investigate the factors influencing household income in the
Northern Midlands and Mountains, Vietnam. Using the dataset provided by Ph.D. Tran Quang
Tuyen, Master Le Van Dao, this research provides the initial findings on the determinants of
household income in the most economically deprived area of Vietnam. From insight drawn from
the study, we propose a few policy recommendations to help the state more easily implement
policies aimed at the economy of the Northern Midlands and Mountains provinces.
The results of the study indicate that education has the strongest positive influence on
monthly household income. People with higher education levels will have higher incomes. The
Northern Midlands and Mountains region is a very specific geographical area. It has complex
mountainous terrain, accounting for nearly 1/3 of the country's area with about 14.7 million
people living, of which ethnic minorities account for nearly 50% of the population (Huyền,
2023). The average population density is also quite low, 50 - 100 people/km2. Therefore,
building infrastructure for schools and universalizing basic education in this place is very
difficult. People's access to education depends on many factors such as ethnicity, province,
distance from home to school, etc. We suggest that the government prioritize the development of
infrastructure and implement suitable policies to facilitate and support children's access to
education. Furthermore, the regime for teachers working in difficult areas also needs to be
improved.
Ethnicity ranks second only to education in relationship with household income. The
government has many policies to gradually narrow the gap in living standards and income of
ethnic minorities and mountainous areas compared to the national average (Dương, 2020). We
23
believe that these policies demonstrate considerable effectiveness, yet the crucial aspect lies in
Land variables, including Log_pland, Log_fland, and Log_gland, have no clear impact on
household income compared to log_aland. That means having more or less land for perennial
crops, industrial land and garden land does not affect income. Therefore, we can see that a large
amount of land is being used inefficiently, bringing low value to the owner. The government
To diminish dependency ratio and facilitate income growth, it is essential to devise and
implement supportive policies in a thoughtful manner. Key strategies include the implementation
of family planning policies to control birth rates and manage population growth, the
contraceptives.
24
VII. References
statisticssolutions: https://www.statisticssolutions.com/free-resources/directory-of-
statistical-analyses/assumptions-of-multiple-linear-regression/
Chung, D. K., Dung, T. K., & Duy, L. V. (2015). INFLUENCE OF SOME FACTORS ON
https://www.researchgate.net/publication/332291011_ANH_HUONG_CUA_MOT_SO_
YEU_TO_DEN_GIAM_NGHEO_O_VUNG_TAY_BAC
Dương, T. (2020, June 22). Focus on socio-economic development in ethnic minority and
https://mof.gov.vn/webcenter/portal/vclvcstc/pages_r/l/chi-tiet-
tin?dDocName=MOFUCM178315
Hương, T. L. (2018, April 03). Is there a request for equipment to increase the retirement age?
Huyền, T. (2023, February 13). Removing difficulties and obstacles in implementing national
target programs in the Northern Midlands and Mountainous areas. Retrieved from Tin
Thời sự.
http://fmwww.bc.edu/RePEc/bocode/u/utest.html
https://scholarworks.umass.edu/pare/vol18/iss1/11/
25
PRESS RELEASE RESULTS OF THE 2019 POPULATION AND HOUSING CENSUS. (n.d.).
bao-chi-ket-qua-tong-dieu-tra-dan-so-va-nha-o-nam-2019/
https://www.researchgate.net/publication/279298912_Socio-
Economic_Determinants_of_Household_Income_among_Ethnic_Minorities_in_the_Nort
h-West_Mountains_Vietnam
population/vietnam-population/
Zeyun Feng, e. a. (2020). The longitudinal relationship between income and social participation
https://www.sciencedirect.com/science/article/pii/S2352827320302731
26