Predicting Population Using Least Squares
Predicting Population Using Least Squares
*
Jason Dominic D Santos Institute of Mathematical Sciences and Physics, University of the Philippines-Los Baos
Abstract Philippines belongs to the Third World Countries or growing countries. Being a member of growing countries means that the country suffers from economic deprivation. Economy plays a major role in the development of the country and one of the major aspect that affect the Philippine economy is the population. Using the data from the census of the Philippines, a comparison of two equations was made to predict an accurate size of the Philippine population in the future. It shows that the population of the Philippines in the year 2020 is an average of more or less 106,292,122. It is accurate enough to predict the size of the Philippine population given that the assumptions made will be satisfied. This journal can be used by the future economists for further improvement of the Philippine economy. Keywords: Population, forecasting, Philippine economy, population policy
1. Introduction
Third World, coined as less advanced or growing countries especially located in Asia, Africa, and Latin America. These countries have poor economic statuses. As people know, Philippines belongs to the third world countries. The development of economy is a very crucial element for the advancement of the country but there are hindrances for the country to move forward. One of those hindrances is the population growth. Population growth is the change of the population over a period of time. It can be measure as the change in the number of individuals coined as the population per unit of time. According to some economists, population grows in the same way that money grows when its left to compound interest in a bank. With money, growth comes through accumulation interest upon interest. The interest payments you accumulate eventually will earn interest, increasing your money. With population growth, new members of the population eventually produce other new
members. Thus population increases exponentially as time passes. The only difference between money and population is that unlike population, money can increase without bound. Population growth is a very serious problem in the Philippines. Population increases rapidly but does not increase infinitely since it is bounded by the availability of resources such as foods, jobs and housing facilities. This study aimed to determine the trend of the population. The specific objectives were To predict the number of population in the future by comparing two methods namely Least Square polynomial extrapolation and Curve Fitting; To answer questions such as when is the economy will boom; and To show future trends of the population growth.
2. Review of Related Literature Earth is a place people called home. It is like a container which can only hold limited amount of volume. The limit of Earth is determined by natural constraints and human choices regarding environment, traditions and cultures. According to Cohen, population growth will increase rapidly followed by a slowing population growth rate. Upon studying population growth, Cohen discovered three significant approaches on how to ease the future trend of population regarding economic well-being, environmental quality and traditional values. First is to develop more advance technology, second is to slow down or stop the population growth and lastly is to improve the terms under which people interact. Way back 2003, Mason studied population change in connection with economic development. His paper aimed to know if demographic change plays a significant role in the countrys economic fate and whether the decrease in the population was an outcome of economic status of the country. He then discovered three major lessons, first, given that all the assumptions were satisfied, the rate of population growth will decrease at a remarkable speed. Second, economic development played a major role in lowering the population growth. Third, the demographic changes affected the population size, growth rates and age structures. It affect the economic roles of everyone like the working-age population, roles of women, decision making and investments. This paper will be an improvement of the past studies. The paper aims to know the population trend using mathematical tools like SCILAB and Curve Expert Pro. The
outcome of the paper will help future economists and politicians to predict what plans and strategies to do regarding size of the population. 3. Preliminaries In order to fulfill the objectives of this paper, there are necessary programs and theorems that are needed. The programs that were used to perform mathematical processes are SCILAB and Curve Expert Pro. SCILAB has been established for system control and signal processing applications. SCILAB is an interpreter, it has libraries of functions, Fortran and C routines. It is capable of handling matrices like basic matrix manipulation, concatenation, inverse, transpose etc. It is used to devise an algorithm for the problem. Curve Expert Pro is a crossplatform solution for fitting curves and analyzing data. Using the given data, Curve Expert Professional can make various models that will fit the data points such as linear regression models, nonlinear regression models and various kinds of splines. This program is very suitable in devising an interpolating and extrapolating functions. This program offers the user the highest quality results and output while saving the time of the user. Extrapolation is a process of estimating a value outside the observation interval. The values of the variables are interconnected to each other. Extrapolation is like interpolation, it is just extrapolation is estimating a value outside the given data while interpolation from the
word inter is estimating a value inside the given data. One type of extrapolation is the Least Squares Polynomial. It is a method of extrapolating polynomial which involves linear equation. It is a method that will linearize the function that will estimate and capture the future trend of the data. Linearization is the process by which a nonlinear equation is transformed to a linear equation that will approximate the value.
applying the algorithm, linearization is needed in order to minimize the error and linearize the equation y=CeAx where C and A are constants. First is to take the logarithm of both sides: ln(y) = Ax + ln(C) Then introduce variables: the change of
Y=ln(y), X=x, and B=ln(C) This results in a linear relation between the new variables X and Y: Y = AX + B Table 1.1 Data to be used in order to find the value of A and B.
Year(X) 1799 1800 1812 1819 1829 1840 1850 1858 1870 1877 1887 1896 1903 1918 1939 1948 1960 1970 1975 1980 1990 1995 2000 2007 2010 Population(Y) 1,502,574 1,561,251 1,933,331 2,106,230 2,593,287 3,096,031 3,857,424 4,290,381 4,712,006 5,567,685 5,984,727 6,261,339 7,635,426 10,314,310 16,000,303 19,234,182 27,087,685 36,684,486 42,070,660 48,098,460 60,703,206 68,616,536 76,506,928 88,566,732 92,337,852 ln(Y) 14.2226902 14.26099798 14.47475498 14.56041018 14.76843674 14.94563153 15.16551016 15.2718861 15.36562428 15.53248991 15.60472128 15.64990462 15.84830929 16.14904281 16.58811822 16.77219957 17.11458975 17.4178645 17.55486114 17.68876072 17.92150707 18.04404411 18.15289186 18.29926686 18.34096471
4. Results and Discussions Appendix A contains the data gathered from the census of the Philippines from 1799 to 2010. By observing the data, the result of the study is expected to have an increasing trend of data. A total of 25 data points were used in order to come up with two polynomials; linear function using SCILAB and 4th degree polynomial using Curve Expert Pro. Necessary conditions were needed to satisfy before using the proposed algorithm for Least Squares Polynomial in SCILAB. These conditions are as follows, data should be in an increasing manner only or decreasing and only one value (Y) should correspond to a certain value (X). For the proposed algorithm that was used, see Appendix B. All conditions were satisfied yet the algorithm is not suitable to the data because the graph of the data is almost similar to the graph of an exponential equation y=ex. Before
The original points (xk,yk) in the xy-plane are transformed into points (Xk,Yk) = (xk,ln(yk)) in the XY-plane. Using the new sets of data and the proposed algorithm in SCILAB, values of A and B were 0.0194433 and -20.894094 respectively. Therefore the linear equation is: Y=0.0194433X - 20.894094 Take note that Y=ln(y) and X=x, hence the value of y=eY. Table 1.2 would give the summary of results using the Least Square Polynomial. Table 1.2. Summary of results using Least Square Polynomial.
ln(y)=0.019 4433(X)20.894094 14.0844027 14.103846 14.3371656 14.4732687 14.6677017 14.881578 15.076011 15.2315574 15.464877 15.6009801 15.7954131 15.9704028 16.1065059 16.3981554
1939 1948 1960 1970 1975 1980 1990 1995 2000 2007 2010
16.8064647 16.9814544 17.214774 17.409207 17.5064235 17.60364 17.798073 17.8952895 17.992506 18.1286091 18.186939
19904666 23711114 29942038 36368261 40081423 44173694 53654345 59132397 65169753 74671504 79156622
16,000,303 19,234,182 27,087,685 36,684,486 42,070,660 48,098,460 60,703,206 68,616,536 76,506,928 88,566,732 92,337,852
Yea r(X) 1799 1800 1812 1819 1829 1840 1850 1858 1870 1877 1887 1896 1903 1918
y=e^Y 1,308,514 1,334,205 1,684,814 1,930,460 2,344,779 2,903,938 3,527,187 4,120,799 5,203,682 5,962,379 7,242,038 8,626,961 9,884,772 13,232,089
Population (Y) 1,502,574 1,561,251 1,933,331 2,106,230 2,593,287 3,096,031 3,857,424 4,290,381 4,712,006 5,567,685 5,984,727 6,261,339 7,635,426 10,314,310
Approximated Data
Actual value
As seen in the table and graph, the data that the polynomial will give is not accurate enough though it captures the trend of data. The average error given by the Least Square Polynomial is 3,354,990 which is very large. Therefore the predictions of population in the Philippines that will be made using this method will not be accurate
enough. Hence curve fitting will be used to compare the result of an nth degree polynomial to a linear equation given by the Least Squares Polynomial Approximation. Using the same assumptions and data used in Least Squares Polynomial Approximation, Curve Expert Professional was used to formulate an nth degree polynomial that will give an approximation to the data gathered. After entering the data, Curve Expert gave a 4th degree polynomial that will approximate the true value. y=1.057808698339420E+12 2.310990570858107E+09(x) + 1.892960313313528E+06(x2) 6.889100296114909E+02(x3) + 9.400544032120271E-02(x4) Table 2.1 Summary of results using degree polynomial. Year( Approximation X) Population(Y) (Curve Fit) 1799 1,502,574 1,243,795 1800 1,561,251 1,303,552 1812 1,933,331 2,034,931 1819 2,106,230 2,449,150 1829 2,593,287 2,993,314 1840 3,096,031 3,509,974 1850 3,857,424 3,911,065 1858 4,290,381 4,203,972 1870 4,712,006 4,657,367 1877 5,567,685 4,972,359 1887 5,984,727 5,562,728 1896 6,261,339 6,319,522 1903 7,635,426 7,120,030 1918 10,314,310 9,724,755 1939 16,000,303 16,409,895 1948 19,234,182 20,793,702 1960 27,087,685 28,490,666 1970 36,684,486 36,834,777 4th
1800
1850
1900
1950
2000
2050
Actual Data
Approximated Value
As seen in the graph and the table, Curve Expert Pro gave a 4th degree polynomial which is accurate enough to capture the trend of data. The average error given by the polynomial is 546,743 which is relatively smaller compared to average error of Least Squares. Hence it is accurate enough to predict the future population of the Philippines. Thus the population in the Philippines in 2020 using Least Squares with an average relative error of 0.141281 and the 4th degree polynomial with an average relative error of 0.0579 is 96,145,382 and 116,438,862 respectively.
5. Conclusion 7. References Philippines belongs to the Third World Countries or growing countries. Being a member of growing countries means that the country suffers from economic deprivation. Economy played a major role in the development of the country and one of the major aspect that affect the Philippine economy is the population. Using the data gathered in the census of the Philippines, two equations were formulated; a Least Squares equation and a 4th degree polynomial. These two equations were used to predict the size of the population of the Philippines in the future. Comparing the results that these two were giving, it is clearly significant to use the 4th degree polynomial since it has a minimal relative error compared to Least Squares. Through the use of these equations, economists and politicians can now devise strategies on how to control the population and at the same time to improve the economy of the Philippines. 6. Acknowledgement I would like to express my appreciation to my friends and family who supported me on doing this journal. To my professors, Mr. Jonathan Mamplata and Ms. Destiny Lutero, thank very much Sir and Maam for your patience in guiding me. To my special someone Mary Joi Librea, thank you very much for your continuing love. I am very lucky because you are always here supporting and encouraging me to study hard and do my best. I love you!
(n.d.). Retrieved july 29, 2013, from http://countrystudies.us/philippines/34.htm (n.d.). Retrieved July 29, 2013, from http://www.tradingeconomics.com/philippines/p opulation (n.d.). Retrieved July 30, 2013, from http://www.nscb.gov.ph/beyondthenumbers/201 2/11162012_jrga_popn.asp (2003). A First Course in Mathematical Modeling. In F. R. Giordano, M. D. Weir, & W. P. Fox, A First Course in Mathematical Modeling (p. 535). Brookes/Cole. National Statistics Office. (n.d.). Retrieved july 29, 2013, from http://www.census.gov.ph/ (n.d.). Retrieved September 12, 2013, from http://serc.carleton.edu/quantskills/methods/qua ntlit/popgrowth.html Cohen, J. E. (1995, July 21). Population Growth and Earth's Human Carrying Capacity. p. 7. Mason, A. (2003). Population change and economic development: What have we learned from the East Asia experience? p. 12. Retrieved September 26, 2013 Scilab. (n.d.). Retrieved from http://www.scilab.org/ (n.d.). Retrieved September 28, 2013, from Curve Expert: http://www.curveexpert.net/
APPENDIX A Table A. Data gathered from the census of the Philippines Average annual rate of increase (%) 3.91 1.80 1.23 2.10 1.62 2.22 1.34 0.78 2.41 0.72 0.50 2.87 2.03 2.11 2.07 2.89 3.08 2.78 2.71 2.35 2.32 2.34 2.04 1.9
Year(X) 1799 1800 1812 1819 1829 1840 1850 1858 1870 1877 1887 1896 1903 1918 1939 1948 1960 1970 1975 1980 1990 1995 2000 2007 2010
Population(Y) 1,502,574 1,561,251 1,933,331 2,106,230 2,593,287 3,096,031 3,857,424 4,290,381 4,712,006 5,567,685 5,984,727 6,261,339 7,635,426 10,314,310 16,000,303 19,234,182 27,087,685 36,684,486 42,070,660 48,098,460 60,703,206 68,616,536 76,506,928 88,566,732 92,337,852
Source of Data Fr. Buzeta Fr. Zuniga Cedulas Cedulas Church Local Officials Fr. Buzeta Bowring Guia de Manila Census Census Prof. Plehn's estimate Census Census Census Census Census Census Census Census Census Census Census Census Census
APPENDIX B Proposed Algorithm to be used in order to get the Least Squares Polynomial
//the linear regression function takes x-values and //y-values of data in the column vectors 'X' and 'Y' and finds the best fit line through the data points. It returns the //slope and y-intercept of the line as well as the coefficient of determination the function call for this should be of the //form: '[m,b,r2]=Linear_Regression(x,y)' function [slope, y_int, r_sq]=Linear_Regression(X, Y) //determine the number of data points n=size(X,'r'); //initialize each summation sum_x=0; sum_y=0; sum_xy=0; sum_x_sq=0; sum_y_sq=0; //calculate each sum required to find the slope, yintercept and r_sq for i=1:n sum_x=sum_x+X(i); sum_y=sum_y+Y(i); sum_xy=sum_xy+X(i)*Y(i); sum_x_sq=sum_x_sq+X(i)*X(i); sum_y_sq=sum_y_sq+Y(i)*Y(i); end //determine the average x and y values for the //y-intercept calculation x_bar=sum_x/n; y_bar=sum_y/n; //calculate the slope, y-intercept and r_sq and return the results slope=(n*sum_xy-sum_x*sum_y)/(n*sum_x_sqsum_x^2); y_int=y_bar-slope*x_bar; r_sq=((n*sum_xysum_x*sum_y)/(sqrt(n*sum_x_sqsum_x^2)*sqrt(n*sum_y_sq-sum_y^2)))^2; //determine the appropriate axes size axes_size=[min(X)-0.1*(max(X)-min(X)),min(Y)0.1*(max(Y)-min(Y)),max(X)+0.1*(max(X)min(X)),max(Y)+0.1*(max(Y)-min(Y))]; //plot the provided data plot2d(X,Y,style=-4,rect=axes_size);
//plot the calculated regression line plot2d(X,(slope*X+y_int)); endfunction X=[1799;1800;1812;1819;1829;1840;1850;1858;187 0;1877;1887;1896;1903;1918;1939;1948;1960;1970; 1975;1980;1990;1995;2000;2007;2010];Y=[14.2226 902;14.26099798;14.47475498;14.56041018;14.768 43674;14.94563153;15.16551016;15.2718861;15.36 562428;15.53248991;15.60472128;15.64990462;15. 84830929;16.14904281;16.58811822;16.77219957;1 7.11458975;17.4178645;17.55486114;17.6887607;1 7.92150707;18.04404411;18.15289186;18.29926686 ;18.34096471];[P Q R]=Linear_Regression(X, Y)