0% found this document useful (0 votes)

82 views24 pages

MM 501

This document is a statistical analysis report submitted by two students, Dhruvi Babubhai Nakrani and Vishal Pravinbhai Parmar, at Sardar Vallabhbhai National Institute of Technology under the supervision of Dr. Neeru Adlakha. The report analyzes and compares the population of different states in India between 2018 and 2019, calculates various statistical measures of the population data, and examines the correlation between population and internet users in different states.

Uploaded by

Dhruvi Nakrani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views24 pages

MM 501

Uploaded by

Dhruvi Nakrani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Statistical Analysis of Population of different

States in India

submitted in partial fulfillment of the requirements

for the course of MM-501 (Mini Project)
in
5th Year (9th Semester) of

(Five Year Integrated Program)

in Mathematics
submitted by

Dhruvi Babubhai Nakrani

Vishal Pravinbhai Parmar

under the supervision of

Dr Neeru Adlakha
Professor
(Department of Mathematics and Humanities)

DEPARTMENT OF MATHEMATICS AND HUMANITIES

SARDAR VALLABHBHAI NATIONAL INSTITUTE OF
TECHNOLOGY
SURAT-395007, GUJARAT, INDIA

November 2022
DECLARATION

We hereby declare that the report entitled “Statistical analysis of Population of different
states in India” is a genuine record of work carried out by us and no part of this report has
been submitted to any University or Institution for the completion of any course.

Dhruvi Babubhai Nakrani

Admission No.: I18MA015

Vishal Pravinbhai Parmar

Admission No.:I18MA016

Department of Mathematics & Humanities

Sardar Vallabhbhai National Institute of Technology
Surat-395007

Date: November 2022

Place: Surat

1
Acknowledgment

We are very grateful as students of 5 Year Integrated M.Sc. program in Mathematics at Sar-
dar Vallabhbhai National Institute of Technology, Surat. First, we would like to express our
genuine appreciation to our supervisors Dr. Neeru Adlakha for her guidance for our work.
To work with her is a great opportunity and pleasure to us. We are thankful to Director of
SVNIT and Dr. Jayesh M. Dhodiya, Head of Department of Mathematics & Humanities and
all other Faculties, Research Scholars and Non-Teaching staff of our department for their reg-
ular help, moral support and encouragement.

Dhruvi Nakrani - I18MA015

Vishal Parmar - I18MA016

2
Contents

1 Abstract 1

2 Introduction 2

3 Literature Review 3

4 Relevant Theory 4

5 Methodology 6

6 Comparison of the Population of Different States in India between 2018 and

2019 9

7 A Statistical Evaluation of our Main Data Set 12

7.1 Calculation of Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
7.2 Calculation of Median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
7.3 Calculation of Variance(σ 2 ) and Standard Deviation(σ) . . . . . . . . . . . . . . 13
7.4 Calculation of Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
7.5 Calculation of the Coefficient of Variation . . . . . . . . . . . . . . . . . . . . . 13
7.6 Calculation of Skewness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
7.7 Calculation of Kurtosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

8 Correlation Between Population and Internet Users of Different States of In-

dia 15

9 Conclusion and Future Scope 20

3
Chapter 1

Abstract

India, the country that holds the greater part of South Asia and is one of the oldest civiliza-
tions in the world with a rich cultural heritage. It represents a highly diverse population con-
sisting of many ethnic groups and several languages.
India is a federal union comprising 28 states and 8 union territories, a total of 36 num-
bers. According to Census Population Projection Report, India’s Population in 2022 is es-
timated at 1,375,586,000 (1.38 Billion or 138 Crore), India has witnessed a huge growth in
its population in the last 50 years. According to estimates, India will become the most pop-
ulated country in the world by 2028 leaving behind China.Nearly half of India’s total pop-
ulation lives in five states of Uttar Pradesh, Maharashtra, Bihar, West Bengal and Andhra
Pradesh.In this project we will try to predict the future population of 2036 based on
data from 2018 and 2019.We also evaluate different statistical measures of interest from the
data used, and finally tried to draw some correlation between the number of internet users
and the population of different states in India by year.

1
Chapter 2

Introduction

Population of Indian states like Uttar Pradesh, Maharashtra and Bihar is more than many
countries around the world. Uttar Pradesh, most populated state in India is currently home
to over 237 million people. Most of the states in India are very densely populated as com-
pared to other places in the world, thus leading to danger of environment imbalances. Popu-
lation growth rate of many highly populated states in India is 5% to 18% in a decade.
First, we have noted down some basic relevant statistical concepts and then divided
right into a state-over-state comparison of the yearly population of different states between
2018 and 2019 and tried to draw inferences from the same. The data sets for our purposes
were mainly taken from Census of India(Government Official Website for Population demo-
graphics). The tools used for the data analysis and visualization were Excel and Jupyter
notebook for using Python. Using the tables and the visualization afforded to us by the graphs,
we were able to make some relate the trends in the figures with the actual on-ground situa-
tion with respect to the population in India.
We next checked our data set of population of 2019 for our selected different 15 states
for different statistical measures/quantities such as mean, median, variance, coefficient of
variation, standard deviation, skewness and kurtosis. This gave us information about the dis-
tribution of the data at hand.
Lastly, we tried to correlate the number of internet users and population in different
states of India year wise. We obtain only a moderate correlation for the same.we explored the
reasons why that might have been the case.

2
Chapter 3

Literature Review

There is extensive literature available on the impact of population of different states in India
on various aspects of life, economy, livelihoods, environment etc.There are both governmental
and non-governmental sources available which carry reports on population of India as well as
the different states of India itself.

• The Official Website of Indian Government (Census of India’s) detailed reports and
data sets on the Population of different states of India for different years are some of
the most credible sources for the same.

• Some of the non-governmental websites like statisticstimes , statista also provides the
detailed analysis of population growth in different states of India over past years.

• Some of the more common online sources such as Wikipedia also carry detailed infor-
mation on different aspects of the population and carry tons of further sources and ref-
erences.

3
Chapter 4

Relevant Theory

• Mean of a set of observations is the sum of all observations divided by the total num-
ber of observations. Thus, if X1 , X2 , · · · , XN represent the values of N items or obser-
vations, the arithmetic mean denoted by X or µ is defined as:
N
P
Xi
X1 + X 2 + · · · + XN
X= = i=1
.
N N
While the essence remains the same, the formula changes for grouped data, or if we
want to take a weighted mean.
• Median is a measure of central tendency that finds the center of the data when ar-
ranged in some order.
N +1
th
Median = Size of observation.
2
For grouped data,
N/2 − cf
Median = L + × i, where
f
L = Lower limit of median class, i.e., th class in which the middle observation in the
distribution lies,
cf = Preceding cumulative frequency to the median class,
f = Frequency of the median class, and
i = Class-interval of the median class.
• Mode is the data value which has the highest frequency. For ungrouped data, one can
directly count the number of times that different values repeat themselves, so that the
one that occurs the maximum number of times is the modal value. On the other hand,
in the case of grouped data, the following formula is used for calculating mode:
f1 − f2
Mo = L + × i, where
2f1 − f0 − f2
L = Lower limit of the modal class,
f0 = Frequency of the class preceding the modal class,
f1 = Frequency of the modal class,
f2 = Frequency of the class succeeding the modal class, and
i = The size of the modal class.

4
• Variance is defined as the expectation of squared deviations about the mean of given
data. It is a measure of spread or dispersion.
• Standard Deviation (σ) is the square root of variance. It is also a measure of spread
or dispersion.
• Percentage Growth is given mathematically, as
(Final Value − Initial Value)
Percentage Growth = × 100.
Initial Value
• Correlation Coefficient is a statistical measure of the strength of the relationship be-
tween the relative movements of two variables. Denoted by the symbol r, it summarizes
in one figure the direction and degree of correlation. Here, for our purposes, we have
used the Karl Pearson Coefficient of Correlation, which assumes a linear relationship
between variables. Let X and Y be two variables whose coefficient of correlation we are
interested in. Then,
(X − X)(Y − Y )
P
r = qP qP .
(X − X)2 (Y − Y )2
• Trend Line: A line on a graph showing the general direction that a group of points
seem to follow.
• Linear Regression attempts to model the relationship between two variables by fit-
ting a linear equation to observed data points.
• Regression Line: The line corresponding to the fitted linear equation above is the
regression line.
• The first thing one usually notices about a distribution’s shape is whether it has one
mode (peak) or more than one. If it’s unimodal (has just one peak), like most data
sets, the next thing to notice is whether it’s symmetric or skewed to one side. If the
bulk of the data is at the left and the right tail is longer, we say that the distribution is
skewed right or positively skewed; if the peak is toward the right and the left tail is
longer, we say that the distribution is skewed left or negatively skewed.
• The other common measure of shape is called the kurtosis. As skewness involves the
third moment of the distribution, kurtosis involves the fourth moment. Higher values
indicate a higher, sharper peak; lower values indicate a lower, less distinct peak. The
reference standard is a normal distribution, which has a kurtosis of 3. In token of this,
often the excess kurtosis is presented: excess kurtosis is simply kurtosis − 3.
– A normal distribution has kurtosis exactly 3 (excess kurtosis exactly 0). Any dis-
tribution with kurtosis ≈ 3 (excess ≈ 0) is called mesokurtic.
– A distribution with kurtosis < 3 (excess kurtosis < 0) is called platykurtic.
Compared to a normal distribution, its tails are shorter and thinner, and often its
central peak is lower and broader.
– A distribution with kurtosis > 3 (excess kurtosis > 0) is called leptokurtic.
Compared to a normal distribution, its tails are longer and fatter, and often its
central peak is higher and sharper.
• Coefficient of Variation,
σ
C.V. = × 100%
µ

5
Chapter 5

Methodology

Publicly available datasets were obtained from sources such as Census of India’s official web-
site.To handle the data, create graphs and make calculations, we took the help of the pro-
gramming language Python.
The pandas library of Python was used to pre-process the data for the Population fig-
ures and to reduce it down to a less unwieldy data set. Some exploratory data analysis and
relevant calculations with the obtained data were also performed using this library. In order
to visualise the results, the matplotlib library of Python was used to generate a few of the
plots. In later stages, some use of Excel was also made for the purpose.

• Some required Python libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
from sympy import symbols, Eq, solve

• To load the excel data sets

data = pd.read_excel("forgraphs.xlsx")

• To calculate the Regression line and plot the graphs

x1 = np.array(range(15)) #regression Line 1
y1 = 1.43097 + 0.29372*x1

x2 = np.array(range(15)) #regression Line 2

y2 = 0.32972 +2.02979*x2

data = pd.read_excel("forgraphs.xlsx") #importing excel file

states = list(data['State'])
p2020 = list(data["p2020"])
p2019 = list(data["p2019"]) #population of 2019
p2018 = list(data["p2018"]) #population of 2018
p2017 = list(data["p2017"]) #population of 2017
p2016 = list(data["p2016"]) #population of 2016
p2015 = list(data["p2015"]) #population of 2015
p2014 = list(data["p2014"]) #population of 2014

i2020 = list(data["i2020"])
i2019 = list(data["i2019"]) #internetUsers of 2019
i2018 = list(data["i2018"]) #internetUsers of 2018

6
i2017 = list(data["i2017"]) #internetUsers of 2017
i2016 = list(data["i2016"]) #internetUsers of 2016
i2015 = list(data["i2015"]) #internetUsers of 2015
i2014 = list(data["i2014"]) #internetUsers of 2014

x_axis = np.arange(len(states))

#Comparison Graph
plt.figure(figsize=(15,8), dpi=120)
plt.xticks(x_axis, states, rotation=45)
plt.plot(x_axis, p2019, marker="o")
plt.plot(x_axis, p2018, marker="o")
plt.ylabel("Population in Crores")
plt.legend(["2018","2019"])
plt.savefig('statevspop18-19.png', dpi=300, bbox_inches='tight')
plt.show()

#Bar Graph
width = 0.1
plt.figure(figsize=(15,8), dpi=120)
plt.bar(x_axis, p2020, width=width, label='2020')
plt.bar(x_axis+width, p2019, width=width, label='2019')
plt.bar(x_axis+width*2, p2018, width=width, label='2018')
plt.bar(x_axis+width*3, p2017, width=width, label='2017')
plt.bar(x_axis+width*4, p2016, width=width, label='2016')
plt.bar(x_axis+width*5, p2015, width=width, label='2015')
plt.bar(x_axis+width*6, p2014, width=width, label='2014')
plt.xticks(x_axis, states, rotation=45)
plt.ylabel("Population(in Cr.)")
plt.legend()
plt.savefig('barGraphAllYears.png', dpi=300, bbox_inches='tight')
plt.show()

#Regresiion Lines
plt.figure(figsize=(15,8), dpi=120)
plt.grid(True)
plt.plot(x1,y1, linestyle='--', color='red')
plt.plot(x2,y2, color='blue')
plt.axvline(0.634, color='gray', linestyle='--')
plt.axhline(1.617, color='gray', linestyle='--')
plt.xlabel("X")
plt.ylabel("Y")
plt.legend(["Y = 1.43097 + 0.29372 X","Y = 0.32972 +2.02979 X"])
plt.savefig('regressionLines.png', dpi=300, bbox_inches='tight')
plt.show()

#Regresiion Line population

plt.figure(figsize=(15,8), dpi=120)
plt.xticks(x_axis, states, rotation=45)
plt.plot(x_axis, p2019, 'o-')
plt.plot(x2,y2)
plt.xlabel("States")
plt.ylabel("Internet Users (in Cr.)")
plt.legend(["2019","X on Y"])
plt.savefig('regressionLinePopulation.png', dpi=300,
,→ bbox_inches='tight')
plt.show()

#Regression Line Internet Users

7
plt.figure(figsize=(15,8), dpi=120)
plt.xticks(x_axis, states, rotation=45)
plt.plot(x_axis, i2019, 'o-', color='green')
#plt.plot(x1,y1, linestyle='--', color='red')
plt.plot(x2,y2, color='blue')
plt.xlabel("States")
plt.ylabel("Internet Users (in Cr.)")
plt.legend(["2019","Y on X"])
plt.savefig('regressionLineInternet.png', dpi=300,
,→ bbox_inches='tight')
plt.show()

#Internet Users During Different Years

width=0.5
plt.figure(figsize=(12,8), dpi=120)
plt.bar(x_axis, i2020, width,label='2020')
plt.bar(x_axis, i2019, width,label='2019')
plt.bar(x_axis, i2018, width,label='2018')
plt.bar(x_axis, i2017, width,label='2017')
plt.bar(x_axis, i2016, width,label='2016')
plt.bar(x_axis, i2015, width,label='2015')
plt.bar(x_axis, i2014, width,label='2014')
plt.xticks(x_axis, states, rotation=45)
plt.xlabel("States")
plt.ylabel("Internet users(in Cr.)")
plt.legend()
plt.savefig('stateInternetUsers2014-20.png', dpi=300,
,→ bbox_inches='tight')
plt.show()

#Comparision of Internet users and Population

plt.figure(figsize=(15,8), dpi=120)
plt.xticks(x_axis, states, rotation=45)
plt.grid(True)
plt.plot(x_axis, i2019, 'o-', color='green')
plt.plot(x_axis, p2019, 'o-', color='blue')
plt.ylabel("Population (in Cr.)")
plt.legend(["Internet Users","Population"])
plt.savefig('compPopIu.png', dpi=300, bbox_inches='tight')
plt.show()

8
Chapter 6

Comparison of the Population of

Different States in India between 2018
and 2019

We first look at the data on population across fifteen states with highest population in the
India ,which are Uttar Pradesh, Maharashtra, Bihar, West Bengal,Andhra Pradesh, Mad-
hya Pradesh, Tamilnadu, Rajasthan, Karnataka, Gujarat, Odisha, Kerla, Jharkhand, Assam,
Punjab. We obtained data for both 2019 and 2018 for these states to be able to comment
on the deviation from usual/expected figures for certain states in 2019. The reasons for the
same will be discussed in detail in the next section. For our purpose, we obtained a publicly
available data set from the Census of India’s official website. The Figure 6.1 shows the com-
parison between the population of fifteen states in 2018 and 2019.

Figure 6.1: Population of different states in year 2018 and 2019

Now we plot the Bar graph for the Population of highest populated states(as described above)

9
Year
States 2019 2018
Uttar Pradesh 237,882,725 223,897,418
Maharashtra 123,144,223 124,945,748
Bihar 124,799,926 121,741,741
West Bengal 99,609,303 98,785,114
Andhra Pradesh 53,903,393 87,641,369
Madhya Pradesh 85,358,965 82,961,852
Tamil Nadu 77,841,267 80,288,487
Rajasthan 81,032,689 77,122,315
Karnataka 67,562,686 68,159,821
Gujarat 63,872,399 68,927,491
Odisha 46,356,334 46,172,447
Kerala 35,699,443 34,732,356
Jharkhand 38,593,948 34,149,478
Assam 35,607,039 32,652,597
Punjab 30,141,373 30,471,254

Table 6.1: Population in Different States in Year 2019 and 2018

Year Percentage
State 2019 2018 Growth(%)
Uttar Pradesh 237,882,725 223,897,418 6.2463
Maharashtra 123,144,223 124,945,748 -1.4418
Bihar 124,799,926 121,741,741 2.5120
West Bengal 99,609,303 98,785,114 0.8343
Andhra Pradesh 53,903,393 87,641,369 -38.4954
Madhya Pradesh 85,358,965 82,961,852 2.8894
Tamil Nadu 77,841,267 80,288,487 -3.0481
Rajasthan 81,032,689 77,122,315 5.0704
Karnataka 67,562,686 68,159,821 -0.8761
Gujarat 63,872,399 68,927,491 -7.3339
Odisha 46,356,334 46,172,447 0.3983
Kerala 35,699,443 34,732,356 2.7844
Jharkhand 38,593,948 34,149,478 13.0148
Assam 35,607,039 32,652,597 9.0481
Punjab 30,141,373 30,471,254 -1.0826

Table 6.2: Yearly Population Growth for year 2018 and 2019

in India for showing comparison between the years 2020,2019,2018,2017,2016,2015,2014.

10
Figure 6.2: Population of different states from year 2020 to 2014

Year
State 2020 2019 2018 2017 2016 2015 2014
Uttar Pradesh 236,693,311 237,882,725 223,897,418 246,035,979 243,209,093 234,125,886 231,048,278
Maharashtra 122,528,502 123,144,223 124,945,748 127,364,900 125,901,512 121,199,429 119,606,250
Bihar 124,175,926 124,799,926 121,741,741 129,077,351 127,594,287 122,828,983 121,214,384
West Bengal 99,111,256 99,609,303 98,785,114 103,023,338 101,839,628 98,036,191 96,747,496
Andhra Pradesh 53,633,876 53,903,393 87,641,369 55,750,892 55,110,329 53,052,106 52,354,731
Madhya Pradesh 84,932,170 85,358,965 82,961,852 88,284,580 87,270,215 84,010,906 82,906,575
Tamil Nadu 77,452,061 77,841,267 80,288,487 80,509,219 79,584,190 76,611,934 75,604,863
Rajasthan 80,627,526 81,032,689 77,122,315 83,810,024 82,847,070 79,752,954 78,704,594
Karnataka 67,224,873 67,562,686 68,159,821 69,878,347 69,075,464 66,495,681 65,621,588
Gujarat 63,553,037 63,872,399 68,927,491 66,061,578 65,302,549 62,863,674 62,037,325
Odisha 46,124,552 46,356,334 46,172,447 47,945,163 47,394,286 45,624,237 45,024,502
Kerala 35,520,946 35,699,443 34,732,356 36,923,015 36,498,780 35,135,648 34,673,787
Jharkhand 38,400,978 38,593,948 34,149,478 39,916,727 39,458,095 37,984,441 37,485,132
Assam 35,429,004 35,607,039 32,652,597 36,827,444 36,404,307 35,044,703 34,584,037
Punjab 29,990,666 30,141,373 30,471,254 31,174,446 30,816,260 29,665,356 29,275,402

Table 6.3: Population of different states from year 2020 to 2014

11
Chapter 7

A Statistical Evaluation of our Main

Data Set

We evaluate statistical measures such as mean, median, variance, coefficient of variation,

standard deviation, skewness and kurtosis for the main data set of our interest, i.e., of the
population of the different fifteen states for the year 2019. The data for the same can be
found in Table 6.2.
X X −µ (X − µ)2 (X − µ)3 (X − µ)4
23.7883 15.7789 248.9737 3928.5317 61,987.9129
12.3144 4.3051 18.5335 79.7875 343.4893
12.4800 4.4706 19.9865 89.3519 399.4583
9.9609 1.9516 3.8086 7.4327 14.5053
5.3903 -2.6190 6.8593 -17.9648 47.0504
8.5359 0.5265 0.2772 0.1460 0.0769
7.7841 -0.2252 0.0507 -0.0114 0.0026
8.1033 0.0939 0.0088 0.0008 0.0001
6.7563 -1.2531 1.5703 -1.9677 2.4657
6.3872 -1.6221 2.6313 -4.2683 6.9238
4.6356 -3.3737 11.3821 -38.4003 129.5524
3.5699 -4.4394 19.7085 -87.4945 388.4255
3.8594 -4.1500 17.2223 -71.4722 296.6078
3.5607 -4.4487 19.7906 -88.0420 391.6695
3.0141 -4.9952 24.9524 -124.6429 622.6205
X = 120.1406 (X − µ) = 0.0000 (X − µ)2 = 395.7558 (X − µ)3 = 3670.9864 (X − µ)4 = 64, 630.7609
P P P P P

Table 7.1: Table for Calculating the First Four Moments about the Mean (µ = 8.0094)

7.1 Calculation of Mean

Mean,
P
X
µ=
N
= (23.7883 + 12.3144 + 12.4800 + 9.9609 + 5.3903 + 8.5359 + 7.7841 + 8.1033
+ 6.7563 + 6.3872 + 4.6356 + 3.5699 + 3.8594 + 3.5607 + 3.0141)/15
∴ µ = 8.0047

12
7.2 Calculation of Median
Observations in ascending order are: 2.9991, 3.5429, 3.5521, 3.8401, 4.6356, 5.3903, 6.3872,
6.7563, 7.7841, 8.1033, 8.5359, 9.9609, 12.3144, 12.4800 and 23.7883. Here N = 15 is odd.
Now Median
N + 1 th

M = Value of observation
2
15 + 1 th

= Value of observation
2
= Value of (16)th observation
= Value of (8)th observation
∴ M = 6.7563

7.3 Calculation of Variance(σ 2) and Standard Deviation(σ)

The second moment about the mean µ2 , when averaged over the number of observations n,
gives us the variance σ 2 , Thus, Variance
(x − µ)2
σ2 =
N
26.38372
=
15
696.1008
=
15
∴ σ 2 = 46.4067

Thus, Standard Deviation

√
σ= 46.4067 = 6.8122

Thus, the standard deviation over the....

7.4 Calculation of Moments

P
(X−µ)
• µ1 = N
= 0
15
=0
P
(X−µ)2
• µ2 = N
= 395.7558
15
= 26.3705
P
(X−µ)3
• µ3 = N
= 3670.9864
15
= 244.7324
P
(X−µ)4
• µ4 = N
= 64630.7609
15
= 4, 308.7174

7.5 Calculation of the Coefficient of Variation

Coefficient of Variation,
σ 5.1381
C.V = × 100% = × 100% = 64.23%
µ 8

13
7.6 Calculation of Skewness
Skewness,
v
u 2
q u µ3
γ1 = + β1 = t
µ32
v
u (244.7324)2
u
= +t
(26.3705)3
s
59893.9619
=+
18365.7309
√
= + 3.26118
∴ γ1 = 1.8059

Thus the data set is .....

7.7 Calculation of Kurtosis

Kurtosis,

γ2 = β2 − 3
µ4
= 2 −3
µ2
4308.7174
= −3
26.38372
4308.7174
= −3
696.1008
∴ γ2 = 6.1898 − 3
∴ γ2 = 3.1898

14
Chapter 8

Correlation Between Population and

Internet Users of Different States of
India

We shall now draw inferences from our data set in terms of how correlated this population
data of different states with the number of internet users of the same states in year 2019.
We except that the number of internet users should increase as the population increases, and
similarly the number of internet users should decrease as the population decreases.
The data for the internet users in different states of India per year was obtained from
the official website of the Department of Telecommunications of Government of India.

States X-(Population of year 2019) Y-(Internet users of year 2019)

Uttar Pradesh 23.7883 7.7030
Maharashtra 12.3144 8.0320
Bihar 12.4800 3.9340
West Bengal 9.9609 2.6830
Andhra Pradesh 5.3903 4.9290
Madhya Pradesh 8.5359 4.1400
Tamil Nadu 7.7841 4.5480
Rajasthan 8.1033 3.5970
Karnataka 6.7563 4.0390
Gujarat 6.3872 4.0180
Odisha 4.6356 1.5810
Kerala 3.5521 2.6540
Jharkhand 3.8401 1.7410
Assam 3.5429 1.4240
Punjab 2.9991 2.6130
Total X = 120.0706 Y = 57.6360
P P

Table 8.1: Number of Internet Users(Y ) and Population(X) of year 2019 (in Crore)

For our purpose, we shall use the Karl Pearson Coefficient of Correlation(r), given by

(X − X)(Y − Y )
P
r = qP q
(X − X)2 (Y − Y )2

where X and Y are the variables being examined for correlation. One thing to note about

15
this coefficient is that it assumes a linear relationship between variables. We shall calculate
the same, albeit using the following formula:

n· dxdy − dx ·
P P P
dy
r=q q
n· dx2 − ( dx)2 · n· dy 2 − ( dy)2
P P P P

Now, µx = 8.0047 is already calculated in Section 7. We calculate the same for Y . Mean

µy = (7.703 + 8.032 + 3.934 + 2.683 + 4.929 + 4.140 + 4.548 + 3.597+

4.039 + 4.018 + 1.581 + 2.654 + 1.741 + 1.424 + 2.613)/15
57.6360
=
15
∴ µy = 3.8424

X Y dX dY dX · dY dX 2 dY 2
23.7882725000 7.703 15.78 3.86 60.934047 249.12105 14.904232
12.3144223000 8.032 4.31 4.19 18.055998 18.57368 17.552748
12.4799926000 3.934 4.48 0.09 0.409936 20.02821 0.008391
9.9609303000 2.683 1.96 -1.16 -2.268050 3.82682 1.344208
5.3903393000 4.929 -2.61 1.09 -2.840768 6.83490 1.180700
8.5358965000 4.140 0.53 0.30 0.158083 0.28217 0.088566
7.7841267000 4.548 -0.22 0.71 -0.155639 0.04865 0.497871
8.1032689000 3.597 0.10 -0.25 -0.024188 0.00972 0.060221
6.7562686000 4.039 -1.25 0.20 -0.245442 1.55859 0.038652
6.3872399000 4.018 -1.62 0.18 -0.284027 2.61619 0.030835
4.6356334000 1.581 -3.37 -2.26 7.618815 11.35063 5.113930
3.5520945785 2.654 -0.29 2.65 -0.770471 0.08428 7.043716
3.8400978260 1.741 3.84 1.74 6.685610 14.74635 3.031081
3.5429003805 1.424 3.54 1.42 5.045090 12.55214 2.027776
2.9990666135 2.613 3.00 2.61 7.836561 8.99440 6.827769
X = 120.0706 Y = 57.6360 dX = 0.0000 dY = 0.0000 dXdY = 112.3461 dX 2 = 396.3844 dY 2 = 54.0086
P P P P P P P

Table 8.2: Table for calculating Correlation Coefficient

N· dXdY − dX ·
P P P
dY
r=q q
N· dX 2 − ( dX)2 · N· dY 2 − ( dY )2
P P P P

15 × 112.3461 − (0)(0)
=q q
15 × 396.3844 − (0)2 · 15 × 54.0086 − (0)2
1685.191547
=
2194.729077
∴ r = 0.7678

Thus, the two datasets are highly correlated (|r| > 0.75). However, we are also interested in a
visualization of these data sets, so that we can better comment on the relations at work and
see any inferences can be drawn.

16
Figure 8.1: Regression Lines

Figure 8.2: Regression line with Population year 2019

17
Figure 8.3: Regression line with Population year 2019

Figure 8.4: Internet Users during different years

18
Figure 8.5: Comparison of Internet user and Population year 2019

19
Chapter 9

Conclusion and Future Scope

Introduction To Statistics: Haramaya University College of Computing and Informatics Department of Statistics
100% (1)
Introduction To Statistics: Haramaya University College of Computing and Informatics Department of Statistics
113 pages
Facts & Figures For Guesstimates
100% (3)
Facts & Figures For Guesstimates
6 pages
Time Management On Distance Learning EDU 580 Chapter 1
No ratings yet
Time Management On Distance Learning EDU 580 Chapter 1
24 pages
LAW OF CONTRACT IN TANZANIA (Part 2) by MWAKISIKI MWAKISIKI EDWARDS
68% (19)
LAW OF CONTRACT IN TANZANIA (Part 2) by MWAKISIKI MWAKISIKI EDWARDS
69 pages
NCERT Class-XII, India People and Economy Notes (E)
No ratings yet
NCERT Class-XII, India People and Economy Notes (E)
59 pages
ST104a Vle
100% (1)
ST104a Vle
203 pages
Statistics Notes Self Made
100% (1)
Statistics Notes Self Made
41 pages
Statistics XI
100% (1)
Statistics XI
332 pages
Soal Dan Jawaban Latihan BIG Kelas 10 Ipa
No ratings yet
Soal Dan Jawaban Latihan BIG Kelas 10 Ipa
5 pages
My Library List
No ratings yet
My Library List
12 pages
Probability and Statistics Dr. Ishapathik Das, IIT Tirupati
No ratings yet
Probability and Statistics Dr. Ishapathik Das, IIT Tirupati
37 pages
Cbse Class 9 Social Science Geography Chapter 6 Notes
No ratings yet
Cbse Class 9 Social Science Geography Chapter 6 Notes
7 pages
Development of A Conducted EMI Model For A Industrial Power
No ratings yet
Development of A Conducted EMI Model For A Industrial Power
9 pages
Mock Memo - Manisha Aswal
No ratings yet
Mock Memo - Manisha Aswal
9 pages
F.Y. Maths PPT On Probability and Statistics
No ratings yet
F.Y. Maths PPT On Probability and Statistics
10 pages
Synopsis - On Line Reminder
100% (1)
Synopsis - On Line Reminder
10 pages
Intro Stat
No ratings yet
Intro Stat
112 pages
Applied Statistical Methods (ASM) : "The True Logic of This World Is in The Calculus of Probabilities"
No ratings yet
Applied Statistical Methods (ASM) : "The True Logic of This World Is in The Calculus of Probabilities"
90 pages
Chap 4 Part1 Intro Measures of Central Tendency of Ungrouped Data 1
No ratings yet
Chap 4 Part1 Intro Measures of Central Tendency of Ungrouped Data 1
74 pages
Lesson 02 Probability and Statistics
No ratings yet
Lesson 02 Probability and Statistics
127 pages
Affidavit of Identity - Bangladesh
100% (1)
Affidavit of Identity - Bangladesh
1 page
Further Mathematics: Trial Exam 2012
No ratings yet
Further Mathematics: Trial Exam 2012
45 pages
Dr. Anwar Al Assaf
No ratings yet
Dr. Anwar Al Assaf
24 pages
Pages From 0580 - Practice - Questions - (For - Examination - From - 2020)
No ratings yet
Pages From 0580 - Practice - Questions - (For - Examination - From - 2020)
26 pages
5th Semester 1
No ratings yet
5th Semester 1
60 pages
Consolidated ProfGarg
No ratings yet
Consolidated ProfGarg
283 pages
Univariate Statistics w24 Update
No ratings yet
Univariate Statistics w24 Update
144 pages
It B.tech II Year II Sem DV (R18a0555)
No ratings yet
It B.tech II Year II Sem DV (R18a0555)
73 pages
Descriptive Analytics
No ratings yet
Descriptive Analytics
42 pages
Nghe Chép Chính T - Advanced
No ratings yet
Nghe Chép Chính T - Advanced
72 pages
André Bakker Modeling Flow Fields in Stirred Tanks
No ratings yet
André Bakker Modeling Flow Fields in Stirred Tanks
40 pages
Chapter 3 - SV
No ratings yet
Chapter 3 - SV
83 pages
CMC Mathazine: VOLUME 1 - October 2020 ISSUE
No ratings yet
CMC Mathazine: VOLUME 1 - October 2020 ISSUE
52 pages
CH 1 Central Tendency Class
No ratings yet
CH 1 Central Tendency Class
34 pages
Shakespeare Stations Student Handouts
No ratings yet
Shakespeare Stations Student Handouts
24 pages
Stat & Probability
No ratings yet
Stat & Probability
48 pages
T L 9544 Alphabet Letter Sound Powerpoint Ver 2
No ratings yet
T L 9544 Alphabet Letter Sound Powerpoint Ver 2
34 pages
STK110 Stats Notes - Quarter 2, 2025
No ratings yet
STK110 Stats Notes - Quarter 2, 2025
88 pages
Measures of Central Tendency and Dispersion: Chapter Three
No ratings yet
Measures of Central Tendency and Dispersion: Chapter Three
47 pages
Basics of Stats
No ratings yet
Basics of Stats
49 pages
Class 12 Eco Geo
No ratings yet
Class 12 Eco Geo
126 pages
Singer Sewing Machine Model 20 Manual
No ratings yet
Singer Sewing Machine Model 20 Manual
36 pages
Human Geography
No ratings yet
Human Geography
23 pages
Element of Stat - Docx 11111
No ratings yet
Element of Stat - Docx 11111
12 pages
Business Analytics
No ratings yet
Business Analytics
44 pages
A Christian Lifestyle in The Last Days
No ratings yet
A Christian Lifestyle in The Last Days
16 pages
Project 1: Descriptive Analysis of Demographic Data: TU Dortmund
No ratings yet
Project 1: Descriptive Analysis of Demographic Data: TU Dortmund
20 pages
RG Unit 3
No ratings yet
RG Unit 3
20 pages
Biostatistics BY SALAMA
No ratings yet
Biostatistics BY SALAMA
27 pages
Module 1 - 3 - Statistics
No ratings yet
Module 1 - 3 - Statistics
44 pages
Disserattion Anirudh Final
No ratings yet
Disserattion Anirudh Final
30 pages
Fybsc Stats Syllabus
No ratings yet
Fybsc Stats Syllabus
21 pages
2466939-EDA and STATISTICS NOTES
No ratings yet
2466939-EDA and STATISTICS NOTES
15 pages
Statistics For Data Analysis
No ratings yet
Statistics For Data Analysis
13 pages
PBL Report Aidm
No ratings yet
PBL Report Aidm
20 pages
Arham Report
No ratings yet
Arham Report
14 pages
Statistics
No ratings yet
Statistics
11 pages
Population: Distribution, Density, Growth and Composition
No ratings yet
Population: Distribution, Density, Growth and Composition
14 pages
Sma11
No ratings yet
Sma11
27 pages
Statiscics Part 1
No ratings yet
Statiscics Part 1
28 pages
TSeminar
No ratings yet
TSeminar
20 pages
Rubric For Creative Writing - Vision and Mission Statements
No ratings yet
Rubric For Creative Writing - Vision and Mission Statements
1 page
ANUJCA3PEL
No ratings yet
ANUJCA3PEL
13 pages
Exp 3
No ratings yet
Exp 3
16 pages
Math Pro
No ratings yet
Math Pro
15 pages
Respimirror
No ratings yet
Respimirror
10 pages
Prelim Coverage
No ratings yet
Prelim Coverage
6 pages
Recognition and Enforcement of Foreign Judgments
No ratings yet
Recognition and Enforcement of Foreign Judgments
26 pages
Introduction To Population Geography
No ratings yet
Introduction To Population Geography
13 pages
Screening of Peniophora Nuda (A White Rot Fungus) For The Presence of Commercially Important Bioactive Metabolites
No ratings yet
Screening of Peniophora Nuda (A White Rot Fungus) For The Presence of Commercially Important Bioactive Metabolites
9 pages
Micro Project Civil 1 Year Maths
No ratings yet
Micro Project Civil 1 Year Maths
12 pages
Population
No ratings yet
Population
15 pages
SST PROJECT - Docx - 20241227 - 231246 - 0000
No ratings yet
SST PROJECT - Docx - 20241227 - 231246 - 0000
7 pages
Learning Outcomes: Declaration:: Shivangani Singh
No ratings yet
Learning Outcomes: Declaration:: Shivangani Singh
9 pages
The New HMD Kontro CSA Pump: With Best-In-Class Technology
No ratings yet
The New HMD Kontro CSA Pump: With Best-In-Class Technology
4 pages
Fundaments of Statistics
No ratings yet
Fundaments of Statistics
54 pages
Week 03 (Probability and Statistics
No ratings yet
Week 03 (Probability and Statistics
17 pages
Lipids: Fats, Oils, Waxes, Etc
No ratings yet
Lipids: Fats, Oils, Waxes, Etc
11 pages
Stats
No ratings yet
Stats
15 pages
BMS Induction 2025 Slides
No ratings yet
BMS Induction 2025 Slides
39 pages
Close-Up B1 SB 14
No ratings yet
Close-Up B1 SB 14
3 pages
Explain The Methods of Assessing Population Growth in Brief Also Giving The Latest Data of All Indian States
No ratings yet
Explain The Methods of Assessing Population Growth in Brief Also Giving The Latest Data of All Indian States
4 pages
Tabletopics Script 1
No ratings yet
Tabletopics Script 1
2 pages
Demographic Trends in India
No ratings yet
Demographic Trends in India
3 pages
Data Scientist /data Analyst - Fresher Resume
No ratings yet
Data Scientist /data Analyst - Fresher Resume
2 pages
A.P.Narmada NYP
No ratings yet
A.P.Narmada NYP
1 page
PIRS SLEEP 25 Abstract Supplement A246 A2472002
No ratings yet
PIRS SLEEP 25 Abstract Supplement A246 A2472002
1 page
Statistics and Data Analysis Essentials
From Everand
Statistics and Data Analysis Essentials
Jayant Ramaswamy
No ratings yet
Numbers
From Everand
Numbers
Henry F. De Francesco
No ratings yet
Statistical Analysis and Visualization
From Everand
Statistical Analysis and Visualization
Mohit Chatterjee
No ratings yet