Final P General Class
Final P General Class
Data Encoding
Microsoft Excel
Microsoft Excel is a spreadsheet developed by Microsoft for windows, macro, Android and so
others. It feature calculation, graphing tools, pivot tables, and a macro programming language
called visual Basic for applications. It has been a very Widely applied spreadsheet for these
platforms, especially since version 5 in 1993, and it has replaced Lotus 1-2-3 as the industry
standard for spreadsheet. Excel forms part of Microsoft Office. In the year 1995 MS-Excel starts
using with Microsoft Corporation Office 95.
1. Graphing: This package plays a very important role in graphing as it has the ability to
produce a variety of different charts, which may be used by different departments to represent
statistical data in more visual way.
Cell: A Cell in Microsoft Excel is a worksheet has a default gray border, but a user can
change the border color. The cells are identified by a cross section of letters and numbers. It
is a rectangular –shaped box on a worksheet. Any combined of numbers or words can be
entered in the cell.
Excel Worksheet: An Excel Worksheet is a single spreadsheet that contains cells organized
by rows and columns. A worksheet begins with row number 1 and column A. Each cell can
contain a number, text or formula.
Sheet Tab: Each worksheet has a tab at the bottom of the workbook window with the name
of the worksheet on it.
Toolbar: Sometimes referred to as a bar, the toolbar is a row of boxes , often at the top of an
application window that controls various functions of the software. The boxes often contain
images that correspond with the function they control, as demonstrated in the image below.
Formula Bar:
A toolbar is at the top of the Microsoft Excel spreadsheet window that you can use to enter or
copy an existing formula into cells or charts. It is labeled with function symbol (fx).By
clicking the Formula bar, or when you type an (=) symbol in a cell, the Formula Bar will act .
Data Tabulation
The process of placing classified data into tabular form is known as Tabulation.
A table is a symmetric arrangement of statistical data in row and columns. Rows are
horizontal arrangement where as columns are vertical arrangement. It may be simple, double
or complex depending upon the type of classification.
Type of Tabulation:
1. Simple Tabulation or One –way Tabulation:
When the data tabulated to one characteristic, it is said to be simple tabulation.
Ex- Religion
2. Double Tabulation or Two- way Tabulation:
When the data tabulated according two characteristic, it is said to be double tabulation.
Ex- Religion and Level of Education
3. Complex Tabulation:
When the data tabulated according many characteristic, it is said to be complex
tabulation.
Ex- Religion, Age-sex and Literacy etc.
Data Manipulation:
Data manipulation is the process of changing data in an effort to make it easier to read or
be more organized. For example, a log of data could be organized in alphabetical order,
making individual entries easier to locate. Data manipulation is often used on web server
logs to allow a website owner to view their most popular pages as well as their traffic
sources.
Data Restoration:
Data Restoration is the process of salvaging inaccessible corrupted or damaged data from
secondary storage, removable media or files, when the data they store cannot be accessed
in normal way.
A diagram is a visual form for presentation of statistical data. Diagram refers to the varies
types of devices such as bars, circles, maps, pictorials, cartograms, etc. These devices can take
many attractive forms. Strictly speaking, these are not graphic devices. Diagram does not add
any new meaning to the statistical facts, but they exhibit the results more clearly. An ordinary
man can understand pictures and diagrams more easily than the figures.
There are various diagrammatic devices by which statistical data can be presented. We shall
discuss a few of them, which are mostly used. The following are the common type of diagrams:
i. One-Dimensional Diagram:
A population pyramid, also called age pyramid or age picture, is a graphical illustration that
shows the distribution of various age groups in a population (typically that of a country or region
of the world), which forms the shape of a pyramid when the population is growing.
Advantages of age and sex pyramid:
Disadvantages:
Table: Age wise %of male and female population of Bihar of 2011
Bihar
Age group Male % Female %
0-9 13.86093671 12.9495
10-14 7.062793677 6.36095
15-19 5.102013145 4.03305
20-24 4.06051056 3.71678
25-29 3.692224238 3.70604
30-34 3.385470696 3.41878
35-39 3.190250921 3.03686
40-44 2.65855031 2.32303
45-49 2.13122499 1.98694
50-54 1.743968786 1.4507
55-59 1.282939208 1.41406
60-64 1.504325868 1.39647
65-69 1.064379081 0.95092
70-74 0.756425878 0.57394
75-79 0.287352592 0.24351
80+ 0.347741643 0.30739
Age sex pyramid (Bihar)
80+
75-79
70-74
65-69
60-64
55-59
50-54 Female %
45-49 Male %
40-44
35-39
30-34
25-29
20-24
15-19
10-14
0-9
15.00 10.00 5.00 0.00 5.00 10.00 15.00
Kerala
Age group Male % Female %
0-9 7.643403106 7.36399
10-14 4.311804283 4.1468
15-19 3.980330566 3.84235
20-24 3.892012888 4.09625
25-29 3.607794957 4.19553
30-34 3.380772409 3.97729
35-39 3.481462892 4.24869
40-44 3.348430513 3.88077
45-49 3.312993169 3.72452
50-54 2.790371746 2.98743
55-59 2.581619237 2.63961
60-64 2.053052635 2.1861
65-69 1.376117249 1.62684
70-74 0.978563343 1.21903
75-79 0.624234847 0.87814
80+ 0.612341491 1.01134
Age sex pyramid (Kerala)
80+
75-79
70-74
65-69
60-64
55-59
50-54 Female %
45-49 Male %
40-44
35-39
30-34
25-29
20-24
15-19
10-14
0-9
10.00 8.00 6.00 4.00 2.00 0.00 2.00 4.00 6.00 8.00 10.00
Interpretation : From the calculated values and also from represented age sex pyramid diagram
it is revealed that in the state of Bihar ,a huge disparity is not found between percentage of male
and percentage of female. Number of male population was comparatively high than female
population among different age groups in the year 2011. In this state the working population i.e.
15-59 was in case of male is 28254194 and 26013423 in case of female and the dependent
population was 100529600 in case of male and 12024367 in case of female. On the other side in
case of Kerala district the working population was very high which was 10136879 for male and
11210329 for female and dependent population in case male was 5873236 and female was
18.43224661. So we can easily say that male female disparity is low in Kerala.
LINE GRAPH
Total population
Census year (in millions)
1901 238.4
1911 252.1
1921 251.3
1931 279
1941 318.7
1951 361.1
1961 439.2
1971 548.2
1981 683.3
1991 846.4
2001 1028.7
2011 1210.2
Source: Census of India 2011
1000
800 Total population
600 (in millions)
400
200
0
01 11 21 31 41 51 61 71 81 91 01 11
19 19 19 19 19 19 19 19 19 19 20 20
Year
Interpretation:
In this graph we can see that the total population in India has been increased gradually since
1991 to 2011.There is an increasing trend in population pattern in India.
BARGRAPH:
Simple Bar Graph:
20
15
ST Population in %
10 % of ST popula-
tion
0
i
ng ur ar ur ur ah d m an ia as gli ra iya h ta as ur ur
jr ili aig Bih ajp ajp ald aba bhu rdw Nad gan Hu nku rul wra olka gan inip inip
Da Jalp och Din Din M shid Bir Bu Pa
r u
Ba P H
o K ar ed ed
P
K ttar hin ur 2 4 24 est M ast M
U ak s M N S E
D W
Districts
Interpretation: From the above diagram it is found that % of ST population is highest
in Darjeeling district. Whereas in districts like Kochbihar, Murshidabad,
Haora ,Kolkata, Purba Medinipore the % of ST population is negligible. Rest of the
districts show a significant amount of concentration of ST population.
Female Literacy
State Male Literacy Rate Rate
UttaraKhand 88.33 70.7
Rajasthan 80.51 52.66
Uttar Pradesh 79.24 59.26
Bihar 73.39 53.33
Jharkhand 78.45 56.21
Odisha 82.4 64.36
Chhatisgarh 81.45 60.59
Maddhya Pradesh 80.53 60.02
source: census of India 2011
70
60
50
40
male literacy rate
30
20 female literacy
10 rate
0
d an h r d a rh h
an th es ha an ish ga es
h s ra
d bi h d s ad
ak aja p ar
k o
ha
ti pr
ar r ar jh h ya
utt utt c
dh
ad
m
EAG States
Interpretation: In the above diagram we can see that highest male literacy rate is found in
Uttarakhand (88.33%) and highest female literacy rate also found in Uttarakhand (70.7%).
where as lowest male literacy rate is found in Bihar(73.39%) and lowest female literacy rate is
found in Bihar(53.33%).
100000
50000
0
r1 r2 l1 l2 a1 a 2 ole ola pur alda azar hak ak 1 ak 2 ak 3
pu apu cha cha ratu atu gaz ang b c
dr a r an an r m abi ld m sh b nik iach iach iach
n d h li a l l l
ha cha
n ch ch ba o
ng m ka ka ka
c e
ris ris
ha ha Blocks
main worker marginal worker non worker
Interpretation : the above diagram depicts that highest no of workers have been observed in
gazole which is mainly concentrated by main workers. In kaliachak 3 highest amount of
marginal workers has been concentrated. In all blocks concetration of non workers are higher
than main and marginal workers. It is indicative of work participation rate is lower and
dependent population concentration is higher.
Two-dimensional diagram
PIE DIAGRAM
26%
main worker
marginal worker
Number of male
Formula: ×100
Number of female
male
female
48.568044133 51.431955866
0761 9241
Interpretation: From this diagram it can be clearly observed that males population is
dominating in Malda district over females population in sex wise population distribution.The
percentage of male population is 51.43 where as percentage of female population is 48.57
Three-Dimensional Diagram
STAR DIAGRAM
rajasthan mp
orisha
Interpretation: From the above diagram it is shown that the decadal growth rate in all the EAG
states is decreasing except Chattisgarh in 2001 to 2011. Bihar has the highest growth rate but the
growth rate is decreasing from the previous decade. The lowest decadal growth rate is found in
Orisha in last two decade.
Some measures that are commonly used to describe a data set are measure of central
tendency and measures of variability or dispersion. Measures of central tendency include
the mean, median and mode while measure of variability include the standard deviation
(or variance), the minimum and maximum values of the variables, kurtosis and skewness.
Objective:
Descriptive Measures:
Central Tendency Measures: They are computed to give a “centre” around which the
measurements in the data are distributed.
Variation or Variability measures: They describe “data spread” or how far away the
measurements are from the centre.
Relative standing Measures: They describe the relative position of specific
measurements in the data.
Mean:
The most popular and widely used measured of representing the entire data by one value is what
most Laymen call an” Average “and what statistician calls the mean.
Methodology:
Input values > =Average (select cells of value)> Enter
Advantage-
I. Mean is readily understood, and hence need no explanation, when used.
II. It is a very stable and reliable average as regards sampling functions.
III. Computation of mean is very easy
Disadvantage-
I. Mean cannot be obtained by inspection, as in the case of median and mode.
II. It is highly affected by the presence of even a few extremely large or small
observations.
III. Mean may not be the actual value of the variable.
Median:
Methodology:
Advantage-
I. It is easy to calculate than mean
II. Extreme values do not affect the median as strongly as they do the mean.
III. The value of median can be determined graphically.
I.
II. It is not capable to algebraic treatment
III. It is erratic if the no. of items is small.
Mode:
Mode or the modal value is that in value in a series of observation which occurs with the
greatest frequency.
Methodology:
Advantage-
I. The mode is the unduly affected by the extreme values.
II. It can be used to describe qualitative phenomenon.
III. It is also described by the graphically.
Disadvantage-
I. The value of mode cannot be determined in case of bimodal series.
II. Its value not based every item of the series.
III. It is not capable of algebraic manipulations.
Standard Deviation(SD) :
The standard deviation concept was first introduced by Karl Pearson in 1928. Standard
deviation of a set of observations is the square root of the arithmetic mean of sqares of
deviation from arithmetic mean.
Methodology:
Advantage-
I. It is possible to calculate the combination standard deviation of two more groups.
II. It is most prominently used to further statistical work.
III.
Disadvantages:-
1. It is difficult to compute.
2. It gives more weight to extreme items and less to those which near to mean.
1. When the mean value is close to zero, the coefficient of variation will approach infinity
and is therefore sensitive to small changes in the mean. This is often the case if the values
do not originate from a ratio scale.
2. Unlike the standard deviation, it cannot be used directly to construct confidence intervals
for the mean.
3. CVs are not an ideal index of the certainty of a measurement when the number of
replicates varies across samples because CV is invariant to the number of replicates while
certainty of the mean improves with increasing replicates In this case standard error in
percent is suggested to be superior"
1 Mean
0.5
0
1911 1921 1931 1941 1951 1961 1971 1981 1991 2001 2011
-0.5
Year
Interpretation: From the above table and diagram it is clear that the average population growth
from the year 1911 to 2011 is 1.48 an from the year 1911 to 1921 negative population growth
was found. After the year 1931 population started to increase slightly. Highest population growth
was found in the year 1981 population growth again started to decrease. At present the
population growth is 1.64
Objective:
1. Trend analysis facilitates us to compare two or more time series over different
period of time and this helps to draw conclusions about them.
2. The trend describes the basic growth tendency ignoring short term fluctuation.
3. It describes the pattern of behavior which has characterized the series in the past.
Total Fertility Rate and Contraceptive use in EAG States(Linear)
70 70
60 f(x) = 2.36933086687504 x − 114.382426484659 60
R² = 0.758911608078426
Contraceptive use
50 50
40 40
TFR
30 30
20 20 Total fertility rate
10 10 Linear (Total fertility
rate)
0 0 Contraseptive use
62 64f(x) = −660.107201632965491
68
R² = 0.821699367689939 70 x + 10.1199682124556
72 74 76 78 Linear (Contrasep-
tive use)
Literacy Rate(%)
Interpretation: This diagram shows the relation of literacy rate with total fertility rate and
contraceptive used .The relation of literacy rate with total fertility rate and contraceptive used
is highly positive as the value of R2 is 0.821 & 0.758 .
70 100
7 10
TFR
0.7 1
62 64 66 68 70 72 74 76 78
Literacy Rate(%)
Interpretation: This diagram shows the relation of literacy rate with total fertility rate and
contraceptive used .The relation of literacy rate with total fertility rate and contraceptive used
is highly positive as the value of R2 is 0.831 & 0.770 .
70 70
Contraceptive use
Total fertility rate
40 40 Polynomial (Total
TFR
fertility rate)
30 30 Contraseptive
use
20 20 Polynomial (Con-
traseptive use)
10 10
Interpretation: This diagram shows the relation of literacy rate with total fertility rate and
contraceptive used .The relation of literacy rate with total fertility rate and contraceptive used
is highly positive as the value of R2 is 0.850 & 0.803 .
30 Total fertility
30 rate
20 20 Exponential (To-
10 tal fertility rate)
10
Contraseptive
0 0 use
62 64f(x) = 41.5107783974215
66 68 70 exp(72− 0.039682727781477
74 76 78x )
R² = 0.823723920537947
Literacy Rate(%)
Interpretation: This diagram shows the relation of literacy rate with total fertility rate and
contraceptive used .The relation of literacy rate with total fertility rate and contraceptive used
is highly positive as the value of R2 is 0.823 & 0.793 .
Contraceptive use
50
40
40
TFR
30
30
20 Total fertility rate
20
10 Power (Total fertil-
10 ity rate)
0 f(x) = 315002.711551821 x^-2.75768736348001 0 Contraseptive use
62 64R² = 0.828397555044674
66 68 70 72 74 76 78
Literacy Rate(%)
Interpretation: This diagram shows the relation of literacy rate with total fertility rate and
contraceptive used .The relation of literacy rate with total fertility rate and contraceptive used
is highly positive as the value of R2 is 0.828 & 0.808 .
Conclusion: Among the above trend line the polynomial trend line is the best fit to show the
relation of literacy rate and total fertility rate & the power trend line is the best fit to show the
relation of literacy rate and contraceptive used.