0% found this document useful (0 votes)
434 views47 pages

UGC Net Statistics

Statistics can be defined as both a singular and plural noun. When used as a plural noun, statistics refers to quantitative and qualitative data that is collected for statistical analysis. When used as a singular noun, statistics refers to the scientific method of collecting, analyzing, and presenting data to make statistical inferences. There are different types of data including primary data that is originally collected and secondary data that was previously collected. Various sampling techniques can be used to select samples from a population for study including simple random sampling, stratified random sampling, and cluster sampling.

Uploaded by

Sumant Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
434 views47 pages

UGC Net Statistics

Statistics can be defined as both a singular and plural noun. When used as a plural noun, statistics refers to quantitative and qualitative data that is collected for statistical analysis. When used as a singular noun, statistics refers to the scientific method of collecting, analyzing, and presenting data to make statistical inferences. There are different types of data including primary data that is originally collected and secondary data that was previously collected. Various sampling techniques can be used to select samples from a population for study including simple random sampling, stratified random sampling, and cluster sampling.

Uploaded by

Sumant Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

❖ Statistics:

We may define statistics either in a singular sense or in a plural sense. Statistics, when used as
a plural noun, may be defined as data qualitative as well as quantitative that are collected,
usually with a view of having statistical analysis. However, statistics, when used as a singular
noun, may be defined, as the scientific method that is employed for collecting, analyzing and
presenting data, leading finally to drawing statistical inferences about some important
characteristics it means it is ‘science of counting’ or ‘science of averages’.

According to AL Bowley, “Statistics may be called the science of counting”.

According to Croxton and Cowden, “Statistics may be defined as the collecting presentation,
analysis and interpretation of numerical data.

❖ Data:

Data are the values of subjects with respect to qualitative or quantitative variables. We may
define ‘data’ as quantitative information about some particular characteristic(s) under
consideration. Although a distinction can be made between a qualitative characteristic and a
quantitative characteristic but so far as the statistical analysis of the characteristic is
concerned, we need to convert qualitative information to quantitative information by
providing a numeric descriptions to the given characteristic.

❖ Data analysis:

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the
goal of discovering useful information, informing conclusions, and supporting decision-making.
Data analysis has multiple facets and approaches, encompassing diverse techniques under a
variety of names, while being used in different business, science, and social science domains.

www.everstudy.co.in Query: [email protected]


❖ Data set:
A data set is a collection of data. It corresponds to the contents of a single database table, or a
single statistical data matrix, where every column of the table represents a particular variable,
and each row corresponds to a given member of the data set in question. The data set lists
values for each of the variables, such as height and weight of an object, for each member of
the data set. Each value is known as a datum. The data set may comprise data for one or more
members, corresponding to the number of rows.

❖ Data type:
The different data types, also called measurement scales, is a crucial prerequisite for doing
Exploratory Data Analysis (EDA), since you can use certain statistical measurements only for
specific data types. Nominal, Ordinal, Interval and Ratio are defined as the four fundamental
levels of measurement scales that are used to capture data.

www.everstudy.co.in Query: [email protected]


❖ Nominal Scale:

Nominal Scale is used for labeling variables into distinct classifications and doesn’t involve
a quantitative value or order. Nominal scale is the most fundamental research scale.
Nominal scale is often used in research surveys and questionnaires where only variable
labels hold significance. For instance, a customer survey asking “Which brand of
smartphones do you prefer?” Options: “Apple”- 1, “Samsung”-2, “OnePlus”-3.

❖ Ordinal Scale:

Ordinal Scale is defined as a variable measurement scale used to simply depict the order of
variables and not the difference between each of the variables. These scales are generally
used to depict non-mathematical ideas such as frequency, satisfaction, happiness, a degree
of pain etc. Example - How satisfied are you with our services? 1- Very Unsatisfied 2-
Unsatisfied 3- Neural 4- Satisfied 5- Very Satisfied.

❖ Interval Scale:

Interval Scale is defined as a numerical scale where the order of the variables is known as
well as the difference between these variables. Interval scale contains all the properties of
ordinal scale, in addition to which, it offers a calculation of the difference between
variables. For example - temperature scale or time scale.
www.everstudy.co.in Query: [email protected]
❖ Ratio Scale:

Ratio scales are the ultimate nirvana when it comes to measurement scales. Ratio scales
provide a wealth of possibilities when it comes to statistical analysis. These variables can be
meaningfully added, subtracted, multiplied, divided (ratios). In addition, everything above
about interval data applies to ratio scales. Good examples of ratio variables include height
and weight.

❖ Primary and Secondary Data:

Collection of data plays the very important role for any statistical analysis. The data which are
collected for the first time by an investigator or agency are known as primary data whereas
the data are known to be secondary if the data, as being already collected, are used by a
different person or agency. Example - if Mr. C collects the data on the height of every student
in his class, then these would be primary data for him. If, however, another person, say, Mr. D
uses the data, as collected by Mr. C, for finding the average height of the students belonging
to that class, then the data would be secondary for Mr. D.

Methods employed for Primary Data:

The following methods are employed for the collection of primary data:

www.everstudy.co.in Query: [email protected]


(i) Interview method – Interview may be conducted personally or telephonically.

(ii) Mailed questionnaire method - A wide area can be covered using the mailed questionnaire
method, but the amount of non-responses is also likely to be maximum in this method.

(iii) Observation method – Data on height, weight etc. can be collected by direct observation.

(iv) Questionnaires filled and sent by enumerators - Enumerators collects information directly
by interviewing the persons having information: Question are explained and hence data is
collected.

❖ Census:

Census method is that method of statistical enumeration where all members of the
population are studied. A population refers to the set of all observations under concern. For
example, if you want to carry out a survey to find out student’s feedback about the facilities of
your school, all the students of your school would form a part of the ‘population’ for your
study.

❖ Sample and Sampling:

Sample refers to a set of observations drawn from a population. Often, it is necessary to


use samples for research, because it is impractical to study the whole population. For
example, suppose we wanted to know the average height of 12-year-old American boys.
We could not measure all of the 12-year-old boys in America, but we could measure a
sample of boys.

Sampling is the process in which the fraction of the population, so selected to represent the
characteristics of the larger group. This method is used for statistical testing, where it is not
possible to consider all members or observations, as the population size is very large.

The units which constitute sample is considered as ‘Sampling Units’. The full-fledged list
containing all sampling units is called ‘Sampling Frame’.

www.everstudy.co.in Query: [email protected]


❖ Sampling techniques:

www.everstudy.co.in Query: [email protected]


❖ Simple Random Sampling:

A sampling technique where every item in the population has an even chance and
likelihood of being selected in the sample. Here the selection of items completely depends
on chance or by probability and therefore this sampling technique is also sometimes known
as a method of chances. .

❖ Cluster Sampling:

Cluster sampling is a sampling plan used when mutually homogeneous yet internally
heterogeneous groupings are evident in a statistical population. It is a method where the
researchers divide the entire population into sections or clusters that represent a
population. Clusters are identified and included in a sample on the basis of defining
demographic parameters such as age, location, sex etc.

❖ Systematic Sampling:

Using systematic sampling method, members of a sample are chosen at regular intervals of
a population. Example - In a sample of 500 people out of 5000 people, every 10th individual
may be selected.

❖ Stratified Random Sampling:

Stratified random sampling is a method of sampling that involves the division of a population
into multiple non-overlapping, homogeneous, smaller groups known as strata and randomly
choose final members from the various strata for research which reduces cost and improves
efficiency. Members in each of these groups should be distinct so that every member of all
groups get equal opportunity to be selected using simple probability. Stratified random
sampling is also called proportional random sampling or quota random sampling. For example,
a researcher looking to analyze the characteristics of people belonging to different annual
income divisions, will create strata (groups) according to annual family income.

❖ Convenience sampling:

This method is dependent on the ease of access to subjects such as surveying customers at
a mall or passers-by on a busy street. It is usually termed as convenience sampling, as it’s
carried out on the basis of how easy is it for a researcher to get in touch with the subjects.
www.everstudy.co.in Query: [email protected]
❖ Judgmental or Purposive Sampling:

In judgmental or purposive sampling, the sample is formed by the discretion of the judge
purely considering the purpose of study along with the understanding of target audience.

❖ Quota Sampling: is another non-probability sampling method wherein the population is


divided into a mutually exclusive, sub-groups from which the sample items are selected on
the basis of a given proportion. For Example, Suppose an interviewer is told to interview
250 people living in certain geographical areas. Out of which 100 males, 100 females and
50 children are to be interviewed. Within these quotas, the interviewer can select any
person on the basis of his personal judgment.

❖ Sampling Error:

A sampling error is a statistical error that occurs when an analyst does not select a sample that
represents the entire population of data and the results found in the sample do not represent
the results that would be obtained from the entire population. Sampling error can be
eliminated when the sample size is increased and also by ensuring that the sample adequately
represents the entire population.

www.everstudy.co.in Query: [email protected]


❖ Non-Sampling Errors:

Non-Sampling Errors are those which creep in due to human factors which always varies from
one investigator to another. In other words, a non-sampling error is an error that results
during data collection, causing the data to differ from the true values. Non-sampling errors
may be present in both samples and censuses in which an entire population is surveyed and
may be random or systematic. While increasing sample size will help minimize sampling error,
it will not have any effect on reducing non-sampling error. Unfortunately, non-sampling errors
are often difficult to detect, and it is virtually impossible to eliminate them entirely.

www.everstudy.co.in Query: [email protected]


❖ Central Limit Theorem – CLT:
The central limit theorem states that when samples from a data set with a known variance are
aggregated their mean roughly equals the population mean. CLT is a statistical theory that
states that given a sufficiently large sample size from a population with a finite level of
variance, the mean of all samples from the same population will be approximately equal to the
mean of the population.

❖ Data Presentation:

Data can be presented in textual form, tabular or diagrammatic form. Most popular form of
data presentation is diagrammatic form. As can be used for both the educated section and
uneducated section of the society. Furthermore, any hidden trend present in the given data
can be noticed only in this mode of representation.

❖ Line Diagram: A line graph, also known as a line chart, is a type of chart used to visualize
the value of something over time. For example, a finance department may plot the change
in the amount of cash the company has on hand over time. The line graph consists of a
horizontal x-axis and a vertical y-axis.

www.everstudy.co.in Query: [email protected]


❖ Bar Diagram:

A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars
with heights or lengths proportional to the values that they represent. A bar graph may run
horizontally or vertically.

❖ Pie Chart: A pie chart (or a circle chart) is a circular statistical graphic, which is divided into
slices to illustrate numerical proportion. It shows proportions and percentages between
categories, by dividing a circle into proportional segments. In a pie chart, the arc length of
each slice (and consequently its central angle and area), is proportional to the quantity it
represents.

www.everstudy.co.in Query: [email protected]


❖ Histogram: This is a very convenient way to represent a frequency distribution.

❖ Frequency Polygon:

Another type of graph that can be drawn to represent the same set of data as a histogram
represents is a frequency polygon. A frequency polygon is a graph constructed by using lines to
join the midpoints of each interval.

www.everstudy.co.in Query: [email protected]


❖ Ogives or Cumulative Frequency Graph:

By plotting cumulative frequency against the respective class boundary, we get ogives. As such
there are two ogives – less than type ogives, obtained by taking less than cumulative
frequency on the vertical axis and more than type ogives by plotting more than type
cumulative frequency on the vertical axis and thereafter joining the plotted points.

❖ Measures of central tendency:

www.everstudy.co.in Query: [email protected]


Measures of central tendency are measures of the location of the middle or the center of a
distribution. The most frequently used measures of central tendency are the mean, median
and mode.
The mean is obtained by summing the values of all the observations and dividing by the
number of observations.
The median (also referred to as the 50th percentile) is the middle value in a sample of
ordered values. Half the values are above the median and half are below the median.
The mode is a value occurring most frequently. It is rarely of any practical use for numerical
data.
A comparison of the mean, median and mode can reveal information about skewness, as
illustrated in figure below. The mean, median and mode are similar when the distribution is
symmetrical. When the distribution is skewed the median is more appropriate as a measure
of central tendency.

❖ Dispersion:

The dispersion is the tendency of data to be scattered over a range. Dispersion is the
important feature of a frequency distribution. It is also called spread or variation. Range,
variance and standard deviation are all measures of dispersion.

www.everstudy.co.in Query: [email protected]


❖ Standard deviation:
Standard deviation is a measure that is used to quantify the amount of variation
or dispersion of a set of data values. A low standard deviation indicates that the data points
tend to be close to the mean (also called the expected value) of the set, while a high standard
deviation indicates that the data points are spread out over a wider range of values.

❖ Standard Error:
Standard error is the approximate standard deviation of a statistical sample population. In
statistics, a sample mean deviates from the actual mean of a population; this deviation is the
standard error.

www.everstudy.co.in Query: [email protected]


❖ Skewness and Kurtosis:

Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or


data set, is symmetric if it looks the same to the left and right of the center point.

Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal
distribution. That is, data sets with high kurtosis tend to have heavy tails, or outliers. Data sets
with low kurtosis tend to have light tails, or lack of outliers. A uniform distribution would be
the extreme case.

❖ Theoretical Probability Distribution:


www.everstudy.co.in Query: [email protected]
Theoretical probability is the probability of an event when all outcomes are equally likely. A
theoretical probability distribution describes the relation between size and probability. A
probability distribution may be either a discrete probability distribution or a Continuous
probability distribution depending on the random variable under study.

Two important discrete probability distributions are (a) Binomial Distribution and (b)
Poisson distribution.

Some important continuous probability distributions are:


(a) Normal Distribution
(b) Chi-square Distribution
(c) Student-Distribution
(d) F-Distribution

❖ Binomial Distribution:

It is derived from a particular type of random experiment known as Bernoulli process named
after the famous mathematician Bernoulli. A binomial distribution can be thought of as simply
the probability of a SUCCESS or FAILURE outcome in an experiment or survey that is repeated
multiple times. The binomial is a type of distribution that has two possible outcomes (the
prefix “bi” means two, or twice). For example, a coin toss has only two possible outcomes:
heads or tails and taking a test could have two possible outcomes: pass or fail.

Important characteristics of Binomial Distribution:

(i) Each trial is associated with two mutually exclusive and exhaustive outcomes, the
occurrence of one of which is known as a 'success' and as such its nonoccurrence as a 'failure'.
(ii) The trials are independent.
(iii) The probability of a success, usually denoted by p, and hence that of a failure, usually
denoted by q = 1–p, remain unchanged throughout the process.
(iv) Binomial distribution is known as biparametric distribution as it is characterised by two
parameters n and p. This means that if the values of n and p are known, then the distribution
is known completely.
(v) Binomial distribution has mean = np and variance = np (1-p)

Formula for Binomial Distribution:

www.everstudy.co.in Query: [email protected]


❖ Poisson Distribution:

Poisson distribution is the discrete probability distribution of the number of events occurring
in a given time period, given the average number of times the event occurs over that time
period. Poisson distribution is applied in situations where there are a large number of
independent Bernoulli trials with a very small probability of success in any trial say p. Thus very
commonly encountered situations of Poisson distribution are: The number of aircraft/road
accidents in any time interval.

The probability distribution of a Poisson random variable X representing the number of


successes occurring in a given time interval or a specified region of space is given by the
formula:

Note: Mean, or average in n tries will be equal to np. If μ is the average number of successes
occurring in a given time interval or region in the Poisson distribution, then the mean and the
variance of the Poisson distribution are both equal to μ.

❖ Normal Distribution:

www.everstudy.co.in Query: [email protected]


The normal distribution, also known as the Gaussian distribution, is a probability distribution It
is defined as a continuous frequency distribution of infinite range. The normal distribution is a
descriptive model that describes real world situations. Graphically, it is presented as:

This random variable X is said to be normally distributed with mean μ and standard
deviation σ if its probability distribution is given by:

Example of normal Distribution: body temperature for healthy humans, heights and weights
of adults, thickness and dimension of a product, IQ and standardized test score, quality control
test results, errors in measurements etc.

Properties of Normal Distribution:

1) The normal curve is bell shaped in appearance.

2) There is one maximum point of normal curve which occur at mean.

3) As it has only one maximum curve so it is unimodal.


www.everstudy.co.in Query: [email protected]
4) In binomial and possion distribution the variable is discrete while in this it is continuous.

5) Here mean= median =mode.

6) The total area of normal curve is 1.

7) The area to the left and the area to the right of the curve is 0.5.

Percentage of Area under the normal curve:

This means that 68.27% of the scores lie within 1 standard deviation of the means. Also
95.45% of the scores lie within 2 standard deviations of the mean. Finally 99.73% of the score
lie within 3 standard deviations of the mean.

❖ Hypothesis :

Hypothesis testing is an objective method of making decisions or inferences from sample data
(evidence). Sample data is used to choose between two choices i.e. hypotheses or statements
about a population. A statistical hypothesis is an assumption about a population parameter.
This assumption may or may not be true. It refers to the formal procedures used by
statisticians to accept or reject statistical hypotheses. Typically this is carried out by comparing
what we have observed to what we expected if one of the statements (Null Hypothesis) was
true.

www.everstudy.co.in Query: [email protected]


❖ Null Hypothesis:

A null hypothesis is a statistical hypothesis in which there is no significant difference exist


between the set of variables. It is the original or default statement, with no effect, often
represented by H0 (H-zero). It is always the hypothesis that is tested. It denotes the certain
value of population parameter such as µ, s, p. A null hypothesis can be rejected, but it cannot
be accepted just on the basis of a single test.

❖ Alternative Hypothesis:

A statistical hypothesis used in hypothesis testing, which states that there is a significant
difference between the set of variables. It is often referred to as the hypothesis other than the
null hypothesis, often denoted by H1 (H-one). It is what the researcher seeks to prove in an
indirect way, by using the test. It refers to a certain value of sample statistic, e.g., x¯, s, p

The acceptance of alternative hypothesis depends on the rejection of the null hypothesis i.e.
until and unless null hypothesis is rejected, an alternative hypothesis cannot be accepted.

❖ Differences between Null and Alternative Hypothesis:

1. A null hypothesis is a statement, in which there is no relationship between two


variables. An alternative hypothesis is a statement; that is simply the inverse of the null
hypothesis, i.e. there is some statistical significance between two measured
phenomenon.
2. A null hypothesis is what, the researcher tries to disprove whereas an alternative
hypothesis is what the researcher wants to prove.
3. A null hypothesis represents, no observed effect whereas an alternative hypothesis
reflects, some observed effect.

www.everstudy.co.in Query: [email protected]


4. If the null hypothesis is accepted, no changes will be made in the opinions or actions.
Conversely, if the alternative hypothesis is accepted, it will result in the changes in the
opinions or actions.
5. As null hypothesis refers to population parameter, the testing is indirect and implicit. On
the other hand, the alternative hypothesis indicates sample statistic, wherein, the
testing is direct and explicit.
6. A null hypothesis is labelled as H0 (H-zero) while an alternative hypothesis is
represented by H1 (H-one).
7. The mathematical formulation of a null hypothesis is an equal sign but for an alternative
hypothesis is not equal to sign.
8. In null hypothesis, the observations are the outcome of chance whereas, in the case of
the alternative hypothesis, the observations are an outcome of real effect.

❖ Type I error.

A Type I error occurs when the researcher rejects a null hypothesis when it is true. The
probability of committing a Type I error is called the significance level. This probability is also
called alpha, and is often denoted by α.

❖ Type II error.

A Type II error occurs when the researcher fails to reject a null hypothesis that is false. The
probability of committing a Type II error is called Beta, and is often denoted by β. The
probability of not committing a Type II error is called the Power of the test.

www.everstudy.co.in Query: [email protected]


Decision Rules:
The analysis plan includes decision rules for rejecting the null hypothesis. In practice,
statisticians describe these decision rules in two ways - with reference to a P-value or with
reference to a region of acceptance.

❖ P-value:

P values evaluate how well the sample data support the devil’s advocate argument that the
null hypothesis is true. It measures how compatible your data are with the null hypothesis.
How likely is the effect observed in your sample data if the null hypothesis is true?

• High P values: your data are likely with a true null.


• Low P values: your data are unlikely with a true null.

A low P value suggests that your sample provides enough evidence that you can reject the null
hypothesis for the entire population. P value is the probability of obtaining an effect at least as
extreme as the one in your sample data, assuming the truth of the null hypothesis. For
example, suppose that a vaccine study produced a P value of 0.04. This P value indicates that if
the vaccine had no effect, you’d obtain the observed difference or more in 4% of studies due
to random sampling error

www.everstudy.co.in Query: [email protected]


❖ Region of acceptance:
The region of acceptance is a range of values. If the test statistic falls within the region of
acceptance, the null hypothesis is not rejected. The region of acceptance is defined so that the
chance of making a Type I error is equal to the significance level.

❖ Region Of Rejection:

The set of values outside the region of acceptance is called the region of rejection. If the test
statistic falls within the region of rejection, the null hypothesis is rejected. In such cases, we
say that the hypothesis has been rejected at the α level of significance.

❖ One-Tailed and Two-Tailed Tests:

A test of a statistical hypothesis, where the region of rejection is on only one side of
the sampling distribution, is called a one-tailed test. For example, suppose the null hypothesis
states that the mean is less than or equal to 10. The alternative hypothesis would be that the
mean is greater than 10. The region of rejection would consist of a range of numbers located
on the right side of sampling distribution; that is, a set of numbers greater than 10.

A test of a statistical hypothesis, where the region of rejection is on both sides of the sampling
distribution, is called a two-tailed test. . For example, suppose the null hypothesis states that
the mean is equal to 10. The alternative hypothesis would be that the mean is less than 10 or
greater than 10. The region of rejection would consist of a range of numbers located on both
sides of sampling distribution; that is, the region of rejection would consist partly of numbers
that were less than 10 and partly of numbers that were greater than 10

www.everstudy.co.in Query: [email protected]


❖ Right-Tailed:

The critical value for conducting the right-tailed test H0 : μ = 3 versus HA : μ > 3 is the t-value,
denoted tαα, n - 1, such that the probability to the right of it is αα.

❖ Left-Tailed:

The critical value for conducting the left-tailed test H0 : μ = 3 versus HA : μ < 3 is the t-value,
denoted -t(αα, n- 1) , such that the probability to the left of it is αα

www.everstudy.co.in Query: [email protected]


❖ Critical value

In hypothesis testing, a critical value is a point on the test distribution that is compared to the
test statistic to determine whether to reject the null hypothesis. If the absolute value of your
test statistic is greater than the critical value, you can declare statistical significance and reject
the null hypothesis. Critical values correspond to α, so their values become fixed when you
choose the test's α.

Level of α= 0.10 α= 0.05 α= 0.01 α= 0.005


significance(α)
Critical values -1.28 or -1.645 or -2.33or -2.58or
of Z (for one
tailed test) +1.28 +1.645 +2.33 +2.58
Critical values -1.645 or -1.96 or -2.58 or -2.81 or
of z(for two
tailed test) +1.645 +1.96 +2.58 +2.81

❖ Level of significance:

This refers to the degree of significance with which we accept or reject a particular hypothesis.
The significance level, also denoted as alpha or α, is the probability of rejecting the null
hypothesis when it is true. For example, a significance level of 0.05 indicates a 5% risk of
concluding that a difference exists when there is no actual difference.

www.everstudy.co.in Query: [email protected]


❖ Degree of freedom:
Degree of freedom is the number of values in the final calculation of a statistic that are free to
vary. The number of independent ways by which a dynamic system can move, without
violating any constraint imposed on it.

❖ Causation:
Causation indicates that one event is the result of the occurrence of the other event; i.e. there
is a causal relationship between the two events. This is also referred to as cause and effect.

www.everstudy.co.in Query: [email protected]


❖ Correlation:
Correlation is a statistical technique that can show whether and how strongly pairs of variables
are related. For example, height and weight are related; taller people tend to be heavier than
shorter people. It is a statistical method to determine the relationship between two or more
variables. Its value can lie between +1 and -1. The direction of the relation is determined by
sign.

❖ Correlation Coefficient:

The main result of a correlation is called the correlation coefficient (or "r"). It ranges from -1.0
to +1.0. The closer r is to +1 or -1, the more closely the two variables are related. If r is close to
0, it means there is no relationship between the variables. If r is positive, it means that as one
variable gets larger the other gets larger. If r is negative it means that as one gets larger, the
other gets smaller (often called an "inverse" correlation).

www.everstudy.co.in Query: [email protected]


Methods of determining Correlation:

Following are most popular methods of determining correlation:

(a) Scatter diagram


(b) Karl Pearson’s Product moment correlation coefficient
(c) Spearman’s rank correlation co-efficient
(d) Co-efficient of concurrent deviations

❖ Scatter Diagram:

The scatter diagram is known by many names, such as scatter plot, scatter graph, and
correlation chart. This diagram is drawn with two variables, usually the first variable is
independent and the second variable is dependent on the first variable.

www.everstudy.co.in Query: [email protected]


❖ Karl Pearson’s coefficient of correlation :
The Pearson correlation coefficient is a very helpful statistical formula that measures the
strength between variables and relationships. In the field of statistics, this formula is often
referred to as the Pearson R test.

www.everstudy.co.in Query: [email protected]


There are different variations of formula which can also be used to find it. Example:

cov(X,Y): the covariance between X and Y


σX and σY are the standard deviations of the distributions X and Y.

Important Properties of Coefficient of Correlation:


• The Coefficient of Correlation is a unit-free measure.
• The coefficient of correlation remains invariant under a change of origin and/or scale
of the variables under consideration depending on the sign of scale factors.
• The coefficient of correlation always lies between –1 and 1, including both the limiting
values.

❖ Spearman’s Rank Correlation Coefficient:

When we need finding correlation between two qualitative characteristics, say, beauty and
intelligence, we take recourse to using rank correlation coefficient. In order to find out
correlation, the characteristics are first assigned ranking. The Spearman correlation
coefficient, rs, can take values from +1 to -1. In formula terms, this is given by:

www.everstudy.co.in Query: [email protected]


Coefficient of Concurrent Deviations: The method of studying correlation is the simplest of all
the methods. The only thing that is required under this method is to find out the direction of
change of X variable and Y variable. The formula applicable is:

Note: If (2c–m) >0, then we take the positive sign both inside and outside the radical sign and
if (2c–m) <0, we are to consider the negative sign both inside and outside the radical sign.

❖ Regression analysis:

www.everstudy.co.in Query: [email protected]


In regression analysis, we are concerned with the estimation of one variable for a given value
of another variable (or for a given set of values of a number of variables) on the basis of an
average mathematical relationship between the two variables (or a number of variables).

When there are two variables x and y and if y is influenced by x i.e. if y depends on x, then we
get a simple linear regression or simple regression. y is known as dependent variable or
regression or explained variable and x is known as independent variable or predictor or
explanator. In case of a simple regression model if y depends on x, then the regression line of y
on x in given by:

y = a + bx, Here a and b are two constants and they are also known as regression parameters.
Furthermore, b is also known as the regression coefficient of y on x and is also denoted by byx.

Regression coefficient :

Regression coefficient is a statistical measure of the average functional relationship between


two or more variables. It is denoted by b. Between two variables x and y, two values of
regression coefficient can be obtained. One will be obtained when we consider x as
independent and y as dependent and the other when we consider y as independent and x as
dependent. The regression coefficient of y on x is represented as byx and that of x on y as bxy.
In formula terms, they can be expressed as under:

The Regression Line is the line that best fits the data, such that the overall distance from the
line to the points (variable values) plotted on a graph is the smallest.

www.everstudy.co.in Query: [email protected]


There are as many numbers of regression lines as variables. If we take two variables, say X and
Y, then there will be two regression lines:
▪ Regression line of Y on X: This gives the most probable values of Y from the given values of
X. The regression line of y on x as the line of best fit obtained by the method of least
squares and used for estimating the value of the dependent variable y for a known value of
the independent variable x.
▪ Regression line of X on Y: This gives the most probable values of X from the given values of
Y.

❖ Properties of regression lines:

(i) The regression coefficients remain unchanged due to a shift of origin but change due
to a shift of scale.
(ii) The two lines of regression intersect at the point (mean of "x", mean of "y"), where x
and y are the variables under consideration.
(iii) The coefficient of correlation between two variables x and y in the simple geometric
mean of the two regression coefficients. The sign of the correlation coefficient would
be the common sign of the two regression coefficients.
(iv) The two lines of regression coincide i.e. become identical when r = –1 or 1.
(v) The two lines of regression are perpendicular to each other when r = 0.

www.everstudy.co.in Query: [email protected]


❖ Regression vs correlation:

❖ Parametric test and non -parametric test:

Nonparametric statistics refer to a statistical method in which the data is not required to fit a
normal distribution. Nonparametric statistics uses data that is often ordinal, meaning it does
not rely on numbers, but rather a ranking or order of sorts. For example, a survey conveying
consumer preferences ranging from like to dislike would be considered ordinal data.

The parametric test is one which has information about the population parameter. The
parametric test is the hypothesis test which provides generalisations for making statements
about the mean of the parent population. A t-test based on Student’s t-statistic, which is often
used in this regard
www.everstudy.co.in Query: [email protected]
❖ t-test:
T-test is a small sample test. It was developed by William gusset in 1908. He published this test
under the pen name of “student” It is known as student t-test. For applying t-test ,the value of
t-statistic is computed. The following formula is used:
t= deviation from the population parameter / standard error of the sample statsic.
A t-test is a type of inferential statistic which is used to determine if there is a significant
difference between the means of two groups which may be related in certain features. It is
mostly used when the data sets, like the set of data recorded as outcome from flipping a coin
a 100 times, would follow a normal distribution and may have unknown variances. T test is
used as a hypothesis testing tool, which allows testing of an assumption applicable to a
population.

❖ f-test (variance ratio test):


F-test is named after greater stastician R.A.Fisher .f-test is used to test whether the two
independent estimates of population variance differ significantly or whether two samples may
be regarded as drawn from the normal population having the same variance.
f-statistic is defined as: f=larger estimate of population variance / smaller estimate of
population variance

❖ Chi-square test:

The test is applied when you have two categorical variables from a single population. It is used
to determine whether there is a significant association between the two variables. For
example, in an election survey, voters might be classified by gender (male or female) and
voting preference (Democrat, Republican, or Independent). We could use a chi-square test for
independence to determine whether gender is related to voting preference.
Chi-squared test is used to determine whether there is a significant difference between the
expected frequencies and the observed frequencies in one or more categories. The value of
chi-square is compared with pre-determined level of significance and degrees of Freedom.
When the computed chi-square statistic exceeds the critical value in the table for a given
significance level, then we can reject the null hypothesis.

www.everstudy.co.in Query: [email protected]


Important Properties of Chi-square Test:

• The mean of the distribution is equal to the number of degrees of freedom.


• The variance is equal to two times the number of degrees of freedom.
• As the degrees of freedom increase, the chi-square curve approaches a normal
distribution.

❖ Analysis of variance (ANOVA):

Analysis of variance (ANOVA) is a collection of statistical models and their associated


estimation procedures (such as the "variation" among and between groups) used to
analyze the differences among group means in a sample. ANOVA was developed
by statistician and evolutionary biologist Ronald Fisher. In the ANOVA setting, the
observed variance in a particular variable is partitioned into components attributable to
different sources of variation.

www.everstudy.co.in Query: [email protected]


❖ Data processing:

Data processing refers to the operation performed on data in order to derive new information
according to a given set of rules. Data processing may involve various processes, including data
validation, summarization, aggregation, analysis and reporting.

www.everstudy.co.in Query: [email protected]


❖ Elements of data processing:

The three basic structural elements of data processing system are -files, flows, and processes.
Files are collections of permanent records in the system, flows are data interfaces between
the system and the environment, and processes are functionally defined logical manipulations
of the data. An investigation of the cost of developing software as related to files, flows, and
processes was conducted.

❖ Database management system:


A database management system (DBMS) is a systematically organized repository of indexed
information that allows for easy retrieval, updating, analysis, and output of data. This type of
processing includes relational databases, NoSQL databases, etc. A DBMS is similar to those
organized bins of treats in a candy store – each bin is a repository of a certain kind of candy.
Your choices are limited to what’s in the available containers, and you can randomly pick
candy from each bin.

❖ Data entry:

Direct input of data in the appropriate data fields of a database, through the use of a human
data-input device such as a keyboard, mouse, stylus, or touch screen, or through speech
recognition software.

www.everstudy.co.in Query: [email protected]


❖ Data storage:

Data storage, often called storage or memory, is a technology consisting


of computer components and recording media that are used to retain digital data. It is a core
function and fundamental component of computers

❖ Computer:

'Computer' is basically derived form the word 'computer', which means to calculate some
thing. A computer is a fast electronic device, processing the Input data according to the
Instructions given by the Programmer/User and provides the desired information as an output.

www.everstudy.co.in Query: [email protected]


❖ Characteristics of Computer:

Speed: - A computer is very fast device. It can perform large amount of work in a few seconds

• Accuracy: – The computer is 100% accurate and capable to perform arithmetical


calculation and logic operations with the same accuracy. It can never make mistakes. All
mistakes are done by users
• Diligence: – A computer can operate twenty four hours continuously without taking any
rest. It has no feelings or no emotions
• Versatility: – Versatility is one of the most wonderful features about the computer. One
moment, it is preparing the results of a particular examination, the next moment it is
busy in preparing electricity bills, and in between it may be helping an office secretary to
trace an important letter is seconds. It can do multiple works at a same time. It also
used in data processing jobs, weather fore casting, ticket reservation purpose,
multimedia designing, animations, accountancy etc.

❖ Hardware: - Hardware refers to all the physical parts and components of the computer.

www.everstudy.co.in Query: [email protected]


❖ Central Processing Unit: – The Central Processing Unit (CPU) or Microprocessor is the heart
of the computer, where all the processing of the data is carried out. The data and
instructions that are entered into the computer system are fed into the CPU before the
final results are displayed on the Output Unit. The CPU stores the data and instructions,
does all the calculations and problem solving, and also controls the functions of all other
units.

❖ The components of the CPU are as follows:

(1) Memory Unit or Storage Unit

(2) Arithmetic & Logical Unit (ALU)

(3) Control Unit

❖ INPUT DEVICES: - In a computerized system, before any processing takes place, the data
and instructions must be fed. This is achieved through the Input Devices, which provide a
communication medium between the user and the machine. The most common of Input
devices keyboard, which resembles a typewriter. The help of a keyboard, the user types
data and instruction.

www.everstudy.co.in Query: [email protected]


❖ Input devices are divided into two categories:

1) Text Input Devices: In text input devices the mainly used keyboard

2) Cursor Control Devices: Cursor control devices are mouse, joystick, scanner etc.

❖ Operating System:

Operating system is a program that acts as an interface between the users and computes
hardware and controls the execution of all kinds of programs. It is the most important
program in the computer System. It is one program that executes all the time always as the
computer is operational and it exit only when the computer is shut down. OS are the programs
that makes the computer work hence the name OS. It takes instructions in the form of
commands from user and translates into machine understandable instructions. It gets the
instructions executed by the CPU and translates the result back into user understandable
form.

www.everstudy.co.in Query: [email protected]


❖ Application of Computer:

1. Fine Arts:

• To draw something
• To make changes in photograph
• Makes scanning

2.Business/Commerce:

• Works get done faster , accurately, efficiently.


• With the use of E-mail, faster to deal with other parties

3. Banks:

• Update your bank record regularly


• Records of the customers put into the machine

4.Education:

• Self learning packages are available


• Help in school administration

5.Entertainment:

• Use to play games


• Helps in film editing, audio editing

6. Libraries:

• To do routine job in the faster way


• Keep statstical data

7.Research and Development:

• Connecting data
• Interpreting and getting output

8.Publishing:
www.everstudy.co.in Query: [email protected]
• Computers do desk top publishing
• Editing becomes very easy

❖ Application of computers in Accounting :

Accounting software is used to implement a computerized accounting system. The computer


accounting system is based on the concept of databases.

1. To Keep Accounting Record of Big Company is Possible


2. Separate Payroll Accounting is Possible
3. Automation of All Financial Accounts
4. Graphic Presentation of Accounting Results
5. Best Inventory Control

www.everstudy.co.in Query: [email protected]


❖ Computer application to Inventory control:

A Computerized Inventory Control System is the integration of sub-functions involved in the


management of inventory into a single cohesive system. It is software installed on the
computer systems that enables a firm to keep a check on the inventory levels by performing
the automatic counting of inventories, recording withdrawals and revising the stock balance.

❖ Computer application to Marketing:

Marketing professionals use computer technology to plan, manage and monitor campaigns. By
analyzing and manipulating data on computers, they can increase the precision of marketing
campaigns, personalize customer and prospect communications, and improve customer
relationship management. Computer technology also makes it easier for marketing
professionals to collaborate with colleagues, agencies and suppliers

❖ Statistical software:
www.everstudy.co.in Query: [email protected]
SPSS: The Statistical Package for the Social Sciences (SPSS) is a software package used in
statistical analysis of data. It was developed by SPSS Inc. and acquired by IBM in 2009. In 2014,
the software was officially renamed IBM SPSS Statistics. The software was originally meant for
the social sciences, but has become popular in other fields such as health sciences and
especially in marketing, market research and data mining.

SAS stands for Statistical Analysis System. It was developed at the North Carolina State
University in 1966, so is contemporary with SPSS.

Stata is a more recent statistical package with Version 1 being released in 1985. Since then, it
has become increasingly popular in the areas of epidemiology and economics, and probably
now rivals SPSS and SAS in it user base. We are now on Version 14.

S-plus is a statistical programming language developed in Seattle in 1988. R is a free version of


S-plus developed in 1996.
Since then the original team has expanded to include dozens of individuals from all over
the globe. Because it is a programming language and environment, it is used by giving the
software a series of commands, often saved in text documents called syntax files or
scripts, rather than having a menu-based system. Because of this, it is probably best used by
people already reasonably expert at statistical analysis, or who have an affinity for computers.

www.everstudy.co.in Query: [email protected]

You might also like