0% found this document useful (0 votes)
590 views

Statistical Practice Population Statistical Inference Probability Theory Statistical Theory

Sampling involves selecting a subset of individuals from a population to make inferences about that population. There are several common sampling methods. Stratified sampling divides the population into mutually exclusive subgroups or strata and then randomly samples from each stratum. Stratified sampling can improve accuracy and allow studying specific subgroups. It works best when variability within strata is low and between strata is high, and the stratification variables are strongly related to the variable of interest.

Uploaded by

daneeshthatsme
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
590 views

Statistical Practice Population Statistical Inference Probability Theory Statistical Theory

Sampling involves selecting a subset of individuals from a population to make inferences about that population. There are several common sampling methods. Stratified sampling divides the population into mutually exclusive subgroups or strata and then randomly samples from each stratum. Stratified sampling can improve accuracy and allow studying specific subgroups. It works best when variability within strata is low and between strata is high, and the stratification variables are strongly related to the variable of interest.

Uploaded by

daneeshthatsme
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 36

Answer

Sampling is that part of statistical practice concerned with the


selection of individual observations intended to yield some
knowledge about a population of concern, especially for the
purposes of statistical inference. In particular, results from
probability theory and statistical theory are employed to guide
practice.

The sampling process consists of five stages:

Definition of population of concern

Specification of a sampling frame, a set of items or events that


it is possible to measure

Specification of sampling method for selecting items or events


from the frame

Sampling and data collecting

Review of sampling process

Sampling methods

Within any of the types of frame identified above, a variety of


sampling methods can be employed, individually or in
combination. Factors commonly influencing the choice between
these designs include:
Nature and quality of the frame

Availability of auxiliary information about units on the frame

Accuracy requirements, and the need to measure accuracy

Whether detailed analysis of the sample is expected

Cost/operational concerns

Simple random sampling

In a simple random sample ('SRS') of a given size, all such


subsets of the frame are given an equal probability. Each
element of the frame thus has an equal probability of
selection: the frame is not subdivided or partitioned.
Furthermore, any given pair of elements has the same chance
of selection as any other such pair (and similarly for triples,
and so on). This minimises bias and simplifies analysis of
results. In particular, the variance between individual results
within the sample is a good indicator of variance in the overall
population, which makes it relatively easy to estimate the
accuracy of results.

However, SRS can be vulnerable to sampling error because the


randomness of the selection may result in a sample that
doesn't reflect the makeup of the population. For instance, a
simple random sample of ten people from a given country will
on average produce five men and five women, but any given
trial is likely to overrepresent one sex and underrepresent the
other. Systematic and stratified techniques, discussed below,
attempt to overcome this problem by using information about
the population to choose a more representative sample.

SRS may also be cumbersome and tedious when sampling from


an unusually large target population. In some cases,
investigators are interested in research questions specific to
subgroups of the population. For example, researchers might
be interested in examining whether cognitive ability as a
predictor of job performance is equally applicable across racial
groups. SRS cannot accommodate the needs of researchers in
this situation because it does not provide subsamples of the
population. Stratified sampling, which is discussed below,
addresses this weakness of SRS.
Simple random sampling is always an EPS design, but not all
EPS designs are simple random sampling.

Systematic sampling

Systematic sampling relies on arranging the target population


according to some ordering scheme and then selecting
elements at regular intervals through that ordered list.
Systematic sampling involves a random start and then
proceeds with the selection of every kth element from then
onwards. In this case, k=(population size/sample size). It is
important that the starting point is not automatically the first
in the list, but is instead randomly chosen from within the first
to the kth element in the list. A simple example would be to
select every 10th name from the telephone directory (an 'every
10th' sample, also referred to as 'sampling with a skip of 10').

As long as the starting point is randomized, systematic


sampling is a type of probability sampling. It is easy to
implement and the stratification induced can make it efficient,
if the variable by which the list is ordered is correlated with
the variable of interest. 'Every 10th' sampling is especially
useful for efficient sampling from databases.

Example: Suppose we wish to sample people from a long street


that starts in a poor district (house #1) and ends in an
expensive district (house #1000). A simple random selection of
addresses from this street could easily end up with too many
from the high end and too few from the low end (or vice
versa), leading to an unrepresentative sample. Selecting (e.g.)
every 10th street number along the street ensures that the
sample is spread evenly along the length of the street,
representing all of these districts. (Note that if we always start
at house #1 and end at #991, the sample is slightly biased
towards the low end; by randomly selecting the start between
#1 and #10, this bias is eliminated.)

However, systematic sampling is especially vulnerable to


periodicities in the list. If periodicity is present and the period
is a multiple or factor of the interval used, the sample is
especially likely to be unrepresentative of the overall
population, making the scheme less accurate than simple
random sampling.
Example: Consider a street where the odd-numbered houses
are all on the north (expensive) side of the road, and the even-
numbered houses are all on the south (cheap) side. Under the
sampling scheme given above, it is impossible' to get a
representative sample; either the houses sampled will all be
from the odd-numbered, expensive side, or they will all be
from the even-numbered, cheap side.

Another drawback of systematic sampling is that even in


scenarios where it is more accurate than SRS, its theoretical
properties make it difficult to quantify that accuracy. (In the
two examples of systematic sampling that are given above,
much of the potential sampling error is due to variation
between neighbouring houses - but because this method never
selects two neighbouring houses, the sample will not give us
any information on that variation.)

As described above, systematic sampling is an EPS method,


because all elements have the same probability of selection (in
the example given, one in ten). It is not 'simple random
sampling' because different subsets of the same size have
different selection probabilities - e.g. the set {4,14,24,...,994}
has a one-in-ten probability of selection, but the set
{4,13,24,34,...} has zero probability of selection.

Systematic sampling can also be adapted to a non-EPS


approach; for an example, see discussion of PPS samples
below.

Stratified sampling

Where the population embraces a number of distinct


categories, the frame can be organized by these categories
into separate "strata." Each stratum is then sampled as an
independent sub-population, out of which individual elements
can be randomly selected[3]. There are several potential
benefits to stratified sampling.

First, dividing the population into distinct, independent strata


can enable researchers to draw inferences about specific
subgroups that may be lost in a more generalized random
sample.

Second, utilizing a stratified sampling method can lead to


more efficient statistical estimates (provided that strata are
selected based upon relevance to the criterion in question,
instead of availability of the samples). It is important to note
that even if a stratified sampling approach does not lead to
increased statistical efficiency, such a tactic will not result in
less efficiency than would simple random sampling, provided
that each stratum is proportional to the group’s size in the
population.

Third, it is sometimes the case that data are more readily


available for individual, pre-existing strata within a population
than for the overall population; in such cases, using a
stratified sampling approach may be more convenient than
aggregating data across groups (though this may potentially
be at odds with the previously noted importance of utilizing
criterion-relevant strata).

Finally, since each stratum is treated as an independent


population, different sampling approaches can be applied to
different strata, potentially enabling researchers to use the
approach best suited (or most cost-effective) for each
identified subgroup within the population.

There are, however, some potential drawbacks to using


stratified sampling. First, identifying strata and implementing
such an approach can increase the cost and complexity of
sample selection, as well as leading to increased complexity of
population estimates. Second, when examining multiple
criteria, stratifying variables may be related to some, but not
to others, further complicating the design, and potentially
reducing the utility of the strata. Finally, in some cases (such
as designs with a large number of strata, or those with a
specified minimum sample size per group), stratified sampling
can potentially require a larger sample than would other
methods (although in most cases, the required sample size
would be no larger than would be required for simple random
sampling.

A stratified sampling approach is most effective when three


conditions are met

Variability within strata are minimized

Variability between strata are maximized


The variables upon which the population is stratified are
strongly correlated with the desired dependent variable.

Advantages over other sampling methods

Focuses on important subpopulations and ignores irrelevant


ones.

Allows use of different sampling techniques for different


subpopulations.

Improves the accuracy/efficiency of estimation.

Permits greater balancing of statistical power of tests of


differences between strata by sampling equal numbers from
strata varying widely in size.

Disadvantages

Requires selection of relevant stratification variables which


can be difficult.

Is not useful when there are no homogeneous subgroups.

Can be expensive to implement.

Poststratification

Stratification is sometimes introduced after the sampling


phase in a process called "poststratification". This approach is
typically implemented due to a lack of prior knowledge of an
appropriate stratifying variable or when the experimenter
lacks the necessary information to create a stratifying variable
during the sampling phase. Although the method is susceptible
to the pitfalls of post hoc approaches, it can provide several
benefits in the right situation. Implementation usually follows
a simple random sample. In addition to allowing for
stratification on an ancillary variable, poststratification can be
used to implement weighting, which can improve the precision
of a sample's estimates.

Oversampling

Choice-based sampling is one of the stratified sampling


strategies. In choice-based sampling the data are stratified on
the target and a sample is taken from each strata so that the
rare target class will be more represented in the sample. The
model is then built on this biased sample. The effects of the
input variables on the target are often estimated with more
precision with the choice-based sample even when a smaller
overall sample size is taken, compared to a random sample.
The results usually must be adjusted to correct for the
oversampling.

Probability proportional to size sampling

In some cases the sample designer has access to an "auxiliary


variable" or "size measure", believed to be correlated to the
variable of interest, for each element in the population. This
data can be used to improve accuracy in sample design. One
option is to use the auxiliary variable as a basis for
stratification, as discussed above.

Another option is probability-proportional-to-size ('PPS')


sampling, in which the selection probability for each element is
set to be proportional to its size measure, up to a maximum of
1. In a simple PPS design, these selection probabilities can
then be used as the basis for Poisson sampling. However, this
has the drawbacks of variable sample size, and different
portions of the population may still be over- or under-
represented due to chance variation in selections. To address
this problem, PPS may be combined with a systematic
approach.

Example: Suppose we have six schools with populations of 150,


180, 200, 220, 260, and 490 students respectively (total 1500
students), and we want to use student population as the basis
for a PPS sample of size three. To do this, we could allocate the
first school numbers 1 to 150, the second school 151 to
330 (= 150 + 180), the third school 331 to 530, and so on to
the last school (1011 to 1500). We then generate a random
start between 1 and 500 (equal to 1500/3) and count through
the school populations by multiples of 500. If our random start
was 137, we would select the schools which have been
allocated numbers 137, 637, and 1137, i.e. the first, fourth,
and sixth schools.

The PPS approach can improve accuracy for a given sample


size by concentrating sample on large elements that have the
greatest impact on population estimates. PPS sampling is
commonly used for surveys of businesses, where element size
varies greatly and auxiliary information is often available - for
instance, a survey attempting to measure the number of
guest-nights spent in hotels might use each hotel's number of
rooms as an auxiliary variable. In some cases, an older
measurement of the variable of interest can be used as an
auxiliary variable when attempting to produce more current
estimates.

Cluster sampling

Sometimes it is cheaper to 'cluster' the sample in some way


e.g. by selecting respondents from certain areas only, or
certain time-periods only. (Nearly all samples are in some
sense 'clustered' in time - although this is rarely taken into
account in the analysis.)

Cluster sampling is an example of 'two-stage sampling' or


'multistage sampling': in the first stage a sample of areas is
chosen; in the second stage a sample of respondents within
those areas is selected.

This can reduce travel and other administrative costs. It also


means that one does not need a sampling frame listing all
elements in the target population. Instead, clusters can be
chosen from a cluster-level frame, with an element-level frame
created only for the selected clusters. Cluster sampling
generally increases the variability of sample estimates above
that of simple random sampling, depending on how the
clusters differ between themselves, as compared with the
within-cluster variation.

Nevertheless, some of the disadvantages of cluster sampling


are the reliance of sample estimate precision on the actual
clusters chosen. If clusters chosen are biased in a certain way,
inferences drawn about population parameters from these
sample estimates will be far off from being accurate.

Multistage sampling Multistage sampling is a complex form of


cluster sampling in which two or more levels of units are
imbedded one in the other. The first stage consists of
constructing the clusters that will be used to sample from. In
the second stage, a sample of primary units is randomly
selected from each cluster (rather than using all units
contained in all selected clusters). In following stages, in each
of those selected clusters, additional samples of units are
selected, and so on. All ultimate units (individuals, for
instance) selected at the last step of this procedure are then
surveyed.

This technique, thus, is essentially the process of taking


random samples of preceding random samples. It is not as
effective as true random sampling, but it probably solves more
of the problems inherent to random sampling. Moreover, It is
an effective strategy because it banks on multiple
randomizations. As such, it is extremely useful.

Multistage sampling is used frequently when a complete list of


all members of the population does not exist and is
inappropriate. Moreover, by avoiding the use of all sample
units in all selected clusters, multistage sampling avoids the
large, and perhaps unnecessary, costs associated traditional
cluster sampling.

Matched random sampling

A method of assigning participants to groups in which pairs of


participants are first matched on some characteristic and then
individually assigned randomly to groups.[5]

The Procedure for Matched random sampling can be briefed


with the following contexts,

Two samples in which the members are clearly paired, or are


matched explicitly by the researcher. For example, IQ
measurements or pairs of identical twins.

Those samples in which the same attribute, or variable, is


measured twice on each subject, under different
circumstances. Commonly called repeated measures. Examples
include the times of a group of athletes for 1500m before and
after a week of special training; the milk yields of cows before
and after being fed a particular diet.

Quota sampling

In quota sampling, the population is first segmented into


mutually exclusive sub-groups, just as in stratified sampling.
Then judgment is used to select the subjects or units from
each segment based on a specified proportion. For example, an
interviewer may be told to sample 200 females and 300 males
between the age of 45 and 60.

It is this second step which makes the technique one of non-


probability sampling. In quota sampling the selection of the
sample is non-random. For example interviewers might be
tempted to interview those who look most helpful. The
problem is that these samples may be biased because not
everyone gets a chance of selection. This random element is
its greatest weakness and quota versus probability has been a
matter of controversy for many years

Mechanical sampling

Mechanical sampling is typically used in sampling solids,


liquids and gases, using devices such as grabs, scoops, thief
probes, the COLIWASA and riffle splitter.

Care is needed in ensuring that the sample is representative of


the frame. Much work in the theory and practice of mechanical
sampling was developed by Pierre Gy and Jan Visman.

Convenience sampling

Convenience sampling (sometimes known as grab or


opportunity sampling) is a type of nonprobability sampling
which involves the sample being drawn from that part of the
population which is close to hand. That is, a sample population
selected because it is readily available and convenient. The
researcher using such a sample cannot scientifically make
generalizations about the total population from this sample
because it would not be representative enough. For example, if
the interviewer was to conduct such a survey at a shopping
center early in the morning on a given day, the people that
he/she could interview would be limited to those given there at
that given time, which would not represent the views of other
members of society in such an area, if the survey was to be
conducted at different times of day and several times per
week. This type of sampling is most useful for pilot testing.
Several important considerations for researchers using
convenience samples include:

Are there controls within the research design or experiment


which can serve to lessen the impact of a non-random,
convenience sample whereby ensuring the results will be more
representative of the population?

Is there good reason to believe that a particular convenience


sample would or should respond or behave differently than a
random sample from the same population?

Is the question being asked by the research one that can


adequately be answered using a convenience sample?

In social science research, snowball sampling is a similar


technique, where existing study subjects are used to recruit
more subjects into the sample.

Line-intercept sampling

Line-intercept sampling is a method of sampling elements in a


region whereby an element is sampled if a chosen line
segment, called a “transect”, intersects the element.

Panel sampling

Panel sampling is the method of first selecting a group of


participants through a random sampling method and then
asking that group for the same information again several times
over a period of time. Therefore, each participant is given the
same survey or interview at two or more time points; each
period of data collection is called a "wave". This sampling
methodology is often chosen for large scale or nation-wide
studies in order to gauge changes in the population with
regard to any number of variables from chronic illness to job
stress to weekly food expenditures. Panel sampling can also be
used to inform researchers about within-person health
changes due to age or help explain changes in continuous
dependent variables such as spousal interaction. There have
been several proposed methods of analyzing panel sample
data, including MANOVA, growth curves, and structural
equation modeling with lagged effects. For a more thorough
look at analytical techniques for panel data, see Johnson
(1995).

Event Sampling Methodology

Event Sampling Methodology (ESM) is a new form of sampling


method that allows researchers to study ongoing experiences
and events that vary across and within days in its naturally-
occurring environment. Because of the frequent sampling of
events inherent in ESM, it enables researchers to measure the
typology of activity and detect the temporal and dynamic
fluctuations of work experiences. Popularity of ESM as a new
form of research design increased over the recent years
because it addresses the shortcomings of cross-sectional
research, where once unable to, researchers can now detect
intra-individual variances across time. In ESM, participants are
asked to record their experiences and perceptions in a paper
or electronic diary.

There are three types of ESM:# Signal contingent – random


beeping notifies participants to record data. The advantage of
this type of ESM is minimization of recall bias.

Event contingent – records data when certain events occur

Interval contingent – records data according to the passing of a


certain period of time

ESM has several disadvantages. One of the disadvantages of


ESM is it can sometimes be perceived as invasive and intrusive
by participants. ESM also leads to possible self-selection bias.
It may be that only certain types of individuals are willing to
participate in this type of study creating a non-random sample.
Another concern is related to participant cooperation.
Participants may not be actually fill out their diaries at the
specified times. Furthermore, ESM may substantively change
the phenomenon being studied. Reactivity or priming effects
may occur, such that repeated measurement may cause
changes in the participants' experiences. This method of
sampling data is also highly vulnerable to common method
variance.[6]

Further, it is important to think about whether or not an


appropriate dependent variable is being used in an ESM
design. For example, it might be logical to use ESM in order to
answer research questions which involve dependent variables
with a great deal of variation throughout the day. Thus,
variables such as change in mood, change in stress level, or
the immediate impact of particular events may be best studied
using ESM methodology. However, it is not likely that utilizing
ESM will yield meaningful predictions when measuring
someone performing a repetitive task throughout the day or
when dependent variables are long-term in nature (coronary
heart problems).

Answer 2

In probability theory and statistics, correlation (often


measured as a correlation coefficient) indicates the strength
and direction of a linear relationship between two random
variables.

In statistics, regression analysis refers to techniques for the


modeling and analysis of numerical data consisting of values of
a dependent variable (also called a response variable) and of
one or more independent variables (also known as explanatory
variables or predictors). The dependent variable in the
regression equation is modeled as a function of the
independent variables, corresponding parameters
("constants"), and an error term. The error term is treated as a
random variable. It represents unexplained variation in the
dependent variable. The parameters are estimated so as to
give a "best fit" of the data. Most commonly the best fit is
evaluated by using the least squares method, but other
criteria have also been used.

Answer 3

Forecasting involves making the best possible judgment about


some future event.It is no longer reasonable to rely solely on
intuition, or ones feel for the situation, in projecting sales,
inventory needs, personnel requirements, and other important
economic or business variables.

Who uses forecasts?

Accountants - costs, revenues, tax-planning

Personnel Departments - recruitment of new employees

Financial Experts - interest rates


Production Managers - raw materials needs, inventories

Marketing Managers - sales forecasts for promotions

Major Types of Forecasting Methods

Subjective Methods

Sales Force Composites

Customer Surveys

Jury of Executive Opinions

Delphi Method

Time series analysis

In statistics, signal processing, and many other fields, a time


series is a sequence of data points, measured typically at
successive times, spaced at (often uniform) time intervals.
Time series analysis comprises methods that attempt to
understand such time series, often either to understand the
underlying context of the data points (Where did they come
from? What generated them?), or to make forecasts
(predictions). Time series forecasting is the use of a model to
forecast future events based on known past events: to
forecast future data points before they are measured. A
standard example in econometrics is the opening price of a
share of stock based on its past performance.

The term time series analysis is used to distinguish a problem,


firstly from more ordinary data analysis problems (where there
is no natural ordering of the context of individual
observations), and secondly from spatial data analysis where
there is a context that observations (often) relate to
geographical locations. There are additional possibilities in the
form of space-time models (often called spatial-temporal
analysis). A time series model will generally reflect the fact
that observations close together in time will be more closely
related than observations further apart. In addition, time
series models will often make use of the natural one-way
ordering of time so that values in a series for a given time will
be expressed as deriving in some way from past values, rather
than from future values (see time reversibility.)

Methods for time series analyses are often divided into two
classes: frequency-domain methods and time-domain methods.
The former centre around spectral analysis and recently
wavelet analysis, and can be regarded as model-free analyses
well-suited to exploratory investigations. Time-domain
methods have a model-free subset consisting of the
examination of auto-correlation and cross-correlation analysis,
but it is here that partly and fully-specified time series models
make their appearance.

Prior moving average

A simple moving average (SMA) is the un weighted mean of


the previous n data points. For example, a 10-day simple
moving average of closing price is the mean of the previous 10
days' closing prices. If those prices are then
the formula is

When calculating successive values, a new value comes into


the sum and an old value drops out, meaning a full summation
each time is unnecessary,

In technical analysis there are various popular values for n,


like 10 days, 40 days, or 200 days. The period selected
depends on the kind of movement one is concentrating on,
such as short, intermediate, or long term. In any case moving
average levels are interpreted as support in a rising market, or
resistance in a falling market.
In all cases a moving average lags behind the latest data point,
simply from the nature of its smoothing. An SMA can lag to an
undesirable extent, and can be disproportionately influenced
by old data points dropping out of the average. This is
addressed by giving extra weight to more recent data points,
as in the weighted and exponential moving averages.

One characteristic of the SMA is that if the data have a


periodic fluctuation, then applying an SMA of that period will
eliminate that variation (the average always containing one
complete cycle). But a perfectly regular cycle is rarely
encountered in economics or finance.

Central moving average

For a number of applications it is advantageous to avoid the


shifting induced by using only 'past' data. Hence a central
moving average can be computed, using both 'past' and
'future' data. The 'future' data in this case are not predictions,
but merely data obtained after the time at which the average
is to be computed.

Weighted and exponential moving averages [citation needed]


(see below) can also be computed centrally.

Cumulative moving average

The cumulative moving average is also frequently called a


running average or a long running average although the term
running average is also used as synonym for a moving
average. This article uses the term cumulative moving average
or simply cumulative average since this term is more
descriptive and unambiguous.

In some data acquisition systems, the data arrives in an


ordered data stream and the stastitician would like to get the
average of all of the data up until the current data point. For
example, an investor may want the average price of all of the
stock transactions for a particular stock up until the current
time. As each new transaction occurs, the average price at the
time of the transaction can be calculated for all of the
transactions up to that point using the cumulative average.
This is the cumulative average, which is typically an
unweighted average of the sequence of i values x1, ..., xi up to
the current time:
The brute force method to calculate this would be to store all
of the data and calculate the sum and divide by the number of
data points every time a new data point arrived. However, it is
possible to simply update cumulative average as a new value
xi+1 becomes available, using the formula:

where CA0 can be taken to be equal to 0.

Thus the current cumulative average for a new data point is


equal to the previous cumulative average plus the difference
between the latest data point and the previous average
divided by the number of points received so far. When all of
the data points arrive (i = N), the cumulative average will
equal the final average.

The derivation of the cumulative average formula is


straightforward. Using

and similarly for i + 1, it is seen that

Solving this equation for CAi+1 results in:

Weighted moving average

A weighted average is any average that has multiplying factors


to give different weights to different data points.
Mathematically, the moving average is the convolution of the
data points with a moving average function; in technical
analysis, a weighted moving average (WMA) has the specific
meaning of weights which decrease arithmetically. In an n-day
WMA the latest day has weight n, the second latest n − 1, etc,
down to zero.

WMA weights n = 15

The denominator is a triangle number, and can be easily

computed as

When calculating the WMA across successive values, it can be


noted the difference between the numerators of WMAM+1 and
WMAM is npM+1 − pM − ... − pM−n+1. If we denote the sum
pM + ... + pM−n+1 by TotalM, then

The graph at the right shows how the weights decrease, from
highest weight for the most recent data points, down to zero.
It can be compared to the weights in the exponential moving
average which follows.

Exponential moving average


EMA weights N=15

An exponential moving average (EMA), sometimes also called


an exponentially weighted moving average (EWMA), applies
weighting factors which decrease exponentially. The weighting
for each older data point decreases exponentially, giving much
more importance to recent observations while still not
discarding older observations entirely. The graph at right
shows an example of the weight decrease.

Parameters:

The degree of weighing decrease is expressed as a constant


smoothing factor α, a number between 0 and 1. α may be
expressed as a percentage, so a smoothing factor of 10% is
equivalent to α = 0.1. A higher α discounts older observations
faster. Alternatively, α may be expressed in terms of N time

periods, where . For example, N = 19 is equivalent to


α = 0.1. The half-life of the weights (the interval over which
the weights decrease by a factor of two) is approximately

(within 1% if N > 5).

The observation at a time period t is designated Yt, and the


value of the EMA at any time period t is designated St. S1 is
undefined. S2 may be initialized in a number of different ways,
most commonly by setting S2 to Y1, though other techniques
exist, such as setting S2 to an average of the first 4 or 5
observations. The prominence of the S2 initialization's effect
on the resultant moving average depends on α; smaller α
values make the choice of S2 relatively more important than
larger α values, since a higher α discounts older observations
faster.
Formula:

The formula for calculating the EMA at time periods t > 2 is

This formulation is according to Hunter (1986)[2]. The weights


will obey α(1 − α)xYt − (x + 1). An alternate approach by
Roberts (1959) uses Yt in lieu of Yt−1[3]:

This formula can also be expressed in technical analysis terms


as follows, showing how the EMA steps towards the latest data
point, but only by a proportion of the difference (each
time):[4]

Expanding out EMAyesterday each time results in the following


power series, showing how the weighting factor on each data
point p1, p2, etc, decrease exponentially:

In theory this is an infinite sum, but because 1 − α is less than


1, the terms become smaller and smaller, and can be ignored
once small enough.

The N periods in an N-day EMA only specify the α factor. N is


not a stopping point for the calculation in the way it is in an
SMA or WMA. The first N data points in an EMA represent
about 86% of the total weight in the calculation[citation
needed].

The power formula above gives a starting value for a particular


day, after which the successive days formula shown first can
be applied.

The question of how far back to go for an initial value depends,


in the worst case, on the data. If there are huge p price values
in old data then they'll have an effect on the total even if their
weighting is very small. If one assumes prices don't vary too
wildly then just the weighting can be considered. The weight
omitted by stopping after k terms is

Which is

I.e. a fraction

out of the total weight.

For example, to have 99.9% of the weight,

terms should be used. Since approaches

as N increases, this simplifies to approximately

k = 3.45(N + 1)

for this example (99.9% weight).

Modified Moving Average

This is called modified moving average (MMA), running moving


average (RMA), or smoothed moving average.

Definition

In short, this is exponential moving average, which .


[edit] Application of exponential moving average to OS
performance metrics

Some computer performance metrics use a form of exponential


moving average, for example, the average process queue
length, or the average CPU utilization.

Here α is defined as a function of time between two readings.


An example of a coefficient giving bigger weight to the current
reading, and smaller weight to the older readings is

where time for readings tn is expressed in seconds, and W is


the period of time in minutes over which the reading is said to
be averaged (the mean lifetime of each reading in the
average). Given the above definition of α, the moving average
can be expressed as

For example, a 15-minute average L of a process queue length


Q, measured every 5 seconds (time difference is 5 seconds), is
computed as

Answer 4

Statistics is the scientific application of mathematical


principles to the collection, analysis, and presentation of
numerical data. Statisticians contribute to scientific enquiry by
applying their mathematical and statistical knowledge to the
design of surveys and experiments; the collection, processing,
and analysis of data; and the interpretation of the results.
Statisticians may apply their knowledge of statistical methods
to a variety of subject areas, such as biology, economics,
engineering, medicine, public health, psychology, marketing,
education, and sports.

Many economic, social, political, and military decisions cannot


be made without statistical techniques, such as the design of
experiments to gain federal approval of a newly manufactured
drug.

Characteristics of Statistics

Some of its important characteristics are given below:

Statistics are aggregates of facts.

Statistics are numerically expressed.

Statistics are affected to a marked extent by multiplicity of


causes.

Statistics are enumerated or estimated according to a


reasonable standard of accuracy.

Statistics are collected for a predetermine purpose.

Statistics are collected in a systemic manner.

Statistics must be comparable to each other.

Limitations of statistics

Actually, statistics is an exact science. It is the application of


the results that leads to problems.

Perhaps the most commonly encountered limitation of


statistics is the misunderstanding that a statistical measure
can be used as a measure of the accuracy of a measurement.

Statistics, in general, provide very little information on the


intrinsic accuracy of a measurement. Statistics can only
provide an estimate of the minimal error that might be in the
measurement. The actual error can be much greater than the
minimal (statistical) error.

Another way to put it is that statistics measure the variability


of a measurement, not the accuracy of a measurement.

First of all, there's the sample size. If you don't have a big
enough sample, you can't give a very reliable answer. Of
course, with limited resources it might not be possible to
collect a large sample. There's a tradeoff between sample size
and reliability.

Also, there are aspects of statistics which are countrintuitive


and tend to confuse people. For instance, Simpson's Paradox.
We all know that lots of sunlight is good for crops. But if you
do a survey of crop yield and compare it to weather records,
you might actually find that the sunnier it is, the less the crops
grow. This is different from the problem of small sample size
outlined above. I'm not talking about a few freak years with
lots of pests throwing the statistics out. The problem is that
when it's sunny, it generally isn't raining. So the years with
lots of sunshine have lower crop yields because they have less
rain. It is possible to correct for this, by looking at records of
both sunshine and rainfall. Basically, if you're looking for a
direct relationship between two variables, you need to think of
all the other variables that might have an effect and take
those into account. But I think this increases the sample size
needed, if you overdo it.

Another fallacy is to assume that correlation implies causality.


For instance, you could do a survey and discover that poorer
neighbourhoods have more crime. You might conclude that
living in such a place makes someone a criminal. But this might
not be the case. It might be that wealth affects both crime and
place of residence, i.e. poor people tend to live in certain
places (because they can't afford to live anywhere else), and
poor people often turn to crime (to feed their families). Or
there might be some other variable we haven't thought of yet.

Answer 5
1. Background

A strengthened planning system is necessary in the formation


for the

construction of a society based on sustainable development as


well as justice

and equity with the reduction of poverty and


institutionalization of peace and

democracy. It will be necessary to develop an effective


planning system and to

make reforms on the institutional and procedural aspects of


the planning

organization timely with revisions, in order to support the


state mechanism in

intensifying the development activities by mainstreaming a


balance between the

growing ambition of the people and limited resources and


means.

For economic transformation in line with the democratic values


and

beliefs, a competent statistical system needs to be developed


for dynamic plan

formulation, in accordance with the latest liberal economic


philosophy and the

geo-physical economic and social conditions.

2. Review of the Current Situation

The Tenth Plan had made commitments to coordinate long-


term and

periodic plans for the raising of living standards of the poor,


backward groups
and regions, policy formulation, input projection, selection of
development

programs, approval, implementation and monitoring and


evaluation works; to

supply all types of reliable and quality statistics; to make


institutional

strengthening of CBS, legal reforms and human resource


development; to fully

implement the overall national statistical plans; to strengthen


national accounts;

and to prepare an Action Plan in an integrated way in order to


specify the status

of the designated indication in the Tenth Plan. Against these


commitments the

achievements within the Tenth Plan period were: PMAS was


established and the

documentation on this was made public on a yearly basis and


NPC was

restructured. Similarly, MTEF system was initiated and


institutionalized to make

resource allocation and selection of projects logical in


accordance with the

national goal of poverty reduction. Assistance was provided in


a special way for

the implementation of programs of national importance under


the Immediate

Action Plan. Some exercises were done on the implementation


of business plan

in some ministries to bring about effectiveness in the work


performance,
monitoring indicators were developed for the implementation
of the monitoring

subsystem, tasks on impact and cost effectiveness study were


carried out, MDG

progress report was prepared, and need assessment was made


to MDG in five

districts. National accounts statistics was timely in accordance


with the System of

National Accounts (SNA) 1993. Determination of specified


indicator of MDGs and

the Tenth Plan were determined, Nepal Info was prepared,


living standard

measurement and poverty mapping was accomplished, digital


mapping of all the

VDCs was updated using GIS, and National Development


Volunteers Program

launched in 42 districts.

3. Problems, Challenges and Opportunities

Some of the problems are duplication in the statistics in Nepal,


lack of

proper level and standards as well as coordination and limited


use of information

technology in plan preparation and data processing. Due to


these problems, the

following have become challenging and serious:

• To clarify the role of NPC in the liberal economic system.

• To make planning and action based on demand, information


and
facts.

• To establish coordination on the above among the involved

agencies.

• Professional achievement with capacity enhancement.

• Institutional and legal reforms.

With successful coordination of the above actions, the plan,


policy and

program being prepared will be effective, simplification of


implementation,

monitoring and evaluation will occur, it is challenging to give


attention to making

the different stages of planning development strong in the


coming days. By

utilizing the experience of planned development over the past


fifty years, efforts

now will be concentrated to formulate plans capable of


bringing intensity in the

overall development through the maximum mobilization of


social and economic

infrastructure.

4. Long Term Vision

To develop a strengthened planning system, capable of playing


a timely

role in the construction of a Prosperous, Modern and Just


Nepal.

5. Objectives

To institutionalize an effective panning system, by developing


a reliable
statistical system.

6. Strategies

• To prepare the infrastructure for a strengthened and dynamic

planning system.

• To carry out institutional strengthening and enhance the


capacity of

planning organizations, statistics and planning units.

• Institutional strengthening of the National Statistical System


and the

National Accounts System will be done.

7. Policy and Working Policies

• Planning system based on research and objective analysis


and a

favorable dynamic political, economic and social context will


be

developed.

• A framework will be prepared after making a study on the


planning

and statistical system to be adopted in accordance with a


federal

structure.

• NPC and CBS will be structured and strengthened with a view


of

making notable reforms in the working system and


effectiveness.
• Planning and statistical system will gradually be made
inclusive and

engendered.

8. Programs

System Development

• Program to make the current planning practice gender


accountable,

pro-people, and effective by incorporating programs like study,

research, survey, tours, seminars, conferences, public debate,


and

the enhancement of public awareness.

• In accordance with a federal structure, in order to prepare

appropriate framework for planning and statistical system,


programs

like necessary studies, surveys, debate, seminar, discussion,

meeting/conference, tour, awareness enhancement will be


carried

out.

• Institutional and procedural arrangement will be made in


order to

develop an effective training system in the field of planning,

programming, monitoring and evaluation and statistics.

• National statistical system will be enforced.

• National Accounts System strengthening program will be


launched.

Institutional Strengthening
• Institutional, legal and work procedural restructuring of NPC
after a

comprehensive study is done, and it will be made more timely,


more

effective and competent.

• By constructing a modern and well-equipped planning house,


PMIS

will be followed by using Information and Communication

Technology. To modernize the plan formulation system,


networks

with other ministries and stakeholders will be established.

• Working environment will be improved, by making physical


amenities

available, and monitoring and evaluation reforms.

• Extension and reforms of MTEF and the adoption and


extension

works will be done on gender planning.

• A separate unit will be institutionalized after being formed


for study,

research and analysis. To strengthen planning


divisions/sections and

to make effective use of the concerned agencies by


coordinating the

actions of different ministries and agencies.

• To make institutional strengthening of GIS by making


maximum

utilization of information technology in the collection of


statistics.
• To construct a new building for GIS and cartography, library,
training

centers and data processing.

Preparation of Long Term Vision Paper and Policy Research and


Other

Provisions

• A long-term vision paper will be prepared to design the


destination of

economic social development of the country and where it


should be

in 20 years, and this paper will be adopted.

• By developing a competent policy study analysis system, the

commission can give the country advice on timely policy


formulation

and their role in a competent way.

• In order to develop the National Statistical System, problems


related

to the availability of data, duplication quality, integrated


system,

coordination and legitimacy, will be solved through legal,


institutional,

human resources, quality related aspects addressing the


ongoing

programs, reforms and institutional strengthening,


development of

institutional memory, survey, study, research works.


• As National Development Volunteers Service program has
been

found to help in the upliftment of the marginalized groups and

regions, this will be developed as an autonomous agency after


its

institutional strengthening.

• By formulating model community development programs

arrangements will be made for its implementation in the


designated

VDC and areas.

• Surveys and studies including Industrial Census, Nepal Labor


Power

Survey and Nepal Living Standards Survey will be carried out.

Likewise, preparation works for Census 2011 and Agricultural

Census 2011 will be made. In conducting these censuses and

surveys, on the basis of feasibility as well as importance, the

contribution of women in the national economy, tourism,


health,

informal sector will gradually be incorporated as in the


preparation of

satellite accounting, for their contribution even in non-


economic

activities.

Most often we collect statistical data by doing surveys or


experiments. To do a survey, we pick a small number of people
and ask them questions. Then, we use their answers as the
data. The choice of which individuals to take for a survey or
data collection is very important, as it directly influences the
statistics. When the statistics are done, it can no longer be
determined which individuals are taken. Suppose we want to
measure the water quality of a big lake. If we take samples
next to the waste drain, we will get different results than if the
samples are taken in a far away, hard to reach, spot of the
lake.

There are two kinds of problems which are commonly found


when taking samples:

If there are many samples, the samples will likely be very close
to what they are in the real population. If there are very few
samples, however, they might be very different from what they
are in the real population. This error is called a chance error.

The individuals for the samples need to be chosen carefully,


usually they will be chosen randomly. If this is not the case,
the samples might be very different from what they really are
in the total population. This is true even if a great number of
samples is taken. This kind of error is called bias

Errors

We can avoid chance errors by taking a larger sample, and we


can avoid some bias by choosing randomly. However,
sometimes large random samples are hard to take. And bias
can happen if some people refuse to answer our questions, or
if they know they are getting a fake treatment. These
problems can be hard to fix.

Descriptive statistics

Finding the middle of the data

The middle of the data is often called an average. The average


tells us about a typical individual in the population. There are
three kinds of average that are often used: the mean, the
median and the mode.

The examples below use this sample data:

Name | A B C D E F G H I J

---------------------------------------------
score| 23 26 49 49 57 64 66 78 82 92

Mean

The formula for the mean is

Where are the data and N is the population size.


(see Sigma Notation).

This means that you add up all the values, and then divide by
the number of values.

In our example

The problem with the mean is that it doesn't tell anything


about how the values are distributed. Values that are very
large or very small change the mean a lot. In statistics, these
extreme values might be errors of measurement, but
sometimes the population really does contain these values. For
example, if in a room there are 10 people who make $10/day
and 1 who makes $1,000,000/day. The mean of the data is
$90,918/day. Even though it is the average amount, the mean
in this case is not the amount any single person makes, and is
probably useless.

Median

The median is the middle item of the data. To find the median
we sort the data from the smallest number to the largest
number and then choose the number in the middle. If there are
an even number of data, there won't be a number right in the
middle, so we choose the two middle ones and calculate their
mean. In our example there are 10 items of data, the two
middle ones are "E" and "F", so the median is (57+64)/2 =
60.5.

Mode
The mode is the most frequent item of data. For example the
most common letter in English is the letter "e". We would say
that "e" is the mode of the distribution of the letters.

The mode is the only form of average that can be used for
numbers that can't be put in order.

Finding the spread of the data

Another thing we can say about a set of data is how spread out
it is. A common way to describe the spread of a set of data is
the standard deviation. If the standard deviation of a set of
data is small, then most of the data is very close to the
average. If the standard deviation is large, though, then a lot
of the data is very different from the average.

If the data follows the common pattern called the normal


distribution, then it is very useful to know the standard
deviation. If the data follows this pattern (we would say the
data is normally distributed), about 68 of every 100 pieces of
data will be off the average by less than the standard
deviation. Not only that, but about 95 of every 100
measurements will be off the average by less that two times
the standard deviation, and about 997 in 1000 will be closer to
the average than three standard deviations.

Other descriptive statistics

We also can use statistics to find out that some percent,


percentile, number, or fraction of people or things in a group
do something or fit in a certain category.

For example, social scientists used statistics to find out that


49% of people in the world are males

You might also like