Permutations: K-Subsets
Permutations: K-Subsets
Permutations: K-Subsets
Permutations
A permutation, also called an "arrangement number" or "order," is a rearrangement of the elements of an
ordered list into a one-to-one correspondence with itself. The number of permutations on a set of
elements is given by ( factorial; Uspensky 1937, p. 18). For example, there are permutations
of , namely and , and permutations of , namely ,
, , , , and . The permutations of a list can be found in the Wolfram
Language using the command Permutations. A list of length can be tested to see if it is a permutation of 1,
..., in the Wolfram Language using the command PermutationListQ
Sedgewick (1977) summarizes a number of algorithms for generating permutations, and identifies the minimum
change permutation algorithm of Heap (1963) to be generally the fastest (Skiena 1990, p. 10). Another method
of enumerating permutations was given by Johnson (1963; Séroul 2000, pp. 213-218).
The number of ways of obtaining an ordered subset of elements from a set of elements is given by
(Uspensky 1937, p. 18), where is a factorial. For example, there are 2-subsets of ,
namely , , , , , , , , , , , and .
The unordered subsets containing elements are known as the k-subsets of a given set.
A representation of a permutation as a product of permutation cycles is unique (up to the ordering of the
cycles). An example of a cyclic decomposition is the permutation of . This is
denoted , corresponding to the disjoint permutation cycles (2) and (143). There is a great deal of
freedom in picking the representation of a cyclic decomposition since (1) the cycles are disjoint and can
therefore be specified in any order, and (2) any rotation of a given cycle specifies the same cycle (Skiena 1990,
p. 20). Therefore, (431)(2), (314)(2), (143)(2), (2)(431), (2)(314), and (2)(143) all describe the same permutation.
Another notation that explicitly identifies the positions occupied by elements before and after application of a
permutation on elements uses a matrix, where the first row is and the second row is the new
arrangement. For example, the permutation which switches elements 1 and 2 and fixes 3 would be written as
The number of wrong permutations of objects is where is the nearest integer function. A
permutation of ordered objects in which no object is in its natural place is called a derangement (or
sometimes, a complete permutation) and the number of such permutations is given by the subfactorial .
Using
with gives
The set of all permutations of a set of elements 1, ..., can be obtained using the following recursive procedure
Consider permutations in which no pair of consecutive elements (i.e., rising or falling successions) occur.
For , 2, ... elements, the numbers of such permutations are 1, 0, 0, 2, 14, 90, 646, 5242, 47622, ...
(OEIS A002464).
Let the set of integers 1, 2, ..., be permuted and the resulting sequence be divided into increasing runs.
Denote the average length of the th run as approaches infinity, . The first few values are summarized in
the following table, where e is the base of the natural logarithm (Le Lionnais 1983, pp. 41-42; Knuth 1998).
OEIS approximate
1 A091131 1.7182818...
2 A091132 1.9524...
3 A091133 1.9957...
For Example: How many different integral numbers may be expressed by writing the 5 significant
digits in succession, each figure to be taken once, and once in each number. Its answer is 120
(1.2.3.4.5 = 120).
within elements of a given ordered list in one-to-one manner of correspondence. In this section will
Formula
If we have to find permutation of a list of n elements, it is given by n!, read as n factorial. Also, n! can
be calculated by finding the product of natural numbers from 1 to n, that is,
n! = n (n – 1) (n – 2) ….. 3 . 2 . 1
arrangements which are {3, 4, 7}, {3, 7, 4}, {4, 3, 7}, {4, 7, 3}, {7, 4, 3}, {7, 3, 4}.
If we need to find permutation arrangement of k elements out of a given set of n elements then we
have the following formula:
n (n – 1) (n – 2) … (n – k + 1)
The number of ways of picking unordered outcomes from possibilities. Also known as the binomial coefficient or
choice number and read " choose ,"
where is a factorial (Uspensky 1937, p. 18). For example, there are combinations of two elements out of
the set , namely , , , , , and . These combinations are known as k-subsets.
The number of combinations can be computed in the Wolfram Language using Binomial[n, k], and the
combinations themselves can be enumerated in the Wolfram Language using Subsets[Range[n], k ].
Formula:
Answer:
the science that deals with the collection, classification, analysis,and interpretation of numeric
al facts or data, and that, by use of mathematical theories ofprobability, imposes order and re
gularity on aggregates of more or less disparate elements.
(used with a plural verb) the numerical facts or data themselves.
Statistics deals with the analysis of data; statistical methods are developed to analyze large
volumes of data and their properties. Statistical methods are used by various organizations
and governments to calculate a collaborative property about employees or people; such
properties then influence the decisions taken by the organizations and governments. For
example, a government may want to know the average number of children below the age of
12 that are malnourished in the country, in the same way an organization may want to know
the average number of employees working that have stress disorders. Depending upon the
number of impoverished children, the government may come up with a new policy that will
help the government in dealing with the malnutrition problem.
Now if a country has 1000 citizens then it may keep a full database of all its inhabitants and
the statistical calculations may be easy, but when the number is in billions, the gathering of
complete data may become a little tedious or in many ways impossible. In such cases, a
random sample of population is selected as a representative to the whole population and
statistical calculations are carried according to it.
In this article we will explore some of the basic concepts of statistical analysis that form the
basis of all complex statistics problems. The basic concepts of mean, median, mode, variance
and standard deviation are the stepping stones to almost all statistical calculations. So let’s
explore them one by one.
MEAN OR AVERAGE
Mean or average, in theory, is the sum of all the elements of a set divided by the number of elements
in the set. Mean could be treated as a collaborative property of the whole set of values. You can get
a fairly good idea about the whole set of data by calculating its mean. Thus the formula for mean will
become.
The importance of mean lies in its ability to summarize the whole dataset with a single value. For
example, you may want to compare the average household income of County 1 to County 2. To
compare the household incomes between the two counties you cannot compare each and every
household income of one county to the other. The best solution would be to find the average
household incomes of the two counties and then compare them with each other. By comparing the
two means, we may make an assumption as to which county is more prosperous than the other.
MEDIAN
Simply put: Median is the middle value of a set. So, if a set consists of odd number of sets, then
the middle value is the median of the set, and if the set consists of an even number of sets,
then the median is the average of the two middle values. The median may be used to
separate a set of data into two parts.
To find the median of a set, all one needs to do is to write the elements of the set in increasing
order and find the number of elements then finally find the median. Median can prove to be a
very useful property in case of any outliers in the dataset. An outlier is nothing but a very huge
aberration in the values specified in the set. For example, if a set consists of values: 1, 2, 3, 4,
10000, then the value 10000 is an outlier. Outliers can make mean values deeply flawed. For
example, the mean of the above set is 10010/5=2002 and the median is 3. Thus, we can
definitely say that the median most properly summaries the set, better than the mean. You
can learn some more about the various statistics formulas and become well acquainted with
the topic.
MODE
The mode in a dataset is the value that is most frequent in a dataset. Like mean and median, mode
is also used to summarize a set with a single piece of information. For example, the mode of the
dataset S = 1,2,3,3,3,3,3,4,4,4,5,5,6,7, is 3 since it occurs the maximum number of times in the set S.
An important property of mode is that it is equal to the value of mean and median in the case of a
normal distribution. In other distributions or skewed distributions the value of mode may differ from the
two. In normal distributions the data is symmetrical to a central value. A normal distribution curve is a
curve that is symmetrical to an axis. Another important property of normal distributions is that half of
the values in the set are larger than the mean and half are smaller.
VARIANCE
You may want to measure the deviation of a set of data from the mean value. For example, a huge
variance of the household income data of a country may be interpreted as an economy with high
inequality. Many useful interpretations can be carried out by analyzing the variance in data. The
variance is obtained by:
1. Finding out the difference between the mean value and all the values in the set.
2. Squaring those differences.
3. Adding the differences.
Thus, one can observe that the variance of the particular dataset is always positive. The most proper
use of variance is its use in the calculation of Standard Deviation, which is one of the most important
concepts of statistics. Also, the calculation of variance can be lengthy; you may want to take up a
course on Vedic Mathematics which will teach you on how you can do the calculations faster.
STANDARD DRVIATION
The standard deviation is calculated by square rooting the variance of the data. The standard
deviation gives a more accurate account of the dispersion of values in a dataset. Since variance is
obtained by squaring the values, it cannot be applied to real world calculations. Standard deviation
is calculated by obtaining the square root of the variance which is of the same unit as the elements
of the set. Hence, Standard Deviation can be used as a trusted statistical quantity to make proper
statistical calculations. Standard deviation is also related to probability in many ways, so you may like
to take a workshop on probability and statistics to explore more about the relation between the two
topics.
A standard use of deviation is finding out how much the values of the dataset differ from the mean.
Let’s understand standard deviation with the help of an example:
Suppose a particular country claims that the average salary of its people is 5000 Dollars per month,
hence the country is very prosperous. This is a classic problem for a statistician who may ask the
claimant the standard deviation in the salary distribution for the people in his country. If the standard
deviation is very huge then the statistician may claim that the dispersion in salary is very huge and
hence the prosperity claim of the country should be viewed with suspicion. If the standard deviation
is less, then the claim of the country may really be credible because of the low difference in the
individual salaries from the mean salary.
A thumb rule of standard deviation is that generally 68% of the data values will always lie within one
standard deviation of the mean, 95% within two standard deviations and 99.7% within three standard
deviations of the mean. Thus, if somebody says that 95% of the state’s population is aged between 4
and 84, and asks you to find the mean. Then, you can easily calculate the
mean age of the population to be 4+84/2=22. Thus, the mean age of the population comes out to
be 22. Hence, we can assume a very young population.
POPULATION MEAN
The term population mean, which is the average score of the population on a given variable, is
represented by:
μ = ( Σ Xi ) / N
The symbol ‘μ’ represents the population mean. The symbol ‘Σ Xi’ represents the sum of all scores
present in the population (say, in this case) X1 X2 X3 and so on. The symbol ‘N’ represents the total
number of individuals or cases in the population.
The population standard deviation is a measure of the spread (variability) of the scores on a given
variable and is represented by:
σ = sqrt[ Σ ( Xi – μ )2 / N ]
The symbol ‘σ’ represents the population standard deviation. The term ‘sqrt’ used in this statistical
formula denotes square root. The term ‘Σ ( Xi – μ )2’ used in the statistical formula represents the sum
of the squared deviations of the scores from their population mean.
POPULATION VARIANCE
The population variance is the square of the population standard deviation and is represented
by:
σ2 = Σ ( Xi – μ )2 / N
The symbol ‘σ2’ represents the population variance.
SAMPLE MEAN
The sample mean is the average score of a sample on a given variable and is represented by:
x_bar = ( Σ xi ) / n
The term “x_bar” represents the sample mean. The symbol ‘Σ xi’ used in this formula represents the
represents the sum of all scores present in the sample (say, in this case) x1 x2 x3 and so on. The symbol
‘n,’ represents the total number of individuals or observations in the sample.
The statistic called sample standard deviation, is a measure of the spread (variability) of the
scores in the sample on a given variable and is represented by:
s = sqrt [ Σ ( xi – x_bar )2 / ( n – 1 ) ]
The term ‘Σ ( xi – x_bar )2’ represents the sum of the squared deviations of the scores from the sample
mean.
SAMPLE VARIANCE
The sample variance is the square of the sample standard deviation and is represented by:
s2 = Σ ( xi – x_bar )2 / ( n – 1 )
The symbol ‘s2’ represents the sample variance.
Pooled Sample Standard Deviation
The pooled sample standard deviation is a weighted estimate of spread (variability) across multiple
samples. It is represented by: