Lorenz Curve and QQ Plot
Lorenz Curve and QQ Plot
• Smaller the coefficient, lower the income inequality, the higher the Gini
coefficient, the more unequally income is distributed.
• If there is a completely equal distribution of income, the G = 0 and the Lorenz
curve is a diagonal line. The distribution of income becomes more unequal as
G increases and gets closer to 1.
• As 100% of the population must receive 100% of total income, the top right
hand end of the Lorenz curve must touch the top right hand end of the
diagonal.
Examples:
Lorenz Curve is scale invariant and not translation invariant
• When all incomes are scaled by the same percentage factor, the Lorenz
Curve does not change.
• This is so because scaling all incomes by the same percentage will also
increase total incomes by the same percentage.
• Whenever incomes are increased by adding an equal amount of money, the
new Lorenz Curve would be closer to the equi-distribution line than the
original one.
• The opposite holds true if income were decreased by the same amount.
Subtracting equal amounts would therefore generate more inequality in
Lorenz Curve terms.
Lorenz Curve is scale invariant…Example:
Lorenz dominance and intersection
• Lorenz dominance of one income distribution over another occurs when, for any given cumulative
proportion of population p, the Lorenz Curve of a given income distribution is above the Lorenz
Curve(s) of the other distribution(s).
• Given the Lorenz Curve and its properties, the dominating Lorenz Curve implies an income
distribution with less inequality.
• However, there is no guarantee that given two income distributions one would Lorenz-dominate.
• It may be the case that Lorenz Curves intersect. In this case, by considering only Lorenz Curves,
nothing can be said about which income distribution has less inequality.
QQ Plot
What is QQ plot?
• The quantile-quantile (q-q) plot is a graphical technique for determining if two data sets come
from populations with a common distribution.
• A q-q plot is a plot of the quantiles of the first data set ( x ) against the quantiles of the second
data set ( y ).
• A 45-degree reference line is also plotted. If the two sets come from a population with the same
distribution, the points should fall approximately along this reference line.
• The greater the departure from this reference line, the greater the evidence for the conclusion
that the two data sets have come from populations with different distributions.
Features of QQ Plot
• The points plotted in a Q–Q plot are always non-decreasing
when viewed from left to right.
• If the two distributions being compared are identical, the
Q–Q plot follows the 45° line y = x.
• If the two distributions agree after linearly transforming
the values in one of the distributions, then the Q–Q plot
follows some line, but not necessarily the line y = x.
• If the general trend of the Q–Q plot is flatter than the line y
= x, the distribution plotted on the horizontal axis is more
dispersed than the distribution plotted on the vertical axis.
• Conversely, if the general trend of the Q–Q plot is steeper
than the line y = x, the distribution plotted on the vertical
axis is more dispersed than the distribution plotted on the
horizontal axis.
• Q–Q plots are often arced, or "S" shaped, indicating that
one of the distributions is more skewed than the other, or
that one of the distributions has heavier tails than the
other.
Construction of QQ Plot
• If the data sets have the same size, the q-q plot is essentially a plot of
sorted data set 1 against sorted data set 2.
• If the data sets are not of equal size, the quantiles are usually picked
to correspond to the sorted values from the smaller data set and then
the quantiles for the larger data set are interpolated.
Probability Plot
• The term "probability plot" sometimes refers specifically to a Q–Q
plot.
• For a probability plot, the quantiles for one of the data samples are
replaced with the quantiles of a theoretical distribution.
Uses of Q-Q plot
• Q-Q plot is commonly used to check the validity assumption for a data
set.
• It is used to compare the distribution of a sample to a theoretical
distribution, such as the standard normal distribution N (0, 1), as in a
normal probability plot.
• It also used to compare two theoretical distribution.
• It is also used to compare two data sets. One can use it to get the
answer of the following questions:
• Do two data sets come from populations with a common distribution?
• Do two data sets have common location and scale?
• Do two data sets have similar distributional shapes?
• Do two data sets have similar tail behavior?