Measurement and Scaling
Measurement and Scaling
Measurement and Scaling
Syllabus
In interval scales, numbers form a continuum and provide information about the amount of
difference, but the scale lacks a true zero. The differences between adjacent numbers are equal
or known. If zero is used, it simply serves as a reference point on the scale but does not
indicate the complete absence of the characteristic being measured.
The Fahrenheit and Celsius temperature scales are examples of interval measurement. In
those scales, 0 °F and 0 °C do not indicate an absence of temperature.
For example, A well-known brand for watches, carried out semantic differential
scaling to understand the customer’s attitude towards its product. The pictorial
representation of this technique is as follows:
Semantic Differential Scale
From the diagram, we can analyze that the customer finds the product of
superior quality; however, the brand needs to focus more on the styling of its
watches.
Scale Reliability and Validity
There is some of the difficulties with measuring constructs in social science
research. For instance, how do we know whether we are measuring
“compassion” and not the “empathy”, since both constructs are somewhat
similar in meaning? Or is compassion the same thing as empathy? What makes
it more complex is that sometimes these constructs are imaginary concepts (i.e.,
they don’t exist in reality), and multi-dimensional (in which case, we have the
added problem of identifying their constituent dimensions). Hence, it is not
adequate just to measure social science constructs using any scale that we
prefer.
(1) these scales indeed measure the unobservable construct that we wanted to
measure (i.e., the scales are “valid”), and
(2) they measure the intended construct consistently and precisely (i.e., the
scales are “reliable”).
Note that reliability implies consistency but not accuracy. In the previous example of the weight scale, if the
weight scale is calibrated incorrectly (say, to shave off ten pounds from your true weight, just to make you feel
better!), it will not measure your true weight and is therefore not a valid measure. Nevertheless, the
miscalibrated weight scale will still give you the same weight every time (which is ten pounds less than your
true weight), and hence the scale is reliable.
What are the sources of unreliable observations in social science measurements? One of the primary sources is
the observer’s (or researcher’s) subjectivity. If employee morale in a firm is measured by watching whether the
employees smile at each other, whether they make jokes, and so forth, then different observers may infer
different measures of morale if they are watching the employees on a very busy day (when they have no time to
joke or chat) or a light day (when they are more jovial or chatty). Two observers may also infer different levels
of morale on the same day, depending on what they view as a joke and what is not. “Observation” is a qualitative
measurement technique. Sometimes, reliability may be improved by using quantitative measures, for instance,
by counting the number of grievances filed over one month as a measure of (the inverse of) morale. Of course,
grievances may or may not be a valid measure of morale, but it is less subject to human subjectivity, and
therefore more reliable. A second source of unreliable observation is asking imprecise or ambiguous questions.
For instance, if you ask people what their salary is, different respondents may interpret this question differently
as monthly salary, annual salary, or per hour wage, and hence, the resulting observations will likely be highly
divergent and unreliable. A third source of unreliability is asking questions about issues that respondents are not
very familiar about or care about, such as asking an American college graduate whether he/she is satisfied with
Canada’s relationship with Slovenia, or asking a Chief Executive Officer to rate the effectiveness of his
company’s technology strategy – something that he has likely delegated to a technology executive.
So how can you create reliable measures? If your measurement involves
soliciting information from others, as is the case with much of social science
research, then you can start by replacing data collection techniques that
depends more on researcher subjectivity (such as observations) with those that
are less dependent on subjectivity (such as questionnaire), by asking only
those questions that respondents may know the answer to or issues that they
care about, by avoiding ambiguous items in your measures (e.g., by clearly
stating whether you are looking for annual salary), and by simplifying the
wording in your indicators so that they not misinterpreted by some respondents
(e.g., by avoiding difficult words whose meanings they may not know). These
strategies can improve the reliability of our measures, even though they will not
necessarily make the measurements completely reliable. Measurement
instruments must still be tested for reliability.
There are many ways of estimating reliability, some of the test are mentioned here :
➢ Inter-rater reliability
➢ Test-retest reliability
➢ Split-half reliability
➢ Internal consistency reliability
➢ Cronbach’s alpha
Validity
• Validity , often called construct validity, refers to the extent to which a measure adequately represents
the underlying construct that it is supposed to measure. For instance, is a measure of compassion
really measuring compassion, and not measuring a different construct such as empathy? Validity can
be assessed using theoretical or empirical approaches, and should ideally be measured using both
approaches. Theoretical assessment of validity focuses on how well the idea of a theoretical construct
is translated into or represented in an operational measure. This type of validity is called translational
validity (or representational validity), and consists of two subtypes: face and content validity.
Translational validity is typically assessed using a panel of expert judges, who rate each item
(indicator) on how well they fit the conceptual definition of that construct, and a qualitative technique
called Q-sort.
• Empirical assessment of validity examines how well a given measure relates to one or more external
criterion, based on empirical observations. This type of validity is called criterion-related validity ,
which includes four sub-types: convergent, discriminant, concurrent, and predictive validity. While
translation validity examines whether a measure is a good reflection of its underlying construct,
criterion -related validity examines whether a given measure behaves the way it should, given the
theory of that construct. This assessment is based on quantitative analysis of observed data using
statistical techniques such as correlational analysis, factor analysis, and so forth. The distinction
between theoretical and empirical assessment of validity is illustrated in Figure 7.2. However, both
approaches are needed to adequately ensure the validity of measures in social science research.
• Note that the different types of validity discussed here refer to the validity of the measurement
procedures , which is distinct from the validity of hypotheses testing procedures , such as internal
validity (causality), external validity (generalizability), or statistical conclusion validity.
Classification of Data
Meaning of Classification of Data
• It is the process of arranging data into homogeneous (similar) groups according to their common characteristics.
• Raw data cannot be easily understood, and it is not fit for further analysis and interpretation. Arrangement of data
helps users in comparison and analysis.
• For example, the population of a town can be grouped according to sex, age, marital status, etc.
Classification of data
The method of arranging data into homogeneous classes according to the common features present in the data is
known as classification.
A planned data analysis system makes the fundamental data easy to find and recover. This can be of particular interest
for legal discovery, risk management, and compliance. Written methods and sets of guidelines for data classification
should determine what levels and measures the company will use to organise data and define the roles of employees
within the business regarding input stewardship.
Once a data -classification scheme has been designed, the security standards that stipulate proper approaching
practices for each division and the storage criteria that determines the data’s lifecycle demands should be discussed.
Geographical classification
Chronological classification
Qualitative classification
Quantitative classification
When data are classified on the basis of location or areas, it is called geographical classification.
In geographical classification, data are classified on the basis of location, region, etc. For example, if we present the data
regarding production of sugarcane or wheat or rice, in view of the four main regions in India, this would be known as
geographical classification as given below. Geographical classification is usually listed in alphabetical order for easy reference.
Items may also be listed by size to emphasis the magnitude of the areas under consideration such as ranking the states based on
population. Normally, in reference tables, the first approach (i.e. listing in alphabetical order) is followed.
Example: Classification of production of food grains in different states in India.
In Qualitative classification, data are classified on the basis of some attributes or quality such as sex, colour of hair, literacy and
religion. In this type of classification, the attribute under study cannot be measured. It can only be found out whether it is present
or absent in the units of study.
In qualitative classification, data are classified on the basis of some attributes or qualitative characteristics such as sex, colour of
hair, literacy, religion, etc. You should note that in this type of classification the attribute under study cannot be measured
quantitatively. One can only count it according to its presence or absence among the individuals of the population under study.
The number of farmers based on their land holdings can be given as follows
Number of farmers
Type of farmers
Marginal 907
Medium 1041
Large 1948
Total 3896
Qualitative classification can be of two types as follows
o Simple classification
o Manifold classification
50-60 200
60-70 300
70-80 100
80-90 30
90-100 20
Total 700
The data on land holdings by farmers in a block. Quantitative classification is based the land holding which is the variable in this
example.
Land holding ( hectare) Number of Farmers
<1 442
1-2 908
2-5 471
>5 124
Total 1945
Variable
Variable refers to the characteristic that varies in magnitude or quantity. E.g. weight of the students. A variable may be discrete or
continuous.
Discrete variable
A discrete variable can take only certain specific values that are whole numbers (integers). E.g. Number of children in a family or
Number of class rooms in a school.
Continuous variable
A Continuous variable can take any numerical value within a specific interval.
Example: the average weight of a particular class student is between 60 and 80 kgs.
Frequency
For example there are 50 students having weight of 60 kgs. Here 50 students is the frequency.
Frequency distribution
Frequency distribution refers to data classified on the basis of some variable that can be measured such as prices, weight, height,
wages etc.
The following are the two examples of discrete and continuous frequency distribution
The following technical terms are important when a continuous frequency distribution is formed
Class limits: Class limits are the lowest and highest values that can be included in a class. For example take the class 40-50. The
lowest value of the class is 40 and the highest value is 50. In this class there can be no value lesser than 40 or more than 50. 40 is
the lower class limit and 50 is the upper class limit.
Class interval: The difference between the upper and lower limit of a class is known as class interval of that class. Example in
the class 40-50 the class interval is 10 (i.e. 50 minus 40).
Class frequency: The number of observations corresponding to a particular class is known as the frequency of that class
Example:
Income (Rs) No. of persons
1000 - 2000 50
In the above example, 50 is the class frequency. This means that 50 persons earn an income between Rs.1, 000 and Rs.2,
000.
A table is a systematic arrangement of statistical data in columns and rows. Rows are horizontal arrangements whereas the
columns are vertical ones.
Presentation of Data
Statistics is all about data. Presenting data effectively and efficiently is an art. You may have uncovered many truths that
are complex and need long explanations while writing. This is where the importance of the presentation of data comes
in. You have to present your findings in such a way that the readers can go through them quickly and understand each
and every point that you wanted to showcase. As time progressed and new and complex research started happening,
people realized the importance of the presentation of data to make sense of the findings.
• Textual
• Tabular
• Diagrammatic
Out of the different methods of data presentation, this is the simplest one. You just write your findings in a coherent
manner and your job is done. The demerit of this method is that one has to read the whole text to get a clear picture.
Yes, the introduction, summary, and conclusion can help condense the information.
Tabular Ways of Data Presentation and Analysis
To avoid the complexities involved in the textual way of data presentation, people use tables and charts to present data.
In this method, data is presented in rows and columns - just like you see in a cricket match showing who made how
many runs. Each row and column have an attribute (name, year, sex, age, and other things like these). It is against these
attributes that data is written within a cell.
• Geometric Diagram
When a Diagrammatic presentation involves shapes like a bar or circle, we call that a Geometric Diagram.
Examples of Geometric Diagram
• Bar Diagram
Pie Chart
A pie chart is a chart where you divide a pie (a circle) into different parts based on the data. Each of the data is first
transformed into a percentage and then that percentage figure is multiplied by 3.6 degrees. The result that you get is the
angular degree of that corresponding data to be drawn in the pie chart. So, for example, you get 30 degrees as the
result, on the pie chart you draw that angle from the center.
Pie charts provide a very descriptive & a 2D depiction of the data pertaining to comparisons or resemblance of data in two
separate fields.
Bar charts
A bar chart that shows the accumulation of data with cuboid bars with different dimensions & lengths which are directly
proportionate to the values they represent. The bars can be placed either vertically or horizontally depending on the data being
represented.
Frequency Diagram
Suppose you want to present data that shows how many students have 1 to 2 pens, how many have 3 to 5 pens, how
many have 6 to 10 pens (grouped frequency) you do that with the help of a Frequency Diagram. A Frequency Diagram
can be of many kinds:
Column chart
It is a simplified version of the pictorial Presentation which involves the management of a larger amount of data being shared
during the presentations and providing suitable clarity to the insights of the data.
Histograms
It is a perfect Presentation of the spread of numerical data. The main differentiation that separates data graphs and
histograms are the gaps in the data graphs.
Pictorial Presentation
It is the simplest form of data Presentation often used in schools or universities to provide a clearer picture to students, who
are better able to capture the concepts effectively through a pictorial Presentation of simple data.
Box plots
Box plot or Box-plot is a way of representing groups of numerical data through quartiles. Data Presentation is easier with this
style of graph dealing with the extraction of data to the minutes of difference.
Maps
Map Data graphs help you with data Presentation over an area to display the areas of concern. Map graphs are useful to make
an exact depiction of data over a vast case scenario.
All these visual presentations share a common goal of creating meaningful insights and a platform to understand and manage
the data in relation to the growth and expansion of one’s in-depth understanding of data & details to plan or execute future
decisions or actions.
Importance of Data Presentation
Data Presentation could be both can be a deal maker or deal breaker based on the delivery of the content in the context of visual
depiction.
Data Presentation tools are powerful communication tools that can simplify the data by making it easily understandable &
readable at the same time while attracting & keeping the interest of its readers and effectively showcase large amounts of complex
data in a simplified manner.
If the user can create an insightful presentation of the data in hand with the same sets of facts and figures, then the results promise
to be impressive.
There have been situations where the user has had a great amount of data and vision for expansion but the presentation drowned
his/her vision.
To impress the higher management and top brass of a firm, effective presentation of data is needed.
Data Presentation helps the clients or the audience to not spend time grasping the concept and the future alternatives of the
business and to convince them to invest in the company & turn it profitable both for the investors & the company.
Although data presentation has a lot to offer, the following are some of the major reason behind the essence of an effective
presentation:-
• Many consumers or higher authorities are interested in the interpretation of data, not the raw data itself. Therefore, after the
analysis of the data, users should represent the data with a visual aspect for better understanding and knowledge.
• The user should not overwhelm the audience with a number of slides of the presentation and inject an ample amount of
texts as pictures that will speak for themselves.
• Data presentation often happens in a nutshell with each department showcasing their achievements towards company
growth through a graph or a histogram.
• Providing a brief description would help the user to attain attention in a small amount of time while informing the audience
about the context of the presentation
• The inclusion of pictures, charts, graphs and tables in the presentation help for better understanding the potential outcomes.
• An effective presentation would allow the organization to determine the difference with the fellow organization and
acknowledge its flaws. Comparison of data would assist them in decision making.
Meaning
• A correlation coefficient of 1 means that for every positive increase in one variable, there is a positive increase of a fixed proportion
in the other. For example, shoe sizes go up in (almost) perfect correlation with foot length.
• A correlation coefficient of -1 means that for every positive increase in one variable, there is a negative decrease of a fixed
proportion in the other. For example, the amount of gas in a tank decreases in (almost) perfect correlation with speed.
• Zero means that for every increase, there isn’t a positive or negative increase. The two just aren’t related.
The absolute value of the correlation coefficient gives us the relationship strength. The larger the number, the stronger the relationship. For
example, |-.75| = .75, which has a stronger relationship than .65.
One of the most commonly used formulas is Pearson’s correlation coefficient formula. If you’re taking a basic stats class, this is the one you’ll
probably use:
Two other formulas are commonly used: the sample correlation coefficient and the population correlation coefficient.
The population correlation coefficient uses σx and σy as the population standard deviations, and σxy as the population covariance.
Pearson correlation is used in thousands of real life situations. For example, scientists in China wanted to know if there was a
relationship between how weedy rice populations are different genetically. The goal was to find out the evolutionary potential of
the rice. Pearson’s correlation between the two groups was analyzed. It showed a positive Pearson Product Moment correlation of
between 0.783 and 0.895 for weedy rice populations. This figure is quite high, which suggested a fairly strong relationship.
Example question: Find the value of the correlation coefficient from the following table:
Step 4: Take the square of the numbers in the y column, and put the result in the y2 column.
What Is Regression?
Regression is a statistical method used in finance, investing, and other disciplines that attempts to determine the strength and
character of the relationship between one dependent variable (usually denoted by Y) and a series of other variables (known as
independent variables).
Regression helps investment and financial managers to value assets and understand the relationships between variables, such
as commodity prices and the stocks of businesses dealing in those commodities.
Regression Explained
The two basic types of regression are simple linear regression and multiple linear regression, although there are non-linear
regression methods for more complicated data and analysis. Simple linear regression uses one independent variable to explain or
predict the outcome of the dependent variable Y, while multiple linear regression uses two or more independent variables to
predict the outcome.
Regression can help finance and investment professionals as well as professionals in other businesses. Regression can also he lp
predict sales for a company based on weather, previous sales, GDP growth, or other types of conditions. The capital asset pricing
model (CAPM) is an often-used regression model in finance for pricing assets and discovering costs of capital.
The general form of each type of regression is:
Where:
Regression takes a group of random variables, thought to be predicting Y, and tries to find a mathematical relationship between
them. This relationship is typically in the form of a straight line (linear regression) that best approximates all the individual data
points. In multiple regression, the separate variables are differentiated by using subscripts.
Regression analysis is majorly used to find equations that will fit the data. Linear analysis is one type of
regression analysis. The equation for a line is y = a + bX. Y is the dependent variable in the formula which one
is trying to predict what will be the future value if X, an independent variable, change by a certain value. “a” in
the formula is the intercept which is that value which will remain fixed irrespective of changes in the
independent variable and the term ‘b’ in the formula is the slope which signifies how much variable is the
dependent variable upon independent variable.
Regression helps investment and financial managers to value assets and understand the relationships between variables
Regression can help finance and investment professionals as well as professionals in other businesses.
Regression is often used to determine how many specific factors such as the price of a commodity, interest rates, particular
industries, or sectors influence the price movement of an asset. The aforementioned CAPM is based on regression, and it is
utilized to project the expected returns for stocks and to generate costs of capital. A stock's returns are regressed against the
returns of a broader index, such as the S&P 500, to generate a beta for the particular stock.
Beta is the stock's risk in relation to the market or index and is reflected as the slope in the CAPM model. The return for t he
stock in question would be the dependent variable Y, while the independent variable X would be the market risk premium.
Additional variables such as the market capitalization of a stock, valuation ratios, and recent returns can be added to the CAPM
model to get better estimates for returns. These additional factors are known as the Fama-French factors, named after the
professors who developed the multiple linear regression model to better explain asset returns.
Regression analysis widely used statistical methods to estimate the relationships between one or more
independent variables and dependent variables. Regression is a powerful tool as it is used to assess the
strength of the relationship between two or more variables, and then it would be used for modeling the
relationship between those variables in the future.
Y=a + bX + ∈
Where:
Examples
Example #1
Consider the following two variables x and y, you are required to do the calculation of the regression.
Solution:
Using the above formula, we can do the calculation of linear regression in excel as follows.
We have all the values in the above table with n = 5.
Now, first, calculate the intercept and slope for the regression.
a = 0.52
Calculation of Slope is as follows,
b = 1.20
Let’s now input the values in the regression formula to get regression.
Example #2
State bank of India recently established a new policy of linking savings account interest rate to Repo rate, and
the auditor of the state bank of India wants to conduct an independent analysis on the decisions taken by the
bank regarding interest rate changes whether those have been changes whenever there have been changes in
the Repo rate. Following is the summary of the Repo rate and Bank’s savings account interest rate that
prevailed in those months are given below.
The auditor of state bank has approached you to conduct an analysis and provide a presentation on the same
in the next meeting. Use regression formula and determine whether Bank’s rate changed as and when the
Repo rate was changed?
Solution:
Using the formula discussed above, we can do the calculation of linear regression in excel. Treating
the Repo rate as an independent variable, i.e., X, and treating Bank’s rate as the dependent variable as Y.
We have all the values in the above table with n = 6.
Now, first, calculate the intercept and slope for the regression.
b= -0.04
Let’s now input the values in the formula to arrive at the figure.
Analysis: It appears State bank of India is indeed following the rule of linking its saving rate to the repo rate
as there is some slope value that signals a relationship between the repo rate and the bank’s saving account
rate.
Example #3
ABC laboratory is conducting research on height and weight and wanted to know if there is any relationship
like as the height increases, the weight will also increase. They have gathered a sample of 1000 people for
each of the categories and came up with an average height in that group.
You are required to do the calculation of regression and come up with the conclusion that any such
relationship exists.
Solution:
Using the formula discussed above, we can do the calculation of linear regression in excel. Treating Height as
an independent variable, i.e., X, and treating Weight as the dependent variable as Y.
We have all the values in the above table with n = 6
Now, first, calculate the intercept and slope for the regression.
a = 68.63
b = -0.07
Let’s now input the values in the formula to arrive at the figure.
Analysis: It appears that there is a significant very less relationship between height and weight as the slope is
very low.