0% found this document useful (0 votes)
10 views3 pages

Research Methodology - 11

The document outlines key concepts in data analysis, focusing on measures of central tendency (mean, median, mode) and measures of dispersion (range, standard deviation). It also discusses correlation and simple linear regression, providing formulas and examples to illustrate these statistical methods. An integrated example involving a bakery's advertising spend and sales data is included to demonstrate the application of these concepts.

Uploaded by

cryptoworld20182
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views3 pages

Research Methodology - 11

The document outlines key concepts in data analysis, focusing on measures of central tendency (mean, median, mode) and measures of dispersion (range, standard deviation). It also discusses correlation and simple linear regression, providing formulas and examples to illustrate these statistical methods. An integrated example involving a bakery's advertising spend and sales data is included to demonstrate the application of these concepts.

Uploaded by

cryptoworld20182
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

📈

Research Methodology - 11
MODULE XI: DATA ANALYSIS

A. MEASURES OF CENTRAL TENDENCY


Describe the “center” or “typical” value of a distribution.

1. Mean (Arithmetic Average)

Definition: Sum of all observations divided by the number of observations.

Formula: xˉ=1n∑i=1nxi\bar{x} = \tfrac{1}{n}\sum_{i=1}^n x_i

Example: Monthly sales (₹ 000s) for 5 months: 12, 15, 18, 20, 25

Sum = 12+15+18+20+25 = 90

Mean = 90 ÷ 5 = 18

Key Point: Sensitive to extreme values.

2. Median (Middle Value)

Definition: The middle observation when data are ordered. If n is even, it’s the average of the two central values.

Example (Odd n): Sales above → ordered [12, 15, 18, 20, 25], median = 18 (3ᵈ value).

Example (Even n): Add one more month, say 30 → [12, 15, 18, 20, 25, 30], median = (18+20)/2 = 19

Key Point: Unaffected by outliers; best for skewed distributions.

3. Mode (Most Frequent Value)

Definition: Observation that occurs most often. A distribution may be unimodal, bimodal, or multimodal.

Example: Customer visits per week: [2, 3, 3, 4, 5, 3, 4] → 3 occurs 3 times → mode = 3

Key Point: Only measure defined for nominal data; may not exist (all unique) or not be unique (multiple modes).

B. MEASURES OF DISPERSION
Quantify the spread or variability in the data.

1. Range

Definition: Difference between the maximum and minimum values.

Formula: Range = xmax⁡–xmin⁡x_{\max} – x_{\min}

Example (Sales data): Max = 25, Min = 12 → range = 13

Research Methodology - 11 1
Key Point: Simple but highly sensitive to outliers; ignores intermediate values.

2. Standard Deviation (SD)

Definition: Average distance of observations from the mean.

Formula:
s=1n−1∑i=1n(xi−xˉ)2 s = \sqrt{\frac{1}{n-1}\sum_{i=1}^n (x_i - \bar{x})^2}

Example: Sales [12, 15, 18, 20, 25], mean=18

1. Compute deviations: –6, –3, 0, +2, +7

2. Square them: 36, 9, 0, 4, 49 → sum = 98

3. Divide by n–1 = 4: 98/4 = 24.5

4. SD = √24.5 ≈ 4.95

Key Point: Uses all data; common basis for further analysis (e.g., z-scores).

C. CONCEPT OF CORRELATION & REGRESSION

1. Correlation

Purpose: Measures the strength and direction of the linear relationship between two variables X and Y.

Pearson’s r formula:
r=∑(xi−xˉ)(yi−yˉ)∑(xi−xˉ)2 ∑(yi−yˉ)2 r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum(x_i - \bar{x})^2
\;\sum(y_i - \bar{y})^2}}

Example: Hours studied vs. exam score for 5 students:

X: [2, 4, 6, 8, 10] Y: [50, 60, 70, 80, 90]

Perfect linear ↑ → r = +1

Interpretation:

r ≈ +1 strong positive

r ≈ 0 no linear relationship

r ≈ –1 strong negative

2. Simple Linear Regression

Purpose: Predict Y from X via the line Y=a+bXY = a + bX.

Slopes & Intercept:

b=∑(xi−xˉ)(yi−yˉ)∑(xi−xˉ)2,a=yˉ−b xˉ b = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum(x_i - \bar{x})^2},


\quad
a = \bar{y} - b\,\bar{x}

Example (continued):

With perfectly linear data above, b = 4, a = 42

Regression equation: Score = 42 + 4 × (Hours studied)

Predict: study 5 hrs → score = 42 + 4×5 = 62

Key Points:

b = change in Y per one‐unit change in X.

a = predicted Y when X = 0.

R² (coefficient of determination) = r² measures % variance in Y explained by X.

INTEGRATED EXAMPLE

A small bakery tracks daily advertising spend (₹000s) and daily sales (₹000s) over 7 days:

Day Ad Spend (X) Sales (Y)

1 2 20

Research Methodology - 11 2
2 3 30

3 5 45

4 4 40

5 6 60

6 7 70

7 8 80

Mean X = 5; Mean Y = 50

Range sales = 80 – 20 = 60

SD (X) ≈ 2.16; SD (Y) ≈ 21.6

r ≈ +0.99 (very strong positive)

Regression line:

b = Cov(X,Y)/Var(X) = (Σ (x−5)(y−50)) / Σ(x−5)² = (280) / (28) = 10

a = 50 – 10×5 = 0

Y = 10 X (each ₹1 000 ad spend yields ₹10 000 sales)

Research Methodology - 11 3

You might also like