SDM Lab Report
SDM Lab Report
Laboratory Certificate
This is to certify that Mr. / Ms…………………………………… has satisfactorily
completed the course of Experiments in Statistics in Decision making prescribed by
the Department during the year 2024-2025
USN: ………………………………
Semester: I
1
Signature of the staff in-charge Head of the Department
VISION
2
Imparting innovation and value based education in Industrial Engineering and Management for
steering organizations to global standards with an emphasis on sustainable and inclusive
development.
MISSION
§ To impart scientific knowledge, engineering and managerial skills for driving organizations
to global excellence.
§ To institute collaborative academic and research exchange programs with national and
globally renowned academia, industries and other organizations.
§ To establish and nurture centers of excellence in the niche areas of Industrial and Systems
Engineering.
II. Develop competency to adapt to changing roles for achieving organizational excellence.
III. Design and develop sustainable technologies and solutions for betterment of society.
IV. Pursue entrepreneurial venture with a focus on creativity and innovation for developing
newer products, processes and systems.
3
2. Problem analysis: Identify, formulate, review research literature, and analyze complex
engineering problems reaching substantiated conclusions using first principles of mathematics,
natural sciences, and engineering sciences.
5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and
modern engineering and IT tools including prediction and modeling to complex
engineering activities with an understanding of the limitations.
6. The engineer and society: Apply reasoning informed by the contextual knowledge to
assess societal, health, safety, legal and cultural issues and the consequent responsibilities
relevant to the professional engineering practice.
8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities
and norms of the engineering practice.
11. Project management and finance: Demonstrate knowledge and understanding of the engineering
and management principles and apply these to one’s own work, as a member and leader in a
team, to manage projects and in multidisciplinary environments.
12. Life-long learning: Recognize the need for, and have the preparation and ability to
engage in independent and life-long learning in the broadest context of technological
change.
4
PROGRAM SPECIFIC OUTCOMES (PSO)
1. Design, develop, implement and improve integrated systems that include people,
materials, information, equipment and energy.
2. Apply statistical and simulation tools, optimization and meta heuristics techniques for
analysis of various systems leading to better decision making.
Index
Page no
1. Introduction 7
2. V lookup 8
3. Data survey 11
5
4. Data Analysis and linear regression of Tesla closing 15
stock prices
Introduction
In this course “Statistics in Decision Making”, we embark on a quest to unravel the
intricate tapestry of statistical methodologies that underpin sound decision making
processes. This project serves as a gateway into applying statistical tools in real-world
scenarios, exploring the significance of data analysis in shaping informed decisions
across various domains.
Through hands-on applications and case studies, we dive into the process of transforming
raw data into meaningful insights, equipping ourselves with the skills necessary to extract
valuable information from the vast sea of data. This project aims to showcase the particle
6
relevance of statistics, emphasizing its role as a powerful tool in guiding decision-making
processes in diverse fields.
This project would not be possible without the support and guidance of our esteemed
faculty from the “Industrial Engineering and Management” department. Their expertise
and guidance provided the necessary scaffolding for our academic endeavors, propelling
us toward a deeper comprehension of statistical principles.
V look-Up
Name: Medhansh Jain
USN: 24RVIGCP23
Subject: Statistics in Decision Making
Aim: To understand and apply the VLOOKUP function in Excel to search for
specific information in a given data set and get corresponding values based on a
given criteria.
Apparatus used: Microsoft Excel
Introduction:
During this class we were taught how to make use of the VLOOKUP function in excel
using a data set of office chair sales. This helped us understand how to simplify data
retrieval when a large set of data is given.
Theory:
7
The VLOOKUP function in excel searches for a value in the first column of a range and
returns a value in the same row from a specified column. The syntax for this function is:
= VLOOKUP (lookup_value, table_array, col_index_num, [range_lookup])
Where
- Lookup_value: the value you want to search for
- Table_array: the range of cells contained the data (lookup table)
- Col_index_num: the column number in the table from which the value if
retrieved
- Range_lookup: True (approx match) or false (exact match)
Procedure:
1. Open the sample dataset “Office chair sales Q1-Q2 2020” given in MS excel
2. Convert the given dataset to a table using table in insert tab
3. Create a new column named ‘Company name’
4. Under the new column, write the function as followsL:
= VLOOKUP ([@[customer ID]],’Customer Info’!$A$4:$C$12,2,0)
And press ‘enter’
Output:
8
After using the VLOOKUP function:
9
As you can see, the function helps us organize and makes it easier for us to search for a
value in a table or range by row and retrieve the corresponding data from another column
in the same row.
Conclusion:
Data Survey
10
Name: Jai Adhvik, Ayushmaan, Prarthana, Neha
USN: 24RVIGCP21
Subject: Statistics in Decision Making
Aim: Our aim was to find out the general habits of students in University.
Apparatus used: Pen, Book, Microsoft Excel
Introduction
The purpose of this report is to analyze the music listening habits of 62 respondents,
focusing on their preferred genres, frequency of music consumption, attendance at live
music events, and preferred music streaming platforms. Additionally, demographic
information regarding gender identity has been considered for a comprehensive
understanding.
Objective: For us to find out the general music preference of students using the questions
below:
1-Whats your gender?
2-What type of music do you listen to?
3-Do you use your phone in class?
4-How many hours do you study daily?
5-What do you like to do in your free time?
6-Do you stay in the hostel?
Method
1. Data collection process:
- Questions regarding the General habits of students were formed - 6
questions were formed - these questions will help us understand the general
habits of the students
- Using the questions formed, individual interviews were conducted around
the campus, we conducted around 60-70 interviews
- The data was collected using a pen and paper, writing down the answers
given by the students
2. Data analysis process:
- Once the data was collected, it was uploaded onto microsoft excel to make
individual graphs representing each of the data sets - the graphs were made
using excel’s in-built options.
- References with respect to the graphs were made, and the inferences were
presented to the class using a powerpoint presentation.
11
Findings:
This graph represents the daily screen time of the 70 students that were interviewed.
Around 23 students spend an average of 2-4 hours in front of a screen, and an average of
around 27 students spend 4-6 hours in front of a screen. A bar graph was used here to
represent the discrete distribution of the data.
This pie graph illustrates the various apps used by college students. A pie chart was used
here to show the various percentages of a whole. Out of the 70 students that were
interviewed, a majority of the students made use of Instagram. Second was whatsapp
with a percentage of 7.1% (around 4-5 people).
12
This graph shows the number of hours students spent studying in
a day. On an average, it can be seen that around 34 people spend
only 2-4 hours studying. This visualization suggests that the
majority of students dedicate moderate amounts of time (2−4
hours) to studying, with fewer committing to significantly
longer or shorter periods.
Another Pie chart was used to show the number of people who use their phones in class.
From the graph above it can be seen that more than 60% of students that were
interviewed use their phones during class.
13
In order to find out the number of students who lived in pg, hostel or were day scholars,
we made use of a pie chart which helped us understand the percentage of people in each
of the 3 categories.
Our last question was what people do during their free time. This is a perfect use of a pie
chart as it helped us show the percentage of people for each category. As it can be seen,
the majority of the people spend their time watching something, or keeping fit through
sports and fitness.
Conclusion:
The survey that was conducted informs us of many key findings. We found out that
despite the strict rule of being allowed to use phones during class, many students still find
ways to use their phones during class. Moreover, it was seen that most of the students
spend a moderate amount of time (2-4 hours) studying. However, when we asked
students what they do during their free time and what social media apps people use the
most, it was not a surprise to see students spend most of their free time on social media
apps such as instagram, youtube and whatsapp. Overall, this project of collecting,
representing and analysis of data helped me gain a better understanding towards the
general habits of college students at RVCE.
14
Data Analysis and Linear Regression
Name: Jai Adhvik, Ayushmaan, Neha, Prarthana
USN: 24RVIGCP21
Subject: Statistics in Decision Making
Aim: Present data analysis of a given data set - Histogram, Linear regression, and
Correlation.
Apparatus used: Microsoft Excel, Tesla stock prices
Introduction:
In this project, using any set of data and the built-in functions of graphing Excel, we had
to make a histogram, a linear regression chart and correlation as well. Hence, we took the
opening stock prices of Tesla. Using this data set, a histogram, linear regression and
correlation executions took place. Microsoft Excel provides a versatile and user-friendly
platform for people to make histograms, and to do a linear regression and correlation
analysis for a given set of data. In this exploration, we will uncover how Excel's features
can help us make histograms and the regression charts.
Method:
1. For the data, using a website called Kaggle we took the first 2000+ opening stock
prices of tesla.
2. Then using the in-built data analysis tool ‘ToolPak’, the histogram can be made.
Changing the range of the bins, frequency for each bins and formatting the axis,
we get a Histogram.
3. Conducting the data analysis of the given data was also done using the built-in
‘Data analysis’ function in excel. By highlighting the columns for the data
analysis, the mean, median, mode, variance, standard deviation, range, sum, count
skewness and more was found.
4. Lastly, to conduct the linear regression and correlation analysis, we again made
use of the built-in functions of excel. Again highlighting the tables required, the
values for linear regression and correlation can be found.
15
Output:
Data analysis:
The above table shows us the data analysis of the data. The built-in function of data
analysis in excel made it easier for us to find out variance, standard deviation, skewness
and more.
Histogram:
16
The above image shows us a tabular representation of the graph above. By using the
built-in functions to find the value of the bins and frequencies, excel made it much easier
to make a histogram in minutes compared to the manual way which would have taken a
lot of time considering more than 2000+ opening stock values were considered.
Regression analysis:
17
Again, making use of the built-in functions of excel, we were able to find out the ‘r’ and
‘r^2’ values. These values are crucial to us as it helps us understand how well the data fits
the regression model. For example, in the above data, we got an r^2 value of 0.44639,
which tells us that the relationship between the variables are not very strong (date (x) and
opening stock value (y)). The numerical value tells us that approximately 44.64% of the
variance in the dependent variable is explained by the independent variable.
Conclusion:
In conclusion, Microsoft Excel showed a different way to execute data analysis and linear
regressions which helped draw statistical inferences which then helped us understand the
relationship between the 2 variables.
18
which represents a new effect or difference, and using statistical techniques to determine
which is more likely to be true.
- The null hypothesis (H0): represents the default assumption typically indicating no
effect or no difference.
- Alternative hypothesis (H1): Represents the claim or effect being tested, usually
indicating a significant difference or relationship.
Procedure
1. We first have to define H0 and H1, and choose a significance level (usually 0.05 or
0.01)
2. Collect the data: father sample data which are relevant to the hypotheses
3. Calculate the test statistic: using statistical methods we can compute a test statistic
from the data
4. Making a decision: compare the test statistic to the critical values or use a p-value
to determine whether to reject or accept the null hypothesis
5. Draw conclusions: Interpret the results in the context of the original research
question. Hypothesis testing helps determine if observed data deviates
significantly from what would be expected under the null hypothesis
Theory:
Single mean: A single mean hypothesis test involves testing whether the mean of a
population is equal to a specified value. An example is shown below for better
understanding.
Example:
Q: Suppose you want to test if the average height of a group of students is 170 cm.
After collecting data from a sample, you calculate the sample mean and standard
deviation, perform the test, and interpret the results to see if there's enough evidence to
conclude that the mean height differs from 170 cm
Double mean: A double mean hypothesis test typically involves comparing the means of
two independent groups to determine if there is a significant difference between them.
This is often referred to as an independent sample t-test. An example is shown below for
better understanding.
19
Example:
Q: Imagine you want to compare the test scores of 2 different teaching methods:
You would calculate the t-statistic and then decide whether to reject the null hypothesis
based on your results.
Conclusion
Name: Jai Adhvik, Ayushmaan, Arjun, Nathan, Ishan, Utkarsh, Shanmuka Saketh
USN: 24RVIGCP21
Subject: Statistics in Decision Making
Aim: To check if a data follows uniform distribution
Apparatus used: Die
20
Aim: to determine the goodness of fit for a set of data from an activity assuming to
produce uniformly distributed data.
Apparatus: Dies
Theory: The Uniform distribution is a discrete distribution bounded by a max and min
[min,max] with constant probability at every value on or between the bounds. Sometimes
called the discrete rectangular distribution, it arises when an event can have a finite and
equally probable number of outcomes. Note that the probabilities are actually weights at
each integer, but are represented by broader bars for visibility.
Formula:
1
p(x )=
max−min+1
Where,
Min and max: are lower and upper limits of the distribution
Hypotheses:
H0 = Experiment of a single die follows uniform distribution
H1 = Experiment of a single die does not follow uniform distribution
Procedure:
1. Toss 1 Die 120 times
2. After tossing the die 1 time, note down the face up value of the die
3. Repeat step 2 until 120 values have been tabulated
4. Determine the probability of uniform distribution (since there are 6 faces on the
die, the probability of occurrence of any one face is equal to ⅙)
5. Calculate the expected frequency,
probability of occurrence ×total number of observation
1
¿ × 120=20
6
6. Determine the Chi square table value with degrees of freedom = m - k - 1
Where, m = no of rows, k = no of parameters estimated
Confidence level (α ) = 0.05
7. If Chi square value is less than the chi square table value, then the data fits
uniform distribution
Experimental setup:
21
/
Tabulation:
Face no Observed Pi Expected freq Pooled Pooled ¿
frequency Oi ' ei = Pi × ΣOi ' observed expected freq
Freq Oi y1 ei
1 24 1/6 20 24 20 0.8
2 24 1/6 20 24 20 0.8
3 17 1/6 20 17 20 0.45
4 16 1/6 20 16 20 0.8
5 18 1/6 20 18 20 0.2
6 21 1/6 20 21 20 0.05
Total 120 1 120 120 120 3.1
Calculations:
22
Calculation of degrees of freedom:
m=6
k = 0
Degrees of freedom = m - k - 1 = 6 - 0 - 1 = 5
Inference:
From the information above, since the calculated chi square value is less than the chi
square table value ( χ 2 cal< χ 2 tab) (3.1 < 11.07), we fail to reject the H O. Since HO is not
rejected, it implies that the experiment of single die follows a uniform distribution.
Name: Jai Adhvik, Ayushmaan, Arjun, Nathan, Ishan, Utkarsh, Shanmuka Saketh
USN: 24RVIGCP21
Subject: Statistics in Decision Making
Aim: To check if a data follows Binomial distribution
Apparatus used: 10 Die
Aim: To determine the goodness of fit for a set of data from an activity assuming to
produce binomially distributed data.
Apparatus: Dies
23
of the event occurring. Each single trial is assumed to be independent of all others. For
large n, the binomial distribution may be approximated by the Normal distribution.
Formulae:
p(x )=¿
Where x = 0, 1, …. n
p = probability of an event occurring
n = number of trials
Hypotheses:
H0 = The experiment conducted follows a binomial distribution
H1 = The experiment conducted does not follow a binomial distribution
Procedure:
1. Toss 10 dies 80 times
2. Count the number of times the number 1 occurs on the die and record the observed
frequency table
3. Determine the binomial probability using the mass function’
4. Calculate the expected frequency
5. Determine the Chi square table value with degrees of freedom = m - k - 1
where , m = no of rows, k = number of parameters estimated, and confidence level
α =0.05
6. If chi square calculated value is less than or equal to the chi square value, the data
follows a binomial distribution
24
Sl no No of Defectives Tally marks Observed
frequency Oi’
1 0 11
2 1 27
3 2 18
4 3 17
5 4 7
6 5
7 6
Total 80
Tabulation:
No of Observed freq Individual Expected freq Pooled Observed Pooled ¿¿
defects Oi ’ probability (Pi) Ei = Pi x Σ Oi’ freq expected freq
Oi ei
0 11 0.129 10.33 11 10.33 0.043
1 27 0.307 24.56 27 24.56 0.242
2 18 0.313 25.03 18 25.03 1.974
3 17 0.177 14.17 17 14.17 0.565
4 7 0.060 4.81 7 5.79 0.253
5 0 0.014 1.11 0 0.12 0.120
Total 80 80.00 80 80.00 3.197
2
χ cal=3.197
DOF = m - k - 1 = 6 - 1 - 1 = 4
χ 0.05,4 = 9.488
2
25
Specimen calculations:
¿¿ ⇒ ¿¿
Inference:
From the above calculations and tabulations it can be seen that the chi square table value
> chi square calculated value ( χ 20.05,4 > χ 2 cal ) ⇒ (9.488 > 3.197). Therefore, we fail to
reject the null hypothesis, in other words, the data follows a Binomial distribution.
Name: Jai Adhvik, Ayushmaan, Arjun, Nathan, Ishan, Utkarsh, Shanmuka Saketh
USN: 24RVIGCP21
Subject: Statistics in Decision Making
Aim: To check if a data follows Poisson distribution
Apparatus used: Tray, box, black and white beads
Aim: to determine the goodness of fit for the no of defectives data in a sample drawn
from a lot, assuming poisson distribution
Apparatus: Sampling gadget, beads - Black (920 Nos), and White (80 Nos)
Theory:
The Poisson distribution is a discrete distribution bounded at 0 on the low side and
unbounded on the high side. The poisson distribution is a limiting form of the
hypergeometric distribution.
The Poisson distribution finds frequent use because it represents the infrequent
occurrence of events whose rate is constant. This includes many types of events in time
or space such as arrivals of telephone calls, defects in semiconductor manufacturing,
defects in all aspects of quality control, molecular distributions, stellar distributions etc. It
26
is an important starting point in queuing theory and reliability theory. Note that the time
between arrivals or the space between defects is exponentially distributed, which makes
this distribution a particularly convenient starting point even when the process is more
complex.
Formula:
−μ x
e ×μ
P( X=x)=
x!
Where, μ= average number of outcomes occurring in the given time interval or specified
region
The Poisson distribution is usually applicable when the conditions such as N = 10n,
where ‘N’ is the lot size and ‘n’ is the sample size and ‘P’ is proportionally defective less
than 0.1, are satisfied.
Hypothesis:
H0 = The experiment follows a poisson distribution
H1 = The experiment does not follow a poisson distribution
Procedure:
1. Calculate the no of black and white beads to suit the specified proportion
2. Mix the beads thoroughly in the top portion of the sampling gadget
3. Mix thoroughly and draw a sample of beads as per the sample size given
4. Observe the number of defects (white beads) in the sample
5. Record the frequency of occurrence of defectives in a table
6. Repeat the procedure of drawing the sample and observing the number of defects
in each sample for 80 times
7. Calculate the chi square value using the formula: χ 2 cal=¿ ¿
8. χ 2 tabcan be obtained by referring to the chi square distribution table and the
degrees of freedom
27
Experimental Setup:
Tabulation
No of defects Tally marks Observed freq Oi’
0 4
1 10
2 12
3 18
4 18
28
5 9
6 6
7 1
8 2
2
χ cal=2.44
DOF = m - k - 1 = 8 - 0 - 2 = 6
χ 0.05,6 = 12.592
2
Specimen calculations:
¿¿ ⇒ ¿¿
Inference:
From the above calculations and tabulations it is seen that the since, the chi square table
value is > the chi square calculated value ( χ 20.05,6 > χ 2 cal) ⇒ (12.592 > 2.44), we fail to
29
reject the null hypothesis. Hence, it can be said that the data follows a poisson
distribution.
Name: Jai Adhvik, Ayushmaan, Arjun, Nathan, Ishan, Utkarsh, Shanmuka Saketh
USN: 24RVIGCP21
Subject: Statistics in Decision Making
Aim: To check if a data follows normal distribution
Apparatus used: inspection gauge, 120 bolts, chart of scale
Aim: To test whether the width across the flat heads of 100 hexagonal head shaped bolts
follow a normal distribution.
Apparatus: Inspection Gauge, measuring guide and 100 hexagonal head shaped bolts
Formula:
2
1 (x−μ )
f (x)= exp[ ]
√2 π σ 2
2 σ2
μ=average=shift parameter
σ =standard deviation=scale parameter
Experiment: The experiment conducted was using bolt hexagonal shaped bolt heads. The
quality characteristic measured here was the width across the flats. The experiment
conducted was to see whether or not the data follows a normal distribution.
Hypotheses:
❑
H 0 =¿❑ ¿ The data follows a normal distribution
30
❑
H 1 =¿❑ ¿ The data does not follow a normal distribution
Procedure:
1. The given test sample of 100 hexagonal bolts is taken from a group of bolts placed
in different boxes
2. Each bolt is drawn individually and with the marked face adjacent to the inner
vertical face of the bar, it is traversed along till it comes to a maximum traverse
position. It should be ensured that the bolt head always touches the bottom peak of
the gauge
3. The measuring guide is then traversed on the gauge, keeping the shorter arm on
the outside vertical face of the bar till it touches the bolt body.(the traverse
direction is opposite to that of the bolt)
4. Using this as a reference, the width across the flats is gauged by reading the class
in which it falls
5. The value is tabulated in its appropriate class as shown in the observation sheet.
A chart of calibration was used to determine the upper and lower specification of
the class interval
6. The steps are repeated for the remaining bolts
7. The mean and standard deviation of the sample data are calculated using the
formula.
The calculations are shown below
8. The standard normal ordinate; Z=(x−μ)/σ is calculates where X = the upper
specification limit of the class interval, μ = mean, and σ = standard deviation
From the normal tables, the probabilities corresponding to the Z are obtained.
This is the cumulative probability
Individual probability for each class interval is determined
Expected frequency is given by the formula Pi × ΣOi
9. The accuracy of the χ 2 test will be effective when the expected frequency is larger
than 5. In order to guarantee this Oi, and ei are pooling suitably.
10. The Σ ¿ ¿ gives the Chi square calculated value, χ 2cal
11. From the Chi square table value χ 2tab is notes for specific degrees of freedom DOF
and confidence level α
DOF = m-k-1, m = no of class intervals after pooling, k = no of parameters
estimated
12. The data fits the normal distribution provided χ 2cal < χ 2tab
31
Observation sheet for Goodness of fit of Normal distribution
Class Range Class midpoint Tally marks Observed
32
(Xu) frequency Oi’
Lower Upper
Limit Limit
(XL) (XU)
A 19.87 19.95 19.91 0
B 19.95 20.03 19.99 1
C 20.03 20.11 20.07 3
D 20.11 20.19 20.15 2
E 20.19 20.27 20.23 11
F 20.27 20.35 20.31 14
G 20.35 20.43 20.39 23
H 20.43 20.51 20.47 32
I 20.51 20.59 20.55 18
J 20.59 20.67 20.63 11
K 20.67 20.75 20.71 1
L 20.75 20.85 20.79 2
M 20.83 20.90 20.86 1
N 20.90 20.98 20.94 0
O 29.98 21.06 21.02 0
Total 120
33
Table for computation of sample statistics Mean = 20.468 sigma (σ ¿ = 0.34287
Class Range Xu Z=(x−μ)/σ Pi Cum Pi Oi’ ei = PiΣOi ' Pooled Pooled ¿¿
(ΣPi) ei Oi
A 0-1 19.91 -1.627 0.05182 0.05182 0 6.22 6.22 0 6.22
34
χ cal = 98.53
2
χ 0.05,12 = 21.03
2
Specimen calculation:
¿¿ ⇒ ¿¿
Inference:
From the above calculations and tabulations, it seen that the chi square calculated value is > the chi square table value ( χ 2cal
> χ 20.05,12) ⇒ (98.53 > 21.03). Therefore, the null hypothesis (the data follows a normal distribution) gets rejected.
Therefore, the data above does not follow a normal distribution.
35