0% found this document useful (0 votes)
14 views23 pages

Data Preparation and Exploration File

Uploaded by

Hanish verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views23 pages

Data Preparation and Exploration File

Uploaded by

Hanish verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

GURU GOBIND SINGH INDARPRASTHA

UNIVERSITY, DWARKA, DELHI – 110064

UNIVERSITY SCHOOL OF MANAGEMENT


STUDIES
INFORMATION TECHNOLOGY
MANAGEMENT LAB FILE
SUBJECT CODE: MS 117
(2024-2026)

Submitted To: Submitted By:


Mrs. Shitika Prayas Samal
00916619824
MBA (Analytics)

1
INDEX
S.No Particulars Page No.
1. Lab 1 (Descriptive 3-4
Statistics)
2. Lab 2 (PivotTable) 5-9
3. Lab 3 (Sports Data 10-15
Analysis)
4. Lab 4 (Jamovi) 16-19
5. Lab 5 (Mutiple Linear 20-21
Regression)
6. Lab 6 (Handling 22-23
Outliers)

2
LAB - 1 (Descriptive Statistics)
Ques: Perform Descriptive Statistics on the data you have collected.
Ans:
The Data collected,

Select “Data Anlysis” under the Data tab,

Select “Descriptive Analysis” and click OK

3
Now, Select the input range and output range then check the “Summary Statistics” box and
Click,

We will get the following table,

4
Lab 2 (Pivot Table)
Ques:

Q1:

Ans:
Create a Pivot table and Drag & Drop the required fields to get the desired results,

Q2:

5
Ans:

Q3:

Ans:

Q4:

Ans:

6
Q5:

Ans:

Q6:

Ans:

Q7:

7
Ans:

Q8:

Ans:
Replaced LATAM and NA by AMERICAS

Q9:

Ans:
The given data has added to the end of our Data Sheet and all the pivot table has been
updated, updated data starts from number 2341,

8
Q10:

Ans:
After selecting the data by filtering via pivot table, simply go under the “Insert” tab and select
Pie Chart which comes under Charts section,

Q11:

Ans:
To add a slicer simply go to the Insert tab and select Slicer after that select the category
whose slicer you want to use

9
LAB 3 (Sports Data Analysis)
Given Data:

10
Q1:

Ans:
By using formula,

11
Result,

Q2:

Ans:
Right click on the column to format and Select Custom then write 3 digits, and similarly for
date,

12
Q3:

Ans:
Open a new worksheet, select the Gender column then select Remove Duplicates under the
Data tab. After that copy the distinct values and paste the values using the transpose option.
Do these steps with the Country column too. After that use the following formula to count the
candidates,

13
Then we will the following result,

Q4:

Ans:
To create a pivot table, go under the insert tab and click PivotTable and select the range of
the data and click OK. After that drag & drop the required fields in Columns area. To change
report layout, right click on PivotTable then select PivotTable Options and check “Classic
PivotTable layout”. Also uncheck the “Expand and Collapse Buttons” and click OK. Simply
drag the “Sport Location” field into Filter option and you are good to go. After all these steps
the result will look like this,

14
Q5:

Ans:
To check we will simply calculate the correlation between the two variables, using CORREL
formula and also plot a Scatter Plot by selecting the data and clicking on Scatterplot under
Insert tab. The result will be like this,

From the above we can see there is a moderate positive relationship between the two.

15
Lab 4 (Jamovi)
Ques: Perform various functions of Jamovi on a dummy data
Ans:
Principal Component Analysis:
It is a statistical method used to reduce the dimensionality of a dataset while preserving the
most important patterns or relationships between the variables.

From the above results,


Uniqueness indicates the proportion of variance in each variable that is not explained by the
component.
In Bartlett’s Test of Sphericity, A significant result (p < .001) indicates that the correlation
matrix is not an identity matrix, supporting the need for PCA.

16
In KMO Measure of Sampling Adequacy, the overall MSA is 0.546, which is moderately
acceptable. However, the MSA for "Gender" is relatively low (0.315), suggesting that this
variable might not be well-suited for PCA.
Exploratory Factor Analysis:
It is a statistical method used to uncover the underlying structure of a relatively large set of
variables. It helps identify groups of variables that are highly correlated with each other,
suggesting that they might be measuring the same underlying construct or factor.

From the above results,


We can see Age group is strongly related to Factor 1 whereas Education is strongly related to
Factor 2.
Uniqueness indicates the proportion of variance in each variable that is not explained by the
component.

17
Confirmatory Factor Analysis:
Confirmatory Factor Analysis (CFA) is a statistical technique used to test a hypothesized
factor structure of a set of observed variables. Unlike Exploratory Factor Analysis (EFA),
which is used to discover the underlying structure of a set of variables, CFA is used to test a
specific, pre-defined structure.

From the above results,


18
The standard error (SE) is a measure of the variability of an estimate.
The high z-scores 14.1 for all the factor loadings indicate that these relationships are
statistically significant.
p < .001 means there is a very strong evidence to support the existence of the relationship
between the variable and the factor.
a df (degrees of freedom) of 0 indicates a saturated model.
A CFI value of 1 indicates a perfect fit of the model to the data. In other words, the model
perfectly reproduces the observed covariance matrix.
A TLI value of 1 indicates a perfect fit of the model to the data. In other words, the model
perfectly reproduces the observed covariance matrix.

19
Lab 5 (Data Analysis)
Ques:

Ans:
To predict we will simply select Regression option by clicking Data Analysis in the Data tab.
After we will input insurance cost as our dependent variable and BMI & age as independent
variable and click OK. It will produce the following result,

20
From the above result, we can conclude that The regression model suggests a moderate
positive relationship between age, BMI, and insurance costs. However, the model explains
only a small portion of the variation in insurance costs. While there is a positive correlation
between age, BMI, and insurance costs, it doesn't necessarily mean that higher age and BMI
directly cause higher insurance costs. Other factors, such as health conditions, lifestyle, and
demographics, could also influence insurance costs.

21
Lab 6 (How to Treat Outliers)
Ques: Showcase the different ways to identify outliers in Excel.

Ans:
By Sorting the Data
First, select the column you want to sort then hover over “Sort & Filter” option under Home
tab. Select Custom Sort, sort by Sales and order Largest to Smallest. It will produce the
following result,

By Using Quartile Functions


In this method, we will calculate QTL 1 and QTL 3 using the formula,

Now we will IQR by QTL3-QTL1 and after that calculate both Upper Limit and Lower Limit
by the QTL1+1.5*IQR and QTL1-1.5*IQR and after we check whether and formula is near
the upper or lower limit. If yes then those values are our outliers,

22
By using LARGE and SMALL functions,
We will simply apply the following formulas,

After that we will able to identify our outliers,

To handle these outliers, we can simply remove them from the data, normalize the data so we
can also use our outliers or replace outliers with a specified value, such as the maximum or
minimum non-outlier value.

23

You might also like