Data Preparation and Exploration File
Data Preparation and Exploration File
1
INDEX
S.No Particulars Page No.
1. Lab 1 (Descriptive 3-4
Statistics)
2. Lab 2 (PivotTable) 5-9
3. Lab 3 (Sports Data 10-15
Analysis)
4. Lab 4 (Jamovi) 16-19
5. Lab 5 (Mutiple Linear 20-21
Regression)
6. Lab 6 (Handling 22-23
Outliers)
2
LAB - 1 (Descriptive Statistics)
Ques: Perform Descriptive Statistics on the data you have collected.
Ans:
The Data collected,
3
Now, Select the input range and output range then check the “Summary Statistics” box and
Click,
4
Lab 2 (Pivot Table)
Ques:
Q1:
Ans:
Create a Pivot table and Drag & Drop the required fields to get the desired results,
Q2:
5
Ans:
Q3:
Ans:
Q4:
Ans:
6
Q5:
Ans:
Q6:
Ans:
Q7:
7
Ans:
Q8:
Ans:
Replaced LATAM and NA by AMERICAS
Q9:
Ans:
The given data has added to the end of our Data Sheet and all the pivot table has been
updated, updated data starts from number 2341,
8
Q10:
Ans:
After selecting the data by filtering via pivot table, simply go under the “Insert” tab and select
Pie Chart which comes under Charts section,
Q11:
Ans:
To add a slicer simply go to the Insert tab and select Slicer after that select the category
whose slicer you want to use
9
LAB 3 (Sports Data Analysis)
Given Data:
10
Q1:
Ans:
By using formula,
11
Result,
Q2:
Ans:
Right click on the column to format and Select Custom then write 3 digits, and similarly for
date,
12
Q3:
Ans:
Open a new worksheet, select the Gender column then select Remove Duplicates under the
Data tab. After that copy the distinct values and paste the values using the transpose option.
Do these steps with the Country column too. After that use the following formula to count the
candidates,
13
Then we will the following result,
Q4:
Ans:
To create a pivot table, go under the insert tab and click PivotTable and select the range of
the data and click OK. After that drag & drop the required fields in Columns area. To change
report layout, right click on PivotTable then select PivotTable Options and check “Classic
PivotTable layout”. Also uncheck the “Expand and Collapse Buttons” and click OK. Simply
drag the “Sport Location” field into Filter option and you are good to go. After all these steps
the result will look like this,
14
Q5:
Ans:
To check we will simply calculate the correlation between the two variables, using CORREL
formula and also plot a Scatter Plot by selecting the data and clicking on Scatterplot under
Insert tab. The result will be like this,
From the above we can see there is a moderate positive relationship between the two.
15
Lab 4 (Jamovi)
Ques: Perform various functions of Jamovi on a dummy data
Ans:
Principal Component Analysis:
It is a statistical method used to reduce the dimensionality of a dataset while preserving the
most important patterns or relationships between the variables.
16
In KMO Measure of Sampling Adequacy, the overall MSA is 0.546, which is moderately
acceptable. However, the MSA for "Gender" is relatively low (0.315), suggesting that this
variable might not be well-suited for PCA.
Exploratory Factor Analysis:
It is a statistical method used to uncover the underlying structure of a relatively large set of
variables. It helps identify groups of variables that are highly correlated with each other,
suggesting that they might be measuring the same underlying construct or factor.
17
Confirmatory Factor Analysis:
Confirmatory Factor Analysis (CFA) is a statistical technique used to test a hypothesized
factor structure of a set of observed variables. Unlike Exploratory Factor Analysis (EFA),
which is used to discover the underlying structure of a set of variables, CFA is used to test a
specific, pre-defined structure.
19
Lab 5 (Data Analysis)
Ques:
Ans:
To predict we will simply select Regression option by clicking Data Analysis in the Data tab.
After we will input insurance cost as our dependent variable and BMI & age as independent
variable and click OK. It will produce the following result,
20
From the above result, we can conclude that The regression model suggests a moderate
positive relationship between age, BMI, and insurance costs. However, the model explains
only a small portion of the variation in insurance costs. While there is a positive correlation
between age, BMI, and insurance costs, it doesn't necessarily mean that higher age and BMI
directly cause higher insurance costs. Other factors, such as health conditions, lifestyle, and
demographics, could also influence insurance costs.
21
Lab 6 (How to Treat Outliers)
Ques: Showcase the different ways to identify outliers in Excel.
Ans:
By Sorting the Data
First, select the column you want to sort then hover over “Sort & Filter” option under Home
tab. Select Custom Sort, sort by Sales and order Largest to Smallest. It will produce the
following result,
Now we will IQR by QTL3-QTL1 and after that calculate both Upper Limit and Lower Limit
by the QTL1+1.5*IQR and QTL1-1.5*IQR and after we check whether and formula is near
the upper or lower limit. If yes then those values are our outliers,
22
By using LARGE and SMALL functions,
We will simply apply the following formulas,
To handle these outliers, we can simply remove them from the data, normalize the data so we
can also use our outliers or replace outliers with a specified value, such as the maximum or
minimum non-outlier value.
23