CCW331 Set4
CCW331 Set4
(Regulations 2021)
Time : 3 Hours Answer any one Question Max. Marks 100
Aim/ Procedure Tabulation Calculation Viva-Voce Record Total
& Results
20 30 30 10 10 100
1. Given set of data from Holmen shipping and they do tours of the Swedish Stockholm’s
archipelago and they pick up passengers during the summer months and each of the ports that
they visit they pick up passengers and they record how many pieces if passengers get on. You
need to summarize total how many people get on the boat for each port and for each month. Find
averages, maximum values ,minimum values, sqrt and round
2. In a class with 10 students, the medical check-up take place wherein they were weighed, the
following data w captured. Calculate the Mean, Median, Mode and Standard deviation,
Variance, Skewness, Kurtosis of the data set based on the given information.
Students Weight
1 30
2 28
3 31
4 33
5 32
6 30
7 29
8 32
9 27
10 29
Page 1 of 5
3. You have a dataset representing the number of customer complaints received by a company each
day for a month:
4, 2, 6, 5, 3, 4, 2, 5, 4, 3, 7, 4, 5, 2, 3, 4, 5, 6, 5, 4
a. Calculate the mean, median, and mode of the number of complaints.
b. Calculate the standard deviation and variance.
c. Determine the skewness and kurtosis of the distribution.
5. 20 customers are working in a shop. 10 are males and 10 are females. The ages of males and females are
given below. Null hypothesis is there is no significant difference between mean ages of males and
females, Alternate hypothesis is there is a significant difference between male ages and female ages.
Perform Z test to check whether the null hypothesis is accepted or rejected?
6. A pharmaceutical company is testing the effectiveness of four different drug formulations to treat a
specific condition. The data shows the improvement scores of patients after treatment with each
formulation:
Formulation 1: [12, 15, 14, 13, 16]
Formulation 2: [18, 17, 20, 19, 15]
Formulation 3: [10, 8, 12, 11, 9]
Formulation 4: [22, 23, 21, 24, 20]
Perform a one-way ANOVA to determine if there are significant differences in the mean improvement
scores among the four formulations.
Page 2 of 5
7. A gardening enthusiast is studying the growth of three different plant species (A, B, C) under two
different light conditions (Full Sunlight and Partial Shade). The data represents the plant heights (in
centimetres) after six weeks of growth:
Full Sunlight:
Plant A: [40, 42, 38, 41, 39]
Plant B: [48, 50, 47, 49, 45]
Plant C: [35, 37, 36, 34, 38]
Partial Shade:
Plant A: [30, 32, 28, 31, 29]
Plant B: [38, 40, 36, 39, 37]
Plant C: [25, 27, 26, 24, 28]
Conduct a two-way ANOVA to assess the effects of plant species and light conditions on plant growth.
8. A company wants to test if there is a significant difference in the average productivity between two
different departments. They collect data on the number of tasks completed in a day for each department:
Department A: [24, 27, 30, 22, 26]
Department B: [31, 29, 34, 28, 32]
Perform an independent samples t-test to determine if there's a significant difference in productivity
between the two departments.
9. You have a dataset in Excel containing information about customer orders. However, some of the "Order
Amount" values are missing (blank cells). Your task is to handle the missing data appropriately.
Load the dataset into Excel and identify rows with missing "Order Amount" values.
Decide on a strategy to handle the missing data (e.g., removing rows, filling with the mean, or
interpolating values).
Apply your chosen strategy and provide the modified dataset.
10. You have a dataset of exam scores for two subjects, "Math" and "English." The scores are on different
scales, making comparisons difficult. You want to normalize the scores for each subject to a common
scale between 0 and 1.
Load the dataset into Excel and calculate the mean and standard deviation for both "Math" and
"English" scores.
Using Excel formulas, normalize the scores for each subject to a scale between 0 and 1.
Provide the normalized dataset for both subjects.
11. Given a dataset with three features, follow these steps to perform PCA:
Create a dataset with the following data points:
Feature 1: [2, 3, 5, 7, 8]
Feature 2: [5, 4, 6, 8, 7]
Feature 3: [1, 2, 1, 3, 2]
1. Standardize the data (mean = 0, standard deviation = 1).
2. Calculate the covariance matrix.
3. Find the eigenvalues and eigenvectors of the covariance matrix.
4. Sort the eigenvalues in descending order and select the corresponding eigenvectors.
5. Transform the data into the new feature space using the selected eigenvectors.
Page 3 of 5
12. Perform KPCA on a simple dataset using a radial basis function (RBF) kernel:
1. Create a dataset with two features, "X" and "Y," in Excel:
X Y
2.5 2.4
3.5 3.1
4.5 3.6
5.0 5.0
5.5 6.0
2. Choose a suitable value for γ in the RBF kernel formula. You may experiment with different
values to see their effects.
3. Calculate the kernel matrix for the dataset. The kernel matrix should be an NxN matrix (N =
number of data points), where each element K(i, j) represents the kernel value between data
points i and j.
4. Perform KPCA on the kernel matrix to obtain the principal components.
5. Visualize the data in the new feature space created by KPCA. You can create a scatter plot with
the first two principal components as axes.
13. Apply Principal Component Analysis (PCA) to reduce the dimensionality of a dataset with multiple
numerical variables. Create a scree plot to visualize the variance explained by each principal component.
14. 1. Download a sample CSV file (e.g., sales_data.csv) from a reliable source or create one with
fictional data.
2. Open Power BI Desktop.
3. Create a new report.
4. Load the data from the CSV file into Power BI.
5. Rename columns, set data types, and remove any unnecessary columns.
6. Create visuals to represent the data, such as a bar chart showing sales by product category.
15. 1. Import a single table dataset (e.g., a list of customers or products) into Power BI.
2. Create relationships between columns within the same table to demonstrate the concept of a one-
table data model.
3. Create visuals that utilize the relationships, such as a table, a matrix, and a slicer for filtering.
16. 1. Import a dataset containing sales data with columns for date, product, and revenue.
2. Create a simple two-table data model with a Sales table and a Date table, establishing a
relationship between them.
3. Develop basic DAX calculations to answer questions like:
o Total Sales
o Sales by Month
o Year-to-Date (YTD) Sales
17. 1. Import a simple dataset, such as sales data, into Power BI.
2. Develop a data model with at least one fact table (e.g., Sales) and a relevant dimension table (e.g.,
Products).
3. Create relationships between tables.
4. Design basic DAX calculations, such as Total Sales, Average Sales Price, and Year-to-Date
(YTD) Sales.
5. Build a report with visuals that utilize the DAX calculations, including tables, bar charts, and
slicers.
Page 4 of 5
18. You work for a coffee shop and claim that the average wait time for customers is less than 5 minutes.
You take a random sample of 25 customer wait times and find the following wait times (in minutes): 4.2,
4.5, 4.0, 4.8, 4.3, 4.6, 4.1, 4.7, 4.2, 4.4, 4.2, 4.9, 4.5, 4.3, 4.6, 4.2, 4.7, 4.0, 4.5, 4.1, 4.6, 4.3, 4.8, 4.2. Use
Excel to perform a one-sample Z-test to determine if the average wait time is less than 5 minutes with a
5% significance level.
19. You want to determine if there is a significant difference in the average test scores between two groups of
students (Group A and Group B). Group A has a sample size of 30 with an average score of 85 and a
standard deviation of 8. Group B has a sample size of 40 with an average score of 88 and a standard
deviation of 9. Perform a two-sample independent t-test in Excel to determine if there is a significant
difference in average scores between the two groups with a 1% significance level.
20. Given a 3x3 matrix A:
A = [ 2, 1, 1 ]
[ -1, 2, -1 ]
[ 0, 1, 3 ]
Perform the SVD for this matrix and calculate its singular values, left singular vectors, and right singular
vectors.
Page 5 of 5