DS Practical (BSC CS)
DS Practical (BSC CS)
BY
This is to certify that the work entered in this journal is the work of Shri MALI
KRISHNA VINOD of T.Y.B.Sc.CS division Computer Science Roll No. Uni. Exam No has
satisfactorily completed the required number of practical and worked for the 2 nd term of the
Year 2023-24 in the college laboratory as laid down by the university.
INDEX
1. Introduction to Excel.
4. Hypothesis Testing.
8. K-Means Clustering.
Practical No. 01
We will take the following marksheet data set to perform Conditional Formatting.
Step 1: Select the column ‘Percentage’ in your Excel sheet and click on ‘Conditional
Formatting’ then click on ‘Highlight Cells Rules’ and then click on ‘Greater Than’.
Step 2: A box will appear where you will enter the value and select a formatting type and
then click on OK.
The formatting will appear as follows which highlights the cells which are above the value
we gave for formatting and the formatting type we gave.
2. Top/Bottom Rules.
Step 1: Select the column ‘Total’ in your Excel sheet and click on ‘Conditional Formatting’
then click on ‘Top/Bottom Rules’ and then click on ‘Top 10 %’.
Step 2: A box will appear where you will enter the percentage of top values you want to
display and select a formatting type and then click on OK.
The formatting will appear as follows which highlights the cells which are the top 20 percent
of all the values from the column with the formatting type we gave.
3. Data Bars.
Step 1: Select the column ‘Sub5’ in your Excel sheet and click on ‘Conditional Formatting’
then click on ‘Data Bars’ and then select whichever formatting you like.
This gives us the formatting of Data Bars according to the data available as shown above.
4. Color Scales.
Step 1: Select the column ‘Sub4’ in your Excel sheet and click on ‘Conditional Formatting’
then click on ‘Color Scales’ and then select whichever formatting you like.
5. Icon Sets.
Step 1: Select the column ‘Sub3’ in your Excel sheet and click on ‘Conditional Formatting’
then click on ‘Icon Set’ and then select whichever formatting you like.
In the above figure you can see the marks of Sub3 have been rated in the form of 4 bars
rating.
To create a pivot table, we are going to take the following data set of product orders.
Step 1: To create a pivot table, select the rows and columns, go to the ‘INSERT’ menu and
click on ‘Pivot Table’ in tables section.
Step 2: A dialog box will appear to create pivot table. Just click on OK.
This will create a blank pivot table along with the "PivotTable Field List" pane.
Step 1: Drag the "Product" field to the Rows area, and the "Price" field to the Values area.
This will give you the total revenue generated by each product.
Step 2: Right-click on any cell in the ‘Price’ column within the pivot table. Select ‘Show
Values As’ from the context menu. From the dropdown menu, select ‘Rank Largest to
Smallest’.
Step 3: A dialog box will appear to select the base field. Product will be the default field keep
it as it is and click on ‘OK’.
Step 4: Now to see the top 5 ranked products Click on the drop-down arrow next to ‘Row
Labels’ in the pivot table. Select ‘Value Filters’ from the drop-down menu. Choose ‘Top
10...’ or any other number you prefer. Enter the number ‘5’ as we want to display top 5
selling products. Click ‘OK’.
If we want to see the sum of these products again, we can just right click on any cell of the
‘Sum of price’ column select ‘Show Values As’ and click on ‘No calculations’.
The Data set we are going to use for VLOOKUP function is as follows which gives us some
stats of football players.
We’ll just enter any ID from the IDs available. Then we click on the cell under the Player
column and click on Insert Function.
Now in the new dialog box there are 4 fields to be filled. In ‘Lookup_value’ select the Id
column where you’ll put the Id of the player you wish to see. In ‘Table_array’ select the
complete table of the players. In ‘column_index_num’ select the index value of the column.
The index value of ‘Player’ column is 2. ‘Range_lookup’ will be ‘false’ to fetch the exact
value. Then click on ‘Ok’.
As we can see we can fetch data of the players from their ID with the help of VLOOKUP.
Now we’ll select a cell in which you want to perform goal seek analysis. Go to the Data tab
and click on What-If Analysis in the Forecast section and click on Goal Seek.
Enter the value you want to achieve and enter select the cell on whose basis you want to
achieve the value and click on Ok.
You can see the status of the goal reached. Click on Ok.
As you can see the changes are made and the goal is reached.
Practical No. 02
Read data from CSV and JSON files into a data frame.
To read data from the Csv file, we need a Csv file. We are going to use the following
‘emplyoee_data.csv’ file for reading its data.
We are using Google Colab to read the above csv file. For that we are going to upload this
‘employee_data.csv’ file in Google Colab.
To read data from the Csv file, we need a Csv file. We are going to use the following
‘emplyoee_data.json’ file for reading its data.
We are using Google Colab to read the above json file. For that we are going to upload this
‘employee_data.json’ file in Google Colab.
We use the following CSV and JSON file to perform the above tasks.
CSV file: -
JSON file: -
We are using Google Colab to perform the above tasks. For that we are going to upload these
‘sales_data.csv’ file and ‘sales_data.json’ in Google Colab.
Practical No. 03
Wine.csv: -
Iris.csv: -
Practical No. 04
1. T-test.
Description: -
The aim of the program is to demonstrate the process of conducting a two-sample t-test and
drawing conclusions based on the results. Specifically, it aims to compare the means of two
samples (sample1 and sample2) drawn from normal distributions with different means but the
same standard deviation.
Detailed Breakdown: -
a) Generate Samples: The program generates two samples, each representing a different
population or group. These samples are generated from normal distributions with
means of 10 and 12, and a standard deviation of 2.
b) Perform Hypothesis Test: The program conducts a two-sample t-test to determine
whether there is a statistically significant difference between the means of the two
samples.
c) Set Significance Level: It sets a significance level (alpha) at 0.05, which is a common
threshold used in hypothesis testing.
d) Visualize Distributions: The program plots histograms of the two samples to visualize
their distributions and compare their means visually.
e) Highlight Critical Region: If the p-value from the t-test is less than the significance
level, the program highlights the critical region on the plot to indicate where the
observed difference in means is statistically significant.
f) Draw Conclusions: Based on the results of the t-test, the program draws conclusions
about whether there is significant evidence to reject the null hypothesis (i.e., the
means of the two populations are equal) and provides interpretations of the findings
based on the direction of the difference in means, if applicable.
Conclusion: -
2. Chi-square Test.
Description: -
We apply chi square test to check if there is correlation among the given two categorical
variables.
Assumptions: -
Conclusion: -
There is sufficient evidence to reject the null hypothesis, indicating that there is a significant
association between 'horsepower_new' and 'modelyear_new' categories.
Practical No. 05
Practical No. 06
Practical No. 07
Python Program: -
Practical No. 08
Practical No. 09