AI Lab File - C
AI Lab File - C
1
Shivendra Singh
A2305220681
Practical-1 Date: 03-01-2023
Aim: Exploring Panda’s DataFrame
You are given a CSV file named `student.csv', whose first few records are as below
Write the python code / command for following questions
2
Shivendra Singh
A2305220681
(c) Print column names
Output:
3
Shivendra Singh
A2305220681
(h) List names of all the male students
Output:
4
Shivendra Singh
A2305220681
(l) sort and print the data on the basis of Name followed by Age.
Output:
5
Shivendra Singh
A2305220681
(o) Count and print number of students sex wise and display result with suitable column
headers.
Output:
(q) Print the data types of individual columns of the data frame
Output:
(r) Convert and print the datatype of a given column Age (int to float).
Output:
6
Shivendra Singh
A2305220681
(s) Create a new column named “UpdatedMarks” which as 5.5% more marks than the existing
Marks column.
Output:
Conclusion:
Hence, the experiment to understand basics of machine learning were studied using python.
7
Shivendra Singh
A2305220681
Evalutaion:
8
Shivendra Singh
A2305220681
Practical-2 Date: 10-01-2023
The given csv file named “RSData.csv” contains real state data for a particular city.
Write python command/ code to answer the following questions on this dataset.
9
Shivendra Singh
A2305220681
1. Write the command to read this data in the data frame.
Output:
10
Shivendra Singh
A2305220681
4. Which column(s) does not have any missing value?
Output:
6. There are how many rows, which does not have any missing value(s)?
Output:
11
Shivendra Singh
A2305220681
8. Fill the street number of record at index 2 with 77
Output:
9. If there is any missing value in the “number of bedrooms” columns, then replace it with the
median value of this column.
Output:
12
Shivendra Singh
A2305220681
11. It is believed that the data entry operator might have entered integer values in “owner
occupied” column. Count number of such entries and replace them with numpy standard nan
(np.nan).
Output:
Conclusion:
Hence, the concept of data cleaning is studied and implemented successfully.
13
Shivendra Singh
A2305220681
Evalutaion:
14
Shivendra Singh
A2305220681
Practical-3 Date: 17-01-2023
Load the “mtcar.csv” dataset and write python code to perform the following operations
model mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21 6 160 110 3.9 2.62 16.46 0 1 4 4
Mazda RX4 Wag 21 6 160 110 3.9 2.87 17.02 0 1 4 4
5
Datsun 710 22. 4 108 93 3.85 2.32 18.61 1 1 4 1
8
Hornet 4 Drive 21. 6 258 110 3.08 3.21 19.44 1 0 3 1
4 5
Hornet 18. 8 360 175 3.15 3.44 17.02 0 0 3 2
Sportabout 7
Valiant 18. 6 225 105 2.76 3.46 20.22 1 0 3 1
1
Duster 360 14. 8 360 245 3.21 3.57 15.84 0 0 3 4
3
Merc 240D 24. 4 146.7 62 3.69 3.19 20 1 0 4 2
4
Merc 230 22. 4 140.8 95 3.92 3.15 22.9 1 0 4 2
8
(a) Generate various summary statistics such as mean, standard deviation, minimum value,
maximum value, and “1,2 & 3rd quantiles” for all the numerical attributes.
Output:
15
Shivendra Singh
A2305220681
16
Shivendra Singh
A2305220681
(b) Count the number of non-NaN items per feature
Output:
17
Shivendra Singh
A2305220681
(f) Calculate mode for “hp” column
Output:
(h) Find the coefficient of correlation between all the numeric attributes.
Output:
(j) Plot a density diagram (kde- kernel desity estimation) for “displacement”
18
Shivendra Singh
A2305220681
Output:
19
Shivendra Singh
A2305220681
(l) Box plot the “hp” attribute.
Output:
Conclusion:
Hence, the concept of plotting of different types of graphs is studied and implemented
successfully.
20
Shivendra Singh
A2305220681
Evalutaion:
21
Shivendra Singh
A2305220681
Practical-4 Date: 24-01-2023
22
Shivendra Singh
A2305220681
(b) Predict the value of CO2 emission on the basis of Engine size for enginesize=2.4. using
data given in “CO2Small.csv” file.
(d) Calculate the above details on the larger dataset i.e., file “CO2Full Data.csv”
23
Shivendra Singh
A2305220681
(e) Print the final equation of the line
(f) Plot final line (best fit line) along with other data points.
24
Shivendra Singh
A2305220681
(g) Give your comments by comparing the R2 of both the datasets.
Conclusion:
Hence, the prediction of CO2 Emission using Simple Linear Regression was implemented
successfully.
25
Shivendra Singh
A2305220681
Evalutaion:
26
Shivendra Singh
A2305220681
Practical-5 Date: 31-01-2023
27
Shivendra Singh
A2305220681
(a) Scatter plot “EngineSize vs CO2Emissions”
28
Shivendra Singh
A2305220681
(b) Calculate R^2 value
29
Shivendra Singh
A2305220681
(c) Print the final equation of the line
(d) Plot final line (best fit line) along with other data points.
Conclusion:
Hence, the prediction of CO2 Emission using Gradient Descent was implemented
successfully.
30
Shivendra Singh
A2305220681
Evalutaion:
31
Shivendra Singh
A2305220681
Practical-6 Date: 07-02-2023
AIM: Predict CO2 Emission using Multilinear Regression.
32
Shivendra Singh
A2305220681
b. What is the accuracy level of your model?
ENGINESIZE
CYLINDERS
FUELCONSUMPTION_CITY
33
Shivendra Singh
A2305220681
FUELCONSUMPTION_HWY
FUELCONSUMPTION_COMB.
d. Write your observation /comment about this data and the model.
The ENGINESIZE and CYLINDERS attributes are positively correlated with CO2
emissions, while the fuel consumption attributes are negatively correlated.
The linear regression model assumes a linear relationship between the input attributes
and the output variable, which may not be entirely accurate in this case.
There may be other factors that influence CO2 emissions that are not captured by the
input attributes in this dataset.
Overall, the model seems to provide a reasonably accurate prediction of CO2
emissions based on the available data
Conclusion:
Hence, the prediction of CO2 Emission using Multilinear Regression was implemented
successfully.
34
Shivendra Singh
A2305220681
Evalutaion:
35
Shivendra Singh
A2305220681
Practical-7 Date: 14-02-2023
36
Shivendra Singh
A2305220681
b. Print Classification matrix
Conclusion:
Hence, the Logistics Regression was implemented successfully.
37
Shivendra Singh
A2305220681
Evalutaion:
38
Shivendra Singh
A2305220681
Practical-8 Date: 21-03-2023
39
Shivendra Singh
A2305220681
Conclusion:
Hence, the Linear Discriminant Analysis on the IRIS dataset was done successfully.
40
Shivendra Singh
A2305220681
Evalutaion:
41
Shivendra Singh
A2305220681
Practical-9 Date: 28-03-2023
system. Use the breast cancer dataset available with the datasets package of
sklearn library.
a. Print the accuracy, precision and recall for the model built.
Conclusion:
Hence, Using Support Vector Machine a breast cancer detection system was built
successfully.
42
Shivendra Singh
A2305220681
Evalutaion:
43
Shivendra Singh
A2305220681
Practical-10 Date: 07-03-2023
44
Shivendra Singh
A2305220681
Conclusion:
Hence, using decision tree a ML model on the “diabetes” dataset was built successfully.
45
Shivendra Singh
A2305220681
Evalutaion:
46
Shivendra Singh
A2305220681
Practical-11 Date: 14-03-2023
AIM: Classify the “Iris Dataset” using KNN. Print Precision, Recall, F1 and
Support.
Conclusion:
Hence, classification of the “Iris Dataset” using KNN was implemented successfully.
47
Shivendra Singh
A2305220681
Evalutaion:
48
Shivendra Singh
A2305220681
Practical-12 Date: 21-03-2023
AIM: You are given with “Mall-Customer”. Cluster the dataset using K-
49
Shivendra Singh
A2305220681
50
Shivendra Singh
A2305220681
The above code performs the following steps:
1.Loads the Mall-Customer dataset
2.Selects the columns to use for clustering
3.Scales the features using StandardScaler
4.Estimates the optimum value of k using the elbow method and the silhouette score
5.Clusters the dataset using the optimum value of k
6.Visualizes the clusters
The optimum value of k in k-means clustering can be estimated using the following steps:
Elbow method: Plot the within-cluster sum of squares (WCSS) against the number of clusters (k). The
WCSS is the sum of squared distances between each point in a cluster and its centroid. The plot will
have a shape like an elbow, and the optimum value of k will be at the "elbow" or the point where
the rate of decrease in WCSS slows down significantly. This method gives a visual representation of
the best k value for clustering.
Silhouette method: The silhouette score measures how similar a data point is to its own cluster
compared to other clusters. It ranges from -1 to 1, where a score closer to 1 indicates that the point
is well-matched to its own cluster and poorly matched to neighbouring clusters. Compute the
average silhouette score for different values of k and choose the k with the highest average score.
This method is more quantitative than the elbow method and can be used to find a more precise
value of k.
Conclusion:
Hence, the cluster the dataset using K-Means Clustering was done successfully.
51
Shivendra Singh
A2305220681
Evalutaion:
52
Shivendra Singh
A2305220681