0% found this document useful (0 votes)
62 views

AI Lab File - C

Here are the steps to perform simple linear regression to predict CO2 emission using the given dataset: 1. Import necessary libraries 2. Load the dataset 3. Select the target variable (CO2) and predictor variable (weight) 4. Create a linear regression model 5. Fit the model on the training data 6. Print the coefficients of the fitted model 7. Make predictions on test data 8. Calculate R-squared to evaluate the model 9. Plot the regression line overlaid on scattered data 10. Print statistics like mean absolute error and root mean squared error This will help build a basic linear regression model to predict CO2 emission based on vehicle weight. The coefficients, prediction performance

Uploaded by

Debabrata Pain
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

AI Lab File - C

Here are the steps to perform simple linear regression to predict CO2 emission using the given dataset: 1. Import necessary libraries 2. Load the dataset 3. Select the target variable (CO2) and predictor variable (weight) 4. Create a linear regression model 5. Fit the model on the training data 6. Print the coefficients of the fitted model 7. Make predictions on test data 8. Calculate R-squared to evaluate the model 9. Plot the regression line overlaid on scattered data 10. Print statistics like mean absolute error and root mean squared error This will help build a basic linear regression model to predict CO2 emission based on vehicle weight. The coefficients, prediction performance

Uploaded by

Debabrata Pain
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 52

INDEX

Exp Aim of Date of Date of Page Remarks


No. Experiment Allotment Evaluation No.

1 Exploring Panda’s DataFrame 03-01-2023 10-01-2023 2

2 Data Cleansing 10-01-2023 17-01-2023 9

3 Statistical Summary and basic plotting 17-01-2023 24-01-2023 15

4 Predict CO2 Emission using Simple Linear 24-01-2023 31-01-2023 21


Regression

5 Predict CO2 Emission using Gradient 31-01-2023 07-02-2023 26


Descent

6 Predict CO2 Emission using Multilinear 07-02-2023 14-02-2023 31


Regression.

7 Perform logistic regression on the 14-02-2023 21-02-2023 35


“ChurnData.csv”.

8 Linear Discriminant Analysis on the IRIS 21-02-2023 28-02-2023 38


dataset

9 Using Support Vector Machine build a 28-02-2023 07-03-2023 41


breast cancer detection system.

10 Using decision tree, build an ML model on 07-03-2023 14-03-2023 43


the “diabetes” dataset.

11 Classify the “Iris Dataset” using KNN 14-03-2023 21-03-2023 46

12 Cluster the dataset using K-Means 21-03-2023 28-03-2023 48


Clustering

1
Shivendra Singh
A2305220681
Practical-1 Date: 03-01-2023
Aim: Exploring Panda’s DataFrame

You are given a CSV file named `student.csv', whose first few records are as below
Write the python code / command for following questions

(a) Load this file into a python Data Frame


Output:

(b) Find the number of rows and columns in it.


Output:

2
Shivendra Singh
A2305220681
(c) Print column names
Output:

(d) Change column name ‘Name’ with new name


‘FirstName’
Output:

(e) Print last 5 rows from the bottom


Output:

(f) Print the details of student with lowest marks


Output:

(g) Find total marks of all female students


Output:

3
Shivendra Singh
A2305220681
(h) List names of all the male students
Output:

(i) Find mean age of the class


Output:

(j) Line plot marks of the class


Output:

(k) Find the index of record of oldest student in the


class
Output:

4
Shivendra Singh
A2305220681
(l) sort and print the data on the basis of Name followed by Age.
Output:

(m) Change the name 'Nihar' to 'Jason Bourne' in


name column of the DataFrame.
Output:

(n) Change and print order of the columns (Name,


Sex, Age, Marks, Grade).
Output:

5
Shivendra Singh
A2305220681
(o) Count and print number of students sex wise and display result with suitable column
headers.
Output:

(p) Delete and print row where age=46


Output:

(q) Print the data types of individual columns of the data frame
Output:

(r) Convert and print the datatype of a given column Age (int to float).
Output:

6
Shivendra Singh
A2305220681
(s) Create a new column named “UpdatedMarks” which as 5.5% more marks than the existing
Marks column.
Output:

(t) Delete the “Marks Column”


Output:

Conclusion:
Hence, the experiment to understand basics of machine learning were studied using python.

7
Shivendra Singh
A2305220681
Evalutaion:

8
Shivendra Singh
A2305220681
Practical-2 Date: 10-01-2023

AIM: Data Cleansing

The given csv file named “RSData.csv” contains real state data for a particular city.
Write python command/ code to answer the following questions on this dataset.

propertyid stno stname owneroccupied numbedrooms numbathrooms areasqft


100001000 10 Kranti Y 3 1 1000
Road
Bhagat
Singh
100002000 17 Road N 3 1.5 --
Bhagat
Singh
100003000 Road N n/a 1 850
100004000 20 Azad 12 1 NaN 700
Mard
23 Azad Y 3 2 1600
Mard
100006000 20 Azad Y NA 1 800
Mard
100007000 NA Shivaji 2 Surya 950
Road
100008000 13 Netaji Y 1 1
Marg
100009000 15 Netaji Y na 2 1800
Marg

9
Shivendra Singh
A2305220681
1. Write the command to read this data in the data frame.
Output:

2. List number of rows and columns in this dataset


Output:

3. Check if there is any missing value in this entire dataset?


Output:

10
Shivendra Singh
A2305220681
4. Which column(s) does not have any missing value?
Output:

5. Which columns have maximum number of missing values?


Output:

6. There are how many rows, which does not have any missing value(s)?
Output:

7. If there is any missing value in the “areasqft” replace it with 900


Output:

11
Shivendra Singh
A2305220681
8. Fill the street number of record at index 2 with 77
Output:

9. If there is any missing value in the “number of bedrooms” columns, then replace it with the
median value of this column.
Output:

10. Give your comment on the “owner occupied” column.


Output:

12
Shivendra Singh
A2305220681
11. It is believed that the data entry operator might have entered integer values in “owner
occupied” column. Count number of such entries and replace them with numpy standard nan
(np.nan).
Output:

Conclusion:
Hence, the concept of data cleaning is studied and implemented successfully.

13
Shivendra Singh
A2305220681
Evalutaion:

14
Shivendra Singh
A2305220681
Practical-3 Date: 17-01-2023

AIM: Statistical Summary and basic plotting

Load the “mtcar.csv” dataset and write python code to perform the following operations
model mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21 6 160 110 3.9 2.62 16.46 0 1 4 4
Mazda RX4 Wag 21 6 160 110 3.9 2.87 17.02 0 1 4 4
5
Datsun 710 22. 4 108 93 3.85 2.32 18.61 1 1 4 1
8
Hornet 4 Drive 21. 6 258 110 3.08 3.21 19.44 1 0 3 1
4 5
Hornet 18. 8 360 175 3.15 3.44 17.02 0 0 3 2
Sportabout 7
Valiant 18. 6 225 105 2.76 3.46 20.22 1 0 3 1
1
Duster 360 14. 8 360 245 3.21 3.57 15.84 0 0 3 4
3
Merc 240D 24. 4 146.7 62 3.69 3.19 20 1 0 4 2
4
Merc 230 22. 4 140.8 95 3.92 3.15 22.9 1 0 4 2
8

(a) Generate various summary statistics such as mean, standard deviation, minimum value,
maximum value, and “1,2 & 3rd quantiles” for all the numerical attributes.
Output:

15
Shivendra Singh
A2305220681
16
Shivendra Singh
A2305220681
(b) Count the number of non-NaN items per feature
Output:

(c) Calculate mean absolute deviation.


Output:

(d) Calculate median for any of the numeric type column


Output:

(e) Calculate mean any of the numeric type column


Output:

17
Shivendra Singh
A2305220681
(f) Calculate mode for “hp” column
Output:

(g) Calculate skewness for “disp”


Output:

(h) Find the coefficient of correlation between all the numeric attributes.
Output:

(i) Scatter plot a graph between “hp” vs “mpg”


Output:

(j) Plot a density diagram (kde- kernel desity estimation) for “displacement”
18
Shivendra Singh
A2305220681
Output:

(k) Plot a bar diagram for “CarName” vs “mpg”


Output:

19
Shivendra Singh
A2305220681
(l) Box plot the “hp” attribute.
Output:

Conclusion:
Hence, the concept of plotting of different types of graphs is studied and implemented
successfully.

20
Shivendra Singh
A2305220681
Evalutaion:

21
Shivendra Singh
A2305220681
Practical-4 Date: 24-01-2023

AIM: Predict CO2 Emission using Simple Linear Regression

You are given with following data

(a) Scatter plot “EngineSize vs CO2Emissions”

22
Shivendra Singh
A2305220681
(b) Predict the value of CO2 emission on the basis of Engine size for enginesize=2.4. using
data given in “CO2Small.csv” file.

(c) Calculate R^2.

(d) Calculate the above details on the larger dataset i.e., file “CO2Full Data.csv”

23
Shivendra Singh
A2305220681
(e) Print the final equation of the line

(f) Plot final line (best fit line) along with other data points.

24
Shivendra Singh
A2305220681
(g) Give your comments by comparing the R2 of both the datasets.

Conclusion:
Hence, the prediction of CO2 Emission using Simple Linear Regression was implemented
successfully.

25
Shivendra Singh
A2305220681
Evalutaion:

26
Shivendra Singh
A2305220681
Practical-5 Date: 31-01-2023

AIM: Predict CO2 Emission using Gradient Descent

27
Shivendra Singh
A2305220681
(a) Scatter plot “EngineSize vs CO2Emissions”

28
Shivendra Singh
A2305220681
(b) Calculate R^2 value

29
Shivendra Singh
A2305220681
(c) Print the final equation of the line

(d) Plot final line (best fit line) along with other data points.

Conclusion:
Hence, the prediction of CO2 Emission using Gradient Descent was implemented
successfully.

30
Shivendra Singh
A2305220681
Evalutaion:

31
Shivendra Singh
A2305220681
Practical-6 Date: 07-02-2023
AIM: Predict CO2 Emission using Multilinear Regression.

a. Using the above data build a multi-variable linear regression.

32
Shivendra Singh
A2305220681
b. What is the accuracy level of your model?

 The accuracy of the model is measured by the R^2 score

c. Which attributes have you used in this model?

 ENGINESIZE
 CYLINDERS
 FUELCONSUMPTION_CITY
33
Shivendra Singh
A2305220681
 FUELCONSUMPTION_HWY
 FUELCONSUMPTION_COMB.

d. Write your observation /comment about this data and the model.

 The ENGINESIZE and CYLINDERS attributes are positively correlated with CO2
emissions, while the fuel consumption attributes are negatively correlated.
 The linear regression model assumes a linear relationship between the input attributes
and the output variable, which may not be entirely accurate in this case.
 There may be other factors that influence CO2 emissions that are not captured by the
input attributes in this dataset.
 Overall, the model seems to provide a reasonably accurate prediction of CO2
emissions based on the available data

Conclusion:
Hence, the prediction of CO2 Emission using Multilinear Regression was implemented
successfully.

34
Shivendra Singh
A2305220681
Evalutaion:

35
Shivendra Singh
A2305220681
Practical-7 Date: 14-02-2023

AIM: Perform logistic regression on the “ChurnData.csv”.

Consider only following features from the file.

[['tenure', 'age', 'address', 'income', 'ed', 'employ', 'equip', 'callcard', 'wireless','churn']]

a. Print confusion matrix

36
Shivendra Singh
A2305220681
b. Print Classification matrix

Conclusion:
Hence, the Logistics Regression was implemented successfully.

37
Shivendra Singh
A2305220681
Evalutaion:

38
Shivendra Singh
A2305220681
Practical-8 Date: 21-03-2023

AIM: Linear Discriminant Analysis on the IRIS dataset

39
Shivendra Singh
A2305220681
Conclusion:
Hence, the Linear Discriminant Analysis on the IRIS dataset was done successfully.

40
Shivendra Singh
A2305220681
Evalutaion:

41
Shivendra Singh
A2305220681
Practical-9 Date: 28-03-2023

AIM: Using Support Vector Machine build a breast cancer detection

system. Use the breast cancer dataset available with the datasets package of
sklearn library.

a. Print the accuracy, precision and recall for the model built.

Conclusion:
Hence, Using Support Vector Machine a breast cancer detection system was built
successfully.

42
Shivendra Singh
A2305220681
Evalutaion:

43
Shivendra Singh
A2305220681
Practical-10 Date: 07-03-2023

AIM: Using decision tree, build an ML model on the “diabetes” dataset.

44
Shivendra Singh
A2305220681
Conclusion:
Hence, using decision tree a ML model on the “diabetes” dataset was built successfully.

45
Shivendra Singh
A2305220681
Evalutaion:

46
Shivendra Singh
A2305220681
Practical-11 Date: 14-03-2023

AIM: Classify the “Iris Dataset” using KNN. Print Precision, Recall, F1 and

Support.

Conclusion:
Hence, classification of the “Iris Dataset” using KNN was implemented successfully.

47
Shivendra Singh
A2305220681
Evalutaion:

48
Shivendra Singh
A2305220681
Practical-12 Date: 21-03-2023

AIM: You are given with “Mall-Customer”. Cluster the dataset using K-

Means Clustering. Show the steps for estimation of optimum value of k.


First few records of the dataset are given below.

49
Shivendra Singh
A2305220681
50
Shivendra Singh
A2305220681
The above code performs the following steps:
1.Loads the Mall-Customer dataset
2.Selects the columns to use for clustering
3.Scales the features using StandardScaler
4.Estimates the optimum value of k using the elbow method and the silhouette score
5.Clusters the dataset using the optimum value of k
6.Visualizes the clusters

The optimum value of k in k-means clustering can be estimated using the following steps:
Elbow method: Plot the within-cluster sum of squares (WCSS) against the number of clusters (k). The
WCSS is the sum of squared distances between each point in a cluster and its centroid. The plot will
have a shape like an elbow, and the optimum value of k will be at the "elbow" or the point where
the rate of decrease in WCSS slows down significantly. This method gives a visual representation of
the best k value for clustering.

Silhouette method: The silhouette score measures how similar a data point is to its own cluster
compared to other clusters. It ranges from -1 to 1, where a score closer to 1 indicates that the point
is well-matched to its own cluster and poorly matched to neighbouring clusters. Compute the
average silhouette score for different values of k and choose the k with the highest average score.
This method is more quantitative than the elbow method and can be used to find a more precise
value of k.
Conclusion:
Hence, the cluster the dataset using K-Means Clustering was done successfully.
51
Shivendra Singh
A2305220681
Evalutaion:

52
Shivendra Singh
A2305220681

You might also like