0% found this document useful (0 votes)

23 views38 pages

AI Final PDF

The document discusses various operations that can be performed on matrices and NumPy arrays like addition, subtraction, multiplication, inversion, transpose etc. It also discusses operations like finding minimum, maximum, mean, trace, rank and eigenvalues of matrices. Finally, it discusses performing similar operations on pandas DataFrames and analyzing the Titanic passenger data.

Uploaded by

Mr. G. Naga Chaitanya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views38 pages

AI Final PDF

Uploaded by

Mr. G. Naga Chaitanya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

WEEK:1,2

1. Write a Program to perform the following operations on matrices

a) Matrix addition
b) Matrix Subtraction
c) Matrix Multiplication
d) Matrix Inversion
e) Transpose of a Matrix

a) Matrix addition
import numpy as np
a=np.array([[1,2,3],[3,4,2],[2,4,3]])
b=np.array([[1,2,3],[3,4,2],[2,4,3]])
print("addition of matrix:\n",np.add(a,b))

OUTPUT:
addition of matrix:
[[2 4 6]
[6 8 4]

[4 8 6]]
b) Matrix Subtraction
print("substraction of matrix:\n",np.subtract(a,b))

OUTPUT:
substraction of matrix:
[[0 0 0]
[0 0 0]
[0 0 0]]
c) Matrix Multiplication
print("multiplication of matrix:\n",np.multiply(a,b))

OUTPUT:
multiplication of matrix:
[[ 1 4 9]
[ 9 16 4]
[ 4 16 9]]
d) Matrix Inversion
print("inversiotion of matrix:\n",np.linalg.inv(a))

OUTPUT:
inversiotion of matrix:
[[ 0.66666667 1. -1.33333333]
[-0.83333333 -0.5 1.16666667]
[ 0.66666667 0. -0.33333333]]
e) Transpose of a Matrix
print("transpose of matrix:\n",np.transpose(a))

OUTPUT:
transpose of matrix:
[[1 3 2]
[2 4 4]
[3 2 3]]
2. Write a Program to perform the following operations
a) Find the minimum and maximum element of the matrix
import numpy as np
a=np.array([[1,2,3],[3,4,2],[2,4,3]])
print("maximum value of matrix:\n",np.max(a))

OUTPUT:
maximum value of matrix:
4

b) Find the minimum and maximum element of each row in the matrix
print("minimum value of matrix:\n",np.min(a))

OUTPUT:
minimum value of matrix:
1

c) Find the minimum and maximum element of each column in the matrix
import numpy as np

A=np.array([[2,2,9],[2,5,8],[1,4,7]])
B=np.array([[9,5,1],[7,5,3],[8,9,6]])

#TO CALCULATE MINIMUM ELEMENT IN COLUMN WISE OF A MATRIX

print(np.max(A,axis=0))

#TO CALCULATE MAXIMUM ELEMENT IN COLUMN WISE OF A MATRIX

print(np.min(A,axis=0))
OUTPUT:
[2 5 9]
[1 2 7]
d) Find trace of the given matrix
import numpy as np

A=np.array([[2,2,9],[2,5,8],[1,4,7]])
B=np.array([[9,5,1],[7,5,3],[8,9,6]])

#TO CALCULATE TRACE OF A MATRIX print(np.trace(A))

OUTPUT:
14

e) Find rank of the given matrix

import numpy as np

A=np.array([[2,2,9],[2,5,8],[1,4,7]])
B=np.array([[9,5,1],[7,5,3],[8,9,6]])

#TO CALCULATE RANK OF A MATRIX

print(np.linalg.matrix_rank(A))
OUTPUT:
3

f) Find eigenvalues and eigenvectors of the given matrix

import numpy as np

# Define the matrices A and B

A = np.array([[2, 2, 9], [2, 5, 8], [1, 4, 7]]) B
= np.array([[9, 5, 1], [7, 5, 3], [8, 9, 6]])

# eigenvalues and eigenvectors for matrix A

eigenvalues_A, eigenvectors_A = np.linalg.eig(A)

print(eigenvalues_A)
OUTPUT:
[13.05054773+0.j 0.47472613+1.17633455j
0.47472613-1.17633455j]
import numpy as np

# Define the matrices A and B

A = np.array([[2, 2, 9], [2, 5, 8], [1, 4, 7]]) B
= np.array([[9, 5, 1], [7, 5, 3], [8, 9, 6]])

#eigenvalues and eigenvectors for matrix A

eigenvalues_A, eigenvectors_A = np.linalg.eig(A)

print(eigenvectors_A)

OUTPUT:
[[ 0.54476501+0.j 0.89728078+0.j
0.89728078-0.j ]
[ 0.65530983+0.j -0.05423648-0.36470464j
0.05423648+0.36470464j]
[ 0.52325913+0.j -0.140014 +0.19832352j -
0.140014 -0.19832352j]]
WEEK:3
3. Write a program for the following .
a. To generate an array of random numbers from a normal distribution for the
array of a given shape.
import numpy as np import
pandas as pd
# a.to generate an array of random numbers from normal
distribution+
random_array =np.random.normal(size=(3,3))
print(random_array)

OUTPUT:
[[-0.72882502 -0.96261731 0.75660861]
[ 0.26895935 0.33844856 -0.12340368]
[ 0.27789698 1.16782929 0.73894252]]

b. Implement Arithmetic operations on two arrays (perform broadcasting also.)

array1=np.random.randint(1,8,size=(2,3)) print(array1)
array2=np.random.randint(1,4,size=(2,1)) print(array2)
ADD = array1 + array2 print(ADD)
MUL = array1*array2 print(MUL)

OUTPUT:
[[2 2 3]
[3 5 5]]
[[3]
[1]]
[[5 5 6]
[4 6 6]] [[6 6 9]
[3 5 5]]
c. Find minimum, maximum, mean in a given array. ( in both the axes )
# find min max mean of an array import
numpy as np
array1=np.random.randint(1,8,size=(3,3)) print(array1)
MIN=(np.min(array1,0)) MAX
=(np.max(array1,axis=1))
print(MIN) print(MAX)

OUTPUT:
[[1 5 2]
[3 1 6]
[6 5 2]]
[1 1 2]
[5 6 6]

d. Implement np.arange and np.linspace functions.

#implement np.arange and np.linspace functions
arange_array=np.arange(0,10,2)
linspace_array=np.linspace(0,10,2)
print("\n np.arange array:")
print(arange_array)
print("\n np.linspace array:")
print(linspace_array)

OUTPUT:
np.arange array:
[0 2 4 6 8]
np.linspace array:
[ 0. 10.]
e. Create a pandas series from a given list.
list_data=[1,2,6,5]
pandas_series=pd.Series(list_data)
print("\n pandas series")
print(pandas_series)

OUTPUT:
pandas series
0 1
1 2
2 6
3 5
dtype: int64

f. Create pandas series with data and index and display the index values.
import pandas as pd
data_with_index={'a':30,'b':90,'c':46}
pandas_series=pd.Series(data_with_index)
print("\n pandas series")
print(pandas_series)

OUTPUT:

pandas series
a 30
b 90
c 46
dtype: int64
g. Create a data frame with columns at least 5 observations
i. Select a particular column from the DataFrame
ii. Summarize the data frame and observe the stats of the DataFrame
created iii. Observe the mean and standard deviation of the data frame and
print the values.
data={'Column1':[1,2,3,4,5],
'Column2':[10,20,30,40,50],
'Column3':[5.5,6.5,7.1,8.9,3.1],
'Column4':['A','B','C','D','E'],
'Column5':[True,False,True,False,True]}
data_frame=pd.DataFrame(data)
print("\ng. DataFrames with 5 columns:")
print(data_frame)
OUTPUT:
g. DataFrames with 5 columns:
Column1 Column2 Column3 Column4 Column5
0 1 10 5.5 A True
1 2 20 6.5 B False
2 3 30 7.1 C True
3 4 40 8.9 D False
4 5 50 3.1 E True
WEEK:4
7. Write a Program to determine the following in the Titanic Survival data.

import numpy as np
import pandas as pd
df=pd.read_csv("titanic_data.csv") print(df.head())

A. Determine the data type of each column.

print("A. Data Types:")
print(df.dtypes) OUTPUT:
A. Data Types:
PassengerId int64
Survived int64
Pclass int64
Name object
Sex object
Age float64
SibSp int64
Parch int64
Ticket object
Fare float64
Cabin object
Embarked object dtype:
object
b. Find the number of non-null values in each column.
non_null_counts = df.notnull().sum()
print(non_null_counts)
OUTPUT:
PassengerId 891
Survived 891
Pclass 891
Name 891
Sex 891
Age 714
SibSp 891
Parch 891
Ticket 891
Fare 891
Cabin 204
Embarked 889
dtype: int64

c. Find out the unique values in each categorical column and frequency of each
unique value. categorical_columns = df.select_dtypes(include='object').columns
print(df.nunique)
OUTPUT:
Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
1 2 1 1 ... 71.2833 C85 C
2 3 1 3 ... 7.9250 NaN S
3 4 1 1 ... 53.1000 C123 S
4 5 0 3 ... 8.0500 NaN S
.. ... ... ... ... ... ... ...
886 887 0 2 ... 13.0000 NaN S
887 888 1 1 ... 30.0000 B42 S
888 889 0 3 ... 23.4500 NaN S
889 890 1 1 ... 30.0000 C148 C
890 891 0 3 ... 7.7500 NaN Q

[891 rows x 12 columns]>

print(df.Sex.unique) OUTPUT:
1 female
2 female
3 female
4 male

886 male
887 female
888 female
889 male
890 male
Name: Sex, Length: 891, dtype: object> print(df.Pclass.unique) OUTPUT:
1 12
3
3 1
4 3
..
886 2 887
1 888 3
889 1
890 3
Name: Pclass, Length: 891, dtype: int64>

d. Find the number of rows where age is greater than

the mean age of data. age_mean = df['Age'].mean()
print(age_mean)
print(np.sum(df['Age']>age_mean)) OUTPUT:
29.69911764705882
330

e. Delete all the rows with missing values.

titanic_data_cleaned = df.dropna()
print(titanic_data_cleaned)
OUTPUT:
PassengerId Survived Pclass ... Fare Cabin Embarked
1 2 1 1 ... 71.2833 C85 C
3 4 1 1 ... 53.1000 C123 S
6 7 0 1 ... 51.8625 E46 S
10 11 1 3 ... 16.7000 G6 S 11
12 1 1 ... 26.5500 C103 S
.. ... ... ... ... ... ... ...
871 872 1 1 ... 52.5542 D35 S
872 873 0 1 ... 5.0000 B51 B53 B55 S
879 880 1 1 ... 83.1583 C50 C
887 888 1 1 ... 30.0000 B42 S
889 890 1 1 ... 30.0000 C148 C

[183 rows x 12 columns]

WEEK:5
8. Perform Data Analysis on the Titanic Data Set to answer the following.

import numpy as np import

pandas as pd
df=pd.read_csv("titanic_data.csv")
print(df.head())

A. Information regarding each column of the data print(df.info())

OUTPUT:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890 Data
columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Name 891 non-null object
4 Sex 891 non-null object
5 Age 714 non-null float64
6 SibSp 891 non-null int64
7 Parch 891 non-null int64
8 Ticket 891 non-null object
9 Fare 891 non-null float64
10 Cabin 204 non-null object 11 Embarked 889 non-null object dtypes:
float64(2), int64(5), object(5)

B. Impact of each column on the label

print(df.corr(method='pearson', min_periods=1, numeric_only=True)) OUTPUT:
PassengerId Survived Pclass ... SibSp Parch Fare
PassengerId 1.000000 -0.005007 -0.035144 ... -0.057527 -0.001652 0.012658
Survived -0.005007 1.000000 -0.338481 ... -0.035322 0.081629 0.257307
Pclass -0.035144 -0.338481 1.000000 ... 0.083081 0.018443 -0.549500
Age 0.036847 -0.077221 -0.369226 ... -0.308247 -0.189119 0.096067
SibSp -0.057527 -0.035322 0.083081 ... 1.000000 0.414838 0.159651
Parch -0.001652 0.081629 0.018443 ... 0.414838 1.000000 0.216225
Fare 0.012658 0.257307 -0.549500 ... 0.159651 0.216225 1.000000

[7 rows x 7 columns]

C. Number of survivals in each gender

print(df.Survived.value_counts()) OUTPUT:
Survived
0 549 1
342
Name: count, dtype: int64
print(df.Sex.value_counts()) OUTPUT:
Sex male
577 female
314
Name: count, dtype: int64

gender_survival = df.groupby('Sex')['Survived'].sum()
print(gender_survival) OUTPUT:
Sex female
233 male
109
Name: Survived, dtype: int64

D. Number of survivals in each passenger class

print(df.Pclass.value_counts()) OUTPUT:
Pclass
3 491
1 216
2 184
Name: count, dtype: int64 print(df.Survived.value_counts()) OUTPUT:
Survived
0 549 1 342
Name: count, dtype: int64
# Group by 'Pclass' and count the number of survivors
class_survival = df.groupby('Pclass')['Survived'].sum()
print(class_survival) OUTPUT:
Pclass
1 136
2 87
3 119
Name: Survived, dtype: int64

E. The number of people who are not alone.

# Create a new column 'Alone' indicating whether the passenger is alone or not
df['not_Alone'] = (df['SibSp'] + df['Parch']) > 0 print(df.SibSp.value_counts())
OUTPUT:
SibSp
0 608
1 209
2 28
4 18
3 16
8 75
5
Name: count, dtype: int64

print(df.Parch.value_counts()) OUTPUT:
Parch
0 678
1 118
2 80
5 5
3 5
4 46
1
Name: count, dtype: int64

print((df['SibSp'] + df['Parch']).value_counts()) OUTPUT:

0 537
1 161
2 102
3 29

5 22 4
15 6
12

10 7
7 6
Name: count, dtype: int64

# Count the number of people who are not alone

not_alone_count = df['not_Alone'].sum()
print(not_alone_count) OUTPUT:
35
WEEK:6
9. Perform Data Analysis on the California House Price data to answer the
following

import pandas as pd
df=pd.read_csv("housing.csv") print(df)
OUTPUT:
longitude latitude ... median_house_value ocean_proximity
0 -122.23 37.88 ... 452600.0 NEAR BAY 1
-122.22 37.86 ... 358500.0 NEAR BAY 2 -
122.24 37.85 ... 352100.0 NEAR BAY 3 -
122.25 37.85 ... 341300.0 NEAR BAY 4 -
122.25 37.85 ... 342200.0 NEAR BAY
... ... ... ... ...
20635 -121.09 39.48 ... 78100.0 INLAND 20636
-121.21 39.49 ... 77100.0 INLAND
20637 -121.22 39.43 ... 92300.0 INLAND
20638 -121.32 39.43 ... 84700.0 INLAND 20639 -
121.24 39.37 ... 89400.0 INLAND

[20640 rows x 10 columns]

A.Data Type of each column and info regarding each column.

print(df.dtypes)
print(df.info())
OUTPUT:
longitude float64 latitude
float64
housing_median_age float64
total_rooms float64
total_bedrooms float64
population float64
households float64
median_income float64
median_house_value float64
ocean_proximity object dtype:
object
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20640 entries, 0 to 20639 Data
columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 longitude 20640 non-null float64
1 latitude 20640 non-null float64
2 housing_median_age 20640 non-null float64
3 total_rooms 20640 non-null float64 4 total_bedrooms
20433 non-null float64
5 population 20640 non-null float64
6 households 20640 non-null float64
7 median_income 20640 non-null float64 8 median_house_value
20640 non-null float64 9 ocean_proximity 20640 non-null object dtypes:
float64(9), object(1)
b. The average age of a house in the data set.
average_age_house=df['housing_median_age'].mean
print(average_age_house) OUTPUT:
1 21.0 2
52.0 3
52.0 4
52.0

20635 25.0
20636 18.0 20637 17.0 20638 18.0
20639 16.0
Name: housing_median_age, Length: 20640, dtype: float64>

c. Determines top 10 localities with the high difference between income and
house value. Also, top 10 localities that have the lowest difference
df['New_column']=df['median_house_value']-df['median_income']
print(df['New_column']) OUTPUT:
4861 500000.5001
6688 500000.5001
16642 500000.2975
15661 500000.1457 15652
500000.1000
5887 17497.6333
19802 14998.4640
2521 14997.3393 2799
14996.9000
9188 14994.8068
Name: New_column, Length: 20640, dtype: float64

df.sort_values(by='New_column',ascending=False,inplace=True)
print(df.head(10)) OUTPUT:
4861 -118.28 34.02 ... <1H OCEAN 500000.5001
6688 -118.08 34.15 ... INLAND 500000.5001
16642 -120.67 35.30 ... NEAR OCEAN 500000.2975
15661 -122.42 37.78 ... NEAR BAY 500000.1457
15652 -122.41 37.80 ... NEAR BAY 500000.1000
6639 -118.15 34.15 ... <1H OCEAN 499999.8333
459 -122.25 37.87 ... NEAR BAY 499999.8304
89 -122.27 37.80 ... NEAR BAY 499999.7566
10448 -117.67 33.47 ... <1H OCEAN 499999.3607 17819
-121.90 37.39 ... <1H OCEAN 499999.2639
[10 rows x 11 columns]

print(df.tail(10))
OUTPUT:
2779 -114.65 32.79 ... INLAND 24999.1429
16186 -121.29 37.95 ... INLAND 22499.2083
14326 -117.16 32.71 ... NEAR OCEAN 22498.9082
1825 -122.32 37.93 ... NEAR BAY 22497.3250
13889 -116.57 35.43 ... INLAND 22497.2862
5887 -118.33 34.15 ... <1H OCEAN 17497.6333
19802 -123.17 40.31 ... INLAND 14998.4640
2521 -122.74 39.71 ... INLAND 14997.3393 2799
-117.02 36.40 ... INLAND 14996.9000 9188 -
117.86 34.24 ... INLAND 14994.8068
[10 rows x 11 columns]
d. What is the ratio of bedrooms to total rooms in the data

A=df['total_bedrooms'].sum() print(A)
OUTPUT:
10990309.0

B=df['total_rooms'].sum() print(B)
OUTPUT:
54402150.0

ratio=A/B
print(ratio)
OUTPUT:
0.2020197547339581
e. Determine the average price of a house for each type of ocean_proximity.
Avg_house_price=df.groupby('ocean_proximity')['median_house_value'].mean()
print(Avg_house_price) OUTPUT:
ocean_proximity
<1H OCEAN 240084.285464
INLAND 124805.392001
ISLAND 380440.000000
NEAR BAY 259212.311790
NEAR OCEAN 249433.977427
Name: median_house_value, dtype: float64
WEEK:7
10. Write a program to perform the following tasks
a. Determine the outliers in each non-categorical column of Titanic Data and
remove them.
numerical_columns=df.select_dtypes(exclude='object').columns
print(numerical_columns) for column in numerical_columns:
q1=df[column].quantile(0.25) q3=df[column].quantile(0.75)
iqr = q3-q1 low_b=q1-1.5*iqr up_b=q3+1.5*iqr
df1=df[(df[column] >= low_b) & (df[column] <= up_b)] print(df1)

OUTPUT:
Index(['PassengerId', 'Survived', 'Pclass', 'Age', 'SibSp', 'Parch', 'Fare'],
dtype='object')
PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
2 3 1 3 ... 7.9250 NaN S
3 4 1 1 ... 53.1000 C123 S
4 5 0 3 ... 8.0500 NaN S5 6 0 3 ... 8.4583
NaN Q .. ... ... ... ... ... ... ...
886 887 0 2 ... 13.0000 NaN S
887 888 1 1 ... 30.0000 B42 S
888 889 0 3 ... 23.4500 NaN S
889 890 1 1 ... 30.0000 C148 C
890 891 0 3 ... 7.7500 NaN Q
[775 rows x 12 columns]

b. Determine missing values in each column of Titanic data. If missing values

account for 30% of data, then remove the column.
th_p=30
missing_p= df.isnull().mean()*100 print(missing_p)
remove_c= missing_p[missing_p >= th_p].index
df2= df.drop(columns=remove_c) print(df2)
OUTPUT:
PassengerId 0.000000
Survived 0.000000
Pclass 0.000000
Name 0.000000
Sex 0.000000
Age 19.865320
SibSp 0.000000
Parch 0.000000
Ticket 0.000000
Fare 0.000000
Cabin 77.104377
Embarked 0.224467 dtype:
float64
PassengerId Survived Pclass ... Ticket Fare Embarked
0 1 0 3 ... A/5 21171 7.2500 S
1 2 1 1 ... PC 17599 71.2833 C
2 3 1 3 ... STON/O2. 3101282 7.9250 S
3 4 1 1 ... 113803 53.1000 S
4 5 0 3 ... 373450 8.0500 S .. ... ... ... ...
... ... ...
886 887 0 2 ... 211536 13.0000 S 887
888 1 1 ... 112053 30.0000 S
888 889 0 3 ... W./C. 6607 23.4500 S
889 890 1 1 ... 111369 30.0000 C
890 891 0 3 ... 370376 7.7500 Q

[891 rows x 11 columns]

c. If missing values are less than 30% of entire data then create a new data frame
1. Missing values in numeric columns are filled with the mean of the
corresponding column.
2. Missing values in categorical columns are filled with the most frequently
occurring value.

num = df.select_dtypes(exclude='object').columns
cat = df.select_dtypes(include='object').columns
print("\n",num) print("\n",cat)

for column in num:

df[column].fillna(df[column].mean(), inplace=True) for
column in cat:
df[column].fillna(df[column].mode()[0], inplace=True)
print(df.head()) OUTPUT:
Index(['PassengerId', 'Survived', 'Pclass', 'Age', 'SibSp', 'Parch', 'Fare'],
dtype='object')

Index(['Name', 'Sex', 'Ticket', 'Cabin', 'Embarked'], dtype='object')

PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 B96 B98 S
1 2 1 1 ... 71.2833 C85 C
2 3 1 3 ... 7.9250 B96 B98 S
3 4 1 1 ... 53.1000 C123 S
4 5 0 3 ... 8.0500 B96 B98 S

[5 rows x 12 columns]
WEEK:9
11. Write a program to perform the following tasks

a. Determine the categorical columns in Titanic Dataset. Convert Columns

with string data type to numerical data using encoding techniques.

b. Convert data in each numerical column so that it lies in the range [0,1]

import pandas as pd
titanic_dataset=pd.read_csv("titanic_dataset.csv") df=titanic_dataset
print(df) output:

PassengerId Survived Pclass ... Fare Cabin Embarked

0 1 0 3 ... 7.2500 NaN S

1 2 1 1 ... 71.2833 C85

………………………………………………………………………………………………………………
………………………………………………………………………………………………………..
S

888 889 0 3 ... 23.4500 NaN S

889 890 1 1 ... 30.0000 C148 C

890 891 0 3 ... 7.7500 NaN Q

[891 rows x 12 columns]

a. Determine the categorical columns in Titanic Dataset. Convert Columns
with string data type to numerical data using encoding techniques.
categorical_columns=df.select_dtypes(include='object').columns

print(categorical_columns)

print("\ncategorical columns:",categorical_columns)

df=pd.get_dummies(df,columns=categorical_columns
) print(df.head()) OUTPUT:
Index(['Name', 'Sex', 'Ticket', 'Cabin', 'Embarked'], dtype='object')

categorical columns: Index(['Name', 'Sex', 'Ticket', 'Cabin', 'Embarked'],

dtype='object')
PassengerId Survived Pclass ... Embarked_C Embarked_Q Embarked_S

0 1 0 3 ... 0 0 1
1 2 1 1 ... 1 0 0
2 3 1 3 ... 0 0 1
3 4 1 1 ... 0 0 1
4 5 0 3 ... 0 0 1

5 rows x 1731 columns]

Convert data in each numerical column so that it lies in the range
[0,1]

from sklearn.preprocessing import MinMaxScaler

numerical_columns=df.select_dtypes(exclude='object').columns
scaler=MinMaxScaler()
df[numerical_columns]=scaler.fit_transform(df[numerical_columns])
print(df.head())

OUTPUT:
PassengerId Survived Pclass ... Embarked_C Embarked_Q
Embarked_S

0 0.000000 0.0 1.0 ... 0.0 0.0 1.0

1 0.001124 1.0 0.0 ... 1.0 0.0 0.0
2 0.002247 1.0 1.0 ... 0.0 0.0 1.0
3 0.003371 1.0 0.0 ... 0.0 0.0 1.0
4 0.004494 0.0 1.0 ... 0.0 0.0 1.0

[5 rows x 1731 columns]

WEEK 10
12.Implement the following models on Titanic Dataset and determine
the values of accuracy, precision, recall, f1 score and confusion matrix
for the test data.

a. Logistic Regression import

pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression from
sklearn.metrics import
accuracy_score,precision_score,recall_score,f1_score,confusion_matrix

titanic_df = pd.read_csv("titanic_dataset.csv")

print(titanic_df) print(titanic_df.info())

titanic_df['Age'].fillna(titanic_df['Age'].median(),inplace=True)
titanic_df.drop(['PassengerId','Name','Ticket','Cabin','Embarked','Sex'],
axis=1, inplace=True)

print(titanic_df)

X = titanic_df.drop('Survived',axis=1) y
= titanic_df['Survived']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,

random_state = 2003)
#Tranining the logistic regression model log_model
= LogisticRegression() log_model.fit(X_train,
y_train)

#Making predictions on the test set y_pred

= log_model.predict(X_test)

#Calculating evalution metrics accuracy =

accuracy_score(y_test, y_pred) precision =
precision_score(y_test, y_pred) recall =
recall_score(y_test, y_pred) f1 =
f1_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

print("Accuracy:",accuracy)
print("Precision:",precision)
print("recall:",recall) print
("F1 Score:",f1) print
("Confusion Matrix:")
print(conf_matrix)

OUTPUT:-
runfile('C:/Users/simlab24/Desktop/manjari ai 33/week 12.py',
wdir='C:/Users/simlab24/Desktop/manjari
ai 3')
PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
…………………………………………
………………………………………….
888 889 0 3 ... 23.4500 NaN S
889 890 1 1 ... 30.0000 C148 C
890 891 0 3 ... 7.7500 NaN Q

[891 rows x 12 columns]

<class 'pandas.core.frame.DataFrame'>
…………………………………..
……………………………………
888 0 3 28.0 1 2 23.4500
889 1 1 26.0 0 0 30.0000
890 0 3 32.0 0 0 7.7500

[891 rows x 6 columns]

Accuracy: 0.7374301675977654
Precision: 0.7906976744186046
recall: 0.4722222222222222 F1
Score: 0.591304347826087
Confusion Matrix:
[[98 9]
[38 34]]
WEEK:11
13. Implement the following models on the California House Pricing
Dataset and determine the values of R2 score, the area under roc curve
and root mean squared error for the test set.
a. Linear Regression with Polynomial Features
b. Random Forest Regressor

import pandas as pd import

numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split from
sklearn.preprocessing import
PolynomialFeatures,StandardScaler
from sklearn.linear_model import LinearRegression from
sklearn.ensemble import RandomForestRegressor from
sklearn.metrics import r2_score,mean_squared_error

california_housing=fetch_california_housing()
x=california_housing.data y=california_housing.target

print(x) print(y)
x_train,x_test,y
_train,y_test=tr
ain_test_split(x,
y,test_size=0.2,r
andom_st
ate=42)
print(x_train,x_test,y_train,y_test)

scaler=StandardScaler()
x_train_scaled=scaler.fit_transform(x_train)
x_test_scaled=scaler.transform(x_test)

print(x_train_scaled)
print(x_test_scaled) OUTPUT:
[[ 8.3252 41. 6.98412698 ... 2.55555556 37.88 -122.23 ] [ 8.3014 21. 6.23813708 ...
2.10984183 37.86 -122.22 ] [ 7.2574 52. 8.28813559 ... 2.80225989 37.85 -122.24 ] ... [ 1.7
17. 5.20554273 ... 2.3256351 39.43 -121.22 ] [ 1.8672 18. 5.32951289 ... 2.12320917 39.43
121.32 ] [ 2.3886 16. 5.25471698 ... 2.61698113 39.37 -121.24 ]] [4.526 3.585 3.521 ... 0.923
0.847 0.894]

a. Linear Regression with Polynomial Features

poly=PolynomialFeatures(degree=2)
x_train_poly=poly.fit_transform(x_train_scaled)
x_test_poly=poly.transform(x_test_scaled)
print(x_train_poly) print(x_test_poly)
linear_reg=LinearRegression()
linear_reg.fit(x_train_poly,y_train)
y_pred_linear=linear_reg.predict(x_test_poly)
r2_linear=r2_score(y_test,y_pred_linear)
rmse_linear=np.sqrt(mean_squared_error(y_test,y_pred_linear))
print("r2 score:",r2_linear) print("root mean squared error:",
rmse_linear) OUTPUT: r2 score:0.6456819729261787 root
mean squared error: 0.6813967448044771
b. Random Forest Regressor
random_forest=RandomForestRegressor(n_estimators=100,rando
m_state=42) random_forest.fit(x_train_scaled,y_train)
y_pred_rf=random_forest.predict(x_test_scaled)
r2_rf=r2_score(y_test,y_pred_rf)
rmse_rf=np.sqrt(mean_squared_error(y_test,y_pred_rf)) print("R2
score :",r2_rf) print("root mean squared error:",rmse_rf)

OUTPUT:
R2 score : 0.8049425298843443 root mean
squared error: 0.5055739907397444

Numpy Dataframe
No ratings yet
Numpy Dataframe
12 pages
Introduction To Numpy: Aniruddh Kadam Reg No-12109237 Lovely Professional University
100% (1)
Introduction To Numpy: Aniruddh Kadam Reg No-12109237 Lovely Professional University
84 pages
Fds PDF
No ratings yet
Fds PDF
58 pages
Ds File
100% (1)
Ds File
40 pages
Python Course Cheat Sheet
No ratings yet
Python Course Cheat Sheet
30 pages
Ds Lab-1
No ratings yet
Ds Lab-1
40 pages
ML Journal
No ratings yet
ML Journal
58 pages
Lab Manual Python Programming Language
No ratings yet
Lab Manual Python Programming Language
21 pages
Data Science
No ratings yet
Data Science
42 pages
Practical Record Programs - Solutions
No ratings yet
Practical Record Programs - Solutions
23 pages
Wa0012.
No ratings yet
Wa0012.
30 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
45 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
ELE492 - ELE492 - Image Process Lecture Notes 5
No ratings yet
ELE492 - ELE492 - Image Process Lecture Notes 5
41 pages
Numpy QP
No ratings yet
Numpy QP
4 pages
Data Science Practical Problems
No ratings yet
Data Science Practical Problems
40 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
QP - Info - Gr.12 - June Test - 2021 - MS
No ratings yet
QP - Info - Gr.12 - June Test - 2021 - MS
9 pages
Project On Linear Programming Problems
No ratings yet
Project On Linear Programming Problems
13 pages
Ip Project Work 2
No ratings yet
Ip Project Work 2
52 pages
Python Unit-5
No ratings yet
Python Unit-5
14 pages
GE Python Visualization 2023
No ratings yet
GE Python Visualization 2023
16 pages
Fds Answers
No ratings yet
Fds Answers
53 pages
Pandas Questions Ip File
No ratings yet
Pandas Questions Ip File
13 pages
Series and Pandas Methods
No ratings yet
Series and Pandas Methods
5 pages
2023 Data Analysis and Visualization Using Python
100% (2)
2023 Data Analysis and Visualization Using Python
9 pages
DAI 101 Tutorial 2 - Solution
No ratings yet
DAI 101 Tutorial 2 - Solution
12 pages
Gec Practicals
No ratings yet
Gec Practicals
31 pages
OSDBMS
No ratings yet
OSDBMS
59 pages
Vanshika Goyal Gec Practicals
No ratings yet
Vanshika Goyal Gec Practicals
31 pages
DAV Practicle File
No ratings yet
DAV Practicle File
28 pages
QP - Info - Gr.12 - June MT - 2022 - MS
No ratings yet
QP - Info - Gr.12 - June MT - 2022 - MS
15 pages
UNIT-4 Important Q-A
No ratings yet
UNIT-4 Important Q-A
28 pages
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
No ratings yet
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
16 pages
Series
No ratings yet
Series
31 pages
Ge - Computer Science Data Analysis
No ratings yet
Ge - Computer Science Data Analysis
16 pages
Section 7
No ratings yet
Section 7
33 pages
Ip 12 MT1 2024
No ratings yet
Ip 12 MT1 2024
3 pages
Python 1
No ratings yet
Python 1
16 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
IP 12 QualiTest 2024
No ratings yet
IP 12 QualiTest 2024
7 pages
DSC Lab Programs
No ratings yet
DSC Lab Programs
24 pages
DV Lab Manual Modified
No ratings yet
DV Lab Manual Modified
31 pages
FDS Record-1-4
No ratings yet
FDS Record-1-4
18 pages
DAVPy 2024GE
No ratings yet
DAVPy 2024GE
12 pages
DAI 101 Tutorial
No ratings yet
DAI 101 Tutorial
12 pages
DXE 24gksmknvj
No ratings yet
DXE 24gksmknvj
16 pages
AD3301 DEV Lab Manual
No ratings yet
AD3301 DEV Lab Manual
26 pages
EX-02-Data Manipulation Pandas Matplot
No ratings yet
EX-02-Data Manipulation Pandas Matplot
9 pages
Oddstudents
No ratings yet
Oddstudents
35 pages
Data Analysis 6060
No ratings yet
Data Analysis 6060
6 pages
DV Lab2 Updated
No ratings yet
DV Lab2 Updated
12 pages
Assignment 7
No ratings yet
Assignment 7
1 page
Even Students
No ratings yet
Even Students
36 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
12 pages
GE - Computer Scien 4ogygeb
No ratings yet
GE - Computer Scien 4ogygeb
8 pages
PYQ Data Analysis and Visualisation Using Python GE May 2024
No ratings yet
PYQ Data Analysis and Visualisation Using Python GE May 2024
6 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Shalvin
No ratings yet
Shalvin
9 pages
2020-21 XIIInfo - Pract.S.E.155
No ratings yet
2020-21 XIIInfo - Pract.S.E.155
11 pages
Nopehjdgs Ufvvdyvhuf8trdsvtrveryter Treroetysiov5yhuetyutdbuzfoyifbvigxdftuvsdhuibrsh
0% (1)
Nopehjdgs Ufvvdyvhuf8trdsvtrveryter Treroetysiov5yhuetyutdbuzfoyifbvigxdftuvsdhuibrsh
2 pages
Newbold Sbe8 Tif Ch07
100% (3)
Newbold Sbe8 Tif Ch07
42 pages
Understanding Machine Learning From Theory To Algorithms 1nbsped 9781107057135 Compress
No ratings yet
Understanding Machine Learning From Theory To Algorithms 1nbsped 9781107057135 Compress
310 pages
Unit 23 Tfcwe
No ratings yet
Unit 23 Tfcwe
25 pages
Unit 21 Tfcwe
No ratings yet
Unit 21 Tfcwe
22 pages
MH4514 Notes
No ratings yet
MH4514 Notes
528 pages
LectureNote CM2
No ratings yet
LectureNote CM2
51 pages
b0a1214b-1060-4689-aa3d-0f9c9d54a893 (1)
No ratings yet
b0a1214b-1060-4689-aa3d-0f9c9d54a893 (1)
10 pages
Jaqueline
No ratings yet
Jaqueline
11 pages
Fourier 4
No ratings yet
Fourier 4
73 pages
Binary Search Algorithm - Data Structure
No ratings yet
Binary Search Algorithm - Data Structure
3 pages
Lesson 7 Pulse Modulation
No ratings yet
Lesson 7 Pulse Modulation
38 pages
OCPI CO Analysis
No ratings yet
OCPI CO Analysis
34 pages
LPP-Graphical Method
No ratings yet
LPP-Graphical Method
48 pages
PRP (2020-21, 18 Batch) CO Analysis
No ratings yet
PRP (2020-21, 18 Batch) CO Analysis
31 pages
Signals Spectra, and Signal Processing Laboratory: October 7, 2020 October 7, 2020
No ratings yet
Signals Spectra, and Signal Processing Laboratory: October 7, 2020 October 7, 2020
16 pages
Forpublicshare... Multivariate Time Series Clustering And...
No ratings yet
Forpublicshare... Multivariate Time Series Clustering And...
21 pages
Traversable Wormholes Time Travel
No ratings yet
Traversable Wormholes Time Travel
2 pages
AU Covering Letter
No ratings yet
AU Covering Letter
1 page
2017 batch-VI sem-CPDE (2019-20) CO ATTAINMENT
No ratings yet
2017 batch-VI sem-CPDE (2019-20) CO ATTAINMENT
15 pages
Akar Persamaan 1
No ratings yet
Akar Persamaan 1
15 pages
NL2SQL Schema Linked Guide
No ratings yet
NL2SQL Schema Linked Guide
4 pages
ECO113 Practice Problems
No ratings yet
ECO113 Practice Problems
6 pages
SSL - C4.5 Rules
No ratings yet
SSL - C4.5 Rules
13 pages
Cp467 12 Lecture6 Sharpening
No ratings yet
Cp467 12 Lecture6 Sharpening
33 pages
ISM Unit - 2
No ratings yet
ISM Unit - 2
9 pages
On The Robustness of Binomial Model and Finite Difference Method For Pricing European Options
No ratings yet
On The Robustness of Binomial Model and Finite Difference Method For Pricing European Options
7 pages
Model Seirs Penyakit Malaria Dengan Vaksinasi
No ratings yet
Model Seirs Penyakit Malaria Dengan Vaksinasi
47 pages
Ci 10cs56 Flat
No ratings yet
Ci 10cs56 Flat
9 pages
Expected Value
No ratings yet
Expected Value
3 pages
IR - Lecture 2
No ratings yet
IR - Lecture 2
35 pages
L 11 Circle Drawing Algorithims 2
No ratings yet
L 11 Circle Drawing Algorithims 2
6 pages
2D FFT Without Using 1D FFT A PREPRINT
No ratings yet
2D FFT Without Using 1D FFT A PREPRINT
9 pages
Tutorial 5, Design and Analysis of Algorithms, 2024
No ratings yet
Tutorial 5, Design and Analysis of Algorithms, 2024
2 pages
Tezis Eng
No ratings yet
Tezis Eng
12 pages
Section 5. Graphing Systems: 5A. The Phase Plane
No ratings yet
Section 5. Graphing Systems: 5A. The Phase Plane
5 pages
Facebook Profiles Clustering
No ratings yet
Facebook Profiles Clustering
5 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet

AI Final PDF

Uploaded by

AI Final PDF

Uploaded by

WEEK:1,2

1. Write a Program to perform the following operations on matrices

#TO CALCULATE MINIMUM ELEMENT IN COLUMN WISE OF A MATRIX

#TO CALCULATE MAXIMUM ELEMENT IN COLUMN WISE OF A MATRIX

#TO CALCULATE TRACE OF A MATRIX print(np.trace(A))

e) Find rank of the given matrix

#TO CALCULATE RANK OF A MATRIX

f) Find eigenvalues and eigenvectors of the given matrix

# Define the matrices A and B

# eigenvalues and eigenvectors for matrix A

# Define the matrices A and B

#eigenvalues and eigenvectors for matrix A

b. Implement Arithmetic operations on two arrays (perform broadcasting also.)

d. Implement np.arange and np.linspace functions.

A. Determine the data type of each column.

[891 rows x 12 columns]>

d. Find the number of rows where age is greater than

e. Delete all the rows with missing values.

[183 rows x 12 columns]

import numpy as np import

A. Information regarding each column of the data print(df.info())

B. Impact of each column on the label

C. Number of survivals in each gender

D. Number of survivals in each passenger class

E. The number of people who are not alone.

print((df['SibSp'] + df['Parch']).value_counts()) OUTPUT:

# Count the number of people who are not alone

[20640 rows x 10 columns]

A.Data Type of each column and info regarding each column.

b. Determine missing values in each column of Titanic data. If missing values

[891 rows x 11 columns]

for column in num:

Index(['Name', 'Sex', 'Ticket', 'Cabin', 'Embarked'], dtype='object')

a. Determine the categorical columns in Titanic Dataset. Convert Columns

PassengerId Survived Pclass ... Fare Cabin Embarked

0 1 0 3 ... 7.2500 NaN S

1 2 1 1 ... 71.2833 C85

888 889 0 3 ... 23.4500 NaN S

890 891 0 3 ... 7.7500 NaN Q

[891 rows x 12 columns]

categorical columns: Index(['Name', 'Sex', 'Ticket', 'Cabin', 'Embarked'],

5 rows x 1731 columns]

from sklearn.preprocessing import MinMaxScaler

0 0.000000 0.0 1.0 ... 0.0 0.0 1.0

[5 rows x 1731 columns]

a. Logistic Regression import

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,

#Making predictions on the test set y_pred

#Calculating evalution metrics accuracy =

[891 rows x 12 columns]

[891 rows x 6 columns]

import pandas as pd import

a. Linear Regression with Polynomial Features

You might also like