AI Final PDF
AI Final PDF
a) Matrix addition
import numpy as np
a=np.array([[1,2,3],[3,4,2],[2,4,3]])
b=np.array([[1,2,3],[3,4,2],[2,4,3]])
print("addition of matrix:\n",np.add(a,b))
OUTPUT:
addition of matrix:
[[2 4 6]
[6 8 4]
[4 8 6]]
b) Matrix Subtraction
print("substraction of matrix:\n",np.subtract(a,b))
OUTPUT:
substraction of matrix:
[[0 0 0]
[0 0 0]
[0 0 0]]
c) Matrix Multiplication
print("multiplication of matrix:\n",np.multiply(a,b))
OUTPUT:
multiplication of matrix:
[[ 1 4 9]
[ 9 16 4]
[ 4 16 9]]
d) Matrix Inversion
print("inversiotion of matrix:\n",np.linalg.inv(a))
OUTPUT:
inversiotion of matrix:
[[ 0.66666667 1. -1.33333333]
[-0.83333333 -0.5 1.16666667]
[ 0.66666667 0. -0.33333333]]
e) Transpose of a Matrix
print("transpose of matrix:\n",np.transpose(a))
OUTPUT:
transpose of matrix:
[[1 3 2]
[2 4 4]
[3 2 3]]
2. Write a Program to perform the following operations
a) Find the minimum and maximum element of the matrix
import numpy as np
a=np.array([[1,2,3],[3,4,2],[2,4,3]])
print("maximum value of matrix:\n",np.max(a))
OUTPUT:
maximum value of matrix:
4
b) Find the minimum and maximum element of each row in the matrix
print("minimum value of matrix:\n",np.min(a))
OUTPUT:
minimum value of matrix:
1
c) Find the minimum and maximum element of each column in the matrix
import numpy as np
A=np.array([[2,2,9],[2,5,8],[1,4,7]])
B=np.array([[9,5,1],[7,5,3],[8,9,6]])
A=np.array([[2,2,9],[2,5,8],[1,4,7]])
B=np.array([[9,5,1],[7,5,3],[8,9,6]])
A=np.array([[2,2,9],[2,5,8],[1,4,7]])
B=np.array([[9,5,1],[7,5,3],[8,9,6]])
print(eigenvalues_A)
OUTPUT:
[13.05054773+0.j 0.47472613+1.17633455j
0.47472613-1.17633455j]
import numpy as np
print(eigenvectors_A)
OUTPUT:
[[ 0.54476501+0.j 0.89728078+0.j
0.89728078-0.j ]
[ 0.65530983+0.j -0.05423648-0.36470464j
0.05423648+0.36470464j]
[ 0.52325913+0.j -0.140014 +0.19832352j -
0.140014 -0.19832352j]]
WEEK:3
3. Write a program for the following .
a. To generate an array of random numbers from a normal distribution for the
array of a given shape.
import numpy as np import
pandas as pd
# a.to generate an array of random numbers from normal
distribution+
random_array =np.random.normal(size=(3,3))
print(random_array)
OUTPUT:
[[-0.72882502 -0.96261731 0.75660861]
[ 0.26895935 0.33844856 -0.12340368]
[ 0.27789698 1.16782929 0.73894252]]
OUTPUT:
[[2 2 3]
[3 5 5]]
[[3]
[1]]
[[5 5 6]
[4 6 6]] [[6 6 9]
[3 5 5]]
c. Find minimum, maximum, mean in a given array. ( in both the axes )
# find min max mean of an array import
numpy as np
array1=np.random.randint(1,8,size=(3,3)) print(array1)
MIN=(np.min(array1,0)) MAX
=(np.max(array1,axis=1))
print(MIN) print(MAX)
OUTPUT:
[[1 5 2]
[3 1 6]
[6 5 2]]
[1 1 2]
[5 6 6]
OUTPUT:
np.arange array:
[0 2 4 6 8]
np.linspace array:
[ 0. 10.]
e. Create a pandas series from a given list.
list_data=[1,2,6,5]
pandas_series=pd.Series(list_data)
print("\n pandas series")
print(pandas_series)
OUTPUT:
pandas series
0 1
1 2
2 6
3 5
dtype: int64
f. Create pandas series with data and index and display the index values.
import pandas as pd
data_with_index={'a':30,'b':90,'c':46}
pandas_series=pd.Series(data_with_index)
print("\n pandas series")
print(pandas_series)
OUTPUT:
pandas series
a 30
b 90
c 46
dtype: int64
g. Create a data frame with columns at least 5 observations
i. Select a particular column from the DataFrame
ii. Summarize the data frame and observe the stats of the DataFrame
created iii. Observe the mean and standard deviation of the data frame and
print the values.
data={'Column1':[1,2,3,4,5],
'Column2':[10,20,30,40,50],
'Column3':[5.5,6.5,7.1,8.9,3.1],
'Column4':['A','B','C','D','E'],
'Column5':[True,False,True,False,True]}
data_frame=pd.DataFrame(data)
print("\ng. DataFrames with 5 columns:")
print(data_frame)
OUTPUT:
g. DataFrames with 5 columns:
Column1 Column2 Column3 Column4 Column5
0 1 10 5.5 A True
1 2 20 6.5 B False
2 3 30 7.1 C True
3 4 40 8.9 D False
4 5 50 3.1 E True
WEEK:4
7. Write a Program to determine the following in the Titanic Survival data.
import numpy as np
import pandas as pd
df=pd.read_csv("titanic_data.csv") print(df.head())
c. Find out the unique values in each categorical column and frequency of each
unique value. categorical_columns = df.select_dtypes(include='object').columns
print(df.nunique)
OUTPUT:
Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
1 2 1 1 ... 71.2833 C85 C
2 3 1 3 ... 7.9250 NaN S
3 4 1 1 ... 53.1000 C123 S
4 5 0 3 ... 8.0500 NaN S
.. ... ... ... ... ... ... ...
886 887 0 2 ... 13.0000 NaN S
887 888 1 1 ... 30.0000 B42 S
888 889 0 3 ... 23.4500 NaN S
889 890 1 1 ... 30.0000 C148 C
890 891 0 3 ... 7.7500 NaN Q
print(df.Sex.unique) OUTPUT:
1 female
2 female
3 female
4 male
886 male
887 female
888 female
889 male
890 male
Name: Sex, Length: 891, dtype: object> print(df.Pclass.unique) OUTPUT:
1 12
3
3 1
4 3
..
886 2 887
1 888 3
889 1
890 3
Name: Pclass, Length: 891, dtype: int64>
[7 rows x 7 columns]
gender_survival = df.groupby('Sex')['Survived'].sum()
print(gender_survival) OUTPUT:
Sex female
233 male
109
Name: Survived, dtype: int64
print(df.Parch.value_counts()) OUTPUT:
Parch
0 678
1 118
2 80
5 5
3 5
4 46
1
Name: count, dtype: int64
5 22 4
15 6
12
10 7
7 6
Name: count, dtype: int64
import pandas as pd
df=pd.read_csv("housing.csv") print(df)
OUTPUT:
longitude latitude ... median_house_value ocean_proximity
0 -122.23 37.88 ... 452600.0 NEAR BAY 1
-122.22 37.86 ... 358500.0 NEAR BAY 2 -
122.24 37.85 ... 352100.0 NEAR BAY 3 -
122.25 37.85 ... 341300.0 NEAR BAY 4 -
122.25 37.85 ... 342200.0 NEAR BAY
... ... ... ... ...
20635 -121.09 39.48 ... 78100.0 INLAND 20636
-121.21 39.49 ... 77100.0 INLAND
20637 -121.22 39.43 ... 92300.0 INLAND
20638 -121.32 39.43 ... 84700.0 INLAND 20639 -
121.24 39.37 ... 89400.0 INLAND
20635 25.0
20636 18.0 20637 17.0 20638 18.0
20639 16.0
Name: housing_median_age, Length: 20640, dtype: float64>
c. Determines top 10 localities with the high difference between income and
house value. Also, top 10 localities that have the lowest difference
df['New_column']=df['median_house_value']-df['median_income']
print(df['New_column']) OUTPUT:
4861 500000.5001
6688 500000.5001
16642 500000.2975
15661 500000.1457 15652
500000.1000
5887 17497.6333
19802 14998.4640
2521 14997.3393 2799
14996.9000
9188 14994.8068
Name: New_column, Length: 20640, dtype: float64
df.sort_values(by='New_column',ascending=False,inplace=True)
print(df.head(10)) OUTPUT:
4861 -118.28 34.02 ... <1H OCEAN 500000.5001
6688 -118.08 34.15 ... INLAND 500000.5001
16642 -120.67 35.30 ... NEAR OCEAN 500000.2975
15661 -122.42 37.78 ... NEAR BAY 500000.1457
15652 -122.41 37.80 ... NEAR BAY 500000.1000
6639 -118.15 34.15 ... <1H OCEAN 499999.8333
459 -122.25 37.87 ... NEAR BAY 499999.8304
89 -122.27 37.80 ... NEAR BAY 499999.7566
10448 -117.67 33.47 ... <1H OCEAN 499999.3607 17819
-121.90 37.39 ... <1H OCEAN 499999.2639
[10 rows x 11 columns]
print(df.tail(10))
OUTPUT:
2779 -114.65 32.79 ... INLAND 24999.1429
16186 -121.29 37.95 ... INLAND 22499.2083
14326 -117.16 32.71 ... NEAR OCEAN 22498.9082
1825 -122.32 37.93 ... NEAR BAY 22497.3250
13889 -116.57 35.43 ... INLAND 22497.2862
5887 -118.33 34.15 ... <1H OCEAN 17497.6333
19802 -123.17 40.31 ... INLAND 14998.4640
2521 -122.74 39.71 ... INLAND 14997.3393 2799
-117.02 36.40 ... INLAND 14996.9000 9188 -
117.86 34.24 ... INLAND 14994.8068
[10 rows x 11 columns]
d. What is the ratio of bedrooms to total rooms in the data
A=df['total_bedrooms'].sum() print(A)
OUTPUT:
10990309.0
B=df['total_rooms'].sum() print(B)
OUTPUT:
54402150.0
ratio=A/B
print(ratio)
OUTPUT:
0.2020197547339581
e. Determine the average price of a house for each type of ocean_proximity.
Avg_house_price=df.groupby('ocean_proximity')['median_house_value'].mean()
print(Avg_house_price) OUTPUT:
ocean_proximity
<1H OCEAN 240084.285464
INLAND 124805.392001
ISLAND 380440.000000
NEAR BAY 259212.311790
NEAR OCEAN 249433.977427
Name: median_house_value, dtype: float64
WEEK:7
10. Write a program to perform the following tasks
a. Determine the outliers in each non-categorical column of Titanic Data and
remove them.
numerical_columns=df.select_dtypes(exclude='object').columns
print(numerical_columns) for column in numerical_columns:
q1=df[column].quantile(0.25) q3=df[column].quantile(0.75)
iqr = q3-q1 low_b=q1-1.5*iqr up_b=q3+1.5*iqr
df1=df[(df[column] >= low_b) & (df[column] <= up_b)] print(df1)
OUTPUT:
Index(['PassengerId', 'Survived', 'Pclass', 'Age', 'SibSp', 'Parch', 'Fare'],
dtype='object')
PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
2 3 1 3 ... 7.9250 NaN S
3 4 1 1 ... 53.1000 C123 S
4 5 0 3 ... 8.0500 NaN S5 6 0 3 ... 8.4583
NaN Q .. ... ... ... ... ... ... ...
886 887 0 2 ... 13.0000 NaN S
887 888 1 1 ... 30.0000 B42 S
888 889 0 3 ... 23.4500 NaN S
889 890 1 1 ... 30.0000 C148 C
890 891 0 3 ... 7.7500 NaN Q
[775 rows x 12 columns]
c. If missing values are less than 30% of entire data then create a new data frame
1. Missing values in numeric columns are filled with the mean of the
corresponding column.
2. Missing values in categorical columns are filled with the most frequently
occurring value.
num = df.select_dtypes(exclude='object').columns
cat = df.select_dtypes(include='object').columns
print("\n",num) print("\n",cat)
[5 rows x 12 columns]
WEEK:9
11. Write a program to perform the following tasks
b. Convert data in each numerical column so that it lies in the range [0,1]
import pandas as pd
titanic_dataset=pd.read_csv("titanic_dataset.csv") df=titanic_dataset
print(df) output:
print(categorical_columns)
print("\ncategorical columns:",categorical_columns)
df=pd.get_dummies(df,columns=categorical_columns
) print(df.head()) OUTPUT:
Index(['Name', 'Sex', 'Ticket', 'Cabin', 'Embarked'], dtype='object')
0 1 0 3 ... 0 0 1
1 2 1 1 ... 1 0 0
2 3 1 3 ... 0 0 1
3 4 1 1 ... 0 0 1
4 5 0 3 ... 0 0 1
OUTPUT:
PassengerId Survived Pclass ... Embarked_C Embarked_Q
Embarked_S
titanic_df = pd.read_csv("titanic_dataset.csv")
print(titanic_df) print(titanic_df.info())
titanic_df['Age'].fillna(titanic_df['Age'].median(),inplace=True)
titanic_df.drop(['PassengerId','Name','Ticket','Cabin','Embarked','Sex'],
axis=1, inplace=True)
print(titanic_df)
X = titanic_df.drop('Survived',axis=1) y
= titanic_df['Survived']
print("Accuracy:",accuracy)
print("Precision:",precision)
print("recall:",recall) print
("F1 Score:",f1) print
("Confusion Matrix:")
print(conf_matrix)
OUTPUT:-
runfile('C:/Users/simlab24/Desktop/manjari ai 33/week 12.py',
wdir='C:/Users/simlab24/Desktop/manjari
ai 3')
PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
…………………………………………
………………………………………….
888 889 0 3 ... 23.4500 NaN S
889 890 1 1 ... 30.0000 C148 C
890 891 0 3 ... 7.7500 NaN Q
california_housing=fetch_california_housing()
x=california_housing.data y=california_housing.target
print(x) print(y)
x_train,x_test,y
_train,y_test=tr
ain_test_split(x,
y,test_size=0.2,r
andom_st
ate=42)
print(x_train,x_test,y_train,y_test)
scaler=StandardScaler()
x_train_scaled=scaler.fit_transform(x_train)
x_test_scaled=scaler.transform(x_test)
print(x_train_scaled)
print(x_test_scaled) OUTPUT:
[[ 8.3252 41. 6.98412698 ... 2.55555556 37.88 -122.23 ] [ 8.3014 21. 6.23813708 ...
2.10984183 37.86 -122.22 ] [ 7.2574 52. 8.28813559 ... 2.80225989 37.85 -122.24 ] ... [ 1.7
17. 5.20554273 ... 2.3256351 39.43 -121.22 ] [ 1.8672 18. 5.32951289 ... 2.12320917 39.43
121.32 ] [ 2.3886 16. 5.25471698 ... 2.61698113 39.37 -121.24 ]] [4.526 3.585 3.521 ... 0.923
0.847 0.894]
OUTPUT:
R2 score : 0.8049425298843443 root mean
squared error: 0.5055739907397444