0% found this document useful (0 votes)
45 views16 pages

PAI Practicle

The document contains details of Shivam (roll no. 23242) who is pursuing a Bachelor of Technology degree in Computer Science & Engineering from Dronacharya College of Engineering, Gurgaon. It includes a certificate stating that Shivam has completed the practical requirement for the degree by submitting a practical on "Big Data Lab" under supervision. It also contains a list of 10 practicals completed by Shivam along with signatures against each.

Uploaded by

Rohan 7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views16 pages

PAI Practicle

The document contains details of Shivam (roll no. 23242) who is pursuing a Bachelor of Technology degree in Computer Science & Engineering from Dronacharya College of Engineering, Gurgaon. It includes a certificate stating that Shivam has completed the practical requirement for the degree by submitting a practical on "Big Data Lab" under supervision. It also contains a list of 10 practicals completed by Shivam along with signatures against each.

Uploaded by

Rohan 7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Department of AIML

PAI Practile file

NAME: Shivam
BRANCH: CSE(AI-ML)
SEM: 6TH
ROLL NO: 23242

Shivam (23242)
Department of CSE AIML
Certificate
Certified that this Practical entitled “Big Data Lab” submitted by Shivam (23242), student
of Computer Science & Engineering Department, Dronacharya College of
Engineering, Gurgaon in the partial fulfillment of the requirement for the award
Bachelor’s of Technology (Branch) Degree of MDU, Rohtak, is a record of student own
study carried under my supervision & guidance.

Shivam (23242)
Sr. Practical Name Signature
No.
1. Introduction of various python libraries used for
machine
learning.
2. Write a program to perform data pre-processing
techniques for effective machine learning.
3. Write a program to apply different feature encoding
schemes on the given dataset.

4. Write a program to apply filter feature selection


techniques

5.

6.

7.

8.

9.

10.

Shivam (23242)
PROGRAM 1: Introduction of various python libraries used for machine learning.

Code:

[1]: pandas as pd import numpy as np


import

[2]: # reading data


data=pd.read_csv("data.csv")

[3]: data

[3]: Country Age Salary Purchased


0 France 44.0 72000.0 No
1 Spain 27.0 48000.0 Yes
2 Germany 30.0 54000.0 No
3 Spain 38.0 61000.0 No
4 Germany 40.0 NaN Yes
5 France 35.0 58000.0 Yes
6 Spain NaN 52000.0 No
7 France 48.0 79000.0 Yes
8 Germany 50.0 83000.0 No
9 France 37.0 67000.0 Yes

[4]: student_data = {"Name":['Prateek','Ronak','Geetanshu','Naman','Ankit'], "exam_no":[18,25,45,34,36],


"Result":['pass','fail','pass','pass','fail']}

df = pd.DataFrame(student_data) df

[4] : Name exam_no Result


0 Prateek 18 pass
1 Ronak 25 fail
2 Geetanshu 45 pass
3 Naman 34 pass
4 Ankit 36 fail

[6]: # access data with the help of label


[6] : df.loc[2,['Name']]
Name Geetanshu
Name: 2, dtype:
object

Shivam (23242)
[7]: df.iloc[2,0]

[7] : 'Geetanshu'

[]:

PROGRAM 2: Write a program to perform data pre-processing techniques for effective


machine learning

Shivam (23242)
[1]:# import pandas
import pandas as pd

[47]:#read csv file


df=pd.read_csv('data.csv')

[30]:# print first 5 elements


df.head()

[30]: Country Age Salary Purchased


0 France 44.0 72000.0 No
1 Spain 27.0 48000.0 Yes
2 Germany 30.0 54000.0 No
3 Spain 38.0 61000.0 No
4 Germany 40.0 NaN Yes

[6]:# import numpy


import numpy as np

[7]:# import StringIO


from io import StringIO

[31]:# check for the null value


df.isnull()

[31]: Country Age Salary Purchased


0 False False False False
1 False False False False
2 False False False False
3 False False False False
4 False False True False
5 False False False False
6 False True False False
7 False False False False
8 False False False False
9 False False False False

Shivam (23242)
[59]: # assign 10 in place of null value df["Age"].fillna(10, inplace = True) df["Salary"].fillna(10, inplace =
True)

[60]: # print updates dataset

df

[60]: Country Age Salary Purchased


0 France 44.0 72000.0 No
1 Spain 27.0 48000.0 Yes
2 Germany 30.0 54000.0 No
3 Spain 38.0 61000.0 No
4 Germany 40.0 10.0 Yes
5 France 35.0 58000.0 Yes
6 Spain 10.0 52000.0 No
7 France 48.0 79000.0 Yes
8 Germany 50.0 83000.0 No
9 France 37.0 67000.0 Yes

[34]: # check for null value after updation


df.isnull().sum()

[34]: Country 0
Age 0
Salary 0
Purchased 0
dtype: int64

[35]: # import SimpleImputer from sklearn


from sklearn.impute import SimpleImputer

[36]: # set model attributes


imr = SimpleImputer(strategy="constant",fill_value= 10 )

[37]: # Fit the data into the model


imr = imr.fit(df.values)

[54]: imputed_data = imr.transform(df.values)

[55]: # print data after transormed


imputed_data

[55]: array([['France', 44.0, 72000.0, 'No'],


['Spain', 27.0, 48000.0, 'Yes'],
['Germany', 30.0, 54000.0, 'No'],
['Spain', 38.0, 61000.0, 'No'],

['Germany', 40.0, 10, 'Yes'],


Shivam(23242)
['France', 35.0, 58000.0, 'Yes'],
['Spain', 10, 52000.0, 'No'],
['France', 48.0, 79000.0, 'Yes'],
['Germany', 50.0, 83000.0, 'No'],
['France', 37.0, 67000.0, 'Yes']], dtype=object)

Shivam(23242)
PROGRAM 3: Write a program to apply different feature encoding schemes on the given dataset.

[57]: #df.describe()

[57]: Age Salary


count 9.000000 9.000000
mean 38.777778 63777.777778
std 7.693793 12265.579662
min 27.000000 48000.000000
25% 35.000000 54000.000000
50% 38.000000 61000.000000
75% 44.000000 72000.000000
max 50.000000 83000.000000

[42]: # import and apply LabelEncoder to the data from sklearn.preprocessing import
LabelEncoder df_le= df
class_le = LabelEncoder()
df_le['Country'] = class_le.fit_transform(df_le['Country'].values) df_le

[42]: Country Age Salary Purchased


0 0 44.0 72000.0 No
1 2 27.0 48000.0 Yes
2 1 30.0 54000.0 No
3 2 38.0 61000.0 No
4 1 40.0 10.0 Yes
5 0 35.0 58000.0 Yes
6 2 10.0 52000.0 No
7 0 48.0 79000.0 Yes
8 1 50.0 83000.0 No
9 0 37.0 67000.0 Yes

[48]: df

[48]: Country Age Salary Purchased


0 France 44.0 72000.0 No
1 Spain 27.0 48000.0 Yes

Shivam(23242)
2 Germany 30.0 54000.0 No
3 Spain 38.0 61000.0 No
4 Germany 40.0 NaN Yes
5 France 35.0 58000.0 Yes
6 Spain NaN 52000.0 No
7 France 48.0 79000.0 Yes
8 Germany 50.0 83000.0 No
9 France 37.0 67000.0 Yes

[61]: df_new=pd.get_dummies(df)

[62]: df_new

[62]: Age Salary Country_France Country_Germany Country_Spain \


0 44.0 72000.0 1 0 0
1 27.0 48000.0 0 0 1
2 30.0 54000.0 0 1 0
3 38.0 61000.0 0 0 1
4 40.0 10.0 0 1 0
5 35.0 58000.0 1 0 0
6 10.0 52000.0 0 0 1
7 48.0 79000.0 1 0 0
8 50.0 83000.0 0 1 0
9 37.0 67000.0 1 0 0

Purchased_No Purchased_Yes
0 1 0
1 0 1
2 1 0
3 1 0
4 0 1
5 0 1
6 1 0
7 0 1
8 1 0
9 0 1

[63]: df_le['Country']

[63]: 0 0
1 2
2 1
3 2
4 1
5 0

Shivam(23242)
6 2

Shivam(23242)
7 0
8 1
9 0

Shivam(23242)
PROGRAM 4: Write a program to apply filter feature selection techniques.

Shivam(23242)
Shivam(23242)
Shivam(23242)
Shivam(23242)

You might also like