0% found this document useful (0 votes)
0 views16 pages

Python

Uploaded by

dhairyalakhani08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views16 pages

Python

Uploaded by

dhairyalakhani08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

1.

creating 1 rank array


import numpy as np
arr= np.array([1,2,3])
print("the array is :", arr)

output :
the array is : [1 2 3]

2. creating rank 2 array


import numpy as np
arr= np.array([[1,2,3],['a','b','c']])
print("the array is : \n", arr)

output :
the array is :
[['1' '2' '3']
['a' 'b' 'c']]

import numpy as np
arr= np.array([[1,2,3],[23,56,90]])
print("the array is : \n", arr)

output :
the array is :
[[ 1 2 3]
[23 56 90]]

3. creating an array from the tuple


import numpy as np
arr= np.array((1,2,3))
print("the array is : \n", arr)

output :
the array is :
[1 2 3]

4. creating a series from scalar values


import pandas as pd
s1=pd.Series([34,90,12])
print("the series is : \n", s1)

output :
the series is :
0 34
1 90
2 12
dtype: int64

5. Creation of a DataFrame from NumPy arrays


array1=np.array([90,100,110,120])
array2=np.array([50,60,70])
array3=np.array([10,20,30,40])
marksDF = pd.DataFrame([array1, array2, array3],
columns=[ 'A', 'B', 'C', 'D'])
print(marksDF)

output :
A B C D
0 90 100 110 120.0
1 50 60 70 NaN
2 10 20 30 40.0
6. Creation of a DataFrame from dictionary of array/lists:
import pandas as pd
data = {'Name':['Varun', 'Ganesh', 'Joseph', 'Abdul','Reena'],
'Age':[37,30,38, 39,40]}
df = pd.DataFrame(data)
print(df)

output :
Name Age
0 Varun 37
1 Ganesh 30
2 Joseph 38
3 Abdul 39
4 Reena 40

7. Creation of DataFrame from List of Dictionaries


listDict = [{'a':10, 'b':20}, {'a':5,'b':10,'c':20}]
a= pd.DataFrame(listDict)
print(a)

output :

a b c
0 10 20 NaN
1 5 10 20.0

8. Adding a New Column to a DataFrame:


ResultSheet={'Rajat': pd.Series([90, 91,
97],index=['Maths','Science','Hindi']), 'Amrita': pd.Series([92,
81, 96],index=['Maths','Science','Hindi']),'Meenakshi':
pd.Series([89, 91, 88],index=['Maths','Science','Hindi']),'Rose':
pd.Series([81, 71,
67],index=['Maths','Science','Hindi']),'Karthika': pd.Series([94,
95, 99],index=['Maths','Science','Hindi'])}
Result = pd.DataFrame(ResultSheet)
Result['Fathima']=[89,78,76]
print(Result)

output :

Rajat Amrita Meenakshi Rose Karthika Fathima


Maths 90 92 89 81 94 89
Science 91 81 91 71 95 78
Hindi 97 96 88 67 99 76

9. Adding a new row to a DataFrame in the above dataframe


only :

Result.loc['English'] = [90, 92, 89, 80, 90, 88]


print(Result)

output :

Rajat Amrita Meenakshi Rose Karthika Fathima


Maths 90 92 89 81 94 89
Science 91 81 91 71 95 78
Hindi 97 96 88 67 99 76
English 90 92 89 80 90 88
DataFRame.loc[] method can also be used to change the data
values of a row to a particular value. For example, to change
the marks of science.
10. Change the marks of science

Result.loc['Science'] = [92, 84, 90, 72, 96, 88]


print(Result)

output :

Rajat Amrita Meenakshi Rose Karthika Fathima


Maths 90 92 89 81 94 89
Science 92 84 90 72 96 88
Hindi 97 96 88 67 99 76
English 90 92 89 80 90 88

11. Delete the row of “Hindi” from above dataframe

Result = Result.drop('Hindi', axis=0)


print(Result)

output :

Rajat Amrita Meenakshi Rose Karthika Fathima


Maths 90 92 89 81 94 89
Science 92 84 90 72 96 88
English 90 92 89 80 90 88

12. Delete the columns of Rajat, Meenakshi and Karthika


from the above dataframe.

Result = Result.drop(['Rajat','Meenakshi','Karthika'], axis=1)


print(Result)
output :

Amrita Rose Fathima


Maths 92 81 89
Science 84 72 88
English 92 80 88

Attributes of DataFrames :

Consider following dataframe :

import pandas as pd

dict = {"Student": pd.Series(["Arnav","Neha","Priya","Rahul"],


index=["Data 1","Data 2","Data 3","Data 4"]),
"Marks": pd.Series([85, 92, 78, 83], index=["Data 1","Data
2","Data 3","Data 4"]),
"Sports":
pd.Series(["Cricket","Volleyball","Hockey","Badminton"],
index=["Data 1","Data 2","Data 3","Data 4"])}

df = pd.DataFrame(dict)

print(df)

output :
Student Marks Sports
Data 1 Arnav 85 Cricket
Data 2 Neha 92 Volleyball
Data 3 Priya 78 Hockey
Data 4 Rahul 83 Badminton

1. To find out the index :

print(df.index)

output :

Index(['Data 1', 'Data 2', 'Data 3', 'Data 4'], dtype='object')

2. To print names of the columns :

print(df.columns)

output :

Index(['Student', 'Marks', 'Sports'], dtype='object')

3. To print shape of the dataframe :

print(df.shape)

Output :

(4, 3)

4. To print first 5 rows of the dataframe :

print(df.head)
output : Here we have only 4 rows so it will print 4 rows…
head() by default will print first 5 rows

<bound method NDFrame.head of Student Marks


Sports
Data 1 Arnav 85 Cricket
Data 2 Neha 92 Volleyball
Data 3 Priya 78 Hockey
Data 4 Rahul 83 Badminton>

If you want to print first n rows of dataframe :


print(df.head(n)  where n is any number like 2,20 or 200..

5. To print first 5 rows of the dataframe :

print(df.tail)

output : Here we have only 4 rows so it will print 4 rows…tail()


by default will print last 5 rows

<bound method NDFrame.head of Student Marks


Sports
Data 1 Arnav 85 Cricket
Data 2 Neha 92 Volleyball
Data 3 Priya 78 Hockey
Data 4 Rahul 83 Badminton>

If you want to print first n rows of dataframe :


print(df.tail(n)  where n is any number like 2,20 or 200..
Handling CSV Files :

1. Importing a csv file in/as our dataframe.

import pandas as pd
df=pd.read_csv("studentsmarks.csv")
#mention the entire path of your csv file if your .py file and .csv
file are in different folder.
print(df)
rno name aimarks mathsmarks
0 1 Akshita 89 91
1 2 Apoorva 91 87
2 3 Bhavik 88 76
3 4 Deepti 78 71
4 5 Farhan 84 84

You can also write :

import pandas as pd
df=pd.read_csv("studentsmarks.csv",sep =",", header=0)
print(df)

here separator can be mention, if our separator is anything else


other than ‘,’ (comma) -we need to specify this.
When we mention header=0 – means the first row of csv file is
header. (0 index means the first row)

2. Exporting our dataframe as a csv file.


A new csv file will be created in the same folder in which
your .py file is created.

df.to_csv(path_or_buf='C:/PANDAS/resultout.csv', sep=',')
# if you don’t mention the path, the new file named as
“resultout.csv” will be created in the same folder where
your .py file is saved. If you want the new file to be created at
some other place, mention the complete path.

Output:

,rno,name,aimarks,mathsmarks
0,1,Akshita,89,91
1,2,Apoorva,91,87
2,3,Bhavik,88,76
3,4,Deepti,78,71
4,5,Farhan,84,84

df.to_csv("resultout1.csv",index=False)

output :

rno,name,aimarks,mathsmarks
1,Akshita,89,91
2,Apoorva,91,87
3,Bhavik,88,76
4,Deepti,78,71
5,Farhan,84,84

df.to_csv("resultout2.csv")
output :

,rno,name,aimarks,mathsmarks
0,1,Akshita,89,91
1,2,Apoorva,91,87
2,3,Bhavik,88,76
3,4,Deepti,78,71
4,5,Farhan,84,84

1.3. Handling Missing Values


The two most common strategies for handling missing values
explained in this section are:
i) Drop the row having missing values OR
ii) Estimate the missing value

for eg your dataframe is given below (with some missing


values) :

rno,name,aimarks,mathsmarks
1,Akshita,89,91
2,Apoorva,,87
3,Bhavik,88,76
4,Deepti,78,
5,Farhan,84,84

 Checking Missing Values :

Pandas provide a function isnull() to check whether any value is


missing or not in the DataFrame. This function checks all
attributes and returns True in case that attribute has missing
values, otherwise returns False.

import pandas as pd
df=pd.read_csv("studentsmarks.csv",sep =",", header=0)
print(df)
a=df.aimarks.isnull()
print(a)

output :

rno name aimarks mathsmarks


0 1 Akshita 89.0 91.0
1 2 Apoorva NaN 87.0
2 3 Bhavik 88.0 76.0
3 4 Deepti 78.0 NaN
4 5 Farhan 84.0 84.0
0 False
1 True
2 False
3 False
4 False
Name: aimarks, dtype: bool

 Drop missing values:

Dropping will remove the entire row (object) having the


missing value(s). This strategy reduces the size of the dataset
used in data analysis, hence should be used in case of missing
values on few objects. The dropna() function can be used to
drop an entire row from the DataFrame.
import pandas as pd
df=pd.read_csv("studentsmarks.csv",sep =",", header=0)
print(df)
print(df.dropna())

output :

rno name aimarks mathsmarks


0 1 Akshita 89.0 91.0
1 2 Apoorva NaN 87.0
2 3 Bhavik 88.0 76.0
3 4 Deepti 78.0 NaN
4 5 Farhan 84.0 84.0
rno name aimarks mathsmarks
0 1 Akshita 89.0 91.0
2 3 Bhavik 88.0 76.0
4 5 Farhan 84.0 84.0

You can also store the edited dataframe in a new dataframe.

import pandas as pd
df=pd.read_csv("studentsmarks.csv",sep =",", header=0)
print(df)
a=df.dropna()
print(a)

 Estimating the missing values :

Missing values can be filled by using estimations or


approximations e.g a value just before (or after) the missing
value, average/minimum/maximum of the values of that
attribute, etc. In some cases, missing values are replaced by
zeros (or ones). The fillna(num) function can be used to replace
missing value(s) by the value specified in num. For example,
fillna(0) replaces missing value by 0. Similarly fillna(1) replaces
missing value by 1.

import pandas as pd
df=pd.read_csv("studentsmarks.csv",sep =",", header=0)
print(df)
df=df.fillna(0)
print(df)

output :

rno name aimarks mathsmarks


0 1 Akshita 89.0 91.0
1 2 Apoorva NaN 87.0
2 3 Bhavik 88.0 76.0
3 4 Deepti 78.0 NaN
4 5 Farhan 84.0 84.0
rno name aimarks mathsmarks
0 1 Akshita 89.0 91.0
1 2 Apoorva 0.0 87.0
2 3 Bhavik 88.0 76.0
3 4 Deepti 78.0 0.0
4 5 Farhan 84.0 84.0

A program to download USA_Holusing.csv from internet. Clean


the file, identify dependent and independent variables,
separate them out , train the model, test the model and
compare the predicted values with actual values.

import pandas as pd
df=pd.read_csv('USA_Housing.csv')
print(df.head()) # print first 5 rows.
print(df.shape) # ( 5000 , 7 )
print(df.describe()) # will give
count,mean,std,min,25%,50%,75%,max (here u can see if any
missing value in the count)
df.drop(['Address'],inplace=True,axis=1) # deleting a column
which do not have relavance.
print(df.shape) # (5000,6)
# as price is dependent variable, separating it from df
x=df.drop(['Price'],axis=1)
print(x.shape) #(5000,5) as price column deleted from df and
rest is saved as dataframe x
y=df['Price']
print(y.shape) # (5000, ) only one column Price is save in
dataframe y

#now using train_test_split for training the model and then


testing.
#the ratio will be 80%,20%

from sklearn.model_selection import train_test_split


x_train,x_test,y_train,y_test
=train_test_split(x,y,test_size=0.20)
print(x_train.shape) # (4000,5) i.e. 80% of 5000 rows
print(y_train.shape) # (4000,)
print(x_test.shape) # (1000,5) i.e. 20% of 5000 rows
print(y_test.shape) # (1000,)

#now applying linear regression algorithm to train the model


using 80% of the data.

from sklearn.linear_model import LinearRegression


m=LinearRegression()
m.fit(x_train,y_train) # training the model with input and
output data.
y_predict=m.predict(x_test)
#in above line y_predict is the numpy array created by
predicting the prices
#for x_test i.e. test data.
print(y_predict[0:5]) # print 5 predicted values.
print(y_test.head()) # printing first 5 rows of testing data(1000
rows)
df1=pd.DataFrame({'Actual':y_test,'Predicted':y_predict})
#in above line, a directory with keys actual and predicted for
comparision.
print(df1.head()) # print first 5 comparsions of actual and
predicted.
#We observe that there is a difference between the actual and
predicted value.
#Further, we need to calculate the error,
#evaluate the model and test the accuracy of the model.

You might also like