0% found this document useful (0 votes)
3 views

Ainotes dataframe

The document provides an overview of essential Python libraries for data manipulation and machine learning, particularly focusing on the Pandas library and its DataFrame data structure. It explains how to create, access, modify, and delete data within a DataFrame, as well as how to read from and write to CSV files using Pandas. Key functionalities include selecting subsets of data, adding and deleting rows and columns, and handling CSV file operations.

Uploaded by

Aditya Pandey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Ainotes dataframe

The document provides an overview of essential Python libraries for data manipulation and machine learning, particularly focusing on the Pandas library and its DataFrame data structure. It explains how to create, access, modify, and delete data within a DataFrame, as well as how to read from and write to CSV files using Pandas. Key functionalities include selecting subsets of data, adding and deleting rows and columns, and handling CSV file operations.

Uploaded by

Aditya Pandey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Libraries

import math: For all mathematical functions like sin(), cos(),pow(),sqrt()


import numpy: It is used for numerical computing. It provides array of n
dimensions.
import pandas: It provide data structure like series(1dimensional) and
datafames(2 dimensional).It simplifies data manipulation task.
import Scikit-learn: It is most useful library for machine learning.

DataFrame Data Structure (UNDER Pandas)


• A DataFrame is a kind of panda’s data structure which stores data in 2D form.
• It has 2 axes-a row index(axis=0) and a column index (axis=1).
• Indexes can be prepared by numbers, strings or letters.
• DataFrame have heterogeneous element
• Size and values of DataFrame is mutable and can be changed anytime.
i.e. The number of rows and columns can be increased or decreased

Syntax
import pandas as pd
<dataFrameObject>=pd.DataFrame(data=None, index=None, columns=None,
dtype=None, copy=None)
DataFrame object can be created by passing a numpy array, dictionary and list.
Dataframe df:
# Creating above dataframe using dictionary value as list
import pandas as pd
d={'name':['avi','ravi','savi','kavi'],'age':[25,35,45,55],'salary':[3000,4500,2500,6000]}
df=pd.DataFrame(d,index=['e1','e2','e3','e4'])
print(df)

# Creating above dataframe using nested list

import pandas as pd

L=[['avi',25,3000],['ravi',35,4500] ,['savi',45,2500] ,['kavi',55,6000]]


df=pd.DataFrame(L,index=['e1','e2',’e3’,’e4’],columns=['name','age','salary'])
print(df)

Selecting/Accessing a subset from a DataFrame-


#DISPLAYING ONE COLUMN
df.loc[:,‘name’] or df['name'] or df.iloc[:,0]

#DISPLAYING MULTIPLE COLUMNS


df.loc[:,[‘name’,’salaty’]] or df[['name','salary']] or df.iloc[:,[0,2]]

#DISPLAYING ONE ROW:


df.loc['e2',:] OR df.loc['e2'] OR df.iloc[1,:]

#DISPLAYING MULTIPLE ROWS:


df.loc[['e2',’e4’],:] or df.loc[['e2',’e4’]] or df.iloc[[1,3],:]

#DISPLAYING SELECTED ROW AND COLUMN:

Df.loc[‘e2’,’name’] or df.iloc[1:0]

#adding one new record or row


df.loc['e5']=['sumit',65,2590]
#adding new column
df['Tourism rate']=[1,2,1,2,1,1]
Deletion of row or column in DataFrame
Syntax:
<DF>.drop([index or sequence of indexes],axis,inplace)

axis=0 will delete row and axis=1 will delete the column.

inplace=True: Makes changes in original Data Frame

Dataframe df:

#deleting rows
df.drop(['e1','e3'],inplace=True)

#deleting columns
df.drop(['age','name'],axis=1,inplace=True)

#Dataframe after deleting e1,e3 row and name, age


column
Attributes of dataframe:
Syntax:
Dataframe_name.attribute
Consider given dataframe df given below:

>>>df.index
Index(['e1', 'e2', 'e3', 'e4'], dtype='object')
>>>df.columns
Index(['name', 'age', 'salary'], dtype='object')
>>>df.dtypes
name object
age int64
salary int64
dtype: object
>>>df.values
[['avi' 25 3000]
['ravi' 35 4500]
['savi' 45 2500]
['kavi' 55 6000]]

>>>df.shape
(4, 3)

>>>df.head(2)
name age salary
e1 avi 25 3000
e2 ravi 35 4500

>>>df.tail(2)
name age salary
e3 savi 45 2500
e4 kavi 55 6000

CSV File

A CSV file (Comma Separated Values file) is a type of plain text file that
uses specific structuring to arrange tabular data. CSV files are normally
created by programs that handle large amounts of data. They are a
convenient way to export data from spread sheets and databases as well
as import or use it in other programs.
Writing in csv file or Storing DataFrame’s Data to CSV file: to_csv()
function is used.
Syntax:-
df.to_csv(“file path”)
Example:
import pandas as pd
d={'PRODUCT ID':[1001,1002,1003,1004,1005],
'PRODUCT NAME':['PEN','PENCIL','BOOK','ERASER','MARKER']}
df=pd.DataFrame(d)
df.to_csv("product.csv")
OUTPUT
OUTPUT

Reading csv file or loading data from csv to dataframe:


read_csv() function is used
syntax
df=pd.read_csv(“file path",sep=”sep character”,names=[column name
sequence],index_col=column name)
For example,
Import pandas as pd
df=pd.read_csv("student.csv")
print(df)
OUTPUT
Output :

Reading CSV file and specifying own column names


If the CSV file does not have top row containing column headings, it will
take the top row as column heading. To display column heading.
syntax
df=pd.read_csv(“path of csv file“, names=[“colname-1”,”col2name,….”])
For example,
import pandas as pd
df=pd.read_csv("student.csv", names=["Roll No", "Name",
"Class"]) print(df)

You might also like