0% found this document useful (0 votes)
11 views5 pages

Ainotes

The document provides an overview of essential Python libraries for data manipulation and machine learning, specifically focusing on pandas and its DataFrame structure. It details how to create, manipulate, and access data within DataFrames, including adding and deleting rows and columns, as well as reading from and writing to CSV files. Syntax examples for creating DataFrames from various data structures and performing operations on them are also included.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views5 pages

Ainotes

The document provides an overview of essential Python libraries for data manipulation and machine learning, specifically focusing on pandas and its DataFrame structure. It details how to create, manipulate, and access data within DataFrames, including adding and deleting rows and columns, as well as reading from and writing to CSV files. Syntax examples for creating DataFrames from various data structures and performing operations on them are also included.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Libraries:

import math: For all mathametical functions like sin(),cos(),pow(),sqrt()


import numpy: It is used for numerical computing.It provide array of n
dimensions.
import pandas:It provide data structure like series(1 dimensional) and
datafames(2 dimenional).
It simplifies data manipulation task.
import Scikit-learn: It is most useful library for machine learning.

DataFrame Data Structure


 A DataFrame is a kind of pandas data structure which stores data in 2D form.
 It has 2 axes-a row index(axis=0) and a column index (axis=1).
 Indexes can be prepared by numbers, strings or letters.
 DataFrame have heterogeneous element
 Size and values of DataFrame is mutable and can be changed anytime.
i.e. The number of rows and columns can be increased or decreased

Syntax
import pandas as pd
<dataFrameObject>=pd.DataFrame(data=None, index=None, columns=None,
dtype=None, copy=None)
DataFrame object can be created by passing a numpy array, dictionary and list.

# Creating above dataframe using dictionary value as list


import pandas as pd
d={'name':['avi','ravi','savi','kavi'],'age':[25,35,45,55],'salary':[3000,4500,2500,6000]}
df=pd.DataFrame(d,index=['e1','e2','e3','e4'])
print(df)

# Creating above dataframe using numpy array:


import numpy as n
import pandas as pd
a1=n.array([10,20,30])
a2=n.array([50,60,70])
a3=n.array([60,35,23])
df=pd.DataFrame(d,index=['r1','r2','r3'],colums=[‘c1’,c2’,’c3’])
print(df)

# Creating above dataframe using nested list


import pandas as pd
L=[['avi',25,3000],['ravi',35,4500] ,['savi',45,2500] ,['kavi',55,6000]]
df=pd.DataFrame(L,index=['e1','e2',’e3’,’e4’],columns=['name','age','salary'])
print(df)

Selecting/Accessing a subset from a DataFrame-

Df, name given to above dataframe:


#DISPLAYING ONE COLUMN
df['Country']
#DISPLAYING MULTIPLE COLUMNS
df[['Country','Percent']]
#DISPLAYING ONE ROW:
df.loc['ES',:] OR df.iloc[1,:]

#DISPLAYING MULTIPLE ROWS:


df.loc[['ES',’FR’],:] or df.iloc[[1,3],:]

#DISPLAYING SELECTED ROW AND COLUMN:


Df.loc[‘ES’,’Country’] or df.iloc[1:0]
#adding one new record or row
df.loc['GF']=['Dubai',65,.76]
#adding new column
df['Tourism rate']=[1,2,1,2,1,1]
Deletion of row or column in DataFrame
Syntax:
<DF>.drop([index or sequence of indexes],axis,inplace)
axis=0 will delete row and axis=1 will delete the column.

inplace=True: Makes changes in original Data Frame


Dataframe df:

#deleting rows
df.drop(['e1','e3'],inplace=True)
#deleting columns
df.drop(['age','name'],axis=1,inplace=True)

#Dataframe after deleting e1,e3 row and name, age column


Attributes of dataframe:
Syntax:
Dataframe_name.attribute
Consider given dataframe df given below:

>>>df.index
Index(['e1', 'e2', 'e3', 'e4'], dtype='object')
>>>df.columns
Index(['name', 'age', 'salary'], dtype='object')
>>>df.dtypes
name object
age int64
salary int64
dtype: object
>>>df.values
[['avi' 25 3000]
['ravi' 35 4500]
['savi' 45 2500]
['kavi' 55 6000]]
>>>df.shape
(4, 3)

>>>df.head(2)
name age salary
e1 avi 25 3000
e2 ravi 35 4500

>>>df.tail(2)
name age salary
e3 savi 45 2500
e4 kavi 55 6000

CSV File

A CSV file (Comma Separated Values file) is a type of plain text file that
uses specific structuring to arrange tabular data. CSV files are normally
created by programs that handle large amounts of data. They are a
convenient way to export data from spread sheets and databases as well
as import or use it in other programs.
Writing in csv file or Storing DataFrame’s Data to CSV file:

to_csv() function is used.


Syntax:-
df.to_csv(“file path”)
Example:
import pandas as pd
d={'PRODUCT ID':[1001,1002,1003,1004,1005],
'PRODUCT NAME':['PEN','PENCIL','BOOK','ERASER','MARKER']}
df=pd.DataFrame(d)
df.to_csv("product.csv")

OUTPUT
OUTPUT

Reading csv file Or loading data from csv to dataframe


read_csv() function is used
syntax
df=pd.read_csv(“file path",sep=”sep character”,names=[column name
sequence],index_col=column name)
For example,
import pandas as pd
df=pd.read_csv("student.csv") print(df)
Reading CSV file and specifying own column names
OUTPUT
Output :

If the CSV file does not have top row containing column headings, it will
take the top row as column heading. To display column heading.
syntax
df=pd.read_csv(“path of csv file“, names=[“colname-1”,”col2name,….”])
For example,
import pandas as pd
df=pd.read_csv("student.csv", names=["Roll No", "Name", "Class"])
print(df)

You might also like