0% found this document useful (0 votes)

43 views33 pages

09 - Pandas Slides

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaasm asssssssssssssssss ssssssssssssssss ssssssssssss ssssssssss ssssssss ss sssss

Uploaded by

wdm20361

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views33 pages

09 - Pandas Slides

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaasm asssssssssssssssss ssssssssssssssss ssssssssssss ssssssssss ssssssss ss sssss

Uploaded by

wdm20361

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Content

What is Pandas?
Basic Data Structures: Series and Data Frame
Basic Functions
Input/Output Tools

1. What is Pandas
pandas is a Python package providing fast, ﬂexible, and expressive data
structures designed to make working with “relational” or “labeled” data
both easy and intuitive. It aims to be the fundamental high-level building
block for doing practical, real world data analysis in Python
(http://pandas.pydata.org/ (http://pandas.pydata.org/))

Pandas builds on top of Numpy to ease managing heterogeneous data

sets.

1.1 Data Handled by Pandas

Pandas is well suited for many different kinds of data:

Tabular data with heterogeneously-typed columns (comparable

to EXCEL, R or relational Databases)
Time series data
Matrix data(homogeneously typed or heterogeneous) with row
and column labels
Any other form of observational / statistical data sets.

1.2 Feature Overview

Easy handling of missing data (represented as NaN)
Size mutability: columns can be inserted and deleted from
DataFrame and higher dimensional objects
Automatic and explicit data alignment: objects can be explicitly
aligned to a set of labels, or the user can simply ignore the
labels and let Series, DataFrame, etc. automatically align the
data for you in computations
Powerful, flexible group by functionality to perform split-apply-
combine operations on data sets, for both ag- gregating and
transforming data
Make it easy to convert ragged, differently-indexed data in
other Python and NumPy data structures into DataFrame
objects
Intelligent label-based slicing, fancy indexing, and subsetting of
large data sets
Intuitive merging and joining data sets
Flexible reshaping and pivoting of data sets
Hierarchical labeling of axes (possible to have multiple labels
per tick)
Robust IO tools for loading and storing data
Time series-specific functionality

2. Pandas Data Structures

Pandas is build around two data structures

Series represent 1 dimensional datasets as subclass of

Numpy's ndarray
DataFrame represent 2 dimensional data sets as list of Series

For all data structures, labels/indices can be defined per row and
column.

Data alignment is intrinsict, i.e. the link between labels and data will not
be broken.
Series:

Homogeneous data
Size Immutable
Values of Data Mutable

Data Frames:

Heterogeneous data
Size Mutable
Data Mutable

2.1. Series
Series is a one-dimensional labeled array capable of holding any data
type (integers, strings, ﬂoating point numbers,Python objects, etc.). The
axis labels are collectively referred to as the index. The basic method to
create a Series is to call:
Series(data, index=index)

data may be a dict, a numpy.ndarray or a sclar value

A series can be created using various inputs like −

Array
Dict
Scalar value or constant

2.1.1 Creating a series from ndarray

In [4]:

#import the pandas library and aliasing as pd

import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print (s)

0 a
1 b
2 c
3 d
dtype: object

In [54]:

data = np.array(['a','b','c','d'])
s = pd.Series(data,index=[100,101,102,103])
print (s)

100 a
101 b
102 c
103 d
dtype: object

2.1.2 Creating a Series from dict

A dict can be passed as input and if no index is specified, then the

dictionary keys are taken in a sorted order to construct index. If index is
passed, the values in data corresponding to the labels in the index will
be pulled out.
In [55]:

data = {'a' : 0., 'b' : 1., 'c' : 2.}

s = pd.Series(data)
print (s)

a 0.0
b 1.0
c 2.0
dtype: float64

Dictionary keys are used to construct index.

In [56]:

data = {'a' : 0., 'b' : 1., 'c' : 2.}

s = pd.Series(data,index=['b','c','d','a'])
print (s)

b 1.0
c 2.0
d NaN
a 0.0
dtype: float64

Index order is persisted and the missing element is filled with NaN (Not
a Number).

2.1.3 Creating a Series from Scalar

If data is a scalar value, an index must be provided. The value will be
repeated to match the length of index
In [57]:

s = pd.Series(5, index=[0, 1, 2, 3])

print (s)

0 5
1 5
2 5
3 5
dtype: int64

In [9]:

#show the index

s.index

Out[9]:

Int64Index([0, 1, 2, 3], dtype='int64')

In [10]:

#show the value

s.values

Out[10]:

array([5, 5, 5, 5])

2.1.4 Series Indexing

Accessing elements in a series can be either done via the number or
the index
In [12]:

s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve a single element

s['a']

Out[12]:

In [13]:

s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve multiple elements

s[['a','c','d']]

Out[13]:

a 1
c 3
d 4
dtype: int64

In [10]:

ser.get_value('age')

Out[10]:

2.2. DataFrame: a Series of Series

The pandas DataFrame is a 2 dimensional labeled data structure with
columns of potentially different types. Similar to

a spreadsheet
relational database table
a dictionary of series
Creating DataFrame's

A pandas DataFrame can be created using various inputs like −

Lists
Dict
Series
Numpy ndarrays
Another DataFrame

2.2.1 Create a DataFrame from Lists

In [14]:

data = [1,2,3,4,5]
df = pd.DataFrame(data)
print (df)

0
0 1
1 2
2 3
3 4
4 5

In [15]:

data = [['Ramesh',10],['Himesh',12],['Kamesh',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print (df)

Name Age
0 Ramesh 10
1 Himesh 12
2 Kamesh 13
In [16]:

data = [['Ramesh',10],['Himesh',12],['Kamesh',13]]
df = pd.DataFrame(data,columns=['Name','Age'], dtype=float)
print (df)

Name Age
0 Ramesh 10.0
1 Himesh 12.0
2 Kamesh 13.0

2.2.2 Create a DataFrame from Dict of ndarrays /

Lists

All the ndarrays must be of same length. If index is passed, then the
length of the index should equal to the length of the arrays.

If no index is passed, then by default, index will be range(n), where n is

the array length.

In [19]:

data = {'Name':['Ramesh', 'Rajesh', 'Nitesh', 'Nilesh'],'Age':[28,34,29,

42]}
df = pd.DataFrame(data)
print (df)

Age Name
0 28 Ramesh
1 34 Rajesh
2 29 Nitesh
3 42 Nilesh
In [20]:

data = {'Name':['Ramesh', 'Rajesh', 'Nitesh', 'Nilesh'],'Age':[28,34,29,

42]}
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
print (df)

Age Name
rank1 28 Ramesh
rank2 34 Rajesh
rank3 29 Nitesh
rank4 42 Nilesh

2.2.3 Create a DataFrame from List of Dicts

List of Dictionaries can be passed as input data to create a DataFrame.
The dictionary keys are by default taken as column names.

In [21]:

data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]

df = pd.DataFrame(data)
print (df)

a b c
0 1 2 NaN
1 5 10 20.0

In [23]:

data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]

df = pd.DataFrame(data, index=['first', 'second']) # passing row indices
print (df)

a b c
first 1 2 NaN
second 5 10 20.0
In [25]:

data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]

#With two column indices, values same as dictionary keys

df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b'])

#With two column indices with one index with other name
df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b1'])
print (df1)
print (df2)

a b
first 1 2
second 5 10
a b1
first 1 NaN
second 5 NaN

2.2.4 Create a DataFrame from Dict of Series

Dictionary of Series can be passed to form a DataFrame. The resultant
index is the union of all the series indexes passed.

In [58]:

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),

'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print (df)

one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4

2.2.5 Column selection, addition, deletion

In [59]:

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),

'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print (df ['one'])

a 1.0
b 2.0
c 3.0
d NaN
Name: one, dtype: float64
In [60]:

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),

'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)

# Adding a new column to an existing DataFrame object with column label

by passing new series

print ("Adding a new column by passing as Series:")

df['three']=pd.Series([10,20,30],index=['a','b','c'])
print (df)

print()
print ("Adding a new column using the existing columns in DataFrame:")
df['four']=df['one']+df['three']

print (df)

Adding a new column by passing as Series:

one two three
a 1.0 1 10.0
b 2.0 2 20.0
c 3.0 3 30.0
d NaN 4 NaN

Adding a new column using the existing columns in DataFr

ame:

one two three four

a 1.0 1 10.0 11.0
b 2.0 2 20.0 22.0
c 3.0 3 30.0 33.0
d NaN 4 NaN NaN
In [31]:

# Using the previous DataFrame, we will delete a column

# using del function

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),

'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']),
'three' : pd.Series([10,20,30], index=['a','b','c'])}

df = pd.DataFrame(d)
print ("Our dataframe is:")
print (df)

print()
# using del function
print ("Deleting the first column using DEL function:")
del df['one']
print (df)

# using pop function

print ("Deleting another column using POP function:")
df.pop('two')
print (df)

Our dataframe is:

one three two
a 1.0 10.0 1
b 2.0 20.0 2

c 3.0 30.0 3
d NaN NaN 4

Deleting the first column using DEL function:

three two
a 10.0 1
b 20.0 2
c 30.0 3
d NaN 4
Deleting another column using POP function:
three
a 10.0
b 20.0
c 30.0
d NaN

2.2.5 Row Selection, Addition, and Deletion

Selection by Label

Rows can be selected by passing row label to a loc function.

In [61]:

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),

'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print (df.loc['b'])

one 2.0
two 2.0
Name: b, dtype: float64

Selection by integer location

Rows can be selected by passing integer location to an iloc function.

In [63]:

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),

'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print (df.iloc[2])

one 3.0
two 3.0
Name: c, dtype: float64

Slice Rows

Multiple rows can be selected using ‘ : ’ operator.

In [34]:

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),

'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print (df[2:4])

one two
c 3.0 3
d NaN 4

Addition of Rows

Add new rows to a DataFrame using the append function. This function
will append the rows at the end.

In [62]:

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])

df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2)
print (df)

a b
0 1 2
1 3 4
0 5 6
1 7 8

Deletion of Rows

Use index label to delete or drop rows from a DataFrame. If label is

duplicated, then multiple rows will be dropped.
In [36]:

import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])

df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2)

# Drop rows with label 0

df = df.drop(0)

print (df)

a b
1 3 4
1 7 8

3 Basic Functionality
In [64]:

#Create a Dictionary of series

d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jac
k']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our data series is:")
print (df)

Our data series is:

Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80

T (Transpose)

Returns the transpose of the DataFrame. The rows and columns will
interchange.
In [65]:

# Create a Dictionary of series

d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jac
k']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

# Create a DataFrame
df = pd.DataFrame(d)
print ("The transpose of the data series is:")
print (df.T)

The transpose of the data series is:

0 1 2 3 4 5 6
Age 25 26 25 23 30 29 23
Name Tom James Ricky Vin Steve Smith Jack
Rating 4.23 3.24 3.98 2.56 3.2 4.6 3.8

axes

Returns the list of row axis labels and column axis labels.

In [66]:

#Create a Dictionary of series

d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jac
k']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Row axis labels and column axis labels are:")
print (df.axes)

Row axis labels and column axis labels are:

[RangeIndex(start=0, stop=7, step=1), Index(['Age', 'Nam
e', 'Rating'], dtype='object')]

dtypes
Returns the data type of each column.

In [43]:

#Create a Dictionary of series

d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jac
k']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("The data types of each column are:")
print (df.dtypes)

The data types of each column are:

Age int64
Name object
Rating float64
dtype: object

ndim

Returns the number of dimensions of the object. By definition,

DataFrame is a 2D object.
In [44]:

#Create a Dictionary of series

d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jac
k']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print (df)
print ("The dimension of the object is:")
print (df.ndim)

Our object is:

Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80
The dimension of the object is:
2

shape

Returns a tuple representing the dimensionality of the DataFrame. Tuple

(a,b), where a represents the number of rows and b represents the
number of columns.
In [45]:

#Create a Dictionary of series

d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jac
k']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print (df)
print ("The shape of the object is:")
print (df.shape)

Our object is:

Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80
The shape of the object is:
(7, 3)

size

Returns the number of elements in the DataFrame.

In [46]:

#Create a Dictionary of series

d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jac
k']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print (df)
print ("The total number of elements in our object is:")
print (df.size)

Our object is:

Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80
The total number of elements in our object is:
21

values

Returns the actual data in the DataFrame as an NDarray.

In [49]:

#Create a Dictionary of series

d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jac
k']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print (df)
print()
print ("The actual data in our data frame is:")
print (df.values)

Our object is:

Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80

The actual data in our data frame is:

[[25 'Tom' 4.23]
[26 'James' 3.24]
[25 'Ricky' 3.98]
[23 'Vin' 2.56]
[30 'Steve' 3.2]
[29 'Smith' 4.6]
[23 'Jack' 3.8]]

Head & Tail

To view a small sample of a DataFrame object, use the head() and tail()
methods. head() returns the first n rows (observe the index values). The
default number of elements to display is five, but you may pass a
custom number.
In [52]:

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our data frame is:")
print (df)
print()
print ("The first two rows of the data frame is:")
print (df.head(2))

Our data frame is:

Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80

The first two rows of the data frame is:

Age Name Rating
0 25 Tom 4.23
1 26 James 3.24

tail() returns the last n rows (observe the index values). The default
number of elements to display is five, but you may pass a custom
number.
In [53]:

#Create a Dictionary of series

d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jac
k']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our data frame is:")
print (df)
print()
print ("The last two rows of the data frame is:")
print (df.tail(2))

Our data frame is:

Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80

The last two rows of the data frame is:

Age Name Rating
5 29 Smith 4.6
6 23 Jack 3.8

4. Descriptive Statistics
Descriptive Statistics sumarizes the underlying distribution of data
values through statistical values like mean, variance etc.

Basic Functions
Function Description
count Number of non-null observations
sum Sum of values
mean Mean of values
mad Mean absolute deviation
median Arithmetic median of values
min Minimum
max Maximum
mode Mode
abs Absolute Value
prod Product of values
std Unbiased standard deviation
var Unbiased variance
skew Unbiased skewness (3rd moment)
kurt Unbiased kurtosis (4th moment)
quantile Sample quantile (value at %)
cumsum Cumulative sum
cumprod Cumulative product
cummax Cumulative maximum
cummin Cumulative minimum

4.1 sum()
Returns the sum of the values for the requested axis. By default, axis is
index (axis=0).
In [10]:

#Create a Dictionary of series

d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jac
k',
'Lee','David','Gasper','Betina','Andres']),
'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,
4.10,3.65])}

#Create a DataFrame
df = pd.DataFrame(d)
print(df)

Age Name Rating

0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80
7 34 Lee 3.78
8 40 David 2.98
9 30 Gasper 4.80
10 51 Betina 4.10
11 46 Andres 3.65

In [11]:

print (df.sum()) # axis = 0

Age
382
Name TomJamesRickyVinSteveSmithJackLeeDavidGasperB
e...
Rating 4
4.92
dtype: object

Each individual column is added individually (Strings are appended).

In [9]:

print (df.sum(1)) # axis = 1

0 29.23
1 29.24
2 28.98
3 25.56
4 33.20
5 33.60
6 26.80
7 37.78
8 42.98
9 34.80
10 55.10
11 49.65
dtype: float64

4.2 mean()
Returns the average value

In [13]:

print (df.mean())

Age 31.833333
Rating 3.743333
dtype: float64

4.3 std()
Returns the Bressel standard deviation of the numerical columns.
In [14]:

print (df.std())

Age 9.232682
Rating 0.661628
dtype: float64

4.4 Summarizing Data

The describe() function computes a summary of statistics pertaining to
the DataFrame columns.

In [15]:

print (df.describe())

Age Rating
count 12.000000 12.000000
mean 31.833333 3.743333
std 9.232682 0.661628
min 23.000000 2.560000
25% 25.000000 3.230000
50% 29.500000 3.790000
75% 35.500000 4.132500
max 51.000000 4.800000

This function gives the mean, std and IQR values. And, function
excludes the character columns and given summary about numeric
columns. 'include' is the argument which is used to pass necessary
information regarding what columns need to be considered for
summarizing. Takes the list of values; by default, 'number'.

object − Summarizes String columns

number − Summarizes Numeric columns
all − Summarizes all columns together (Should not pass it as a
list value)
In [16]:

print (df.describe(include=['object']))

Name
count 12
unique 12
top Steve
freq 1

In [17]:

print (df. describe(include='all'))

Age Name Rating

count 12.000000 12 12.000000
unique NaN 12 NaN
top NaN Steve NaN
freq NaN 1 NaN
mean 31.833333 NaN 3.743333
std 9.232682 NaN 0.661628
min 23.000000 NaN 2.560000
25% 25.000000 NaN 3.230000
50% 29.500000 NaN 3.790000
75% 35.500000 NaN 4.132500
max 51.000000 NaN 4.800000

5. Input/Output Tools
The Pandas I/O api is a set of top level reader functions accessed like
pd.read_csv() that generally return a pandas object.

read_csv
read_excel
read_hdf
read_sql
read_json
read_msgpack (experimental)
read_html
read_gbq (experimental)
read_stata
read_clipboard
read_pickle

The corresponding writer functions are object methods that are

accessed like df.to_csv() • to_csv

to_excel
to_hdf
to_sql
to_json
to_msgpack (experimental) • to_html
to_gbq (experimental) • to_stata
to_clipboard
to_pickle

5.1 Loading the Weather Data from the CSV

In this example we load the weather datafrom the data directory (
"data\weather_data.csv")

In [22]:

#! executes a shell command

!ls data

weather_data.csv
In [24]:

df = pd.read_csv("data/weather_data.csv")
print (df)

Day outlook temperature humidity windy play

0 1 sunny 85 85 False no
1 2 sunny 80 90 True no
2 3 overcast 83 86 False yes
3 4 rainy 70 96 False yes
4 5 rainy 68 80 False yes
5 6 rainy 65 70 True no
6 7 overcast 64 65 True yes

In [25]:

#use help to see the parameters

pd.read_csv?

In [ ]:

Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
38 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
138 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
135 pages
XII IP CH 1 Python Pandas - I Series
No ratings yet
XII IP CH 1 Python Pandas - I Series
45 pages
Data Analytics Pandas
No ratings yet
Data Analytics Pandas
33 pages
1 IP 12 NOTES PythonPandas 2022 PDF
100% (3)
1 IP 12 NOTES PythonPandas 2022 PDF
66 pages
Lecture 9 Pandas
No ratings yet
Lecture 9 Pandas
176 pages
Data Handing Using Pandas-I
100% (2)
Data Handing Using Pandas-I
46 pages
Pandas Dataframe
No ratings yet
Pandas Dataframe
48 pages
Unit - 1 - Python Pandas
No ratings yet
Unit - 1 - Python Pandas
176 pages
Unit 2
No ratings yet
Unit 2
81 pages
Dataframe Ip
No ratings yet
Dataframe Ip
75 pages
Pandas
No ratings yet
Pandas
163 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Pandas
No ratings yet
Pandas
57 pages
Pandas & Numpy
No ratings yet
Pandas & Numpy
32 pages
Python Pandas ch-2
No ratings yet
Python Pandas ch-2
56 pages
Panda
No ratings yet
Panda
46 pages
Pandas
No ratings yet
Pandas
63 pages
Data Handlinng Using Pandas
No ratings yet
Data Handlinng Using Pandas
46 pages
Pandas
No ratings yet
Pandas
82 pages
UNIT 3 (Chapter 2) Pandas
No ratings yet
UNIT 3 (Chapter 2) Pandas
43 pages
05getting Started With Pandas
No ratings yet
05getting Started With Pandas
44 pages
Pandas-Creating Series & Dataframes (DR V Gowri, Srmist)
No ratings yet
Pandas-Creating Series & Dataframes (DR V Gowri, Srmist)
47 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
75 pages
Data Handlinng Using Pandas-I
No ratings yet
Data Handlinng Using Pandas-I
46 pages
Pandas Class 12 Ncertttt
No ratings yet
Pandas Class 12 Ncertttt
48 pages
1 Data Handlinng Using Pandas-I
No ratings yet
1 Data Handlinng Using Pandas-I
46 pages
Python Data Processing
No ratings yet
Python Data Processing
36 pages
Class XII Data Handlinng Using PandasI
No ratings yet
Class XII Data Handlinng Using PandasI
46 pages
P Unit-4 NP
No ratings yet
P Unit-4 NP
30 pages
Dataframes UNIT 1 PART 2
No ratings yet
Dataframes UNIT 1 PART 2
33 pages
Pandas Shan Ver2
No ratings yet
Pandas Shan Ver2
25 pages
CSL 410 L15
No ratings yet
CSL 410 L15
29 pages
Final Formatted After Iloc Loc
No ratings yet
Final Formatted After Iloc Loc
34 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
SBLC 1
No ratings yet
SBLC 1
23 pages
MSIT-102 Programming Concepts and C
No ratings yet
MSIT-102 Programming Concepts and C
218 pages
IP Slybuss
No ratings yet
IP Slybuss
21 pages
MLL Ip Xii
No ratings yet
MLL Ip Xii
22 pages
P03 Introduction To Pandas Ans
No ratings yet
P03 Introduction To Pandas Ans
45 pages
Ip Notes
No ratings yet
Ip Notes
20 pages
Data Science - Unit-3-Part-2
No ratings yet
Data Science - Unit-3-Part-2
32 pages
Pandas DataFrame1
No ratings yet
Pandas DataFrame1
22 pages
Python UnitIV
No ratings yet
Python UnitIV
20 pages
Panda
No ratings yet
Panda
33 pages
Pandas Notes
No ratings yet
Pandas Notes
19 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
14 Pandas
No ratings yet
14 Pandas
25 pages
Unit 4
No ratings yet
Unit 4
36 pages
Class 12 Practical File
No ratings yet
Class 12 Practical File
29 pages
Data Handling Using Pandas-I-ORG
No ratings yet
Data Handling Using Pandas-I-ORG
44 pages
ML Lab8
No ratings yet
ML Lab8
28 pages
Exp 25 - 26
No ratings yet
Exp 25 - 26
17 pages
Ip Study
No ratings yet
Ip Study
18 pages
Python Pandas Dataframe: Parameter & Description
No ratings yet
Python Pandas Dataframe: Parameter & Description
12 pages
Block 1-Data Handling Using Pandas DataFrame
No ratings yet
Block 1-Data Handling Using Pandas DataFrame
17 pages
CH 02 - Data Handling Using Pandas Leip102 EDITED Smaller 01 Codes Only
No ratings yet
CH 02 - Data Handling Using Pandas Leip102 EDITED Smaller 01 Codes Only
15 pages
Introduction To Pandas & Data Structures
No ratings yet
Introduction To Pandas & Data Structures
11 pages
Proposal - Website +software AMC For Cryptoconnect
No ratings yet
Proposal - Website +software AMC For Cryptoconnect
5 pages
PDTI Berkeley 2011 2015
No ratings yet
PDTI Berkeley 2011 2015
84 pages
PD-IT-PR-2901 - Manual API Mondial Relay
No ratings yet
PD-IT-PR-2901 - Manual API Mondial Relay
51 pages
Medidor de Caudal
No ratings yet
Medidor de Caudal
4 pages
Web Scraping Using Python - Notes
No ratings yet
Web Scraping Using Python - Notes
6 pages
AmiBcp ToolsUserGuide v5.15.0065
No ratings yet
AmiBcp ToolsUserGuide v5.15.0065
3 pages
Programming Manual PDM360 NG 12" With Touchscreen: Firmware: 3.2.x CODESYS: 3.5.9.4
No ratings yet
Programming Manual PDM360 NG 12" With Touchscreen: Firmware: 3.2.x CODESYS: 3.5.9.4
261 pages
TLM2024 Cehtlm
No ratings yet
TLM2024 Cehtlm
35 pages
21 - ODI Console
No ratings yet
21 - ODI Console
17 pages
Living in The IT Era Module 1 - Introduction To Information and Communication
No ratings yet
Living in The IT Era Module 1 - Introduction To Information and Communication
8 pages
Project Aavin
No ratings yet
Project Aavin
76 pages
10g Fixes
No ratings yet
10g Fixes
5 pages
Pandas
No ratings yet
Pandas
24 pages
System Reference Manual Beagleboard - Beaglebone-Ai Wiki GitHub PDF
No ratings yet
System Reference Manual Beagleboard - Beaglebone-Ai Wiki GitHub PDF
167 pages
Letter To Field Office SD-WAN
No ratings yet
Letter To Field Office SD-WAN
126 pages
22 10 Con FN
No ratings yet
22 10 Con FN
3 pages
Red Hat Jboss Enterprise Application Platform-7.4-Introduction To Jboss Eap-En-Us
No ratings yet
Red Hat Jboss Enterprise Application Platform-7.4-Introduction To Jboss Eap-En-Us
14 pages
String in Python-1
No ratings yet
String in Python-1
18 pages
Mini Project Synopsis
No ratings yet
Mini Project Synopsis
29 pages
Instakart Axis Deposit Slip (Client Copy) Date of Deposition: Deposit Slip No: 8603388
No ratings yet
Instakart Axis Deposit Slip (Client Copy) Date of Deposition: Deposit Slip No: 8603388
2 pages
Game Api
No ratings yet
Game Api
16 pages
WBP Microproject
No ratings yet
WBP Microproject
17 pages
Farmers List Karoli
No ratings yet
Farmers List Karoli
6 pages
AI and DS
No ratings yet
AI and DS
6 pages
Niranjan P B: Experience Certificates
No ratings yet
Niranjan P B: Experience Certificates
1 page
Online Cloud Engineering Bootcamp - QuickStart
No ratings yet
Online Cloud Engineering Bootcamp - QuickStart
8 pages
Case Study
No ratings yet
Case Study
2 pages
Sooraj Yadav
No ratings yet
Sooraj Yadav
2 pages
John C. Scott Jr. Aerospace Design Engineer
No ratings yet
John C. Scott Jr. Aerospace Design Engineer
6 pages
Mid Term Exam Questioner
No ratings yet
Mid Term Exam Questioner
4 pages
Ctwist: Circumferential Tread Wear Imaging System
No ratings yet
Ctwist: Circumferential Tread Wear Imaging System
2 pages
Read First
No ratings yet
Read First
1 page
Open Cola
No ratings yet
Open Cola
3 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet