0% found this document useful (0 votes)
28 views

09_Pandas slides

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaasm asssssssssssssssss ssssssssssssssss ssssssssssss ssssssssss ssssssss ss sssss

Uploaded by

wdm20361
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

09_Pandas slides

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaasm asssssssssssssssss ssssssssssssssss ssssssssssss ssssssssss ssssssss ss sssss

Uploaded by

wdm20361
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Content

What is Pandas?
Basic Data Structures: Series and Data Frame
Basic Functions
Input/Output Tools

1. What is Pandas
pandas is a Python package providing fast, flexible, and expressive data
structures designed to make working with “relational” or “labeled” data
both easy and intuitive. It aims to be the fundamental high-level building
block for doing practical, real world data analysis in Python
(http://pandas.pydata.org/ (http://pandas.pydata.org/))

Pandas builds on top of Numpy to ease managing heterogeneous data


sets.

1.1 Data Handled by Pandas


Pandas is well suited for many different kinds of data:

Tabular data with heterogeneously-typed columns (comparable


to EXCEL, R or relational Databases)
Time series data
Matrix data(homogeneously typed or heterogeneous) with row
and column labels
Any other form of observational / statistical data sets.

1.2 Feature Overview


Easy handling of missing data (represented as NaN)
Size mutability: columns can be inserted and deleted from
DataFrame and higher dimensional objects
Automatic and explicit data alignment: objects can be explicitly
aligned to a set of labels, or the user can simply ignore the
labels and let Series, DataFrame, etc. automatically align the
data for you in computations
Powerful, flexible group by functionality to perform split-apply-
combine operations on data sets, for both ag- gregating and
transforming data
Make it easy to convert ragged, differently-indexed data in
other Python and NumPy data structures into DataFrame
objects
Intelligent label-based slicing, fancy indexing, and subsetting of
large data sets
Intuitive merging and joining data sets
Flexible reshaping and pivoting of data sets
Hierarchical labeling of axes (possible to have multiple labels
per tick)
Robust IO tools for loading and storing data
Time series-specific functionality

2. Pandas Data Structures


Pandas is build around two data structures

Series represent 1 dimensional datasets as subclass of


Numpy's ndarray
DataFrame represent 2 dimensional data sets as list of Series

For all data structures, labels/indices can be defined per row and
column.

Data alignment is intrinsict, i.e. the link between labels and data will not
be broken.
Series:

Homogeneous data
Size Immutable
Values of Data Mutable

Data Frames:

Heterogeneous data
Size Mutable
Data Mutable

2.1. Series
Series is a one-dimensional labeled array capable of holding any data
type (integers, strings, floating point numbers,Python objects, etc.). The
axis labels are collectively referred to as the index. The basic method to
create a Series is to call:
Series(data, index=index)

data may be a dict, a numpy.ndarray or a sclar value

A series can be created using various inputs like −

Array
Dict
Scalar value or constant

2.1.1 Creating a series from ndarray


In [4]:

#import the pandas library and aliasing as pd


import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print (s)

0 a
1 b
2 c
3 d
dtype: object

In [54]:

data = np.array(['a','b','c','d'])
s = pd.Series(data,index=[100,101,102,103])
print (s)

100 a
101 b
102 c
103 d
dtype: object

2.1.2 Creating a Series from dict

A dict can be passed as input and if no index is specified, then the


dictionary keys are taken in a sorted order to construct index. If index is
passed, the values in data corresponding to the labels in the index will
be pulled out.
In [55]:

data = {'a' : 0., 'b' : 1., 'c' : 2.}


s = pd.Series(data)
print (s)

a 0.0
b 1.0
c 2.0
dtype: float64

Dictionary keys are used to construct index.

In [56]:

data = {'a' : 0., 'b' : 1., 'c' : 2.}


s = pd.Series(data,index=['b','c','d','a'])
print (s)

b 1.0
c 2.0
d NaN
a 0.0
dtype: float64

Index order is persisted and the missing element is filled with NaN (Not
a Number).

2.1.3 Creating a Series from Scalar


If data is a scalar value, an index must be provided. The value will be
repeated to match the length of index
In [57]:

s = pd.Series(5, index=[0, 1, 2, 3])


print (s)

0 5
1 5
2 5
3 5
dtype: int64

In [9]:

#show the index


s.index

Out[9]:

Int64Index([0, 1, 2, 3], dtype='int64')

In [10]:

#show the value


s.values

Out[10]:

array([5, 5, 5, 5])

2.1.4 Series Indexing


Accessing elements in a series can be either done via the number or
the index
In [12]:

s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve a single element


s['a']

Out[12]:

In [13]:

s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve multiple elements


s[['a','c','d']]

Out[13]:

a 1
c 3
d 4
dtype: int64

In [10]:

ser.get_value('age')

Out[10]:

2.2. DataFrame: a Series of Series


The pandas DataFrame is a 2 dimensional labeled data structure with
columns of potentially different types. Similar to

a spreadsheet
relational database table
a dictionary of series
Creating DataFrame's

A pandas DataFrame can be created using various inputs like −

Lists
Dict
Series
Numpy ndarrays
Another DataFrame

2.2.1 Create a DataFrame from Lists


In [14]:

data = [1,2,3,4,5]
df = pd.DataFrame(data)
print (df)

0
0 1
1 2
2 3
3 4
4 5

In [15]:

data = [['Ramesh',10],['Himesh',12],['Kamesh',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print (df)

Name Age
0 Ramesh 10
1 Himesh 12
2 Kamesh 13
In [16]:

data = [['Ramesh',10],['Himesh',12],['Kamesh',13]]
df = pd.DataFrame(data,columns=['Name','Age'], dtype=float)
print (df)

Name Age
0 Ramesh 10.0
1 Himesh 12.0
2 Kamesh 13.0

2.2.2 Create a DataFrame from Dict of ndarrays /


Lists

All the ndarrays must be of same length. If index is passed, then the
length of the index should equal to the length of the arrays.

If no index is passed, then by default, index will be range(n), where n is


the array length.

In [19]:

data = {'Name':['Ramesh', 'Rajesh', 'Nitesh', 'Nilesh'],'Age':[28,34,29,


42]}
df = pd.DataFrame(data)
print (df)

Age Name
0 28 Ramesh
1 34 Rajesh
2 29 Nitesh
3 42 Nilesh
In [20]:

data = {'Name':['Ramesh', 'Rajesh', 'Nitesh', 'Nilesh'],'Age':[28,34,29,


42]}
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
print (df)

Age Name
rank1 28 Ramesh
rank2 34 Rajesh
rank3 29 Nitesh
rank4 42 Nilesh

2.2.3 Create a DataFrame from List of Dicts


List of Dictionaries can be passed as input data to create a DataFrame.
The dictionary keys are by default taken as column names.

In [21]:

data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]


df = pd.DataFrame(data)
print (df)

a b c
0 1 2 NaN
1 5 10 20.0

In [23]:

data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]


df = pd.DataFrame(data, index=['first', 'second']) # passing row indices
print (df)

a b c
first 1 2 NaN
second 5 10 20.0
In [25]:

data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]

#With two column indices, values same as dictionary keys


df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b'])

#With two column indices with one index with other name
df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b1'])
print (df1)
print (df2)

a b
first 1 2
second 5 10
a b1
first 1 NaN
second 5 NaN

2.2.4 Create a DataFrame from Dict of Series


Dictionary of Series can be passed to form a DataFrame. The resultant
index is the union of all the series indexes passed.

In [58]:

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),


'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print (df)

one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4

2.2.5 Column selection, addition, deletion


In [59]:

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),


'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print (df ['one'])

a 1.0
b 2.0
c 3.0
d NaN
Name: one, dtype: float64
In [60]:

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),


'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)

# Adding a new column to an existing DataFrame object with column label


by passing new series

print ("Adding a new column by passing as Series:")


df['three']=pd.Series([10,20,30],index=['a','b','c'])
print (df)

print()
print ("Adding a new column using the existing columns in DataFrame:")
df['four']=df['one']+df['three']

print (df)

Adding a new column by passing as Series:


one two three
a 1.0 1 10.0
b 2.0 2 20.0
c 3.0 3 30.0
d NaN 4 NaN

Adding a new column using the existing columns in DataFr


ame:

one two three four


a 1.0 1 10.0 11.0
b 2.0 2 20.0 22.0
c 3.0 3 30.0 33.0
d NaN 4 NaN NaN
In [31]:

# Using the previous DataFrame, we will delete a column


# using del function

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),


'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']),
'three' : pd.Series([10,20,30], index=['a','b','c'])}

df = pd.DataFrame(d)
print ("Our dataframe is:")
print (df)

print()
# using del function
print ("Deleting the first column using DEL function:")
del df['one']
print (df)

# using pop function


print ("Deleting another column using POP function:")
df.pop('two')
print (df)

Our dataframe is:


one three two
a 1.0 10.0 1
b 2.0 20.0 2

c 3.0 30.0 3
d NaN NaN 4

Deleting the first column using DEL function:


three two
a 10.0 1
b 20.0 2
c 30.0 3
d NaN 4
Deleting another column using POP function:
three
a 10.0
b 20.0
c 30.0
d NaN

2.2.5 Row Selection, Addition, and Deletion

Selection by Label

Rows can be selected by passing row label to a loc function.

In [61]:

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),


'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print (df.loc['b'])

one 2.0
two 2.0
Name: b, dtype: float64

Selection by integer location

Rows can be selected by passing integer location to an iloc function.

In [63]:

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),


'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print (df.iloc[2])

one 3.0
two 3.0
Name: c, dtype: float64

Slice Rows

Multiple rows can be selected using ‘ : ’ operator.


In [34]:

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),


'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print (df[2:4])

one two
c 3.0 3
d NaN 4

Addition of Rows

Add new rows to a DataFrame using the append function. This function
will append the rows at the end.

In [62]:

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])


df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2)
print (df)

a b
0 1 2
1 3 4
0 5 6
1 7 8

Deletion of Rows

Use index label to delete or drop rows from a DataFrame. If label is


duplicated, then multiple rows will be dropped.
In [36]:

import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])


df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2)

# Drop rows with label 0


df = df.drop(0)

print (df)

a b
1 3 4
1 7 8

3 Basic Functionality
In [64]:

#Create a Dictionary of series


d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jac
k']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our data series is:")
print (df)

Our data series is:


Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80

T (Transpose)

Returns the transpose of the DataFrame. The rows and columns will
interchange.
In [65]:

# Create a Dictionary of series


d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jac
k']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

# Create a DataFrame
df = pd.DataFrame(d)
print ("The transpose of the data series is:")
print (df.T)

The transpose of the data series is:


0 1 2 3 4 5 6
Age 25 26 25 23 30 29 23
Name Tom James Ricky Vin Steve Smith Jack
Rating 4.23 3.24 3.98 2.56 3.2 4.6 3.8

axes

Returns the list of row axis labels and column axis labels.

In [66]:

#Create a Dictionary of series


d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jac
k']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Row axis labels and column axis labels are:")
print (df.axes)

Row axis labels and column axis labels are:


[RangeIndex(start=0, stop=7, step=1), Index(['Age', 'Nam
e', 'Rating'], dtype='object')]

dtypes
Returns the data type of each column.

In [43]:

#Create a Dictionary of series


d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jac
k']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("The data types of each column are:")
print (df.dtypes)

The data types of each column are:


Age int64
Name object
Rating float64
dtype: object

ndim

Returns the number of dimensions of the object. By definition,


DataFrame is a 2D object.
In [44]:

#Create a Dictionary of series


d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jac
k']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print (df)
print ("The dimension of the object is:")
print (df.ndim)

Our object is:


Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80
The dimension of the object is:
2

shape

Returns a tuple representing the dimensionality of the DataFrame. Tuple


(a,b), where a represents the number of rows and b represents the
number of columns.
In [45]:

#Create a Dictionary of series


d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jac
k']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print (df)
print ("The shape of the object is:")
print (df.shape)

Our object is:


Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80
The shape of the object is:
(7, 3)

size

Returns the number of elements in the DataFrame.


In [46]:

#Create a Dictionary of series


d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jac
k']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print (df)
print ("The total number of elements in our object is:")
print (df.size)

Our object is:


Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80
The total number of elements in our object is:
21

values

Returns the actual data in the DataFrame as an NDarray.


In [49]:

#Create a Dictionary of series


d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jac
k']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print (df)
print()
print ("The actual data in our data frame is:")
print (df.values)

Our object is:


Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80

The actual data in our data frame is:


[[25 'Tom' 4.23]
[26 'James' 3.24]
[25 'Ricky' 3.98]
[23 'Vin' 2.56]
[30 'Steve' 3.2]
[29 'Smith' 4.6]
[23 'Jack' 3.8]]

Head & Tail

To view a small sample of a DataFrame object, use the head() and tail()
methods. head() returns the first n rows (observe the index values). The
default number of elements to display is five, but you may pass a
custom number.
In [52]:

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our data frame is:")
print (df)
print()
print ("The first two rows of the data frame is:")
print (df.head(2))

Our data frame is:


Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80

The first two rows of the data frame is:


Age Name Rating
0 25 Tom 4.23
1 26 James 3.24

tail() returns the last n rows (observe the index values). The default
number of elements to display is five, but you may pass a custom
number.
In [53]:

#Create a Dictionary of series


d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jac
k']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our data frame is:")
print (df)
print()
print ("The last two rows of the data frame is:")
print (df.tail(2))

Our data frame is:


Age Name Rating
0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80

The last two rows of the data frame is:


Age Name Rating
5 29 Smith 4.6
6 23 Jack 3.8

4. Descriptive Statistics
Descriptive Statistics sumarizes the underlying distribution of data
values through statistical values like mean, variance etc.

Basic Functions
Function Description
count Number of non-null observations
sum Sum of values
mean Mean of values
mad Mean absolute deviation
median Arithmetic median of values
min Minimum
max Maximum
mode Mode
abs Absolute Value
prod Product of values
std Unbiased standard deviation
var Unbiased variance
skew Unbiased skewness (3rd moment)
kurt Unbiased kurtosis (4th moment)
quantile Sample quantile (value at %)
cumsum Cumulative sum
cumprod Cumulative product
cummax Cumulative maximum
cummin Cumulative minimum

4.1 sum()
Returns the sum of the values for the requested axis. By default, axis is
index (axis=0).
In [10]:

#Create a Dictionary of series


d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jac
k',
'Lee','David','Gasper','Betina','Andres']),
'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,
4.10,3.65])}

#Create a DataFrame
df = pd.DataFrame(d)
print(df)

Age Name Rating


0 25 Tom 4.23
1 26 James 3.24
2 25 Ricky 3.98
3 23 Vin 2.56
4 30 Steve 3.20
5 29 Smith 4.60
6 23 Jack 3.80
7 34 Lee 3.78
8 40 David 2.98
9 30 Gasper 4.80
10 51 Betina 4.10
11 46 Andres 3.65

In [11]:

print (df.sum()) # axis = 0

Age
382
Name TomJamesRickyVinSteveSmithJackLeeDavidGasperB
e...
Rating 4
4.92
dtype: object

Each individual column is added individually (Strings are appended).


In [9]:

print (df.sum(1)) # axis = 1

0 29.23
1 29.24
2 28.98
3 25.56
4 33.20
5 33.60
6 26.80
7 37.78
8 42.98
9 34.80
10 55.10
11 49.65
dtype: float64

4.2 mean()
Returns the average value

In [13]:

print (df.mean())

Age 31.833333
Rating 3.743333
dtype: float64

4.3 std()
Returns the Bressel standard deviation of the numerical columns.
In [14]:

print (df.std())

Age 9.232682
Rating 0.661628
dtype: float64

4.4 Summarizing Data


The describe() function computes a summary of statistics pertaining to
the DataFrame columns.

In [15]:

print (df.describe())

Age Rating
count 12.000000 12.000000
mean 31.833333 3.743333
std 9.232682 0.661628
min 23.000000 2.560000
25% 25.000000 3.230000
50% 29.500000 3.790000
75% 35.500000 4.132500
max 51.000000 4.800000

This function gives the mean, std and IQR values. And, function
excludes the character columns and given summary about numeric
columns. 'include' is the argument which is used to pass necessary
information regarding what columns need to be considered for
summarizing. Takes the list of values; by default, 'number'.

object − Summarizes String columns


number − Summarizes Numeric columns
all − Summarizes all columns together (Should not pass it as a
list value)
In [16]:

print (df.describe(include=['object']))

Name
count 12
unique 12
top Steve
freq 1

In [17]:

print (df. describe(include='all'))

Age Name Rating


count 12.000000 12 12.000000
unique NaN 12 NaN
top NaN Steve NaN
freq NaN 1 NaN
mean 31.833333 NaN 3.743333
std 9.232682 NaN 0.661628
min 23.000000 NaN 2.560000
25% 25.000000 NaN 3.230000
50% 29.500000 NaN 3.790000
75% 35.500000 NaN 4.132500
max 51.000000 NaN 4.800000

5. Input/Output Tools
The Pandas I/O api is a set of top level reader functions accessed like
pd.read_csv() that generally return a pandas object.

read_csv
read_excel
read_hdf
read_sql
read_json
read_msgpack (experimental)
read_html
read_gbq (experimental)
read_stata
read_clipboard
read_pickle

The corresponding writer functions are object methods that are


accessed like df.to_csv() • to_csv

to_excel
to_hdf
to_sql
to_json
to_msgpack (experimental) • to_html
to_gbq (experimental) • to_stata
to_clipboard
to_pickle

5.1 Loading the Weather Data from the CSV


In this example we load the weather datafrom the data directory (
"data\weather_data.csv")

In [22]:

#! executes a shell command


!ls data

weather_data.csv
In [24]:

df = pd.read_csv("data/weather_data.csv")
print (df)

Day outlook temperature humidity windy play


0 1 sunny 85 85 False no
1 2 sunny 80 90 True no
2 3 overcast 83 86 False yes
3 4 rainy 70 96 False yes
4 5 rainy 68 80 False yes
5 6 rainy 65 70 True no
6 7 overcast 64 65 True yes

In [25]:

#use help to see the parameters


pd.read_csv?

In [ ]:

You might also like