0% found this document useful (0 votes)
27 views

Chapter 4 - Import-Export Data

The document discusses transferring data between CSV files, DataFrames, and MySQL databases in Python. It covers loading and saving CSV files from DataFrames using read_csv() and to_csv() functions. It also covers connecting to a MySQL database and executing SQL queries to load data into a DataFrame using read_sql().

Uploaded by

Anjana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Chapter 4 - Import-Export Data

The document discusses transferring data between CSV files, DataFrames, and MySQL databases in Python. It covers loading and saving CSV files from DataFrames using read_csv() and to_csv() functions. It also covers connecting to a MySQL database and executing SQL queries to load data into a DataFrame using read_sql().

Uploaded by

Anjana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Importing/Expoting

Data beween
CSV Files/MySQL and
Pandas
Chapter - 4
4.1 Introduction

DataFrames are capable of storing any types of data in 2D
tabular form.

Most data files that we use to store data such as
spreadsheet files or database tables, also store the data in
2D tabular formats.

Since DataFrame can also hold data in similar way, you
can transfer data from dataframe to such data files or from
files into dataframes.

In this Chapter we will learn how to transfer data
among .CSV file,dataframe and database table

.CSV file( .CSV is a format that stores data in comma
separated form – Comma Separated Values)
4.2 Transferring Data between .csv Files and DataFrames


“The acronym CSV is short for Comma-Separated
Values.The CSV format refers to a tabular data that has
been saved as plaintext where data is separated by
commas.”
For example : The data of a table will be stored in CSV
format as shown below
Advantages of CSV format

A simple, compact and ubiquitous format for data storage.

A common format for data interchange.

It can be opened in popular spreadsheet packages like MS-
Excel, Calc etc.

Nearly all spredsheets and databases supports
import/export to csv format.
4.2.1 Loading Data From CSV to DataFrames

Python's Pandas library offers two function


read_csv( ) - This function helps you bring data from a
CSV file into a dataframe.
to_csv( ) - This function helps you write a dataframe's
data to a CSV file.

We can create a CSV file by saving data of an MS-Excel


file in CSV format using save AS command from File
tab/menu and selceting Save As Type as CSV Format.
4.2.1A Reading From a CSV File to Dataframe
We can use read_csv( ) function to raed data from a
CSV file in your dataframe by using the function as per
following syntax:
<DF> = pandas.read_csv( < filepath > )
eg. df = pd.rad_csv( “ c:\\data\\sample.csv “)
print( df )

NOTE:it has taken the first roe from the CSV file as the column names for dataframe
4.2.1B Reading CSV File and Specifying Own Column Names
We may have a CSV file that does not have top row
containg column headers


Now if you read such a file by just giving the filepath, it
will take the top row as the column headers.But the top
row(1,Sarah,Kapur) is data, not column headings.

In such situation, we can specify own column headings in
read_csv( ) using names argument.
Df2 = pd.read_csv( “ c:\\data\\mydata.csv “, names = [ “
Roll no “ , “First_Name” , “Last_Name”] )
And now when you print df2, Python will show:
If we want the first row not to be used as header and at the
same time we don't want to specify column headings rather
go with default column headings which go like 0, 1, 2, 3....
then simply give argument as header = None in read_csv( )
df3 = pd.read_csv(“c:\\data\\mydata.csv “ , header = None)
Now comes a situation where you have first row of CSV file storing
some column headings but you don't want to use them.

For this situation , we need to give two arguments along with file path :
one for column headings i.e, names = <column headings sequence>
and another skipows = <n>.
df5 = pd.read_csv(“c:\\data\\mydata.csv”, names=[“Rollno”, “Name” ,
“Marks”], skiprows = 1]
4.2.1C Reading Specified number of Rows from CSV File
Giving argument nrows = <n> in read_csv( ), will read the
specified number of rows from the CSV file
df6 = pd.read_csv(“ c:\\data\\mydata.csv”, names = [ “Rollno”
,”Name”, “Surname”] , nrows=3)
print(df6)

Using nrows argument, you can extract top rows, bottom


rows and middle rows too if you use it with head( ) or tail( )
functions
4.2.1D Reading from CSV files having Separator Different from Comma
Some CSV files are so created that their separator character is
different from comma such as a semicolo(;) or a pipe symbol(|) etc.

To read data from such CSV files, you need to specify an additinal
argument as sep =<separator character>.

If you skip this argument then default separator character(comma)
is considered.
4.2.2 Storing DataFrame's Data to CSV File
Sometimes, we have data available in dataframes and we want to
save that data in a CSV file.For this purpose, Python Pandas
provides to_csv( ) function that saves the data of a dataframe in a
CSV file.
<DF> .to_csv( <filepath> )
Or <DF>.to_csv( <filepath> , sep = <separator_character>)
The separator character must be a one character string only.

As no separator has been mentioned , it will take default separator
cooma.

Also, if a file exists with the same name at the given location, it will
overwrite it.

Open the file and you will find the data of dataframe df7

Let us save the same dataframe's data in another file namely


new2.csv but with separator.as '|' charartor.
df7.to_csv( “c:\\data\\new2.csv”, sep = “|“)
4.2.2A Handling NaN Values with to_csv( )
Sometimes, your dataframe has some missing values.If your
dataframe dont have missing value then execute the following
statement on dataframe to insert missing values
.
df7.loc[ 3, “Name”] = np.NaN
df7.loc[ 0, “Surame”] = np.NaN

Now your dataframe df7 will


store data as:

Now, if you store this dataframe


in a CSV file by giving following
command:
df7.to_csv(“ c:\\data\\new3.csv”, sep= “|”)

By default , the missing/NaN values


are stored as empty strings in CSV
file.
You can specify your own string that can be written for missing/NaN
values by giving an argument na_rep = <string>
Following statement will write NULL in place of NaN values in the
CSV file :
df7.to_csv( “c:\\data\\new3.csv” , sep = “ | “ , na_rep = “NULL” )
4.3 Transferring Data between DataFrames and MySQL
An SQL database is a relational database having data in tables
called relations.

It uses a special type of query language, Structure Query
Language(SQL), to query upon manipulate data or to
communicate with database.

There are many SQL database available such as MySQL, SQL
Server, SQLite etc.

We will learn to import/export data from MySQL database to a
dataframe in Python progam and vice versa.

Installing mysql connector or pymysql packages

In order to connect with MySQL from with a Python program, you
must have the mysql connector package (mysql-connector-python)
or pymysql installed.

For this, we can open the command shell and go to the Python
installation folder by the following command :

C:\WINDOWS\system32>cd <Python folder path>

And then type the following command on the prompt to install mysql-
connector-python or pymysql package :

C:\<path> > pip install mysql-connector-python

C:\<path> >pip install pymysql

Once you have mysql-connector-python or pymysql package
installed, you can import/export data from a MySQL database into
your Python program.

4.3.1 Bringing Data from MySQL Database into a DataFrame

There are mainly seven steps that must be followed in order to
create a database connectivity application.

Step 1 Start Python and import the packages required for atabase
programming.

Step 2 Open a connection to database.

Step 3 Execute sql command and fetch rows into a dataframe

Step 4 Process as desired.

Step 5 Close the connection.
Step 1: Import Required Libraries
import pandas as pd
import mysql.connector as sqltor

Step 2: Open a Connection to MySQL Database


The connect( ) function establish conection to a MySQL database

<Connection-object> = mysql.connector.connect (host =<host-


name>,user = <username>, passwd = <password> [, database
=<database>]

import mysql.connector as sqltor


The
mycon = sqltor.connect ( host = “localhost” ,
connection user = “root”, passwd = “MyPass”,
object database=”test” )
Loginid and password
Make sure that this is name of of your mysql
an existing database name database it should be
correct
Step 3: Execute SQL command and fetch rows in a Dataframe
Once the connection to SQL database is established, read data from a
table into a dataframe using read_sql() function

<DF> = pandas.read_sql( “<SQL statement > “ , <connection name> )


eg.
df = pd.read_sql( “ SELECT * From Student ; “ , mycon )

Here, make sure that the SQL satatement given inside the read_sql( )
function :
( i ) must end with a semicolon and
( ii) should be enclosed in quotes.
Now the above full code will give output as shown below
4.3.2 Framing Flexible SQL Queries with User Data

Sometimes, you may need to run queries which are based on some
parameters or values that provide from outside.

Such queries are called parameterised queries.

To execute parameterised queries in a mysql.connector connection,
you need to form SQL query string that include values of parameter.
String Templates with % formatting

In this style , string formatting uses this general form : f % v
f - is a template string
v – value or values to be formatted.

if multiple values are to be formatted, v must be a tuple.For rhis you
can write the SQL query in a string but use a %s code in place of the
value to be provided as a parameter.
e.g. “select * from student where marks > %s “
The above string is an incomplete string, to complete it , you must
provide a tuple of values with % prefixed .
e.g. If you want to provide value 70 for %s placeholder , then the
query will be :
“select * from student where marks > %s “ %( 70 , )
f v
Now you can store this query string in a variable and then execute
that variable through read_sql() function e.g.
1. sname = “input( “Which student's record you want to see ? Enter
Name : “)

2. qrystr = “select * from student where name = ' %s ' ; % ( sname , )

3. df1 = pd.read_sql(qrystr, <connection> )


4.4 Exporting a DataFrame's Data as a Table in MySQL Database

For writing onto MySQL database, we shall connact to a MySQL
database using pymysql library which uses create_engine() of
sqlalchemy library .There are some reasons behind it .

mysql.connector does not support writing onto MySQL database
using to_sql( ).

For writing onto MySQL's version 5.5 or higher, a connection with
strong ORM support is required to_sql( ), which is created through
sqlalchemy's create_engine( ) function.

Pymysql library ensure Python to MySQL transition of data smoothly.
For exporting a dataframe onto a MySQL table make sure to install
these two libraries using pip install <library name>, and follow the
steps given below :
i) Import pandas, pymysql and sqlalchemy libraries.
ii) Establish connection toMySQL database
iii) Write dataframe's data onto MySQL table.
Step 1 : Import required libraries by issuing commands as :

import pandas as pd
import pymysql
From sqlalchemy import create_engine #only create_engine is needed

Step 1 : Establish connection to database using create_engine( )


<db engine> = create_engine('mysq + pymysql://<user>:<password>
@localhost/<MySQL database> “)
<connection name> = <db engine>.connect( )
e.g. engine = create_engine('mysql+pymysql://root:MyPass@localhost/School')

Step 3 : Write dataframe's data onto MySQL table using to_sql( )


In this step, now you can write a dataframe in the form of a table by
using to_sql( ).
<df>.to_sql(<tablename>, <connection> [ index=True] [,if_exists = “append” | “replace
“ | “fail”] )
Consider dataframe tDf shown below

Let us create a table Topper1 in MySQL database namely test


using the dataframe tDf.

You might also like