Chapter 4 - Import-Export Data
Chapter 4 - Import-Export Data
Data beween
CSV Files/MySQL and
Pandas
Chapter - 4
4.1 Introduction
●
DataFrames are capable of storing any types of data in 2D
tabular form.
●
Most data files that we use to store data such as
spreadsheet files or database tables, also store the data in
2D tabular formats.
●
Since DataFrame can also hold data in similar way, you
can transfer data from dataframe to such data files or from
files into dataframes.
●
In this Chapter we will learn how to transfer data
among .CSV file,dataframe and database table
●
.CSV file( .CSV is a format that stores data in comma
separated form – Comma Separated Values)
4.2 Transferring Data between .csv Files and DataFrames
●
“The acronym CSV is short for Comma-Separated
Values.The CSV format refers to a tabular data that has
been saved as plaintext where data is separated by
commas.”
For example : The data of a table will be stored in CSV
format as shown below
Advantages of CSV format
●
A simple, compact and ubiquitous format for data storage.
●
A common format for data interchange.
●
It can be opened in popular spreadsheet packages like MS-
Excel, Calc etc.
●
Nearly all spredsheets and databases supports
import/export to csv format.
4.2.1 Loading Data From CSV to DataFrames
NOTE:it has taken the first roe from the CSV file as the column names for dataframe
4.2.1B Reading CSV File and Specifying Own Column Names
We may have a CSV file that does not have top row
containg column headers
●
Now if you read such a file by just giving the filepath, it
will take the top row as the column headers.But the top
row(1,Sarah,Kapur) is data, not column headings.
●
In such situation, we can specify own column headings in
read_csv( ) using names argument.
Df2 = pd.read_csv( “ c:\\data\\mydata.csv “, names = [ “
Roll no “ , “First_Name” , “Last_Name”] )
And now when you print df2, Python will show:
If we want the first row not to be used as header and at the
same time we don't want to specify column headings rather
go with default column headings which go like 0, 1, 2, 3....
then simply give argument as header = None in read_csv( )
df3 = pd.read_csv(“c:\\data\\mydata.csv “ , header = None)
Now comes a situation where you have first row of CSV file storing
some column headings but you don't want to use them.
For this situation , we need to give two arguments along with file path :
one for column headings i.e, names = <column headings sequence>
and another skipows = <n>.
df5 = pd.read_csv(“c:\\data\\mydata.csv”, names=[“Rollno”, “Name” ,
“Marks”], skiprows = 1]
4.2.1C Reading Specified number of Rows from CSV File
Giving argument nrows = <n> in read_csv( ), will read the
specified number of rows from the CSV file
df6 = pd.read_csv(“ c:\\data\\mydata.csv”, names = [ “Rollno”
,”Name”, “Surname”] , nrows=3)
print(df6)
Here, make sure that the SQL satatement given inside the read_sql( )
function :
( i ) must end with a semicolon and
( ii) should be enclosed in quotes.
Now the above full code will give output as shown below
4.3.2 Framing Flexible SQL Queries with User Data
●
Sometimes, you may need to run queries which are based on some
parameters or values that provide from outside.
●
Such queries are called parameterised queries.
●
To execute parameterised queries in a mysql.connector connection,
you need to form SQL query string that include values of parameter.
String Templates with % formatting
●
In this style , string formatting uses this general form : f % v
f - is a template string
v – value or values to be formatted.
●
if multiple values are to be formatted, v must be a tuple.For rhis you
can write the SQL query in a string but use a %s code in place of the
value to be provided as a parameter.
e.g. “select * from student where marks > %s “
The above string is an incomplete string, to complete it , you must
provide a tuple of values with % prefixed .
e.g. If you want to provide value 70 for %s placeholder , then the
query will be :
“select * from student where marks > %s “ %( 70 , )
f v
Now you can store this query string in a variable and then execute
that variable through read_sql() function e.g.
1. sname = “input( “Which student's record you want to see ? Enter
Name : “)
import pandas as pd
import pymysql
From sqlalchemy import create_engine #only create_engine is needed