0% found this document useful (0 votes)
6 views15 pages

Dataquest.io-tutorial Inserting Records and DataFrames Into a SQL Database

This tutorial provides a comprehensive guide on how to insert records and Pandas DataFrames into a SQL database using SQL commands and Python. It covers the basics of SQL insertion, connecting Python to a MySQL database using pymysql, and demonstrates how to insert data both row by row and using the to_sql() function from the SQLAlchemy library. Additionally, it emphasizes the importance of SQL skills for data scientists in managing relational databases.

Uploaded by

Wadia Alfred
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views15 pages

Dataquest.io-tutorial Inserting Records and DataFrames Into a SQL Database

This tutorial provides a comprehensive guide on how to insert records and Pandas DataFrames into a SQL database using SQL commands and Python. It covers the basics of SQL insertion, connecting Python to a MySQL database using pymysql, and demonstrates how to insert data both row by row and using the to_sql() function from the SQLAlchemy library. Additionally, it emphasizes the importance of SQL skills for data scientists in managing relational databases.

Uploaded by

Wadia Alfred
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Tutorial: Inserting Records and DataFrames Into a

SQL Database
dataquest.io/blog/sql-insert-tutorial

August 12, 2019

Learn by watching videos coding!

One of the key roles of a data scientist is to extract patterns and insights from raw data.
Since much of the world’s government and corporate data is organized in relational
databases, it makes sense that data scientists need to know how to work with these
database structures. Writing SQL queries to insert, extract, and filter data in databases
is a key skill for anyone interested in data analytics or data science.

SQL (Structured Query Language) is based on E. F. Codd’s Relational model and


algebra to manage the relational databases. It’s a database query language used to
create, insert, query, and manipulate the relational database and used by a large
number of applications.

Although it has been around for decades, learning SQL is still a critical skill for modern
data scientists because SQL is commonly used in all kinds of relational database
software, including MySQL, SQL Server, Oracle, and PostgreSQL.

In this tutorial, we’ll learn about SQL insertion operations in detail. Here is the list of
topics that we will learn in this tutorial:

SQL Insertion
Inserting records into a database
Inserting Pandas DataFrames into a database using the insert command
Inserting Pandas DataFrames into a database using the to_sql() command
Reading records from a database
Updating records in a database

Want to reach a higher level of SQL skill? Sign up for free and check out Dataquest’s
SQL courses for thorough, interactive lessons on all the SQL skills you’ll need for data
science.

Learn SQL by doing


Go hands-on with SQL right now, using our interactive learning platform.

1/15
Write real queries

Work with real data

Right in your browser!

SQL Insertion
SQL Insertion is an essential operation for data workers to understand. Inserting
missing data or adding new data is a major part of the data cleaning process on most
data science projects.

Insertion is also how most data gets into databases in the first place, so it’s important
anytime you’re collecting data, too. When your company gets new data on a customer,
for example, chances are than a SQL insert will be how that data gets into your existing
customer database.

In fact, whether or not you’re aware of it, data is flowing into databases using SQL
inserts all the time! When you fill out a marketing survey, complete a transaction, file a
government form online, or do any of thousands of other things, your data is likely
being inserted into a database somewhere using SQL.

Let’s dive into how we can actually use SQL to insert data into a database. We can insert
data row by row, or add multiple rows at a time.

Inserting records into a database

In SQL, we use the INSERT command to add records/rows into table data. This
command will not modify the actual structure of the table we’re inserting to, it just adds
data.

Let’s imagine we have a data table like the one below, which is being used to store some
information about a company’s employees.

2/15
Now, let’s imagine we have new employees we need to put into the system.

This employee table could be created using the CREATE TABLE command, so we
could use that command to create an entirely new table. But it would be very inefficient
to create a completely new table every time we want to add data! Instead, let’s use the
INSERT command to add the new data into our existing table.

Here’s the basic syntax for using INSERT in SQL:

We start with the command INSERT INTO followed by the name of table into which
we’d like to insert data. After the table name, we list the columns of new data we’re
inserting column by column, inside parentheses. Then, on the next line, we used the
command VALUES along with the values we want to insert (in sequence inside
parentheses.

So for our employee table, if we were adding a new employee named Kabir, our
INSERT command might look like this:

Inserting Records Into a Database From Python

Since we’re often working with our data in Python when doing data science, let’s insert
data from Python into a MySQL database. This is a common task that has a variety of
applications in data science.

We can send and receive data to a MySQL database by establishing a connection


between Python and MySQL. There are various ways to establish this connection; here,
we will use pymysql for database connectivity.
3/15
Here are the broad steps we’ll need to work through to get pymysql connected, insert
our data, and then extract the data from MySQL:

Let’s walk through this process step by step.

Step 1: Import the pymysql module.

# Import pymysql module


import pymysql

Step 2: Create connection a to the MySQL database

Create a connection using pymysql ‘s connect() function with the parameters host,
user, database name, and password.

(The parameters below are for demonstration purposes only; you’ll need to fill in the
specific access details required for the MySQL database you’re accessing.)

# Connect to the database


connection = pymysql.connect(host='localhost',
user='root',
password='12345',
db='employee')

Step 3: Create a cursor using the cursor() function.

This will allow us to execute the SQL query once we’ve written it.

cursor = connection.cursor()

Step 4: Execute the required SQL query

Commit the changes using the commit() function, and check the inserted records.
Note that we can create a variable called sql , assign our query’s syntax to it, and then
pass sql and the specific data we want to insert as arguments to cursor.execute() .

Then, we’ll commit these changes using commit() .

4/15
# Create a new record
sql = "INSERT INTO `employee` (`EmployeeID`, `Ename`, `DeptID`, `Salary`, `Dname`,
`Dlocation`) VALUES (%s, %s, %s, %s, %s, %s)"

# Execute the query


cursor.execute(sql, (1008,'Kabir',2,5000,'IT','New Delhi'))

# the connection is not autocommited by default. So we must commit to save our changes.
connection.commit()

Let’s do a quick check to see if the record we wanted to insert has actually been inserted.

We can do this by querying the database for the entire contents of employee , and then
fetching and printing those results.

# Create a new query that selects the entire contents of `employee`


sql = "SELECT * FROM `employee`"
cursor.execute(sql)

# Fetch all the records and use a for loop to print them one line at a time
result = cursor.fetchall()
for i in result:
print(i)

(1001, 'John', 2, 4000, 'IT', 'New Delhi')


(1002, 'Anna', 1, 3500, 'HR', 'Mumbai')
(1003, 'James', 1, 2500, 'HR', 'Mumbai')
(1004, 'David', 2, 5000, 'IT', 'New Delhi')
(1005, 'Mark', 2, 3000, 'IT', 'New Delhi')
(1006, 'Steve', 3, 4500, 'Finance', 'Mumbai')
(1007, 'Alice', 3, 3500, 'Finance', 'Mumbai')
(1008, 'Kabir', 2, 5000, 'IT', 'New Delhi')

It worked! Above, we can see the new record has been inserted and is now the final row
in our MySQL database.

Step 5: Close the database connection

Now that we’re done, we should close the database connection using close() method.

# Close the connection


connection.close()

Of course, it would be better to write this code in a way that could better handle
exceptions and errors. We can do this using try to contain the body of our code and
except to print errors if any arise. Then, we can use finally to close the connection
once we’re finished, regardless of whether try succeeded or failed.

Here’s what that looks like all together:

5/15
import pymysql

try:
# Connect to the database
connection = pymysql.connect(host='localhost',
user='root',
password='12345',
db='employee')

cursor=connection.cursor()

# Create a new record


sql = "INSERT INTO `employee` (`EmployeeID`, `Ename`, `DeptID`, `Salary`, `Dname`,
`Dlocation`) VALUES (%s, %s, %s, %s, %s, %s)"
cursor.execute(sql, (1009,'Morgan',1,4000,'HR','Mumbai'))

# connection is not autocommit by default. So we must commit to save our changes.


connection.commit()

# Execute query
sql = "SELECT * FROM `employee`"
cursor.execute(sql)
# Fetch all the records
result = cursor.fetchall()
for i in result:
print(i)

except Error as e:
print(e)

finally:
# close the database connection using close() method.
connection.close()

((1001, 'John', 2, 4000, 'IT', 'New Delhi'), (1002, 'Anna', 1, 3500, 'HR', 'Mumbai'), (1003,
'James', 1, 2500, 'HR', 'Mumbai'), (1004, 'David', 2, 5000, 'IT', 'New Delhi'), (1005, 'Mark', 2,
3000, 'IT', 'New Delhi'), (1006, 'Steve', 3, 4500, 'Finance', 'Mumbai'), (1007, 'Alice', 3, 3500,
'Finance', 'Mumbai'), (1008, 'Kabir', 2, 5000, 'IT', 'New Delhi'), (1009, 'Morgan', 1, 4000, 'HR',
'Mumbai'), (1009, 'Morgan', 1, 4000, 'HR', 'Mumbai'))

Learn SQL by doing


Go hands-on with SQL right now, using our interactive learning platform.

Write real queries

Work with real data

Right in your browser!

6/15
Inserting Pandas DataFrames Into Databases Using INSERT
When working with data in Python, we’re often using pandas , and we’ve often got our
data stored as a pandas DataFrame. Thankfully, we don’t need to do any conversions if
we want to use SQL with our DataFrames; we can directly insert a pandas DataFrame
into a MySQL database using INSERT .

Once again, we’ll take it step-by-step.

Step 1: Create DataFrame using a dictionary

We could also import data from a CSV or create a DataFrame in any number of other
ways, but for the purposes of this example, we’re just going to create a small DataFrame
that saves the titles and prices of some data science texbooks.

# Import pandas
import pandas as pd

# Create dataframe
data = pd.DataFrame({
'book_id':[12345, 12346, 12347],
'title':['Python Programming', 'Learn MySQL', 'Data Science Cookbook'],
'price':[29, 23, 27]
})

data

book_id title price

Python Programming 29

Learn MySQL 23

Data Science Cookbook 27

Step 2: Create a table in our MySQL database

Before inserting data into MySQL, we’re going to to create a book table in MySQL to
hold our data. If such a table already existed, we could skip this step.

We’ll use a CREATE TABLE statement to create our table, follow that with our table
name (in this case, book_details ), and then list each column and its corresponding
datatype.

7/15
Step 3: Create a connection to the database

Once we’ve created that table, we can once again create a connection to the database
from Python using pymysql .

import pymysql

# Connect to the database


connection = pymysql.connect(host='localhost',
user='root',
password='12345',
db='book')

# create cursor
cursor=connection.cursor()

Step 4: Create a column list and insert rows

Next, we’ll create a column list and insert our dataframe rows one by one into the
database by iterating through each row and using INSERT INTO to insert that row’s
values into the database.

(It is also possible to insert the entire DataFrame at once, and we’ll look at a way of
doing that in the next section, but first let’s look at how to do it row-by-row).

# creating column list for insertion


cols = "`,`".join([str(i) for i in data.columns.tolist()])

# Insert DataFrame recrds one by one.


for i,row in data.iterrows():
sql = "INSERT INTO `book_details` (`" +cols + "`) VALUES (" + "%s,"*(len(row)-1) + "%s)"
cursor.execute(sql, tuple(row))

# the connection is not autocommitted by default, so we must commit to save our changes
connection.commit()

Step 5: Query the database to check our work

Again, let’s query the database to make sure that our inserted data has been saved
correctly.

8/15
# Execute query
sql = "SELECT * FROM `book_details`"
cursor.execute(sql)

# Fetch all the records


result = cursor.fetchall()
for i in result:
print(i)

(12345, 'Python Programming', 29)


(12346, 'Learn MySQL', 23)
(12347, 'Data Science Cookbook', 27)

Once we’re satisfied that everything looks right, we can close the connection.

connection.close()

Learn SQL by doing


Go hands-on with SQL right now, using our interactive learning platform.

Write real queries

Work with real data

Right in your browser!

Inserting Pandas DataFrames into a Database Using the to_sql()


Function
Now let’s try to do the same thing — insert a pandas DataFrame into a MySQL database
— using a different technique. This time, we’ll use the module sqlalchemy to create
our connection and the to_sql() function to insert our data.

This approach accomplishes the same end result in a more direct way, and allows us to
add a whole dataframe to a MySQL database all at once.

# Import modules
import pandas as pd

# Create dataframe
data=pd.DataFrame({
'book_id':[12345,12346,12347],
'title':['Python Programming','Learn MySQL','Data Science Cookbook'],
'price':[29,23,27]
})

data

9/15
book_id title price

0 12345 Python Programming 29

1 12346 Learn MySQL 23

2 12347 Data Science Cookbook 27

Import the module sqlalchemy and create an engine with the parameters user,
password, and database name. This is how we connect and log in to the MySQL
database.

# import the module


from sqlalchemy import create_engine

# create sqlalchemy engine


engine = create_engine("mysql+pymysql://{user}:{pw}@localhost/{db}"
.format(user="root",
pw="12345",
db="employee"))

Once we’re connected, we can export the whole DataFrame to MySQL using the
to_sql() function with the parameters table name, engine name, if_exists, and
chunksize.

We’ll take a closer look at what each of these parameters refers to in a moment, but
first, take a look at how much simpler it is to insert a pandas DataFrame into a MySQL
database using this method. We can do it with just a single line of code:

# Insert whole DataFrame into MySQL


data.to_sql('book_details', con = engine, if_exists = 'append', chunksize = 1000)

Now let’s take a closer look at what each of these parameters is doing in our code.

book_details is the name of table into which we want to insert our DataFrame.
con = engine provides the connection details (recall that we created engine
using our authentication details in the previous step).
if_exists = 'append' checks whether the table we specified already exists or not,
and then appends the new data (if it does exist) or creates a new table (if it
doesn’t).
chunksize writes records in batches of a given size at a time. By default, all rows
will be written at once.

Reading Records from a Database


Once we’ve used SQL inserts to get our data into the database, we’ll want to be able to
read it back! So far in this tutorial, we’ve checked our SQL inserts by simply printing the
entire database, but obviously this is not a viable option with larger databases where

10/15
you’d be printing thousands of rows (or more). So let’s take a more in-depth look at how
we can read back the records we’ve created or inserted into our SQL database.

We can read records from a SQL database using the SELECT command. We can select
specific columns, or use * to select everything from a given table. We can also select to
return only records that meet a particular condition using the WHERE command.

Here’s how the syntax for these commands looks:

We start with a SELECT clause, followed by list of columns, or * if we want to select


all columns.
Then we’ll use a FROM clause to name the table we’d like to look at. WHERE can be
used to filter the records and followed by a filter condition, and we can also use ORDER
BY to sort the records. (The WHERE and ORDER BY clauses are optional).

With larger databases, WHERE is useful for returing only the data we want to see. So if,
for example, we’ve just inserted some new data about a particular department, we could
use WHERE to specify the department ID in our query, and it would return only the
records with a department ID that matches the one we specified.

Compare, for example, the results of these two queries using our employee table from
earlier. In the first, we’re returning all the rows. In the second, we’re getting back only
the rows we’ve asked for. This may not make a big difference when our table has seven
rows, but when you’re working with seven thousand rows, or even seven million, using
WHERE to return only the results you want is very important!

11/15
If we want to do this from within Python, we can use the same script we used earlier in
this tutorial to query these records. The only difference is that we’ll tell pymysql to
execute the SELECT command rather than the INSERT command we used earlier.

12/15
# Import module
import pymysql

# create connection
connection = pymysql.connect(host='localhost',
user='root',
password='12345',
db='employee')

# Create cursor
my_cursor = connection.cursor()

# Execute Query
my_cursor.execute("SELECT * from employee")

# Fetch the records


result = my_cursor.fetchall()

for i in result:
print(i)

# Close the connection


connection.close()

(1001, 'John', 2, 4000, 'IT', 'New Delhi')


(1002, 'Anna', 1, 3500, 'HR', 'Mumbai')
(1003, 'James', 1, 2500, 'HR', 'Mumbai')
(1004, 'David', 2, 5000, 'IT', 'New Delhi')
(1005, 'Mark', 2, 3000, 'IT', 'New Delhi')
(1006, 'Steve', 3, 4500, 'Finance', 'Mumbai')
(1007, 'Alice', 3, 3500, 'Finance', 'Mumbai')
(1008, 'Kabir', 2, 5000, 'IT', 'New Delhi')
(1009, 'Morgan', 1, 4000, 'HR', 'Mumbai')
(1009, 'Morgan', 1, 4000, 'HR', 'Mumbai')

Above, we’ve selected and printed the entire database, but if we wanted to use WHERE
to make a more careful, limited selection, the approach is the same:

my_cursor.execute("SELECT * FROM employee WHERE DeptID=2")

Learn SQL by doing


Go hands-on with SQL right now, using our interactive learning platform.

Write real queries

Work with real data

Right in your browser!

13/15
Updating Records in the Database
Often, we’ll need to modify the records in the table after creating them.

For example, imagine that an employee in our employee table got a promotion. We’d
want to update their salary data. The INSERT INTO command won’t help us here,
because we don’t want to add an entirely new row.

To modify existing records in the table, we need to use the UPDATE command.
UPDATE is used to change the contents of existing records. We can specify specific
columns and values to change using SET , and we can also make conditional changes
with WHERE to apply those changes only to rows that meet that condition.

Now, let’s update the records from our employee table and display the results. In this
case, let’s say David got the promotion — we’ll write a query using UPDATE that sets
Salary to 6000 only in columns where the employee ID is 1004 (David’s ID).

Be careful — without the WHERE clause, this query would update every record in the
table, so don’t forget that!

After executing the above query,


the updated table would look like
this:

14/15
Conclusion
In this tutorial, we’ve taken a look at SQL inserts and how to insert data into MySQL
databases from Python. We also learned to insert Pandas DataFrames into SQL
databases using two different methods, including the highly efficient to_sql() method.

Of course, this is just the tip of the iceberg when it comes to SQL queries. If you really
want to become a master of SQL, sign up for free and dive into one of Dataquest’s
interactive SQL courses to get interactive instruction and hands-on experience writing
all the queries you’ll need to do productive, professional data science work.

Learn SQL by doing


Go hands-on with SQL right now, using our interactive learning platform.

Write real queries

Work with real data

Right in your browser!

You may also like

Get started with Dataquest today - for free!

15/15

You might also like