Introduction
In a real world corporate business setting, most data may not be stored in text or Excel files. SQL-based relational databases such as Oracle, SQL Server, PostgreSQL, and MySQL are in wide use, and many alternative databases have become quite popular.
The choice of database is usually dependent on the performance, data integrity, and scalability needs of an application.
How to do it..
In this example we will how to create a sqlite3 database. sqllite is installed by default with python installation and doesn't require any further installations. If you are unsure please try below. We will also import Pandas.
Loading data from SQL into a DataFrame is fairly straightforward, and pandas has some functions to simplify the process.
import sqlite3 import pandas as pd print(f"Output \n {sqlite3.version}")
Output
2.6.0
Output
# connection object conn = sqlite3.connect("example.db") # customers data customers = pd.DataFrame({ "customerID" : ["a1", "b1", "c1", "d1"] , "firstName" : ["Person1", "Person2", "Person3", "Person4"] , "state" : ["VIC", "NSW", "QLD", "WA"] }) print(f"Output \n *** Customers info -\n {customers}")
Output
*** Customers info - customerID firstName state 0 a1 Person1 VIC 1 b1 Person2 NSW 2 c1 Person3 QLD 3 d1 Person4 WA
# orders data orders = pd.DataFrame({ "customerID" : ["a1", "a1", "a1", "d1", "c1", "c1"] , "productName" : ["road bike", "mountain bike", "helmet", "gloves", "road bike", "glasses"] }) print(f"Output \n *** orders info -\n {orders}")
Output
*** orders info - customerID productName 0 a1 road bike 1 a1 mountain bike 2 a1 helmet 3 d1 gloves 4 c1 road bike 5 c1 glasses
# write to the db customers.to_sql("customers", con=conn, if_exists="replace", index=False) orders.to_sql("orders", conn, if_exists="replace", index=False)
Output
# frame an sql to fetch the data. q = """ select orders.customerID, customers.firstName, count(*) as productQuantity from orders left join customers on orders.customerID = customers.customerID group by customers.firstName; """
Output
# run the sql. pd.read_sql_query(q, con=conn)
Example
7. Putting it all together.
import sqlite3 import pandas as pd print(f"Output \n {sqlite3.version}") # connection object conn = sqlite3.connect("example.db") # customers data customers = pd.DataFrame({ "customerID" : ["a1", "b1", "c1", "d1"] , "firstName" : ["Person1", "Person2", "Person3", "Person4"] , "state" : ["VIC", "NSW", "QLD", "WA"] }) print(f"*** Customers info -\n {customers}") # orders data orders = pd.DataFrame({ "customerID" : ["a1", "a1", "a1", "d1", "c1", "c1"] , "productName" : ["road bike", "mountain bike", "helmet", "gloves", "road bike", "glasses"] }) print(f"*** orders info -\n {orders}") # write to the db customers.to_sql("customers", con=conn, if_exists="replace", index=False) orders.to_sql("orders", conn, if_exists="replace", index=False) # frame an sql to fetch the data. q = """ select orders.customerID, customers.firstName, count(*) as productQuantity from orders left join customers on orders.customerID = customers.customerID group by customers.firstName; """ # run the sql. pd.read_sql_query(q, con=conn)
Output
2.6.0 *** Customers info - customerID firstName state 0 a1 Person1 VIC 1 b1 Person2 NSW 2 c1 Person3 QLD 3 d1 Person4 WA *** orders info - customerID productName 0 a1 road bike 1 a1 mountain bike 2 a1 helmet 3 d1 gloves 4 c1 road bike 5 c1 glasses customerID firstName productQuantity ____________________________________ 0 a1 Person1 3 1 c1 Person3 2 2 d1 Person4 1