0% found this document useful (0 votes)
12 views31 pages

Data Science Through R Lesson-3 Accessing RDBMS For Data Science

Uploaded by

Suman Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views31 pages

Data Science Through R Lesson-3 Accessing RDBMS For Data Science

Uploaded by

Suman Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Data Science Through R

Lesson-3
Accessing RDBMS for Data science

Prof.Dr. A. B. Chowdhury,HOD,CA

Techno India University, West Bengal,India


Reach Me Here::[email protected]

today

Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)


Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 1 / 31
Basic Concepts on R and ODBC
Open Database Connectivity (ODBC) is an open standard application
programming interface (API) that permits application programmers to ac-
cess SQL- based Database Management system(DBMS) such as MySQL2,
PostgresSQL, Microsoft Access, SQL Server, DB2, Oracle and SQLite.
In information technology, a product or system is said to be open when
its workings are accessible to the public that can be modified or improved
by anyone. An application program interface (API) is a set of code that
allows two software programs to communicate with each other effectively.
It defines the correct way for a developer to write a program that requests
services from an operating system (OS) or other application. APIs are
implemented by function calls comprising verbs and nouns. The required
syntax is described in the documentation of the application being called.
SQL is a fourth generation language for querying and managing data in
databases, the acronym being Structured Query Language. It originated on
Windows in the early 1990s, but ODBC driver managers unixODBC and
iODBC are nowadays available on a wide range of platforms (and iODBC
is used by macOS (aka OS X).
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 2 / 31
The ODBC for R
The connection to the particular RDBMS needs an ODBC driver: these
may come with the RDBMS or the ODBC driver manager or be provided
separately by the RDBMS developers, and there are third-party developers
who can also provide the same.
An ODBC driver uses the Open Database Connectivity (ODBC) interface
by Microsoft that allows applications to access data in database manage-
ment systems (RDBMS) using SQL as a standard for accessing the data.
ODBC permits maximum interoperability, which means a single application
can access different RDBMS. Application end users can then add ODBC
database drivers to link the application to their choice of RDBMS.
The ODBC for R is a package named RODBC. The first step for setting
up RODBC/ any other ODBC is to set-up a new DSN. The term DSN
stands for Data Source Name .It is a connection to a specific database. We
state below how to set up this DSN in R.
The ODBC driver managers have ‘User DSNs’ and ‘System DSNs’: these
differ only in where the information is stored, the first on a per-user basis
and the second for all users of the system.
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 3 / 31
Setting up Windows environment
Windows has a GUI to set up DSNs, called ‘Data Sources (ODBC)’ under
‘Administrative Tools’ in the Control Panel. We can add, remove and edit
(‘configure’) DSNs there. When adding a DSN, We first select the ODBC
driver and then complete the driver-specific dialog box. There will usually be an
option to test the DSN and it is wise to do so.Having created the DSN, we use
the steps stated below for connecting to ORACLE.
install.packages(”DBI”)
install.packages(”RODBC”)
library(DBI)
library(RODBC)
The DBI is a package that is contained by an ODBC package . The ODBC uses
the DBI to connect to the database. Finally, we set up the connection in the R
environment as shown below:
Myconn=odbcConnect(”mydsn”,uid=”ABC”,pwd=”mypass”)
STDATA=sqlFetch(myconn,”student”)
EMPDATA=sqlQuery(myconn,”select * from employee”)
close(myconn)
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 4 / 31
Primary R functions for ODBC
For our illustration,we have procured data from student and employee
database tables from our RDBMS
The primary R functions are given in the table below:
Function Description
odbcConnect(dsn, uid=””, pwd=””) Open a connection to an ODBC database
Data-frame=sqlFetch(Mycon ,sqtable) Read a table from an ODBC database into a data frame
sqlQuery(Mycon,“query”,errors=,max=, Submit a query to an ODBC database
rows at a time=) and return the results
sqlSave(Mycon, mydf, tablename = sqtable, Write or update (append=True) a data frame
append = FALSE) to a table in the ODBC database
sqlDrop(channel, sqtable) Remove a table from the ODBC database
close(channel) Close the connection
dbListTables(Mycon,schema=”schema-name”)
sqlTables(Mycon)

We can access any RDBMS from R either to store data from a data frame
into a table or to display data from tables or to store data from table into
csv files. These ideas have been illustrated below:
As most of the online websites and the dashboards make use of the
MySQL/SQL server database, we shall make use of them. First, we shall
consider using MySql.
R provides ‘RMySQL’ package for setting up a connection conveniently.
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 5 / 31
Starting the use of RMYSQL
We need to install this package first and then the corresponding library.
Installation of the package is required to be done only once in a machine;
but the library is to be loaded each time we want to work with the RDBMS.
Moreover, connection to the RDBMS is also required for each session of
work.
The steps have been illustrated below:
The following command in R/Rstudio prompt installs the package:
>install.packages(”RMySQL”)
Next, we need to load the library for each session as illustrated below:
>library(RMySQL)
The following message indicates the successful loading of the library.
Loading required package: DBI
In MYSQL we next create a database with the command:
Create database database name of our choice;
We next create a table and insert data into the table using normal SQL
commands. Let the name of our database is ‘TIU’ and the table created
with data be ‘students’.
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 6 / 31
Setting Up Connection with MYSQL
We next set up a connection with MySQL with the user ‘root’ as stated
below in the R environment assuming that our password to MYSQL is
‘tiger’:
>mydb = dbConnect(MySQL(), user=’root’, password=’tiger’,
dbname=’TIU’, host=’localhost’);
The appearance of the R prompt implies that the connection has been
established. To have a list of tables in the specified database,we issue the
following command:
>dbListTables(mydb)
This shows a list of database tables available as under:
[1] ”students” We may check the list of field names in any of the tables
as shown below:
>dbListFields(mydb, ”students”);
[1] ”ID” ”COURSE” ”CFEES” ”FEES PAID”
We can now use the dbSendQuery() in R to issue any query command and
store the output in a result set object as shown below:
>rs = dbSendQuery(mydb, ”select * from students”);
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 7 / 31
R commands for Query Outputs
Now, if we want to display the results in the monitor, we use the fetch() as
illustrated below:
>data = fetch(rs, n=-1)
Now, we issue R command to show the data as shown below:
>print(data)
ID COURSE CFEES FEES PAID
1 12 BCA 55000 35000
2 22 MCA 55000 45000
Further query can be done as follows:
>re=dbSendQuery(mydb, ”select ID, CFEES-FEES PAID as DUES from
students”);
>dat = fetch(re, n=-1);
>print(dat)
ID DUES
1 12 20000
2 22 10000
Making tables:
We can create tables in the database using R dataframes as shown below:
dbWriteTable(mydb, name=’table name’, value=data.frame.name)
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 8 / 31
Connecting to Microsoft SQL SERVER
Connecting to Microsoft SQL SERVER.
To Connect to SQL Server, we need to know 3 pieces of information
before we start which are:
IP address of SQL Server:
User name to connect to SQL Server
Password for the user to connect
We can ask for this information from the team of DBAs that manage the
servers on which the data is residing. Once we have this information ready,
now we can start setting up our connection to SQL Server from R.
There are many packages that can help us to connect to relational databases
in R. Here, we shall use a package named RODBC. First we need to install
this package.as follows:
>install.packages(”RODBC”)
The following message will be displayed after a successful installation:
package ‘RODBC’ successfully unpacked and MD5 sums checked.

Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)


Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 9 / 31
RODBC for Connecting to relational database
Our next task is to load the library corresponding to “RODBC”. This is
done as illustrated below:
>library(RODBC)
This ODBC driver for connecting to relational database can be used to
connect to basically any data source that supports ODBC connections. The
next task is then to create an ODBC Data Source Name(DSN) that
points to the SQL Server hosting the data.
Use of DSN in R
Now, we just have to use this data source (DSN in R) to connect to it and
run a simple query.
For this purpose, we run the following command in R. This command
will create a connection object pointing to the DSN that we created ear-
lier. We need to use the same name abc that we used for the DSN.
>conn=odbcConnect(”abc”)
In the next step, we will use the connection object created in the previous
step and get data from SQL server.

Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)


Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 10 / 31
sqlQuery with RODBC
In the second step, we will use the connection object created in the previous
step and get data from SQL server. We run query shown below to get the
list of databases from the server.
>sqlQuery(conn,”select name from master.sys.sysdatabases where
dbid >4”)
The sample output here is:
name
abc
The data source against which this query needs to be run is conn which we
had created in the first step. So basically this R command connects to the
data source (DSN) object named conn and runs a query.
sqldf package
For writing SQL queries, we can also use sqldf package. It is one of the
most useful packages available which can activate SQL in R. It uses SQLite
(default) as the underlying database and is often faster than performing the
same manipulations in base R. Besides SQLite, it also supports H2 Java
database, PostgreSQL database, and MySQL.
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 11 / 31
Using SQL in R
When using SQL in R, We can think of R as the database storage machine. The
process is simple. We load the data set either using read.csv or read.csv.sql and
start querying data using the steps shown below:
>install.packages(”sqldf”)
>library(sqldf)
>EMP=data.frame(ID=c(1,2,3),SALARY=c(25000,30000,35000))
>write.csv(EMP,file=’empdata’)
read.csv.sql(’empdata’,sql = ”select * from file”,dbname = tempfile(), drv
= ”SQLite”)
>y=read.csv(’empdata’)
>sqldf(’select ID, SALARY from y’)
We shall now illustrate practices by using “babynames” data set readily available
and loadable as stated below:
>install.packages(”babynames”)
>library(babynames)
To see the structure of the ’babynames’ database table, we can write:
>str(babynames)
The result that R shows is as shown below.

Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)


Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 12 / 31
The babynames Dataset
Classes ‘tbl df’, ‘tbl’ and ’data.frame’: 1924665 obs. of 5 variables:
$ year: num 1880 1880 1880 1880 1880 1880 1880 1880 1880 1880 ...
$ sex : chr ”F” ”F” ”F” ”F” ...
$ name: chr ”Mary” ”Anna” ”Emma” ”Elizabeth” ...
$ n : int 7065 2604 2003 1939 1746 1578 1472 1414 1320 1288 ...
$ prop: num 0.0724 0.0267 0.0205 0.0199 0.0179 ...$
If we like to see the records in ’baby names’, we can issue the command:
>babynames
A tibble: 1,924,665 x 5
year sex name n prop
1 1880 F Mary 7065 0.0724
2 1880 F Anna 2604 0.0267
3 1880 F Emma 2003 0.0205
4 1880 F Elizabeth 1939 0.0199
5 1880 F Minnie 1746 0.0179
6 1880 F Margaret 1578 0.0162
7 1880 F Ida 1472 0.0151
8 1880 F Alice 1414 0.0145
9 1880 F Bertha 1320 0.0135
10 1880 F Sarah 1288 0.0132
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 13 / 31
Queries with babynames and DBI
>sqldf(’select count(*) from babynames’)
>sqldf(”Select count(*) from babynames where sex=’F’”)
>sqldf(”Select count(*) from babynames where sex=’M’”)
>sqldf(”select * from babynames where year >= 1980 and prop <0.5”)
Working with SQL directly using the DBI package
Before we can query a database for information, we have to connect to it. In the
chunk below, we connect to a temporary, in-memory database. library(DBI)
con = dbConnect(RSQLite::SQLite(), dbname = ”:memory:”)
The above command creates an empty database with no tables. We can confirm
this with dbListTables() as used below:
dbListTables(con)
character(0)
Before we can do any useful querying, we need to load data into our database. Let
us load a data.frame into a table:
dbWriteTable(con, ”mtcars”, mtcars)
dbListTables(con)
## [1] ”mtcars”
We can immediately inspect what we just loaded like this:
dbListFields(con, ”mtcars”)
## [1] ”mpg” ”cyl” ”disp” ”hp” ”drat” ”wt” ”qsec” ”vs” ”am” ”gear””carb”
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 14 / 31
Querying in ”mtcars”
dbReadTable(con, ”mtcars”)
Often times, when working with database, we won’t want to see all of the
rows because there are just too many to store in memory. In fact, that’s
often the point of using the database in the first place.
The DBI package lets us iterate through the rows in groups of our choosing
so that memory usage stays low:
res = dbSendQuery(con, ”SELECT * FROM mtcars WHERE cyl =
4”)
while (!dbHasCompleted(res)) {
chunk = dbFetch(res, n = 5)
print(nrow(chunk)) }
dbClearResult(res)
Notice the output: 5, 5, 1. Since we specified n = 5 in dbFetch, we
get new rows in chunks of 5.
Let us now close its connection:
dbDisconnect(con)

Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)


Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 15 / 31
Querying in SQLite database
Let us now load the starwars dataset from the dplyr package into an in-
memory SQLite database and execute a query against the table, e.g., listing
just the Droids
library(DBI)
library(dplyr)
con = dbConnect(RSQLite::SQLite(), dbname = ”:memory:”)
dbWriteTable(con, ”starwars”, select(starwars, -films, -vehicles, -
starships))
’dplyr’ is an amazing tool for analyzing data. But when we use it we
usually use it against a data.frame (or something similar). ’dplyr’ also
supports working directly with database tables as if they were regular old
data.frames.
All we have to do is use the tbl instead of, for example, our read.csv
function call:
library(dplyr)
library(dbplyr)
library(DBI)
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 16 / 31
Querying in SQLite database–Contd.
con = dbConnect(RSQLite::SQLite(), dbname = ”:memory:”)
dbWriteTable(con, ”mtcars”, mtcars)
# Now the real magic can be seen
mtcars db = tbl(con, ”mtcars”)
mtcars db
mtcars db %>% select(mpg) %>% show query()
mpg query = mtcars db %>% group by(cyl) %>%
summarize(mean(mpg))
mpg query %>% show query()
# SQL # SELECT ‘cyl‘, AVG(‘mpg‘) AS ‘mean(mpg)‘ class(mpg query)
mpg query %>% collect() %>% class()
Thus, dbplyr automatically converts our dplyr code into SQL queries behind
the scenes and we get to use the usual dplyr functionality.
Working with Microsoft Access databases in R is notoriously tricky. The
only way to make it work is to be running 32-bit Windows. The basic
instructions are shown below:
install.packages(’RODBC’)
library(RODBC)
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 17 / 31
Working with Microsoft Access databases in R
db = odbcConnectAccess(”msAccessDB.db”)
And then we can send queries with odbcQuery()
We can use dplyr against an SQL database for intuitive querying that is fast Solu-
tions as under:
library(DBI)
library(dplyr)
con = dbConnect(RSQLite::SQLite(), dbname = ”:memory:”)
dbWriteTable(con, ”starwars”, select(starwars, -films, -vehicles, -starships))
dbGetQuery(con, ”SELECT * FROM starwars WHERE species = ’Droid’”)
dbDisconnect(con)

Python and Database Programming


The Python programming language possesses powerful features for database pro-
gramming. Python supports various DBMSs like MySQL, Oracle, Sybase, Post-
greSQL, etc. Python supports DDL as well as DML of SQL. For database pro-
gramming, the Python DB API is a widely used module that provides a database
Application Programming Interface(API).

Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)


Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 18 / 31
Accessing Relational Database using Python
Benefits of Python for database programming
Programming in Python is arguably more efficient and faster than many
other programming languages.
Python is famous for its portability as well as platform independence.
Python supports relational database management systems along with SQL
cursors.
Python database APIs are compatible with various databases, so it is very
easy to migrate and port database application interfaces.
Python takes care of the issues of exceptions and errors with open and
closed connections of the database.
PyMySQL and Installation
PyMySQL implements the Python Database API 2.0.Here, we will use it to connect to a MySQL database server from Python.
The pre-requisites of installing PyMySQL are:

1 Any of the following Python variants:


I CPython>=2.6 or >=3.3
I PyPy>=4.0
I IronPython 2.7

2 Any of the following MySQL variants:


I MySQL>=4.
I MariaDB>=5.1

Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)


Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 19 / 31
Setting the Python Environment for SQL Query
We run the following command in the command prompt/Anaconda Power Shell
Prompt:
pip install PyMySQL
Now, whenever we would like to connect MySQL, we shall need issue the following
command at the Python Shell:
import pymysql
The above command loads the PyMySQL library.
To establish a connection between Python and MySQL, we write a statement like
the one as shown below:
mydb = pymysql.connect(
host=”localhost”,
user=”root”,
passwd=”mypass”
)
Here, mydb is any python object that holds the connection.
We next use the mydb and the system function cursor() to define a cursor object
as under:
mycursor = mydb.cursor()
Now, we can run any MySQL command at the Python Shell Prompt by writing the
command within quotes as illustrated below:
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 20 / 31
Use of MySQL commands at the Python Prompt
Having created a database, our next task is to use it so that we can create
tables required under the database.This is done as under for our presently
created database:
>>>mycursor.execute(”USE MYDATABASE;”)
Now, we can create tables, insert data into the tables, retrieve any required
data or update the data in the tables of the database by issuing commands
similarly at the Python Shell prompt as illustrated below:

>>> stmnt=”CREATE TABLE Students(StudentID int PRIMARY


KEY AUTO INCREMENT, Name CHAR(20),
course CHAR(20),course fee int,fees paid int,Admit date date);”
>>>mycursor.execute(stmnt)
>>>stmnt1 = ”INSERT INTO students(Name,course,course fee,
fees paid,Admit date) VALUES(’Amitabha Ghosh’, ’BCA’, 55000,
45000,CURDATE());”
>>>mycursor.execute(stmnt1)

Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)


Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 21 / 31
A complete illustrative python script
import pymysql
mydb = pymysql.connect(
host=”localhost”,
user=”root”,
passwd=”mypass”
)
mycursor = mydb.cursor()
mycursor.execute(”use D21;”)
stmnt1 = ”INSERT INTO students(Name,course,cfee,fees paid,Admit date)
VALUES(’Amitabha Ghosh’, ’BCA’, 55000,45000,CURDATE());”
mycursor.execute(stmnt1)
query=”update students set cfee=cfee*1.1;”
try:
mycursor.execute(query)
mydb.commit()
except:
mydb.rollback()
try:
mycursor.execute(”Select course,cfee from students;”)
resultset=mycursor.fetchall() #To fetch all records that satisfy
for record in resultset:
fees=record[1]
cs=record[0]
print(”Course:”,cs)
print(”Course-Fee:”,fees)
except:
print(”Sorry, we encountered a problem”)
mydb.close()

The package is sqlalchemy provides full SQL language functionality that can be utilised in python in a very simple manner.This

can be done with the pandas library that includes functions to convert dataframes into sql tables,perform queries in such tables

and a sql module for input/output operations in such tables.


Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 22 / 31
Accessing Relational Databases using SQLAlchemy
and pandas
The SQLAlchemy library can be connected to a variety of relational sources includ-
ing Sqlite, MySql, Oracle and Postgresql and Mssql.The steps are straightforward
as pointed out below:
Installation of SQLAlchemy is to be done in the Anaconda Prompt Window
by using the command:
conda install sqlalchemy
A database engine is to be created by using create engine() method of
sqlalchemy
The create engine() method connects to the desired relational DBMS
source
The dataframe is then connected to the database engine by using the
to sql() function of the SQLAlchemy library with a desired name of a
relational table.
The read sql query() function of the pandas can then be used to perform
queries in the named table by using the database engine.
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 23 / 31
Use of Sqlite3
We use here the Sqlite3 as our relational database because of its very light
weight and ease of use to illustrate the use of sql commands in a python
environment.
We first illustrate below how to create a CSV file in Python.The CSV file
will then be used for conversion into a pandas dataframe.Such a dataframe
can be converted into a relational table easily by using a database engine
generated by create engine module of SQLAlchemy. Let us now focus on
the creation of the csv file in the following Python Script created in Spider
of Anaconda.
# -*- coding: utf-8 -*-
”””
Created on Thu Jan 20 16:21:00 2022
@author: Dr. ABC
”””
import csv
# field names
fields = [’EMP ID’, ’ENAME’, ’DEPTT NO’, ’DESIGNATION’,’SALARY’]
# data rows of csv file
rows = [ [’E1’, ’Janardhan’, ’D1’, ’COE’,95000],
[’E2’, ’Janathan’, ’D2’, ’GM’,85000],
[’E3’, ’Ayesha’, ’D1’, ’Accountant’,65000],
[’E4’, ’ANIRBAN’, ’D2’, ’IT HEAD’,80000],
[’E5’, ’ANITA’, ’D3’, ’Officer’,60000],
[’E6’, ’Rabin’, ’D3’, ’PRO’,50000]]
# name of csv file
filename = ”MYRIADS FAMILY.csv”
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.) Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 24 / 31
Use of Sqlite3–Continued
# writing to csv file
with open(filename, ’w’) as csvfile:
# creating a csv writer object
csvwriter = csv.writer(csvfile)
# writing the fields
csvwriter.writerow(fields)
# writing the data rows
csvwriter.writerows(rows)

The following Python Script illustrates how the csv file created above can be read into a pandas dataframe.It also includes the
statement for creating the database engine of Sqlite3 which is then used to convert the dataframe into a relational table for
performing desired operations using SQL.
@author: Dr. ABC
from sqlalchemy import create engine
import pandas as pd
data = pd.read csv(’c:/users/Anil Bikash/MYRIADS FAMILY.csv’)
engine = create engine(’sqlite:///:memory:’)
data.to sql(’EMPLOYEE’, engine)
The following statement shows how read sql query() method of pandas can be used to issue an SQL command with the help of
the created database engine.
data list1 = pd.read sql query(’SELECT * FROM EMPLOYEE’, engine)
print(’Result of Query-1’)
print(data list1)
print(”)
The output generated by the preceding four statements is as shown below:
Result of Query-1
index EMP ID ENAME DEPTT NO DESIGNATION SALARY
0 0 E1 Janardhan D1 COE 95000
1 1 E2 Janathan D2 GM 85000
2 2 E3 Ayesha D1 Accountant 65000
3 3 E4 ANIRBAN D2 IT HEAD 80000
4 4 E5 ANITA D3 Officer 60000
5 5 E6 Rabin D3 PRO 50000
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 25 / 31
SQL Queries Contd.
We now attempt a more complex SQL statement using the group by option
as under:
data list2 = pd.read sql query(’SELECT DEPTT NO,sum(SALARY) As DEPTT WISE TOTAL SALARY FROM EMPLOYEE
group by DEPTT NO’, engine)
print(’Result of Query-2’)
print(data list2)
print(’ ’)

The output generated by the preceding query statement is shown below.


Result of Query-2
DEPTT NO DEPTT WISE TOTAL SALARY
0 D1 160000
1 D2 165000
2 D3 110000
We next attempt to insert a row in the created relational table.This requires
the importing of SQL module from the pandas.io library.Here, one addi-
tional column value with index is required to be included as shown below.
from pandas.io import sql
sql.execute(’INSERT INTO EMPLOYEE VALUES(?,?,?,?,?,?)’, en-
gine, params=[(7,’E7’,’Prabin’,’D1’,’Operator’,25000)])
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 26 / 31
SQL Queries Contd.
Now, we like to check whether the insertion of a new row has commenced.So,
we list all the rows of our table as stated below.
Data List3 = pd.read sql query(’SELECT * FROM EMPLOYEE’, en-
gine)
print(’Result of Query-3’)
print(Data List3)
The preceding query statement gives us the following result as anticipated:
Result of Query-3
index EMP ID ENAME DEPTT NO DESIGNATION SALARY
0 0 E1 Janardhan D1 COE 95000
1 1 E2 Janathan D2 GM 85000
2 2 E3 Ayesha D1 Accountant 65000
3 3 E4 ANIRBAN D2 IT HEAD 80000
4 4 E5 ANITA D3 Officer 60000
5 5 E6 Rabin D3 PRO 50000
6 7 E7 Prabin D1 Operator 25000

We next try to perform some deletion operation with SQL as stated below
with specified parameters.
sql.execute(’Delete from EMPLOYEE where EMP ID= (?) ’, en-
gine,params=[(’E6’)])
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 27 / 31
SQL Queries Contd.
We next check the status of the table EMPLOYEE after the execution of
the DELETE statement above.
data list4= pd.read sql query(’SELECT * FROM EMPLOYEE’, engine)
print(’Result of Query-4’)
print(data list4)
print(’ ’)

This gives us the following output:


Result of Query-4
index EMP ID ENAME DEPTT NO DESIGNATION SALARY
0 0 E1 Janardhan D1 COE 95000
1 1 E2 Janathan D2 GM 85000
2 2 E3 Ayesha D1 Accountant 65000
3 3 E4 ANIRBAN D2 IT HEAD 80000
4 4 E5 ANITA D3 Officer 60000
5 7 E7 Prabin D1 Operator 25000

Finally, we try to execute an UPDATE operation on the table using the


UPDATE command of SQL and the result after the updation is shown
below.
sql.execute(’update EMPLOYEE set SALARY=SALARY*1.25’,engine)
data list5= pd.read sql query(’SELECT * FROM EMPLOYEE’, en-
gine)
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 28 / 31
SQL Queries Contd.
print(’Result of Query-5’)
print(data list5)
print(’ ’)
Result of Query-5
index EMP ID ENAME DEPTT NO DESIGNATION SALARY
0 0 E1 Janardhan D1 COE 118750
1 1 E2 Janathan D2 GM 106250
2 2 E3 Ayesha D1 Accountant 81250
3 3 E4 ANIRBAN D2 IT HEAD 100000
4 4 E5 ANITA D3 Officer 75000
5 7 E7 Prabin D1 Operator 31250

We next aim to create another data table using Python.This time we start
with a dictionary which is then converted into a pandas dataframe. The
dataframe is then transformed into a relational data table as shown above.
The statements used have been listed below in order of execution.
DEPART={’Dptno’:[’D1’,’D2’,’D3’,’D4’,’D5’],’PROJECT’:[’Sales’,’Marketing’,’Purchase’,’Audit’,
’Production’],’BUDGET’:[5000000,7500000,6500000,4500000,8500000]}
df = pd.DataFrame(DEPART)
df.to sql(’DEPTT’, engine)
We next list the data in the table as shown below:
data list6 = pd.read sql query(’SELECT * FROM DEPTT’, engine)
print(’Result of Query-6’)
print(data list6)
print(’ ’)
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 29 / 31
SQL Queries Contd.
index Dptno PROJECT BUDGET
0 0 D1 Sales 5000000
1 1 D2 Marketing 7500000
2 2 D3 Purchase 6500000
3 3 D4 Audit 4500000
4 4 D5 Production 8500000

Now, we perform INNER JOIN of EMPLOYEE and DEPTT table to


list data from both the table as shown below.
data list7=pd.read sql query(’SELECT EMP ID, ENAME,PROJECT from EMPLOYEE INNER JOIN DEPTT on
DEPTT NO=Dptno’,engine)
print(data list7)
EMP ID ENAME PROJECT
0 E1 Janardhan Sales
1 E2 Janathan Marketing
2 E3 Ayesha Sales
3 E4 ANIRBAN Marketing
4 E5 ANITA Purchase
5 E7 Prabin Sales

We next define another relational data table as done earlier with a view
to perform some non-equi Join. We first show below the creation of the
SALGRADE table and then the listing of the tuples in the table.
SALGRADE={’MINSAL’:[25000,50000,100000],’MAXSAL’:[50000,75000,150000],’GRADE’:[’C’,’B’,’A’]}
saldf=pd.DataFrame(SALGRADE)
saldf.to sql(’SALGRADE’,engine)
data list8 = pd.read sql query(’SELECT * FROM SALGRADE’, engine)
print(’Result of Query-8’)
print(data list8);print(’ ’)
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 30 / 31
SQL Queries Contd.

index MINSAL MAXSAL GRADE


0 0 25000 50000 C
1 1 50000 75000 B
2 2 100000 150000 A

We next show below how a non-equi join can be performed with the EMPLOYEE and the SALGRADE table.
data list8 = pd.read sql query(’SELECT EMP ID,ENAME,SALARY,GRADE from EMPLOYEE inner join SALGRADE on
SALARY>=MINSAL and SALARY<=MAXSAL’, engine)
print(’Result of Query-8’)
print(data list8)
print(’ ’)
Result of Query-8

EMP ID ENAME SALARY GRADE


0 E1 Janardhan 118750 A
1 E2 Janathan 106250 A
2 E4 ANIRBAN 100000 A
3 E5 ANITA 75000 B
4 E7 Prabin 31250 C

Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)


Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 31 / 31

You might also like