Data Science Through R Lesson-3 Accessing RDBMS For Data Science
Data Science Through R Lesson-3 Accessing RDBMS For Data Science
Lesson-3
Accessing RDBMS for Data science
Prof.Dr. A. B. Chowdhury,HOD,CA
today
We can access any RDBMS from R either to store data from a data frame
into a table or to display data from tables or to store data from table into
csv files. These ideas have been illustrated below:
As most of the online websites and the dashboards make use of the
MySQL/SQL server database, we shall make use of them. First, we shall
consider using MySql.
R provides ‘RMySQL’ package for setting up a connection conveniently.
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 5 / 31
Starting the use of RMYSQL
We need to install this package first and then the corresponding library.
Installation of the package is required to be done only once in a machine;
but the library is to be loaded each time we want to work with the RDBMS.
Moreover, connection to the RDBMS is also required for each session of
work.
The steps have been illustrated below:
The following command in R/Rstudio prompt installs the package:
>install.packages(”RMySQL”)
Next, we need to load the library for each session as illustrated below:
>library(RMySQL)
The following message indicates the successful loading of the library.
Loading required package: DBI
In MYSQL we next create a database with the command:
Create database database name of our choice;
We next create a table and insert data into the table using normal SQL
commands. Let the name of our database is ‘TIU’ and the table created
with data be ‘students’.
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 6 / 31
Setting Up Connection with MYSQL
We next set up a connection with MySQL with the user ‘root’ as stated
below in the R environment assuming that our password to MYSQL is
‘tiger’:
>mydb = dbConnect(MySQL(), user=’root’, password=’tiger’,
dbname=’TIU’, host=’localhost’);
The appearance of the R prompt implies that the connection has been
established. To have a list of tables in the specified database,we issue the
following command:
>dbListTables(mydb)
This shows a list of database tables available as under:
[1] ”students” We may check the list of field names in any of the tables
as shown below:
>dbListFields(mydb, ”students”);
[1] ”ID” ”COURSE” ”CFEES” ”FEES PAID”
We can now use the dbSendQuery() in R to issue any query command and
store the output in a result set object as shown below:
>rs = dbSendQuery(mydb, ”select * from students”);
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 7 / 31
R commands for Query Outputs
Now, if we want to display the results in the monitor, we use the fetch() as
illustrated below:
>data = fetch(rs, n=-1)
Now, we issue R command to show the data as shown below:
>print(data)
ID COURSE CFEES FEES PAID
1 12 BCA 55000 35000
2 22 MCA 55000 45000
Further query can be done as follows:
>re=dbSendQuery(mydb, ”select ID, CFEES-FEES PAID as DUES from
students”);
>dat = fetch(re, n=-1);
>print(dat)
ID DUES
1 12 20000
2 22 10000
Making tables:
We can create tables in the database using R dataframes as shown below:
dbWriteTable(mydb, name=’table name’, value=data.frame.name)
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 8 / 31
Connecting to Microsoft SQL SERVER
Connecting to Microsoft SQL SERVER.
To Connect to SQL Server, we need to know 3 pieces of information
before we start which are:
IP address of SQL Server:
User name to connect to SQL Server
Password for the user to connect
We can ask for this information from the team of DBAs that manage the
servers on which the data is residing. Once we have this information ready,
now we can start setting up our connection to SQL Server from R.
There are many packages that can help us to connect to relational databases
in R. Here, we shall use a package named RODBC. First we need to install
this package.as follows:
>install.packages(”RODBC”)
The following message will be displayed after a successful installation:
package ‘RODBC’ successfully unpacked and MD5 sums checked.
The package is sqlalchemy provides full SQL language functionality that can be utilised in python in a very simple manner.This
can be done with the pandas library that includes functions to convert dataframes into sql tables,perform queries in such tables
The following Python Script illustrates how the csv file created above can be read into a pandas dataframe.It also includes the
statement for creating the database engine of Sqlite3 which is then used to convert the dataframe into a relational table for
performing desired operations using SQL.
@author: Dr. ABC
from sqlalchemy import create engine
import pandas as pd
data = pd.read csv(’c:/users/Anil Bikash/MYRIADS FAMILY.csv’)
engine = create engine(’sqlite:///:memory:’)
data.to sql(’EMPLOYEE’, engine)
The following statement shows how read sql query() method of pandas can be used to issue an SQL command with the help of
the created database engine.
data list1 = pd.read sql query(’SELECT * FROM EMPLOYEE’, engine)
print(’Result of Query-1’)
print(data list1)
print(”)
The output generated by the preceding four statements is as shown below:
Result of Query-1
index EMP ID ENAME DEPTT NO DESIGNATION SALARY
0 0 E1 Janardhan D1 COE 95000
1 1 E2 Janathan D2 GM 85000
2 2 E3 Ayesha D1 Accountant 65000
3 3 E4 ANIRBAN D2 IT HEAD 80000
4 4 E5 ANITA D3 Officer 60000
5 5 E6 Rabin D3 PRO 50000
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 25 / 31
SQL Queries Contd.
We now attempt a more complex SQL statement using the group by option
as under:
data list2 = pd.read sql query(’SELECT DEPTT NO,sum(SALARY) As DEPTT WISE TOTAL SALARY FROM EMPLOYEE
group by DEPTT NO’, engine)
print(’Result of Query-2’)
print(data list2)
print(’ ’)
We next try to perform some deletion operation with SQL as stated below
with specified parameters.
sql.execute(’Delete from EMPLOYEE where EMP ID= (?) ’, en-
gine,params=[(’E6’)])
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 27 / 31
SQL Queries Contd.
We next check the status of the table EMPLOYEE after the execution of
the DELETE statement above.
data list4= pd.read sql query(’SELECT * FROM EMPLOYEE’, engine)
print(’Result of Query-4’)
print(data list4)
print(’ ’)
We next aim to create another data table using Python.This time we start
with a dictionary which is then converted into a pandas dataframe. The
dataframe is then transformed into a relational data table as shown above.
The statements used have been listed below in order of execution.
DEPART={’Dptno’:[’D1’,’D2’,’D3’,’D4’,’D5’],’PROJECT’:[’Sales’,’Marketing’,’Purchase’,’Audit’,
’Production’],’BUDGET’:[5000000,7500000,6500000,4500000,8500000]}
df = pd.DataFrame(DEPART)
df.to sql(’DEPTT’, engine)
We next list the data in the table as shown below:
data list6 = pd.read sql query(’SELECT * FROM DEPTT’, engine)
print(’Result of Query-6’)
print(data list6)
print(’ ’)
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 29 / 31
SQL Queries Contd.
index Dptno PROJECT BUDGET
0 0 D1 Sales 5000000
1 1 D2 Marketing 7500000
2 2 D3 Purchase 6500000
3 3 D4 Audit 4500000
4 4 D5 Production 8500000
We next define another relational data table as done earlier with a view
to perform some non-equi Join. We first show below the creation of the
SALGRADE table and then the listing of the tuples in the table.
SALGRADE={’MINSAL’:[25000,50000,100000],’MAXSAL’:[50000,75000,150000],’GRADE’:[’C’,’B’,’A’]}
saldf=pd.DataFrame(SALGRADE)
saldf.to sql(’SALGRADE’,engine)
data list8 = pd.read sql query(’SELECT * FROM SALGRADE’, engine)
print(’Result of Query-8’)
print(data list8);print(’ ’)
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-3Accessing RDBMS for Data sciencetoday 30 / 31
SQL Queries Contd.
We next show below how a non-equi join can be performed with the EMPLOYEE and the SALGRADE table.
data list8 = pd.read sql query(’SELECT EMP ID,ENAME,SALARY,GRADE from EMPLOYEE inner join SALGRADE on
SALARY>=MINSAL and SALARY<=MAXSAL’, engine)
print(’Result of Query-8’)
print(data list8)
print(’ ’)
Result of Query-8