Lab No.9

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 17

DEPARTMENT OF ELECTRICAL ENGINEERING

Faculty Member: ____________________ Date: ________________

Semester:_____________ Section: ________________

DS-401: Introduction to Data Science

Lab 10: Databases. MySQL-Python Integration

Name Reg. No Lab Lab


Evaluation Report
(5 Marks) (15 Marks)
Objectives

In this lab, we are introduced to more advanced SQL (functions, joins, subqueries) that are

commonly used when querying databases. Then, we will connect to the database and run

queries using Python.

SQL. Functions

SQL functions are used to perform calculations on data. Functions can be aggregate (using

multiple values to calculate a result) or scalar (applied on a single value).

SQL Aggregate Functions

SQL aggregate functions return a single value calculated from multiple values in a column.

 AVG() - returns the average value

 COUNT() - returns the number of rows

 FIRST() - returns the first value

 LAST() - returns the last value

 MAX() - returns the largest value

 MIN() - returns the smallest value

 SUM() - returns the sum of values


-- AVG() --

SELECT AVG(column_name) FROM TABLE_NAME

-- COUNT() --

SELECT COUNT(column_name) FROM TABLE_NAME;

SELECT COUNT(*) FROM TABLE_NAME;

SELECT COUNT(DISTINCT column_name) FROM TABLE_NAME;

-- In MySQL, FIRST() is replaced by: --

SELECT column_name FROM TABLE_NAME

ORDER BY column_name ASC

LIMIT 1;

-- In MySQL, LAST() is replaced by: --

SELECT column_name FROM TABLE_NAME

ORDER BY column_name DESC

LIMIT 1;

-- MAX() --

SELECT MAX(column_name) FROM TABLE_NAME;

-- MIN() --

SELECT MIN(column_name) FROM TABLE_NAME;

-- SUM() --

SELECT SUM(column_name) FROM TABLE_NAME;


The (SELECT) DISTINCT statement is used to return only distinct values. For example, the

following query returns the distinct positions in the employee table, effectively listing all

positions (ids) that are assigned to employees in a simple way.

SELECT DISTINCT position_id FROM employee;

The GROUP BY statement is used in conjunction with the aggregate functions to group the

result-set by one or more columns. For example, the following query returns the number of

employees for each department. The aggregate function is COUNT(), which counts the

employees within each department.

-- Get the number of employees for each department --

SELECT department_id, COUNT(id) AS employee_count FROM employee

GROUP BY department_id;

GROUP BY defines how the aggregate function is applied. What happens if you remove the

GROUP BY statement? What happens if you use GROUP BY with the employee id instead?

The HAVING clause is used instead of WHERE when using aggregate functions. For

example, the following query returns the number of employees for each department having

more than 5 employees.

-- Get the number of employees for each department having more than 5

employees --

SELECT department_id, COUNT(id) AS employee_count FROM employee

GROUP BY department_id

HAVING COUNT(id) > 5;


SQL Scalar Functions

SQL scalar functions return a single value based on the input value.

 UCASE() - converts a field to uppercase

 LCASE() - converts a field to lowercase

 MID() - extracts characters from a text field

 LEN() - returns the length of a text field

 ROUND() - rounds a numeric field to the number of decimals

specified

-- UCASE() --

SELECT UCASE(column_name) FROM TABLE_NAME;

-- LCASE() --

SELECT LCASE(column_name) FROM TABLE_NAME;

-- MID() --

SELECT MID(column_name, START, LENGTH) AS some_name FROM TABLE_NAME;

-- LEN() --

SELECT LEN(column_name) FROM TABLE_NAME;

-- ROUND() --

SELECT ROUND(column_name, decimals) FROM TABLE_NAME;

 
SQL. Querying multiple tables

Relational databases are usually structured to avoid duplication of data. For example, a

department may be associated to a few employees by using a one-to-many relationship

defined by a FOREIGN KEY constraint. While simple queries can be used to retrieve data

from each table, it is often required to combine information from multiple tables, such as

selecting a list of employees with information about the department they belong to, filtering

data using the WHERE clause with information from other related tables and more complex

aggregate functions. For such complex queries, there are two main SQL clauses: using

JOINs or subqueries.

Using JOINS can be more efficient than running multiple queries (and then combining the

data in the application). The DBMS handles the optimal retrieval of data based on indexing

and performs the data transformations internally. This is also useful when requesting data

from a remote location as it requires less data to be transferred between the DBMS and the

application server.
JOINs

JOINs are used to retrieve data from two or more tables based on the relationships between

them. The ON clause is used to match records between tables based on the column names.

Consider the following example to list all employees with the name of their position. A way

to write this in SQL is by using the WHERE clause for matching the position_id FOREIGN KEY

with the id PRIMARY KEY in the one-to-many relationship between employee and position.

This method is the same as using INNER JOIN.

-- List the employees name, and the name of their position --

SELECT emp.name AS employee, pos.name AS POSITION

FROM employee AS emp, POSITION AS pos

WHERE pos.id = emp.position_id;

JOINs have better performance compared to subqueries, although might be more difficult to

write for complex queries.

(INNER) JOIN

INNER JOIN is used to return ONLY the matching rows from both tables based on the

matching condition:

-- List the employees name and the name of their department --

SELECT emp.name AS employee, dept.name AS department FROM employee AS emp


INNER JOIN department AS dept

ON dept.id = emp.department_id;

Multiple related tables can be joined according to the database schema. For example,

consider we want to list all employees along with the name of the department they belong

to and their position. In the company database (example), the query requires an INNER

JOIN between 3 tables: employee (get the employee names), department (get the name of

the department), position (get the name of the position).

-- List the employees name, the name of their department and position --

SELECT emp.name AS employee, dept.name AS department, pos.name AS POSITION

FROM employee AS emp

INNER JOIN department AS dept

ON dept.id = emp.department_id

INNER JOIN POSITION AS pos

ON pos.id = emp.position_id;

LEFT (OUTER) JOIN

LEFT JOIN is used to return rows from both tables based on the matching condition. The

difference from INNER JOIN is that ALL the rows from the first table will be returned and if
for some rows there is no matching record on the second table, the result for that column(s)

will be NULL. For example, consider the following query to list all employees with the name

of their manager. When using INNER JOIN, the general manager will be missing from the

list. With LEFT JOIN, the general manager will be listed, with the name of his manager

(does not exist) set to NULL.

-- List the employees name and the name of their manager --

SELECT emp.name AS employee_name, manager.name AS manager_name FROM employee

AS emp

LEFT JOIN employee AS manager ON emp.manager_id = manager.id

ORDER BY manager_name ASC;

RIGHT (OUTER) JOIN

RIGHT JOIN is used to return rows from both tables based on the matching condition. The

difference from INNER JOIN is that ALL the rows from the second table will be returned and

if for some rows there is no matching record on the first table, the result for that column(s)

will be NULL.
FULL (OUTER) JOIN

FULL JOIN combines LEFT JOIN and RIGHT JOIN. Where rows in the tables do not match,

the result set will have NULL values for every column of the table that lacks a matching row.

CROSS JOIN

CROSS JOIN is a simplest form of JOIN which matches each row from one table to all rows

of another table (like cartesian product in set theory). The result of a CROSS JOIN can be

filtered by using a WHERE clause which may then produce the equivalent of an INNER JOIN.

-- List all the employees names, and all positions --

SELECT emp.name AS employee, pos.name AS POSITION

FROM employee AS emp

CROSS JOIN POSITION AS pos

-- List all the employees names with their positions using CROSS JOIN and

WHERE --

SELECT emp.name AS employee, pos.name AS POSITION

FROM employee AS emp

CROSS JOIN POSITION AS pos

WHERE emp.position_id = pos.id


Python database connection. Data structures.

Databases can be accessed from other applications as well. A MySQL driver: MySQL-

connector-python is required to access the MySQL Server using Python. With the MySQL

driver installed in the Python environment, the main steps in working with SQL from Python

are as follows:

import mysql.connector

# Connect to the database server using the provided credentials

# Note: user="root" for admin account

mydb = mysql.connector.connect(

host="localhost",

port=3306,

user="ewis_student",

passwd="ewis2020",

database="company"

# Open a cursor

mycursor = mydb.cursor(dictionary=True)

# Execute SQL query

mycursor.execute("SELECT * FROM employee")

# Get results from the query using the cursor

myresult = mycursor.fetchall()

# Close the cursor


mycursor.close()

The database can be located on the same machine (use localhost), but it can also be on a

remote machine, in the same network or in the cloud (use IP address/URL). The port is set

to 3306 by default, and this corresponds to the port that the database server is running on.

The (database) cursor is a control structure that enables traversal, retrieval, addition, and

removal of database records in a sequential way. The following steps describe using cursors

for database access:

 Declare a cursor that defines a result set.

 Open the cursor to establish the result set.

 Fetch the data from the cursor into local variables, one row at a time.

 Close the cursor when done.

The data is retrieved by default in the form of tuples (like lists, with the difference that

tuples are immutable). Take for example the result of a query, returning a list of employees

in the company database:

(1, 1, 1, None, 'Big Hoss', Decimal('10000.00'), datetime.datetime(2018, 10,

18, 0, 0)),

(20, 2, 4, 3, 'Captain America', Decimal('10000.00'), datetime.datetime(2019,

04, 26, 0, 0))

]
Using the dictionary=True argument when creating the cursor, the data can be retrieved as

dictionary list such as:

'id':1,

'department_id':1,

'position_id':1,

'manager_id':None,

'name':'Big Hoss',

'salary':Decimal('10000.00'),

'hire_date':datetime.datetime(2018,10,18,0,0)

}, {

'id':20,

'department_id':2,

'position_id':4,

'manager_id':3,

'name':'Captain America',

'salary':Decimal('10000.00'),

'hire_date':datetime.datetime(2019,04,26,0,0)

While the dictionary results are useful for working with data in a Python application, the CSV

format is often used for processing data and saving result files. For this, the conversion

between a list of dictionaries (the results from the cursor) and a CSV list can be done in

Python as follows:
# Function to extract CSV data from a dictionary list

def get_csv_data(result):

csvdata = []

for elem in result:

csvdata.append([elem[k] for k in elem])

return csvdata

The data can then be written into a CSV file from Python:

def write_csv_file(data):

with open("result.csv", "w") as f:

for d in data:

f.write(",".join([str(col) for col in d]) + "\n")

The CSV format and file content will look like this:

1,1,1,None,Big Hoss,10000.00,2018-10-18 00:00:00

20,2,4,3,Captain America,10000.00,2019-04-26 00:00:00

For working with tables, the pandas library can be used which simplifies many operations

such as extracting columns, CRUD operations (creating, reading, updating and deletion of

data), filtering and data analysis. Pandas is often used for Machine Learning in the form of

dataframes: DataFrame, allows to import data from various structures and file formats:

lists, dictionaries, csv files, excel files, etc.

In the following example, data is retrieved from a cursor and added to a pandas Data

Frame:

mycursor = mydb.cursor(dictionary=True)

mycursor.execute("SELECT * FROM employee")


myresult = mycursor.fetchall()

mycursor.close()

res = db.run_query("SELECT * FROM employee")

# create the pandas dataframe with the results from the database (in

dictionary format)

df = pd.DataFrame(res)

# the dataframe can be printed in the console

print(df)

# the dataframe can be converted to list

dlist = df.values.tolist()

print(dlist)

# columns can be extracted from the dataframe

names = df["name"].values

print(names)

# the following way works too

salaries = list(df["salary"])

# then we may want to do some additional data processing, for example

converting to float (real numbers representation in Python)

salaries = [float(sal) for sal in salaries

LAB TASKS

1. Using aggregate functions, write an SQL query to calculate the sum of all salaries in the

company database.

2. Using aggregate functions, write an SQL query to calculate the average salary in the

company database.
3. Write an SQL query to return a list of employees with the following information about

them:

 the name

 the name of their department

 the size of their team (number of employees within the same department)

4. Write an SQL query to return a list of employees with the following information about

them:

 the name

 the name of their position

 the min/average/max salary of the employees on the same position

5. Write an SQL query to return a list of employees with the following information about

them:

 the name in uppercase

 the location of their department

6. Write an SQL query to return a list of employees (name, salary) having a higher salary

than the average salary within the company.

7. Write an SQL query to return a list of employees (name, salary) having a higher salary

than their manager.

8. Write an SQL query to return the employees (name, salary) having the highest salary

within each department.

You might also like