Assignment 4 Final
Assignment 4 Final
Normalization
You are given the following file of campaign contribution data which is a sample taken from the
CA campaign for president in 2016. We are interested in fields for candidate name,
contributor, contribution amount and date. We are not interested in the cmte_id field or the
last 7 fields.
CREATE TABLE campaign
(
cmte_id varchar(12), // campaign id
cand_id varchar(12), // candidate id
cand_nm varchar(50), // candidate name
contbr_nm varchar(50), // contributor name
contbr_city varchar(40), // contributor city
contbr_st varchar(40), // contributor state
contbr_zip varchar(20), // contributor zipcode
contbr_employer varchar(60), // contributor employer
contbr_occupation varchar(40), // contributor occupation
contb_receipt_amt numeric(8,2), // contribution amount
contb_receipt_dt varchar(20), // contribution date
receipt_desc varchar(255),
memo_cd varchar(20),
memo_text varchar(255),
form_tp varchar(20),
file_num varchar(20),
tran_id varchar(20),
election_tp varchar(20)
);
We want to normalize this data by splitting it into 3 tables for candidate, contributor and
contribution.
Run the sql script file campaign-CA-2016.sql, which creates a campaign database with a
campaign table. Check that there are 18,118 rows in the table by doing a count(*) query.
1. Code create statements for the 3 normalized tables candidate, contributor and
contribution. Table candidate should have a primary key of cand_id. Contributor and
contribution tables should have a surrogate key of int type defined as autoincrement.
Contribution table should have columns for cand_id and contbr_id. Include your create
table statement here.
1
CST363 Assignment 4 (2022springA)
2
CST363 Assignment 4 (2022springA)
3. Alter the contribution table to add foreign key constraints for columns cand_id and
contrbr_id. Include your alter table statement here.
alter table contribution add foreign key (contbr_id) references contributor(contbr_id);
alter table contribution add foreign key (cand_id) references candidate(cand_id);
4. Create a view named “vcampaign” that is a join of the 3 normalized tables and has
columns cand_id, cand_nm, contbr_nm, contbr_city, contbr_st,
contbr_zip, contbr_employer, contbr_occupation,
contb_receipt_amt, contb_receipt_dt
Do a count(*) query using the view and verify the result is 18,118.
3
CST363 Assignment 4 (2022springA)
• Set MAX DEGREE = 3 Max Degree is the max number of pointers in an internal (not leaf) node.
The max number of values in a node is one less than max degree. MAX DEGREE is similar to
what we called in lecture FAN OUT. In the simulator we use a small value for MAX DEGREE, but
remember in real databases, the FAN OUT is typically on the order of 100-200.
• Insert the values (one at a time): 10 20 30 40 50 60
• Your diagram should look like
In the diagram above, the leaf node with 0050 0060 is full, as is the parent node 0040 0050. Other
nodes are not full.
A B+ tree is efficient for doing key lookup and range queries. However, when new entries have to be
inserted or removed from the index due to SQL insert, update or delete statements, there are multiple
reads/writes that must be done to maintain the tree nodes in the correct order and the leaf nodes in the
correct linked list order.
5. Do an insert of key value 12. Draw or embed a screenshot of the updated index.
6. How many nodes were either created or modified for the insert of 12?
4
CST363 Assignment 4 (2022springA)
8. How many nodes were either created or modified for an insert of 14?
10. How many nodes were either created or modified for an insert of 52?
Five, there are 2 nodes created and 3 modified
5
CST363 Assignment 4 (2022springA)
Then examine the query plan by scrolling down the list of icon the right side of the result panel and
selecting the “Execution Plan”.
The query plan depicts how a table is accessed: either by reading the entire table (Full Table Scan
Red Rectangle) or using an index (Green Rectangle with index name below the box). An index is
unique if it is the primary key index or an index defined on a column that is defined as unique. The
query plan also depicts how joins are done. In the diagram a scan of the contributor table is done
and each row is joined first to rows in the contribution table by looking up contbr_id using index fk2,
and then join with row from candidate table looking cand_id using the primary key index. By
default, MySQL creates index on the primary key column(s) and on each foreign key column(s).
11. Is the new index being used? Explain in your words the execution plan.
The index is being used. The query searches the index on contbr_zip and
finds all nodes where the value is 93933 and produces a list from the
table where those nodes reside. The search is then shortened to
represent all of the nodes that exist from that section of the table
from the index query. It then outputs all of the information from those
nodes.
Do a query on vcampaign where contbr_zip is between 93001 and 93599 (the zip codes in LA area)
6
CST363 Assignment 4 (2022springA)
Change the query to zip between 00001 and 93599. The execution plan is
The zip index is not being used. Why? The MySQL query optimizer realizes that it will be faster to scan
all row in contributor for zip between 00001 and 93599 rather than use index. An index is used to
search when the result is expected to be a few rows. If many rows are expected, it is faster to just scan
the whole table. How does the optimizer know when to use an index and when to scan ? There are
statistics kept about each table and each column: the number of rows, the max and min values for each
column, the number of distinct values for a column. Pretty clever!
7
CST363 Assignment 4 (2022springA)
Concurrency Exercises
Exclusive locking
Observe the behavior of exclusive locking when two concurrent transactions attempt to update the
same row.
For this exercise you will need two connections in the workbench that have auto commit turned off.
• Open a connection
o menu à Query à uncheck the item “Auto Commit Transactions”
• Open a second connection.
o To do this use the tab with the “Home” on it to return to the connection page and
then open the second connection.
o menu à Query à uncheck the item “Auto Commit Transactions”
Instance 1 Instance 2 Comments
use zagimore;
set autocommit = 0;
select * from product where
productid='1X1';
What is the price returned?
use zagimore;
set autocommit = 0;
select * from product where
productid='1X1';
What is the price returned?
select * from product where Since the update by Instance 2 has not
productid='1X1'; been committed and Instance 1 does
What is the price returned? not see the update and instead see the
previously committed value.
update product set
productprice=productprice+100 where
productid='1X1';
select * from product where
productid='1X1';
Notice the call is Running…
commit;
The call now completes.
select * from product where
productid='1X1';
What is the price returned?
commit;
8
CST363 Assignment 4 (2022springA)
Inconsistent Writes
Alice and Bob are both on duty. One of them may go off duty assuming that they first check
that the other is still on duty.
update duty
set status=’off’
where name=’Alice’
set autocommit=0; Bob checks that Alice is on duty. So he
select * from duty; updates his status to off duty.
update duty
set status=’off’
where name=’Bob’
commit;
commit;
What has just happened? Bob and Alice have both gone off duty even though each one
checked that the other was on duty. Isn’t one of reasons to use a database is for data
integrity? But how does the database this to happen? But you must understand how a
database system works together with the application to guarantee data integrity.
Databases do exclusive locking on updates to the same row. But in this situation the updates
are to two different based data read from two different rows.
12. Based on lecture material there are 2 ways to fix this problem. Pick one and test it out.
How did you fix the problem?
I included an exclusive select lock on the table and made it so that only one
program can access or update the data at a time. This fixed the problem making it
so that only one program will be interacting with the table at a time and avoiding
a deadlock.
9
CST363 Assignment 4 (2022springA)
Other Exercises
13. Consider this situation: you try to get cash at an ATM, but the ATM fails after updating
your account and committing, but just before cash is dispensed. As a system designer, how
do you cope with the situation that the money has been debited from the account and
committed but the cash was unable to be dispensed? [ hint: what do you think
“compensating transaction” means? do a google search.]
The transaction would be flagged as an “incomplete transaction” because the money would not
be deducted from the ATM’s database of currency dispersed, meaning that the update to the
data table within the ATM would not have committed. This would make the database go back
and look at the transaction log of the account and be able to reverse the committed
transactions of the account and revert to an instance before the withdrawal transaction was
performed.
14. Consider this situation: you try to buy an airline ticket at a web site. The transaction
commits on the server, but crashes just before the message confirming the reservation is
sent to the client. As a system designer, how would you cope with the situation of a
reservation was made and committed in the database, but the confirmation message was
never received by the client?
I would design the database to identify that the message was not committed to the client and have the
DBMS log go back and pull the information from the transaction. I would then have all of the committed
data that was pulled from the log be sent to the client on the web site as a confirmation. If the
confirmation cannot be sent on the web page due to the page crashing I would then send the data
pulled from the log be sent to the client’s email address provided while buying the ticket (assuming this
is stored in the database table server).
Edit this file with your answers to the 14 questions. Submit your answers as a PDF file to the Canvas
assignment.
10