0% found this document useful (0 votes)
84 views10 pages

Assignment 4 Final

This document describes normalizing campaign contribution data from a single table into three normalized tables - candidate, contributor, and contribution. It provides SQL statements to create the tables, insert data into them from the original campaign table, add foreign key constraints, and create a view to join the tables. It also includes exercises on indexing, query plans, and concurrency using two database connections.

Uploaded by

api-571488031
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views10 pages

Assignment 4 Final

This document describes normalizing campaign contribution data from a single table into three normalized tables - candidate, contributor, and contribution. It provides SQL statements to create the tables, insert data into them from the original campaign table, add foreign key constraints, and create a view to join the tables. It also includes exercises on indexing, query plans, and concurrency using two database connections.

Uploaded by

api-571488031
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

CST363 Assignment 4 (2022springA)

Normalization
You are given the following file of campaign contribution data which is a sample taken from the
CA campaign for president in 2016. We are interested in fields for candidate name,
contributor, contribution amount and date. We are not interested in the cmte_id field or the
last 7 fields.
CREATE TABLE campaign
(
cmte_id varchar(12), // campaign id
cand_id varchar(12), // candidate id
cand_nm varchar(50), // candidate name
contbr_nm varchar(50), // contributor name
contbr_city varchar(40), // contributor city
contbr_st varchar(40), // contributor state
contbr_zip varchar(20), // contributor zipcode
contbr_employer varchar(60), // contributor employer
contbr_occupation varchar(40), // contributor occupation
contb_receipt_amt numeric(8,2), // contribution amount
contb_receipt_dt varchar(20), // contribution date
receipt_desc varchar(255),
memo_cd varchar(20),
memo_text varchar(255),
form_tp varchar(20),
file_num varchar(20),
tran_id varchar(20),
election_tp varchar(20)
);

We want to normalize this data by splitting it into 3 tables for candidate, contributor and
contribution.
Run the sql script file campaign-CA-2016.sql, which creates a campaign database with a
campaign table. Check that there are 18,118 rows in the table by doing a count(*) query.
1. Code create statements for the 3 normalized tables candidate, contributor and
contribution. Table candidate should have a primary key of cand_id. Contributor and
contribution tables should have a surrogate key of int type defined as autoincrement.
Contribution table should have columns for cand_id and contbr_id. Include your create
table statement here.

1
CST363 Assignment 4 (2022springA)

create table candidate


(
cand_id varchar(12),
cand_nm varchar(50),
primary key (cand_id)
);
create table contributor
(
contbr_id int auto_increment,
contbr_nm varchar(50),
contbr_city varchar(40),
contbr_st varchar(40),
contbr_zip varchar(20),
contbr_employer varchar(60),
contbr_occupation varchar(40),
primary key (contbr_id)
);
create table contribution
(
contb_id int auto_increment,
contb_receipt_amt numeric(8,2),
contb_receipt_dt varchar(20),
cand_id varchar(12),
contbr_id int,
primary key (contb_id)
);

2
CST363 Assignment 4 (2022springA)

Create an index on contributor name.


create index contributor_nm on contributor(contbr_nm);
2. Code 3 insert statements using subselect (read “Inserting from a Query” page 185 in
textbook) to select data from the campaign table and insert it into the normalized tables.
You should have 22 rows in the candidate table, 14,174 rows in the contributor table, and
18,118 rows in the contribution table. Include your 3 insert statements here.
insert into candidate select distinct c.cand_id, c.cand_nm from campaign c;
insert into contributor select distinct null, contbr_nm, contbr_city, contbr_st, contbr_zip,
contbr_employer, contbr_occupation from campaign;
insert into contribution select null, contb_receipt_amt, contb_receipt_dt, cand_id, contbr_id
from campaign, contributor where campaign.contbr_nm = contributor.contbr_nm;

3. Alter the contribution table to add foreign key constraints for columns cand_id and
contrbr_id. Include your alter table statement here.
alter table contribution add foreign key (contbr_id) references contributor(contbr_id);
alter table contribution add foreign key (cand_id) references candidate(cand_id);

4. Create a view named “vcampaign” that is a join of the 3 normalized tables and has
columns cand_id, cand_nm, contbr_nm, contbr_city, contbr_st,
contbr_zip, contbr_employer, contbr_occupation,
contb_receipt_amt, contb_receipt_dt

create view vcampaign as select candidate.cand_id, cand_nm,


contbr_nm, contbr_city, contbr_st, contbr_zip, contbr_employer,
contbr_occupation, contb_receipt_amt, contb_receipt_dt from
candidate, contributor, contribution
where contribution.contbr_id = contributor.contbr_id and
contribution.cand_id = candidate.cand_id;

Do a count(*) query using the view and verify the result is 18,118.

3
CST363 Assignment 4 (2022springA)

B+ Tree Visualization Exercises


Use the B+ tree simulator at https://www.cs.usfca.edu/~galles/visualization/BPlusTree.html

• Set MAX DEGREE = 3 Max Degree is the max number of pointers in an internal (not leaf) node.
The max number of values in a node is one less than max degree. MAX DEGREE is similar to
what we called in lecture FAN OUT. In the simulator we use a small value for MAX DEGREE, but
remember in real databases, the FAN OUT is typically on the order of 100-200.
• Insert the values (one at a time): 10 20 30 40 50 60
• Your diagram should look like

In the diagram above, the leaf node with 0050 0060 is full, as is the parent node 0040 0050. Other
nodes are not full.

A B+ tree is efficient for doing key lookup and range queries. However, when new entries have to be
inserted or removed from the index due to SQL insert, update or delete statements, there are multiple
reads/writes that must be done to maintain the tree nodes in the correct order and the leaf nodes in the
correct linked list order.

5. Do an insert of key value 12. Draw or embed a screenshot of the updated index.

6. How many nodes were either created or modified for the insert of 12?

One node modified

4
CST363 Assignment 4 (2022springA)

7. Now do an insert for a key value 14. Show an updated diagram.

8. How many nodes were either created or modified for an insert of 14?

Three, one node created and 2 modified

9. Do an insert of key value 52 and show an updated diagram.

10. How many nodes were either created or modified for an insert of 52?
Five, there are 2 nodes created and 3 modified

Conclusion: insert, delete of a B tree index may involve several reads/writes.

5
CST363 Assignment 4 (2022springA)

Query Plan Exercises


Perform the query

select * from vcampaign where contbr_zip = '93933';

Then examine the query plan by scrolling down the list of icon the right side of the result panel and
selecting the “Execution Plan”.

The query plan depicts how a table is accessed: either by reading the entire table (Full Table Scan
Red Rectangle) or using an index (Green Rectangle with index name below the box). An index is
unique if it is the primary key index or an index defined on a column that is defined as unique. The
query plan also depicts how joins are done. In the diagram a scan of the contributor table is done
and each row is joined first to rows in the contribution table by looking up contbr_id using index fk2,
and then join with row from candidate table looking cand_id using the primary key index. By
default, MySQL creates index on the primary key column(s) and on each foreign key column(s).

Create an index on contbr_zip column in the contributor table


create index zip on contributor(contbr_zip);

Redo the query and examine the execution plan.


select * from vcampaign where contbr_zip = '93933';

11. Is the new index being used? Explain in your words the execution plan.

The index is being used. The query searches the index on contbr_zip and
finds all nodes where the value is 93933 and produces a list from the
table where those nodes reside. The search is then shortened to
represent all of the nodes that exist from that section of the table
from the index query. It then outputs all of the information from those
nodes.

Do a query on vcampaign where contbr_zip is between 93001 and 93599 (the zip codes in LA area)

The query plan is

6
CST363 Assignment 4 (2022springA)

showing that range scan is done using index on zip.

Change the query to zip between 00001 and 93599. The execution plan is

The zip index is not being used. Why? The MySQL query optimizer realizes that it will be faster to scan
all row in contributor for zip between 00001 and 93599 rather than use index. An index is used to
search when the result is expected to be a few rows. If many rows are expected, it is faster to just scan
the whole table. How does the optimizer know when to use an index and when to scan ? There are
statistics kept about each table and each column: the number of rows, the max and min values for each
column, the number of distinct values for a column. Pretty clever!

7
CST363 Assignment 4 (2022springA)

Concurrency Exercises
Exclusive locking

Observe the behavior of exclusive locking when two concurrent transactions attempt to update the
same row.

For this exercise you will need two connections in the workbench that have auto commit turned off.

• Open a connection
o menu à Query à uncheck the item “Auto Commit Transactions”
• Open a second connection.
o To do this use the tab with the “Home” on it to return to the connection page and
then open the second connection.
o menu à Query à uncheck the item “Auto Commit Transactions”
Instance 1 Instance 2 Comments
use zagimore;
set autocommit = 0;
select * from product where
productid='1X1';
What is the price returned?

use zagimore;
set autocommit = 0;
select * from product where
productid='1X1';
What is the price returned?

update product set Instance 2 has updated the price but


productprice=productprice+100 where has not committed it. Other clients
productid='1X1'; cannot see uncommitted data.
select * from product where
productid='1X1';
What is the price returned?

select * from product where Since the update by Instance 2 has not
productid='1X1'; been committed and Instance 1 does
What is the price returned? not see the update and instead see the
previously committed value.
update product set
productprice=productprice+100 where
productid='1X1';
select * from product where
productid='1X1';
Notice the call is Running…
commit;
The call now completes.
select * from product where
productid='1X1';
What is the price returned?

commit;

8
CST363 Assignment 4 (2022springA)

Inconsistent Writes

Alice and Bob are both on duty. One of them may go off duty assuming that they first check
that the other is still on duty.

• Open two connections as in the last problem.


• On both connections menu à Query à uncheck the item “Auto Commit Transactions”
• Create the following table and 2 rows.

create table duty (name char(5) primary key, status char(3));


insert into duty values ('Alice' ,'on'), ('Bob', 'on');
commit;

Instance 1 “Alice” Instance 2 “Bob”


set autocommit=0; Alice checks that Bob is on duty. So she
select * from duty; updates her status to off duty.

update duty
set status=’off’
where name=’Alice’
set autocommit=0; Bob checks that Alice is on duty. So he
select * from duty; updates his status to off duty.

update duty
set status=’off’
where name=’Bob’
commit;
commit;

What has just happened? Bob and Alice have both gone off duty even though each one
checked that the other was on duty. Isn’t one of reasons to use a database is for data
integrity? But how does the database this to happen? But you must understand how a
database system works together with the application to guarantee data integrity.
Databases do exclusive locking on updates to the same row. But in this situation the updates
are to two different based data read from two different rows.
12. Based on lecture material there are 2 ways to fix this problem. Pick one and test it out.
How did you fix the problem?

I included an exclusive select lock on the table and made it so that only one
program can access or update the data at a time. This fixed the problem making it
so that only one program will be interacting with the table at a time and avoiding
a deadlock.

9
CST363 Assignment 4 (2022springA)

Other Exercises
13. Consider this situation: you try to get cash at an ATM, but the ATM fails after updating
your account and committing, but just before cash is dispensed. As a system designer, how
do you cope with the situation that the money has been debited from the account and
committed but the cash was unable to be dispensed? [ hint: what do you think
“compensating transaction” means? do a google search.]

The transaction would be flagged as an “incomplete transaction” because the money would not
be deducted from the ATM’s database of currency dispersed, meaning that the update to the
data table within the ATM would not have committed. This would make the database go back
and look at the transaction log of the account and be able to reverse the committed
transactions of the account and revert to an instance before the withdrawal transaction was
performed.

14. Consider this situation: you try to buy an airline ticket at a web site. The transaction
commits on the server, but crashes just before the message confirming the reservation is
sent to the client. As a system designer, how would you cope with the situation of a
reservation was made and committed in the database, but the confirmation message was
never received by the client?

I would design the database to identify that the message was not committed to the client and have the
DBMS log go back and pull the information from the transaction. I would then have all of the committed
data that was pulled from the log be sent to the client on the web site as a confirmation. If the
confirmation cannot be sent on the web page due to the page crashing I would then send the data
pulled from the log be sent to the client’s email address provided while buying the ticket (assuming this
is stored in the database table server).

What to submit for this assignment?

Edit this file with your answers to the 14 questions. Submit your answers as a PDF file to the Canvas
assignment.

10

You might also like