0% found this document useful (0 votes)
25 views

Lecture 01 - Class Overview, Databases

Uploaded by

xukunzh11
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Lecture 01 - Class Overview, Databases

Uploaded by

xukunzh11
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

CS5200: Database

Management Systems
Lecture 1: Introduction

9/4/2024 CS 5200, Fall 2024 (Cobbe) 1


Agenda
● Introduction and Syllabus
● Break
● Introduction to Databases

9/4/2024 CS 5200, Fall 2024 (Cobbe) 2


Introduction and Syllabus

9/4/2024 CS 5200, Fall 2024 (Cobbe) 3


Introduction and Syllabus
● Course Staff
● Lectures
● Communication
● Topics and Objectives
● Assignments and Grading
● Expectations
○ Generative AI/LLM
○ Academic Integrity
● Other Resources

9/4/2024 CS 5200, Fall 2024 (Cobbe) 4


Course Staff
● Prof. Richard Cobbe
○ 2nd year as full-time faculty at Khoury Seattle
○ PhD in programming language theory, Northeastern (Boston), 2009
○ 13 years of industry experience: Endeca, Oracle, Microsoft
○ Endeca/Oracle work: intersection of PL and databases
● TAs
○ Kaijun Chen
○ Yashvi Garg
○ Wenli Li
○ Zhiyuan Zhang

9/4/2024 CS 5200, Fall 2024 (Cobbe) 5


Lectures
● I'm responsible for 2 sections of CS5200 this fall:
○ Section 12: Wednesdays, 12:30–3:50, 225 Terry room 210
○ Section 13: Thursdays, 12:30–3:50, 225 Terry room 210
● Both lectures have the same content
● Please attend the lecture for the section you're registered for
○ One-time exceptions are probably fine
○ Space is limited; we will prioritize students who are registered for the corresponding section

9/4/2024 CS 5200, Fall 2024 (Cobbe) 6


Lectures
● Each lecture will have a 10-15 minute break
○ This is as much for me as for you; I will not be available for questions during break
○ Likewise, I'm unfortunately not able to stay after class for questions
● Slides will be posted to Canvas after the Thursday lecture
● As a rule, I do not plan to record lectures
● Recordings may be available, if attendance is high or there are extenuating
circumstances

9/4/2024 CS 5200, Fall 2024 (Cobbe) 7


Office Hours
● My office hours
○ Complicated by opening of 310 Terry, currently scheduled for Sept 23
○ Before 310 is open: Mondays, 2–4pm via Teams; use my bookings page to sign up
○ After 310 is open: Thursdays, 9:30–11:30am, faculty space on 2nd floor of 310
■ Ideally, you can just show up without scheduling a slot in advance
■ Layout of the space may make this difficult: how do I know that you're there?
■ We may have to try a few things to find something that works for everyone
● TAs will hold office hours
○ Expect a mix of in-person and virtual
○ Still finalizing schedule
○ Expect more information within the next week or so
● Other options available by arrangement

9/4/2024 CS 5200, Fall 2024 (Cobbe) 8


Communications: Canvas
● Syllabus
● Assignments (including submission)
● Grades
● Supplemental Files
○ Lecture slides
○ Recordings, if applicable
○ etc.
● Class membership tied to registration
○ If you're not a member of the Canvas class, please double-check your registration and email
me if there is still a problem

9/4/2024 CS 5200, Fall 2024 (Cobbe) 9


Canvas and Time Zones
● Canvas defaults to Eastern Time
● Affects how assignment deadlines are
displayed
● Be sure your Canvas timezone is set
correctly!
○ Click the “Edit Settings” button on the right
to
adjust.

9/4/2024 CS 5200, Fall 2024 (Cobbe) 10


Communications: Piazza
We have a course Piazza page (signup link; also in Canvas and syllabus).

Primary means of communication outside of class.

● Where appropriate, I encourage public questions!


○ Some of your classmates may be wondering the same thing
○ Discussions between students can be valuable!
○ You may post anonymously if you feel more comfortable doing so
● For other matters, please post a private message on Piazza, addressed to
"Instructors"
○ Goes to me and all the TAs
○ With 5 people reading it, you will probably get a response faster than if you address it to one

9/4/2024 CS 5200, Fall 2024 (Cobbe) 11


Communications: Expectations
● I will do my best to respond to Piazza messages within 24 hours
○ Slightly longer over the weekends and holidays
● Email and Teams messages will have longer response times.

9/4/2024 CS 5200, Fall 2024 (Cobbe) 12


Other Resources
● No textbook for this course
● We will use a few open-source software packages (links to be provided):
○ MySQL
○ MySQL Workbench
○ Eclipse/IntelliJ
● We'll talk about installing and configuring these during lecture at the
appropriate times.

9/4/2024 CS 5200, Fall 2024 (Cobbe) 13


Course Summary

9/4/2024 CS 5200, Fall 2024 (Cobbe) 14


Course Description and Goals
Introductory masters-level course in database management systems.
Major topics:
● What is a database? What is a DBMS?
● Where are databases and DBMSs used?
● How do we design a database? What are some common pitfalls?
● How do we retrieve information of interest from a database?
● How do we add, remove, and update information in a database?
● How do we interact with a database from a program?
● How do we manage concurrent updates to a database?
● How do we implement a DBMS?

9/4/2024 CS 5200, Fall 2024 (Cobbe) 15


Prerequisites
● Designed to be accessible to any Masters student at Khoury
● Requires familiarity with basic Java programming (5004, 5010)
● We will cover all other languages and related topics in lecture, including
○ SQL
○ Relational algebra
○ JDBC
○ HTML, JSP (used for term project)

9/4/2024 CS 5200, Fall 2024 (Cobbe) 16


Coursework Overview
● Homework Assignments
○ Individual work
○ Roughly weekly, mostly first half of the semester
● Midterm
○ In-class, pen-and-paper
○ Oct 23–24, during normal class hours
● Term Project
○ Group work
○ Divided into 4 milestones
● No Final Exam

9/4/2024 CS 5200, Fall 2024 (Cobbe) 17


Term Project: Overview
● Develop a relational database for an application to be described
● Develop a simple web application that interacts with the database
● Work in teams of 5 students
● Split into 4 milestones:
○ Relational model
○ Physical model and sample data
○ Java interoperability (JDBC)
○ Web front-end (JSP)
● In-class presentations after each milestone
○ Everyone should be ready to present each milestone

9/4/2024 CS 5200, Fall 2024 (Cobbe) 18


Term Project
● I will shortly post all 4 term project assignments on Canvas
○ No need to start yet; this is just to give you an idea of what to expect
○ Details of the data that you will be asked to represent forthcoming
● For now: start thinking about forming your teams
○ 80 currently enrolled, so 16 teams of 5
○ Teams should be in place by the week of Sept 16
○ I'll have a page on Canvas where each team can list their members
● The first milestone will be out on Sep 26 and due on Oct 9.

9/4/2024 CS 5200, Fall 2024 (Cobbe) 19


Term Project
Cross-section teams?

● In principle, this should be fine: both sections are doing the same project
● I'm still considering how to handle the presentations in this case. Be aware
that I might ask your team to present in both sections.

9/4/2024 CS 5200, Fall 2024 (Cobbe) 20


Course Schedule
● Week 1: Introduction, Course Overview
● Week 2: Relational Algebra (first HW assigned)
○ Formal, mathematical basis for querying relational databases
● Week 3: Logical Modeling
○ Defining relations, attributes, and relationships used to represent data
● Week 4: Functional Dependencies and Normal Forms (first project assigned)
○ Problems we can encounter in designing a logical model
○ How to avoid them
● Week 5: Physical Modeling
○ Translating the logical model into table definitions for an RDBMS

9/4/2024 CS 5200, Fall 2024 (Cobbe) 21


Course Schedule
● Week 6: SQL
○ Queries and other operations we can perform on an RDBMS
● Week 7: Midterm Review
● Week 8: Midterm (in-class)
● Week 9: Project Setup
○ Installing & configuring the various open-source toolkits we'll use for the term project
○ JDBC (Java DataBase Connectivity): library for executing SQL queries in Java
○ JSP (Java Server Pages): technology for constructing a web application
● Week 10: Database Transactions; Project Debugging
○ How to manage multiple concurrent users in a single database?
○ Time to address any remaining concerns with project setup

9/4/2024 CS 5200, Fall 2024 (Cobbe) 22


Course Schedule
● Week 11: Data Storage; Query Evaluation
○ What data structures do DBMSs use to store data on disk?
○ What algorithms do DBMSs use to evaluate queries?
● Week 12: JSP setup
● Week 13: Thanksgiving Break (no class)
● Week 14: Final Project Presentations
● Week 15: Finals Week (no class)

9/4/2024 CS 5200, Fall 2024 (Cobbe) 23


Policies and Expectations

9/4/2024 CS 5200, Fall 2024 (Cobbe) 24


Grading Policies
● Homeworks: 10% per day penalty for late work in most cases
○ I will not be able to accept late work on specific assignments; details to follow
● Project Milestones: late work is not accepted
○ Presentations scheduled in class immediately after due date
● I will grant extensions in appropriate circumstances
○ Please contact the instructors via Piazza as early as possible
● Grade distribution:
○ individual homework: 30%
○ midterm: 30%
○ term project: 40%

9/4/2024 CS 5200, Fall 2024 (Cobbe) 25


Grading Scale
Final grades computed according to the scale shown here. 93% or higher A

● The Canvas course uses this same scale 90-93% A-

● Grades on the dividing line receive the higher grade: a 86-90% B+

final average of exactly 90.00% is an A-. 82-86% B

77-82% B-

73-77% C+

69-73% C

65-69% C-

below 65% F

9/4/2024 CS 5200, Fall 2024 (Cobbe) 26


Regrade Policy
If you have questions about your grade, or if you believe you were graded
incorrectly, please submit a message to Instructors on Piazza.

We will respond to all such requests.

However, regrade requests have lower priority than grading current assignments.

9/4/2024 CS 5200, Fall 2024 (Cobbe) 27


Academic Integrity
● All coursework subject to NEU's Academic Integrity Policy (link in syllabus)
● Includes plagiarism: presenting another's code, ideas, designs, words as your
own
○ Cite work you derive from other sources
○ This includes ChatGPT and similar tools!
○ The university library has more information; link in syllabus
● Homework and midterm must be your own work; project must be your team's

9/4/2024 CS 5200, Fall 2024 (Cobbe) 28


Academic Integrity
● I encourage you to discuss assignments and projects with your classmates
● Do not provide solutions to anyone
● Good rule of thumb:
○ Discussions only in a natural language (English, Chinese, Quechua, …) are OK
○ Discussions that involve sharing code or design diagrams are generally not
● If in doubt, please discuss with me!

9/4/2024 CS 5200, Fall 2024 (Cobbe) 29


Generative AI (or, What About ChatGPT?)
● My goal: each of you develops a deep understanding of the topics covered
● Generative AI cannot replace this deep understanding
● Generative AI has the potential to take care of routine, repetitive work, but:
○ At the moment, it has problems with factual correctness
○ I don't yet know how to use generative AI to automate this routine work without putting the
deep understanding at risk, and that's a tradeoff I'm not willing to make yet.
● My position on generative AI is still evolving (but don't expect it to change
much during this semester)

9/4/2024 CS 5200, Fall 2024 (Cobbe) 30


Generative AI
So what does that mean for this class?

● I will not ask you to use generative AI tools on an assignment.


● I will not use AI detection tools: risk of false positives is too high.
● Use of generative AI on coursework carries risks.
● Issues with correctness: you are responsible for any errors.
● Course is structured to test your understanding, not your use of AI.
○ Midterm is pen-and-paper, no devices permitted.
○ Project presentations: "Why did you choose to implement it in this fashion? What tradeoffs did
you consider?"
● You might not have access to GenAI during job interviews.

9/4/2024 CS 5200, Fall 2024 (Cobbe) 31


Final Administrivia

9/4/2024 CS 5200, Fall 2024 (Cobbe) 32


Course Evaluations: TRACE
● Toward the end of the semester, the registrar's office will notify you that
TRACE course evaluations are open
● You can submit these on any laptop or mobile device
● I encourage you to submit feedback:
○ Useful to me as I continue to develop and refine this course
○ Useful to Khoury as we work to develop and refine the curriculum
● Feedback is strictly anonymous
● I read all submitted feedback
● You are welcome to submit feedback directly to me via Email/Piazza, though
obviously not anonymous

9/4/2024 CS 5200, Fall 2024 (Cobbe) 33


Student Accommodations
● If you need accommodations due to a disability, please contact Disability
Access Services (link in syllabus) rather than me directly.
● If you are unable to attend lecture or complete an assignment because of a
religious holiday or other observance, we can provide accommodations;
please contact me via Piazza post.
● In either case: the earlier we know of circumstances that require
accommodations, the better we'll be able to help.

9/4/2024 CS 5200, Fall 2024 (Cobbe) 34


Title IX, Discrimination, and Harassment
● NEU, Khoury, and I want to make sure you have a learning environment free of
discrimination and harassment.
● Submit reports of discrimination or harassment to Office of University Equity
and Compliance
● Faculty members, including me, are Mandatory Reporters:
○ If I become aware of discrimination or harassment, I am obligated to report it to OUEC
○ OUEC will contact the injured party to offer information about rights and resources
○ This report does not automatically cause a formal investigation; that requires consent of the
injured party

9/4/2024 CS 5200, Fall 2024 (Cobbe) 35


Title IX, Discrimination, and Harassment
● Confidential resources are available to you as well:
○ These are not mandatory reporters
○ Find@Northeastern: 24-hour mental health support (877-233-9477)
○ Sexual Violence Resource Center: support for Northeastern community members who have
experienced any sort of sexual violence
○ Confidential Resource Advisor: support for Northeastern students who have been accused of
sexual violence

9/4/2024 CS 5200, Fall 2024 (Cobbe) 36


Questions?

9/4/2024 CS 5200, Fall 2024 (Cobbe) 37


15-Minute Break

9/4/2024 CS 5200, Fall 2024 (Cobbe) 38


What is a Database?

9/4/2024 CS 5200, Fall 2024 (Cobbe) 39


What is a database?
● Organized collection of data
● “Organized” implies some notion of structure
● Data is usually represented as a collection of records
● Each record represents a particular real-world item or concept
○ Item in a product catalog
○ An individual order in an ecommerce system
○ The relationship between an order and an item included in that order
● Records have attributes describing the underlying item or concept
○ Item name, description, product, number
○ Order date, customer, shipping address
○ Quantity of item ordered

9/4/2024 CS 5200, Fall 2024 (Cobbe) 40


What is a Database?
● Records are typically grouped into related collections:
○ All records describing orders
○ All records describing customers
○ All records describing catalog items
● Within a collection, records generally have the same structure but different
values:
○ All orders have order date, but each order may have a different date
○ All customers have a billing address, but each customer has a different address
● Close parallel to classes/structs in C, C++, C#, Java
○ All instances of a Java class have the same fields
○ Different instances of that class (can) have different values in their fields

9/4/2024 CS 5200, Fall 2024 (Cobbe) 41


Where Do We Use Databases?
● Business & transactional systems:
○ appointments/calendars
○ point-of-sale (orders, product catalogs, payments, …)
○ financial systems: payroll, accounts payable, bookkeeping, …
● Personal data management
○ photo library
○ music catalogs (iTunes, etc)
○ address books
● Application/OS support
○ Windows registry
○ Authentication/authorization
○ Browser bookmark/favorites list
○ Email

9/4/2024 CS 5200, Fall 2024 (Cobbe) 42


Database Management System (DBMS)
DBMS: Software system that manages one or more databases.

Requirements include some or all of the following:

● Storage abstraction
● Programmatic interfaces
○ insertion/creation, deletion, modification
○ retrieval: query language, data processing
● Large scale: millions/billions of records
● Long-term durability

9/4/2024 CS 5200, Fall 2024 (Cobbe) 43


Database Management System (DBMS)
Requirements, continued:

● Data Integrity
● Support for multiple concurrent users
○ Authentication
○ Access control
○ Concurrency
● Transactions: ensure that data is always in a consistent state
● Access to metadata

9/4/2024 CS 5200, Fall 2024 (Cobbe) 44


DBMS Form Factor
DBMSs take several different forms:

● separate program(s) running on the same machine as the application


● program(s) running on one or more dedicated machines
● cloud computing service
● library linked into client application

9/4/2024 CS 5200, Fall 2024 (Cobbe) 45


DBMS Costs
● Storage overhead
○ Redundancy for data integrity
○ Indexes for faster data access
○ Metadata
● Performance overhead
○ Maintaining data integrity
○ Multi-user support: concurrency, synchronization
○ Security: multi-user authentication
○ Communications: marshalling data to/from client application
● Complexity overhead
○ Big DBMSs have a lot of administrative overhead (person-hours)
○ Can have steep learning curves

9/4/2024 CS 5200, Fall 2024 (Cobbe) 46


DBMS Benefits
So, why bother? Why not just create our own file format ("flat files")?

● Using a DBMS lets us leverage many engineer-years of work


○ Solved a lot of really tricky problems so we don't have to
○ Lot of investment into robustness
● Flat files generally tailored to specific access pattern
○ As soon as you need a new one, complexity increases
○ Ex: PoS system originally designed to look up orders by customer.
○ What happens when we need to look up orders by item?
○ DBMSs support ad-hoc queries

9/4/2024 CS 5200, Fall 2024 (Cobbe) 47


A Note on Terminology
● DBMS: program(s) that manage data & provide access to it
● Database: collection of related records for a particular application or group of
applications. Examples:
○ University: students, faculty members, courses, grades, registrations, …
○ Point-of-sale: products, customers, orders, shipments, payments, …
○ Blog application: users, posts, comments, …
● Frequently see multiple databases managed by the same DBMS instance.
● Confusingly, people often refer to a DBMS as a “database.”

9/4/2024 CS 5200, Fall 2024 (Cobbe) 48


Database Organization

9/4/2024 CS 5200, Fall 2024 (Cobbe) 49


Database Organization
For now, we'll concentrate on a logical view of data organization: let the DBMS
worry about file formats, etc.

Storage hierarchy:

● Database
● Table (class, entity, record set)
● Record (object instance, row)
● Attribute (column, field)

9/4/2024 CS 5200, Fall 2024 (Cobbe) 50


Database Records
Record: info about a single object or concept that we want to store

Examples:

● vehicle
● student in a university
● item in a product catalog
● financial transaction

9/4/2024 CS 5200, Fall 2024 (Cobbe) 51


Database Attributes
Attributes: properties of object or concept described by a record

Examples:

● vehicle: make, model, year, …


● student: name, GPA, number credit hours, …
● item in a product catalog: name, description, quantity in stock, price, …
● financial transaction: date, amount, payee, …

9/4/2024 CS 5200, Fall 2024 (Cobbe) 52


Database Tables
Table: collection of all records describing different instances of an item/concept

Traditionally, all records in the same table have the same schema: set of attribute
names and their types.

Tables & records can be thought of as corresponding to classes and instances, or


to rows in a table in a spreadsheet.

9/4/2024 CS 5200, Fall 2024 (Cobbe) 53


Database Organization
Let's make this concrete with an example:

ID Last Name First Name Degree Program

000627296 Lovelace Ada MS Computer Science

000246936 Smith Adam Bachelor’s, CSSH

000175892 Liddell Alice Law

9/4/2024 CS 5200, Fall 2024 (Cobbe) 54


Records and Keys
Records in a database almost always have some unique identifying tag, called a
primary key. Can take several forms:

● One or more attributes containing real-world information about the record:


○ User in web app: email address
○ Car: VIN
● Identifying value only meaningful within the database:
○ Order number
○ NUID

9/4/2024 CS 5200, Fall 2024 (Cobbe) 55


Records and Keys
Essential properties of keys:

● Every record has a key value


● Each record in a table has a different key value

9/4/2024 CS 5200, Fall 2024 (Cobbe) 56


Database Relationships
Database relationship: logical connection between 2 or more database objects:
databases, tables, records

DBMSs generally have tools for establishing & managing these relationships

Examples:

● University registration: Student-Class


● Point of sale: Order-Customer
● Corporate: Employee-Manager

9/4/2024 CS 5200, Fall 2024 (Cobbe) 57


Kinds of Relationships
Several different kinds of relationships

Two most common: has-a, is-a

● Student has one or more related Courses


● Order has a Customer
● Employee has a Manager
● Individual Contributor is an Employee
● Manager is an Employee

9/4/2024 CS 5200, Fall 2024 (Cobbe) 58


Relationships & Supplementary Information
Relationships, particularly has-a relationships, often require additional information:

● Student registered for Course


○ registered in a particular semester
● Student grades for Course
○ for a particular section
○ in a particular semester

9/4/2024 CS 5200, Fall 2024 (Cobbe) 59


Specialization Relationships
Specialization (is-a) relationships are similar to OO inheritance:

● Student is-a Person


● Instructor is-a Person

Person has set of shared properties: name, ID, home address, …

Student has all of Person's attributes, plus some more: # credit hours, registered
courses, …

Instructor has all of Person's attributes, plus some more: department, courses
taught, committee assignments, …

9/4/2024 CS 5200, Fall 2024 (Cobbe) 60


Constraints
● Example database:
○ Student: ID, last name, first name, major
○ Faculty: ID, last name, first name, departmentName
○ Department: name, college, chair
● Within table: Student.ID must be unique
● Between tables: Faculty.departmentName must exist in Department.name

9/4/2024 CS 5200, Fall 2024 (Cobbe) 61


Database Models

9/4/2024 CS 5200, Fall 2024 (Cobbe) 62


Database Models
Broadly: different ways of organizing data in a database

Historically common:

● Flat-File
● Hierarchical
● Network
● Relational
● Graph
● Object-oriented
● Semi-Structured

9/4/2024 CS 5200, Fall 2024 (Cobbe) 63


Flat-File
Custom application-defined file format; doesn't use a DBMS

Arguably not really a database, but often discussed in this context

Example: old-style Unix password database:


root:*:0:0:System Administrator:/var/root:/bin/sh
daemon:*:1:1:System Services:/var/root:/usr/bin/false
_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico
_taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false
_networkd:*:24:24:Network Services:/var/networkd:/usr/bin/false

Format changes require substantial code changes

9/4/2024 CS 5200, Fall 2024 (Cobbe) 64


Hierarchical
Records arranged in hierarchical structure

Master records (green); detail records (orange)

Traversals must follow this structure: only way


to access detail is by starting with
corresponding master

Easy to find all items ordered by a particular


customer

Harder to find outstanding orders for particular


catalog item

Examples: XML, Windows Registry

9/4/2024 CS 5200, Fall 2024 (Cobbe) 65


Network
Still have master & detail records

Richer traversals:

● links are 2-way


● details can be contained in multiple
masters

Examples: IDMS

Used in '80s & '90s for high-performance,


high-transaction systems: airline reservations,
credit card transactions

9/4/2024 CS 5200, Fall 2024 (Cobbe) 66


Relational
No master/detail division; all entities top-level

Relationships not stored as ptrs but as normal


attributes

Traversals therefore require lookup

Traversals more flexible: don't have to be baked


into schema

Examples: Oracle, MS SQL Server, MySQL,


PostgreSQL, SQLite, etc.

We'll focus on this model in this course.

9/4/2024 CS 5200, Fall 2024 (Cobbe) 67


Graph Database
Stores nodes and edges as first-class concepts.

Both nodes and edges have attributes.

Support for graph algorithms (shortest


path, etc.)

9/4/2024 CS 5200, Fall 2024 (Cobbe) 68


Object-Oriented
Similar to data model in OO languages

Built-in inheritance

Explicit pointers similar to hierarchical/network,


but more free-form

DB layout driven by application's object model

Examples: db40 (defunct), Versant, ObjectStore

9/4/2024 CS 5200, Fall 2024 (Cobbe) 69


Semi-Structured (NoSQL)
Rows vary in schema. Sparse data extremely common.

Example: single-table inheritance

● All subtypes together in a single table


● Each subtype has different set of attributes
● Storage format doesn't consume space for "missing" attributes

Often used in Big Data applications: record counts in billions or trillions

Very common in cloud solutions

9/4/2024 CS 5200, Fall 2024 (Cobbe) 70


Semi-Structured: Document Store
Stores primarily text documents (plain text,
HTML, MS Word, etc.), often book-length

Full-text indexes (concordances), both forward


and inverted

Fast lookup by position, frequency

Additional features, like phrase search,


synonyms, stemming

from Wikipedia

9/4/2024 CS 5200, Fall 2024 (Cobbe) 71


Semi-Structured: XML
<?xml version="1.0" encoding="UTF-8"?>
Rough tree structure <BlogApplication>
<BlogUsers>
<BlogUser date="2016-04-02">
<UserName>username1</UserName>
However, siblings may have different structure <FirstName>First1</FirstName>
<LastName>Last1</LastName>
</BlogUser>
...
May or may not have an associated XML </BlogUsers>
<BlogPosts>
<BlogPost>
schema <PostId>1</PostId>
<UserName>username1</UserName>
<Title>title1</Title>
<Content>content1</Content>
Common query languages: XPath, XSLT, XQuery <Comments>
<Comment>
<CommentId>1</CommentId>
<UserName>username4</UserName>
<Content>comment1</Content>
</Comment>
<Comment>
<CommentId>2</CommentId>
<UserName>username4</UserName>
<Content>comment2</Content>
</Comment>
</Comments>
</BlogPost>
...
</BlogPosts>
</BlogApplication>

9/4/2024 CS 5200, Fall 2024 (Cobbe) 72


Semi-Structured: Key-Value Pairs
Records are free-form; indexed by a string "key" {
orderNumber: "1234",
Used in very high-performance systems, orderDate: "2013-01-01",
customerNumber: "C999",
particularly when data is extremely sparse details: [
{ itemNumber: "A123",
High scalability: can distribute data across description: "pencils",
networks pencil-hardness: "No2",
quantity: "50" },
Often used in cloud systems { itemNumber: "A456",
description: "paper",
Examples: Amazon EC2, MS Azure, Google paper-weight: "24 lb",
AppEngine color: "white",
quantity: "25" }
]
}

9/4/2024 CS 5200, Fall 2024 (Cobbe) 73


Our Approach

9/4/2024 CS 5200, Fall 2024 (Cobbe) 74


Our Approach
● Focus primarily on relational databases in this course
● Cover NoSQL/semi-structured as time permits
● Start with relational algebra: theoretical foundation for operations we can
perform on data in a database: retrieval, data processing
● Move on to designing a database: deciding how to represent data of interest
as tables & relations
○ Might appear backwards: why talk about queries if we don't know what the database looks
like?
○ Hypothesis: rules for designing databases make more sense if we know how we're going to
interact with them

9/4/2024 CS 5200, Fall 2024 (Cobbe) 75


Our Approach, continued
● Next: Structured Query Language (SQL). This is the language that we'll use to
submit queries to an actual DBMS.
● Two streams after that:
○ Project: construct a database for a particular purpose, and write a program that can interact
with the database.
○ Lectures: talk about some of the ideas involved in implementing a DBMS: concurrency,
transactions, performance, optimizations.

9/4/2024 CS 5200, Fall 2024 (Cobbe) 76


For Next Week
● Survey and video introduction!
● No homework yet; first assignment to go out next week.
● Starting thinking about your project teams:
○ 5 people per team
○ I will set up a post on Piazza for people to look for teams.
○ Please have these in place by the week of September 16; details to follow.

9/4/2024 CS 5200, Fall 2024 (Cobbe) 77

You might also like