Lecture 4 Post-Lecture
Lecture 4 Post-Lecture
Lecture 4 Post-Lecture
Lecture 4
Database and SQL
Jun Fan
[email protected]
Sorting Algorithm
Asymptotic Bounding
Data Abstraction
Tidy Data
Case Study: The Great Quant Meltdown
Data Manipulation
Relational Database
Structured Querying Language
Database Landscape
Case Study: The WSB GameStop short sell frenzy
Why?
• Tidying data is about structuring datasets to facilitate analysis
• Such datasets are easy to manipulate, model and visualize
• The format provides a well-defined structure and uniformity over your datasets
• Provides access to a number of tools that work well with tidy datasets
https://vita.had.co.nz/papers/tidy-data.pdf
Lecture 3 – 2/1/2024 MF810 Advanced Programming – J Fan 6
Grammar for data manipulation
Data Manipulation
Relational Database
Structured Querying Language
Database Landscape
Case Study: The WSB GameStop short sell frenzy
Data Manipulation
Relational Database
Structured Querying Language
Database Landscape
Case Study: The WSB GameStop short sell frenzy
Operations supported by SQL adhere to the CRUD (create, read, update, delete)
paradigm. We will focus on the ”read” aspect.
While there is an ANSI SQL standard, the standard is both a superset of what is often
implemented and also insufficient to cover areas such as indexes and file storage.
Commercial vendors are not incentivized to fully standardize as it allows for vendor
lock-in.
Data Manipulation
Relational Database
Structured Querying Language
Database Landscape
Case Study: The WSB GameStop short sell frenzy
Storage Formats
Blocks - A block is a raw storage volume filled with files that have been split into equal size chunks of
data. Each block does not have associated metadata, rather the operating system allocates storage
for different applications and decides what goes into each block. Often used for databases, email
servers, RAID redundancy and virtual machines.
Objects - data is stored in isolated containers identified by a unique ID or hash. These objects can be
stored locally or remotely and very amenable to scaling. Often used for big data, web apps and
backups.
Relational Databases
Most common and prevalent method of scalable storage for predominantly tabular data,
that has existed in some form or another since the 1960s. Traditional forms of usually
supported ACID properties (atomicity, consistency, isolation and durability). Modern forms
of these sometimes sacrifice one or more of these properties in exchange for some
optimization.
NoSQL Databases
NoSQL or sometimes called ”Not only SQL” became popular with the advent of the
Web 2.0 movement. Emphasis is on simpler design and better horizontal scaling to
clusters. With NoSQL, unstructured schema-less data can be stored.
SQL NoSQL
Relational Non-Relational
Storage Format: Object storage organized as buckets where each object is identified by a
bucket, key and version ID
Good for frequently accessed data due to its’ design for low latency and high throughput –
amenable to cloud applications, dynamic websites, content distribution, big data analytics,
etc. High fault tolerance and availability of data through replication. Scalability from the
perspective of being an object store.
Data Manipulation
Relational Database
Structured Querying Language
Database Landscape
Case Study: The WSB GameStop short sell frenzy
• Short selling
• Cost of borrowing
• Limited upside risk, but unlimited downside risk
• GameStop was being short sold at more than 100% of its total free float
market share
• Crowd sourced strong buying triggered by users of the subreddit
r/wallstreetbets
• Short seller realized significant loss
• forced liquidating short positions by buying back shares
• raise stock price further
• GameStop price reached a max value of over $500 per share, nearly 30
times the $17.25 price at the beginning of the month.