0% found this document useful (0 votes)
5 views54 pages

Module 3

The document provides an overview of SQL tables, including their schema, data types, and basic SQL queries. It explains concepts such as keys, joins, aggregation, and handling NULL values, as well as modifications to the database. Additionally, it introduces semi-structured data formats like XML and JSON, highlighting their uses and structures.

Uploaded by

Pratham Choubey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views54 pages

Module 3

The document provides an overview of SQL tables, including their schema, data types, and basic SQL queries. It explains concepts such as keys, joins, aggregation, and handling NULL values, as well as modifications to the database. Additionally, it introduces semi-structured data formats like XML and JSON, highlighting their uses and structures.

Uploaded by

Pratham Choubey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Tables in SQL Attribute names

Table name

Product
PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Tuples or rows
Tables Explained

● The schema of a table is the table name and its


attributes:
Product(PName, Price, Category, Manufacturer)

● A key is an attribute whose values are unique;


we underline a key
Product(PName, Price, Category, Manufacturer)
Tables Explained
● A tuple = a record
○ Restriction: all attributes are of atomic type

● A table = a set of tuples


○ Like a list…
○ …but it is unordered:
no first(), no next(), no last().
Data Types in SQL
● Atomic types:
○ Characters: CHAR(20), VARCHAR(50)
○ Numbers: INT, BIGINT, SMALLINT, FLOAT
○ Others: MONEY, DATETIME, …

● Every attribute must have an atomic type


SQL Query

Basic form: (plus many many more bells and whistles)

SELECT <attributes>
FROM <one or more relations>
WHERE <conditions>
Simple SQL Query

Product PName Price Category Manufacturer


Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
SELECT *
FROM Product
WHERE category=‘Gadgets’

“selection” PName Price Category Manufacturer


Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
Simple SQL Query

Product PName Price Category Manufacturer


Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi

SELECT PName, Price, Manufacturer


FROM Product
WHERE Price > 100

PName Price Manufacturer


“selection” and SingleTouch $149.99 Canon
“projection” MultiTouch $203.99 Hitachi
Notation Input
Schema

Product(PName, Price, Category, Manufacturer)

SELECT PName, Price, Manufacturer


FROM Product
WHERE Price > 100

Answer(PName, Price, Manufacturer)

Output
Schema
Details
● Case insensitive:
○ Same: SELECT Select select
○ Same: Product product
○ Different: ‘Seattle’ ‘seattle’

● Constants:
○ 'abc’ - yes
○ "abc” - no
The LIKE operator

SELECT *
FROM Product
WHERE PName LIKE ‘%gizmo%’

● s LIKE p: pattern matching on strings


● p may contain two special symbols:
○ % = any sequence of characters
○ _ = any single character
Eliminating Duplicates
Category
SELECT DISTINCT category Gadgets
FROM Product
Photography
Household

Compare to: Category

Gadgets
SELECT category Gadgets
FROM Product
Photography
Household
Ordering the Results

SELECT pname, price, manufacturer


FROM Product
WHERE price > 10
ORDER BY manufacturer DESC, pname

Ties are broken by the second attribute


on the ORDER BY list, etc.

Ordering is ascending, unless you


specify the DESC keyword.
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi

SELECT DISTINCT category

?
FROM Product
ORDER BY category

?
SELECT Category
FROM Product
ORDER BY PName

?
SELECT DISTINCT category
FROM Product
ORDER BY PName
Keys and Foreign Keys
Company

CName StockPrice Country

Key GizmoWorks 25 USA

Canon 65 Japan

Hitachi 15 Japan

Product
PName Price Category Manufacturer
Foreign
Gizmo $19.99 Gadgets GizmoWorks key

Powergizmo $29.99 Gadgets GizmoWorks


SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
Joins
● Introduction to Joins
● What are joins
● Why join matters?
Joins Company
CName StockPrice Country
Product (pname, price, category, manufacturer)
Company (cname, stockPrice, country) GizmoWorks 25 USA

Canon 65 Japan
Find all products under $200 manufactured in
Japan; return their names and prices. Hitachi 15 Japan

Product
SELECT PName, Price
PName Price Category Manufacturer
FROM Product, Company
WHERE Manufacturer=CName Gizmo $19.99 Gadgets GizmoWorks
AND Country=‘Japan’ Powergizmo $29.99 Gadgets GizmoWorks
AND Price <= 200
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
Joins
Product Company
PName Price Category Manufacturer Cname StockPrice Country
Gizmo $19.99 Gadgets GizmoWorks
GizmoWorks 25 USA
Powergizmo $29.99 Gadgets GizmoWorks Canon 65 Japan
SingleTouch $149.99 Photography Canon
Hitachi 15 Japan
MultiTouch $203.99 Household Hitachi

SELECT PName, Price


FROM Product, Company PName Price
WHERE Manufacturer=CName AND Country=‘Japan’ SingleTouch $149.99
AND Price <= 200
A Subtlety about Joins Company

CName StockPrice Country


Product (pname, price, category, manufacturer)
Company (cname, stockPrice, country) GizmoWorks 25 USA
Find all countries that manufacture some product in Canon 65 Japan
the ‘Gadgets’ category.
Hitachi 15 Japan

Product

PName Price Category Manufacturer


SELECT Country
FROM Product, Company Gizmo $19.99 Gadgets GizmoWorks
WHERE Manufacturer=CName AND
Category=‘Gadgets’ Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi

Unexpected duplicates
A Subtlety about Joins
Product Company
Name Price Category Manufacturer Cname StockPrice Country
Gizmo $19.99 Gadgets GizmoWorks
GizmoWorks 25 USA
Powergizmo $29.99 Gadgets GizmoWorks Canon 65 Japan
SingleTouch $149.99 Photography Canon
Hitachi 15 Japan
MultiTouch $203.99 Household Hitachi

SELECT Country
FROM Product, Company
WHERE Manufacturer=CName AND
Category=‘Gadgets’

Country
What is ??
the problem ? ??
What’s the
solution ?
Join
Aggregation
SELECT avg(price) SELECT count(*)
FROM Product FROM Product
WHERE maker=“Toyota” WHERE year > 1995

SQL supports several aggregation operations:

sum, count, min, max, avg

Except count, all aggregations apply to a


single attribute
Aggregation: Count

COUNT applies to duplicates, unless otherwise stated:

SELECT Count(category) same as Count(*)


FROM Product
WHERE year > 1995

We probably want:

SELECT Count(DISTINCT category)


FROM Product
WHERE year > 1995
Simple Aggregations
Purchase

Product Date Price Quantity

Bagel 10/21 1 20

Banana 10/3 0.5 10

Banana 10/10 1 10

Bagel 10/25 1.50 20

SELECT Sum(price * quantity)


FROM Purchase ?
WHERE product = ‘bagel’
Simple Aggregations
Purchase

Product Date Price Quantity

Bagel 10/21 1 20

Banana 10/3 0.5 10

Banana 10/10 1 10

Bagel 10/25 1.50 20

SELECT Sum(price * quantity)


FROM Purchase
WHERE product = ‘bagel’ 50 (= 20+30)
Grouping and Aggregation
Purchase(product, date, price, quantity)
Product Date Price Quantity
Find total sales after 10/1/2005 per product. Bagel 10/21 1 20
Bagel 10/25 1.50 20
Banana 10/3 0.5 10

SELECT product, Sum(price*quantity) AS TotalSales Banana 10/10 1 10


FROM Purchase
WHERE date > ‘10/1/2005’
GROUP BY product

Let’s see what this means…


Grouping and Aggregation

Product Date Price Quantity Product TotalSales


Bagel 10/21 1 20
Bagel 10/25 1.50 20 Bagel 50
Banana 10/3 0.5 10
Banana 15
Banana 10/10 1 10

SELECT product, Sum(price*quantity) AS TotalSales


FROM Purchase
WHERE date > ‘10/1/2005’
GROUP BY product
HAVING Clause

Same query, except that we consider only products that had


at least 30 buyers.

SELECT product, Sum(price * quantity)


FROM Purchase
WHERE date > ‘10/1/2005’
GROUP BY product
HAVING Sum(quantity) > 30

HAVING clause contains conditions on aggregates.


NULLS in SQL
● Whenever we don’t have a value, we can put
a NULL
● Can mean many things:
○ Value does not exists
○ Value exists but is unknown
○ Value not applicable
○ Etc.
● The schema specifies for each attribute if can
be null (nullable attribute) or not
● How does SQL cope with tables that have
NULLs ?
Null Values
● If x= NULL then 4*(3-x)/7 is still NULL
● If x= NULL then x=“Joe” is UNKNOWN
● In SQL there are three boolean values:
FALSE = 0
UNKNOWN = 0.5
TRUE = 1
Null Values

Can test for NULL explicitly:


○ x IS NULL
○ x IS NOT NULL

SELECT *
FROM Person
WHERE age < 25 OR age >= 25 OR age IS NULL

Now it includes all Persons


Modifying the Database
Three kinds of modifications
● Insertions
● Deletions
● Updates

Sometimes they are all called “updates”


Insertions
General form:

INSERT INTO R(A1,…., An) VALUES (v1,…., vn)

Example: Insert a new purchase to the database:

INSERT INTO Purchase(buyer, seller, product, store)


VALUES (‘Joe’, ‘Fred’, ‘wakeup-clock-espresso-machine’,
‘The Sharper Image’)

Missing attribute → NULL.


May drop attribute names if give them in order.
Deletions
Example:

DELETE FROM PURCHASE

WHERE seller = ‘Joe’ AND


product = ‘Brooklyn Bridge’

Factoid about SQL: there is no way to delete only a single


occurrence of a tuple that appears twice
in a relation.
Updates

Example:

UPDATE PRODUCT
SET price = price/2
WHERE Product.name IN
(SELECT product
FROM Purchase
WHERE Date =‘Oct, 25, 1999’);
Background: Semi Structured Data
● XML and JSON are two standard, textual data formats for
representing arbitrary data
○ XML stands for “eXtensible Markup Language”
○ JSON stands for “JavaScript Object Notation”
● Both are commonly used in practice. XML came first
● JSON, which uses JavaScript syntax, became popular for
representing data in web applications and services
○ If you’re using JavaScript, JSON is an obvious choice
Background: Semi Structured Data

● Both formats are reasonable


choices, although some people
have strong biases
● Most programming languages
have libraries for parsing and
generating both XML and JSON
● You should be familiar with both
Introduction to XML

● XML: Extensible Markup Language


● Defined by the WWW Consortium (W3C)
● Documents have tags giving extra
information about sections of the document
○ E.g. <title> XML </title> <slide> Introduction …</slide>

● Extensible, unlike HTML


○ Users can add new tags
Introduction to XML
● The ability to specify new tags,
and to create nested tag
structures make XML a great way
to exchange data, not just
documents.

○ Much of the use of XML has been


in data exchange applications,
not as a replacement for HTML
Introduction to XML
Tags make data (relatively) self-documenting
○ E.g.
<?xml version = "1.0"?>
<bank>
<account>
<account_number> A-101 </account_number>
<branch_name> Downtown </branch_name>
<balance> 500 </balance>
</account>
<depositor>
<account_number> A-101 </account_number>
<customer_name> Johnson </customer_name>
</depositor>
</bank>
Introduction to XML
Data interchange is critical in today’s networked
world
○ Examples:
■ Banking: funds transfer
■ Order processing (especially inter-
company orders)
■ Scientific data
● Chemistry, Genetics

○ Paper flow of information between


organizations is being replaced by
electronic flow of information
Comparison with Structured Data
● Inefficient: tags, which in effect represent
schema information, are repeated
● Better than relational tuples as a data-exchange
format
○ Unlike relational tuples, XML data is self-
documenting due to presence of tags
○ Non-rigid format: tags can be added
○ Allows nested structures
○ Wide acceptance, not only in database
systems, but also in browsers, tools, and
applications
Structure of XML Data
● Tag: label for a section of data
● Element: section of data beginning with
<tagname> and ending with matching
</tagname>
Structure of XML Data
● Elements must be properly nested
○ Proper nesting
■ <account> … <balance> …. </balance> </account>
○ Improper nesting
■ <account> … <balance> …. </account> </balance>
○ Formally: every start tag must have a unique matching
end tag, that is in the context of the same parent
element.
● Every document must have a single top-level element
Example of Nested Elements
<?xml version = "1.0"?>
<bank-1>
<customer>
<customer_name> Hayes </customer_name>
<customer_street> Main </customer_street>
<customer_city> Harrison </customer_city>
<account>
<account_number> A-102 </account_number>
<branch_name> Perryridge </branch_name>
<balance> 400 </balance>
</account>
<account>

</account>
</customer>
.
.
</bank-1>
Structure of XML Data
Mixture of text with sub-elements is legal in XML.
○ Example:
<account>
This account is seldom used any more.
<account_number> A-102</account_number>
<branch_name> Perryridge</branch_name>
<balance>400 </balance>
</account>
○ Useful for document markup, but discouraged for
data representation
Attributes
● Elements can have attributes
<account acct-type = “checking” >
<account_number> A-102 </account_number>
<branch_name> Perryridge </branch_name>
<balance> 400 </balance>
</account>
● Attributes are specified by name=value pairs inside the starting tag of
an element
● An element may have several attributes, but each attribute name can
only occur once
<account acct-type = “checking” monthly-fee=“5”>
More on XML Syntax
Distinction between subelement
and attribute

In the context of documents,


attributes are part of markup, while
subelement contents are part of the
basic document contents
More on XML Syntax
Distinction between subelement and attribute
○In the context of data representation, the difference is unclear
and may be confusing
■Same information can be represented in two ways
<account account_number = “A-101”> …. </account>

<account>
<account_number>A-101</account_number> …
</account>
○Suggestion: use attributes for identifiers of elements, and use
subelements for contents
More on XML Syntax
Elements without subelements or text content can be abbreviated
by ending the start tag with a /> and deleting the end tag
○ <account number=“A-101” branch=“Perryridge” balance=“200 />

To store string data that may contain tags, without the tags being
interpreted as subelements, use CDATA as below
○ <![CDATA[<account> … </account>]]>
Here, <account> and </account> are treated as just strings
CDATA stands for “character data”, text that will NOT be parsed by a parser
JSON Data – A name and a value
• A name/value pair consists of a field name (in double quotes), followed by a
colon, followed by a value
• Unordered sets of name/value pairs
• Begins with { (left brace)
• Ends with } (right brace)
• Each name is followed by : (colon)
• Name/value pairs are separated by , (comma)

{
"employee_id": 1234567,
"name": "Jeff Fox",
"hire_date": "1/1/2013",
"location": "Norwalk, CT",
"consultant": false
}
JSON Data – A name and a value
• In JSON, values must be one of the following data types:
• a string
• a number
• an object (JSON object)
• an array
• a boolean
• null

{
"employee_id": 1234567,
"name": "Jeff Fox",
"hire_date": "1/1/2013",
"location": "Norwalk, CT",
"consultant": false
}
JSON Data – A name and a value
• Strings in JSON must be written in double quotes.
{ "name":"John" }

• Numbers in JSON must be an integer or a floating point.


{ "age":30 }

• Values in JSON can be objects.


{
"employee":{ "name":"John", "age":30, "city":"New York" }
}

• Values in JSON can be arrays.


{
"employees":[ "John", "Anna", "Peter" ]
}
Another Example: XML vs. JSON
<?xml version="1.0"?>
<employees>
<employee>
<firstName>John</firstName> <lastName>Doe</lastName>
</employee>
<employee>
<firstName>Anna</firstName> <lastName>Smith</lastName>
</employee>
<employee>
<firstName>Peter</firstName> <lastName>Jones</lastName>
</employee>
</employees>

{"employees":[
{ "firstName":"John", "lastName":"Doe" },
{ "firstName":"Anna", "lastName":"Smith" },
{ "firstName":"Peter", "lastName":"Jones" }
]}
XML vs. JSON
•JSON is Like XML Because The biggest difference is: XML
• Both JSON and XML are "self describing" has to be parsed with XML parser. JSON
(human readable) can be parsed by standard JavaScript
• Both JSON and XML are hierarchical function.
(values within values)
• Both JSON and XML can be parsed and
used by lots of programming languages

•JSON is Unlike XML Because


• JSON doesn't use end tag
• JSON is shorter
• JSON is quicker to read and write
• JSON can use arrays
• JSON has a better fit for OO systems than
XML

You might also like