0% found this document useful (0 votes)
57 views66 pages

17 Olap

The document discusses OLAP (Online Analytical Processing) and data warehousing. It describes how OLAP is used for analysis of historical data in data warehouses, which are optimized for read performance rather than transactions. The document outlines the extract, transform, load (ETL) process used to migrate data to data warehouses, including extracting data from source systems, transforming it by cleaning and reformatting the data, and loading it into the warehouse. Dimensional data modeling is used to structure warehouse data for optimal query performance.

Uploaded by

Soi Bark
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views66 pages

17 Olap

The document discusses OLAP (Online Analytical Processing) and data warehousing. It describes how OLAP is used for analysis of historical data in data warehouses, which are optimized for read performance rather than transactions. The document outlines the extract, transform, load (ETL) process used to migrate data to data warehouses, including extracting data from source systems, transforming it by cleaning and reformatting the data, and loading it into the warehouse. Dimensional data modeling is used to structure warehouse data for optimal query performance.

Uploaded by

Soi Bark
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

OLAP

Textbook Chapter 33
March 19, 2019
Today

• OLTP vs OLAP
• Data Warehouses
• Extract, Transform, Load
• Multidimensional Data Models
• SQL:
• GROUPING SETS
• Window Functions

CMPT 355: Theory and Application of Databases 2


Textbook Chapter 31
Data Warehouses

• A Data Warehouse is a type of database that typically geared toward


analysis of historical data.
• This analysis is called OLAP (Online Analytical Processing)
• Requires highly temporal data that might not be required in a normal
database
• A traditional database supports transaction processing
• This is called OLTP (Online Transactional Processing)
• They are not equipped for handing major analytics
• Some amount of temporal data is often required in these as well

CMPT 355: Theory and Application of Databases 3


https://en.wikipedia.org/wiki/Online_transaction_processing

OLTP vs. OLAP https://en.wikipedia.org/wiki/Online_analytical_processing


Textbook Section 31.1

OLTP: Online transaction processing OLAP: Online analytical processing


• What we’ve been doing this entire course. • Usually querying a purpose-built “data
• Implemented with relational DBMS warehouse”
• Often a copy of the data

System Design System Design


• Fully normalized • Optimized for read performance
• Built for data entry • May or may not be normalized
• Might not support data entry
Intended Usage
• Support day-to-day activities and operational Intended Usage
processing • Analytical, read-only queries
• CRUD operations • Business intelligence or reporting
• Data mining

CMPT 355: Theory and Application of Databases 4


Textbook Table 31.1
OLTP vs. OLAP (data warehouses)
Characteristic OLTP Systems Data Warehousing Systems
Main Purpose Support operational processing. Support analytical processing.
Data age Current, real-time data. Historic.
Sometimes current data also included.
Data latency Real-time. Updated on a fixed schedule.
Some systems might be real-time.
Data granularity Detailed data. Detailed data along with summarized data.
Data processing Predictable CRUD operations. Less predictable.
High throughput of transactions. Low to medium throughput of transactions.
Reporting Predictable, one-dimensional, relatively static Unpredictable, multidimensional, dynamic
fixed reporting. reporting.
Users Serves large number of operational users. Serves lower number of users.
Possibly supports analytical requirements of
operational users.

CMPT 355: Theory and Application of Databases 5


Why?

• In this course, we’ve worked with relatively small data sets.


• ~100 rows in Assignment 2
• One table had 11,203 rows
• <2000 rows in Assignment 3
• <4000 rows in Assignment 4
• A single user (you) was querying the DB.
• In these cases, a data warehouse is overkill.
• But what about real-world databases?

CMPT 355: Theory and Application of Databases 6


https://data.stackexchange.com/
Why Data Warehouses?

• Consider Stack Overflow

CMPT 355: Theory and Application of Databases 7


https://meta.stackexchange.com/
questions/250396/database-
diagram-of-stack-exchange-
model/250439

CMPT 355: Theory and Application of Databases 8


Stack Overflow DB

SELECT
Posts.Id,
Users.Id
FROM Posts
JOIN Users
ON Users.Id =
Posts.OwnerUserId

CMPT 355: Theory and Application of Databases 9


Why Create a Data Warehouse?

• You don’t want to take resources away from OLTP database.


• When extracting data from OLTP database, you can transform it in
various useful ways.
• Clean out bad or invalid data
• Perform calculations
• Standardize data formats
• You can structure the data in a “flatter” way that removes need for
complex JOINs, improves read performance, or is more accessible to
end-users of the OLAP system.
• The StackOverflow DB isn’t bad for this.
From Posts, you can get to any other entity with no more than 2 JOINs.

CMPT 355: Theory and Application of Databases 10


Why Create a Data Warehouse?

• You don’t want to take resources away from OLTP database.


• When extracting data from OLTP database, you can transform it in
various useful ways.
• Clean out bad or invalid data ETL – Extract, Transform, Load
• Perform calculations
• Standardize data formats Dimensional Modelling
• You can structure the data in a “flatter” way that removes need for
complex JOINs, improves read performance, or is more accessible to
end-users of the OLAP system
• The StackOverflow DB isn’t bad for this.
From Posts, you can get to any other entity with no more than 2 JOINs.

CMPT 355: Theory and Application of Databases 11


Extract, Transform, Load

• A process of migrating data often used in data warehousing

CMPT 355: Theory and Application of Databases 12


Extract, Transform, Load

• Data Extraction
• Take data from one (or many) source(s).
• Data Transformation
• Modify/transforming the data into the format required.
• Almost always requires cleaning of data to:
• Remove bad data/characters
• Making list of values consistent
• Standardize data formatting
• Data Load
• Load the transformed data into your new system.

CMPT 355: Theory and Application of Databases 13


Extract, Transform, Load

• This is your opportunity to clean up the


data!
• If your “province” column looks like this,
how can you look at sales per province?

CMPT 355: Theory and Application of Databases 14


ETL for Pokemon DB: Extraction

I did this for you in the following steps:


• The data was extracted from web pages
• e.g., List of Pokemon from Wikipedia:
https://en.wikipedia.org/wiki/List_of_generat
ion_I_Pok%C3%A9mon
• I copied and pasted into Excel.
• Removed all the mega-evolutions.
• Removed all the columns that we weren’t
going to import.
• Cleaned up header row

CMPT 355: Theory and Application of Databases 15


ETL for Pokemon DB: Extraction

• I split all of the merged cells for


single-type Pokemon so that
there would be a blank second
cell.
• I exported to CSV to eliminate
formatting an the links.
• And this left me with something I
could safely import into a SQLite
DB.

CMPT 355: Theory and Application of Databases 16


ETL for Pokemon DB: Transform

After importing into SQLite, we had to


transform the data.
• Find the numeric TypeID from named type
• Switch from “EvolvesTo” to “EvolvesFrom”

This is also where we had to clean up


the data.
• NULLs instead of blank strings

CMPT 355: Theory and Application of Databases 17


ETL for Pokemon DB: Transform

For other tables, we had to do


some extra cleaning up.
• Remove unneeded whitespace
• Make ‘?’ and ‘-’ NULL

I had you do all this in SQL, but you can


do it outside the DB as well.

CMPT 355: Theory and Application of Databases 18


ETL for Pokemon DB: Load

INSERT INTO Pokemon (


• And then finally, to load, we PokedexNumber, Name, PrimaryTypeID,
had to write an INSERT INTO … SecondaryTypeID, EvolvesFromNumber
)
SELECT … query. SELECT
PT1.PokemonNumber,
PT1.Name,
T1.TypeID AS PrimaryTypeID,
T2.TypeID AS SecondaryTypeID,
PT1.EvolvesTo,
PT2.PokemonNumber AS EvolvesFrom
FROM pokemon_temp AS PT1
JOIN Types AS T1
ON T1.Name = TRIM(PT1.PrimaryType)
LEFT JOIN Types AS T2
ON T2.Name = TRIM(PT1.SecondaryType)
LEFT JOIN pokemon_temp PT2
ON PT2.EvolvesTo = PT1.PokemonNumber
CMPT 355: Theory and Application of Databases 19
ETL for Assignment 4

• For Assignment 4, I had you take unnormalized, extracted, data, and


then Load it based on your design and the 3NF version of the data.
• To Extract that data, I
• Created my own DB schema for a hypothetical “old” database.
• Generated mock data.
• Wrote a query create an unnormalized view of the data.

CMPT 355: Theory and Application of Databases 20


ETL for Assignment 4

SELECT

FROM Thread
JOIN Category
ON Category.CategoryID = Thread.CategoryID
JOIN Post
ON Post.ThreadID = Thread.ThreadID
JOIN User AS PostAuthor
ON Post.AuthorUserID = PostAuthor.UserID
JOIN Rating
ON Rating.PostID = Post.PostID
LEFT JOIN ThreadTag
ON ThreadTag.ThreadID = Thread.ThreadID
JOIN Tag
ON Tag.TagID = ThreadTag.TagID

CMPT 355: Theory and Application of Databases 21


ETL for Assignment 4
INSERT INTO Thread (ThreadID, ThreadName,
• And then based on that, CreatedOn, CategoryID, AuthorUserID)
SELECT DISTINCT topic_id, topic, topic_date,
you wrote INSERT INTO … category_id, user_id
SELECT queries to insert FROM temp
the data. GROUP BY topic_id HAVING MIN(post_id);

• Not a lot of transforming INSERT INTO Post (PostID, PostContent,


required (just making use of ThreadID, CreatedOn, AuthorUserID)
default values and NULLs). SELECT DISTINCT post_id, post, topic_id,
post_date, user_id
FROM temp;
• This is almost the reverse INSERT INTO Rating (PostID, RatedByUserID,
of what you would do for a Rating)
data warehouse. SELECT DISTINCT post_id, user_id_rating, rating
FROM temp
CMPT 355: Theory and Application of Databases 22
ETL Tools

• There are purpose-built ETL tools available for performing similar


migration tasks.

• You can always create


your own scripts, e.g.,
using your Python skills.

CMPT 355: Theory and Application of Databases 23


https://pokemondb.net/type
ETL Tools

• For example, the Pokemon


TypeChart requires a complicated
transform if we were to do it purely
in SQL.
• Using Python would not be so
difficult.

CMPT 355: Theory and Application of Databases 24


ETL for TypeChart
Initial import and result of converting to CSV

CMPT 355: Theory and Application of Databases 25


ETL for TypeChart
Cleaning up values and column names

CMPT 355: Theory and Application of Databases 26


ETL for TypeChart
Importing into SQLite DB

CMPT 355: Theory and Application of Databases 27


ETL for TypeChart

• I want to load the values into this table

CREATE TABLE TypeChart (


AttackingType TEXT NOT NULL,
DefendingType TEXT NOT NULL,
Multiplier NUMERIC,
PRIMARY KEY (
AttackingType,
DefendingType
)
)

CMPT 355: Theory and Application of Databases 28


ETL for TypeChart
import sqlite3

conn = sqlite3.connect("typechart.db")
cur = conn.cursor()

def fetch_types():
cur.execute("SELECT Attacking FROM tmp")
results = cur.fetchall()

return [r[0] for r in results]

def transform():
types = fetch_types()

for typeAtk in types:


for typeDef in types:
cur.execute("SELECT " + typeAtk + " FROM tmp WHERE Attacking = '" + typeDef + "'")
multiplier = cur.fetchone()[0]

cur.execute("""INSERT INTO TypeChart (AttackingType, DefendingType, Multiplier)


VALUES (?, ?, ?)""", (typeAtk, typeDef, multiplier))
conn.commit()

transform()

CMPT 355: Theory and Application of Databases 29


ETL for TypeChart
The loaded data

• I get 324 rows as expected. (18 types x 18 types)

CMPT 355: Theory and Application of Databases 30


Textbook Section 32.4
Dimension Modelling

• You can structure the data in a “flatter” way that removes need for
complex JOINs or is more accessible to end-users of the OLAP system
• Every dimensional model is composed of
• One table with many foreign keys making up a composite PK, called the fact
table.
• Many tables, each with a simple surrogate PK, called dimension tables.
• This forms a “star-like” structure, and so is called a star schema.

CMPT 355: Theory and Application of Databases 31


“DreamHome” database logical design from
the textbook (Figure 17.9)

CMPT 355: Theory and Application of Databases 32


Star schema (dimensional
model) for property sales
(Textbook Figure 32.2)

• Unnormalized
• One join to get any
dimension we might be
interested in

CMPT 355: Theory and Application of Databases 33


Chinook Database
Logical DB Diagram

CMPT 355: Theory and Application of Databases 34


Chinook Database
Star Schema for Track Sales

CMPT 355: Theory and Application of Databases 35


Text Section 33.1
OLAP Analyses

• Regardless of whether a data warehouse is used, “OLAP” refers to


specific types of analytical queries.
• “The dynamic synthesis, analysis, and consolidation of large volumes of
multidimensional data.” - Textbook, Section 33.1
• These queries involve multi-dimensional views of aggregate data.
• Not just basic navigation and browsing (“slicing and dicing”), but more
complex analyses involving time or multiple dimensions.
• E.g., “how many tracks were sold of each genre, by region, since 2009.”

CMPT 355: Theory and Application of Databases 36


OLAP Cubes

• Traditionally, in a 2D table you would have two axes, and then values.
• OLAP cubes extend this to many more dimensions

• This is best demonstrated through a pivot table.


• For demonstration purposes, I’m just going to work with a table of unnormalized
data.

CMPT 355: Theory and Application of Databases 37


Pivot Table Data
Chinook Database

CREATE VIEW "Sales" AS


SELECT
II."UnitPrice" AS "UnitPrice",
II."Quantity" AS "Quantity",
II."UnitPrice" * II."Quantity" AS "ExtendedPrice",
G."Name" AS "GenreName",
MT."Name" AS "MediaType",
AL."Title" AS "AlbumTitle",
AR."Name" AS "ArtistName",
C."City", C."State", C."Country",
"InvoiceDate"
FROM "InvoiceLine" AS II
JOIN "Invoice" AS I ON I."InvoiceId" = II."InvoiceId"
JOIN "Track" AS T ON T."TrackId" = II."TrackId"
JOIN "Album" AS AL ON AL."AlbumId" = T."AlbumId"
JOIN "Artist" AS AR ON AR."ArtistId" = AL."ArtistId"
JOIN "Genre" AS G ON G."GenreId" = T."GenreId"
JOIN "MediaType" AS MT ON MT."MediaTypeId" = T."MediaTypeId"
JOIN "Customer" AS C ON C."CustomerId" = I."CustomerId"

CMPT 355: Theory and Application of Databases 38


Pivot Table Data

CMPT 355: Theory and Application of Databases 39


Pivot Table

CMPT 355: Theory and Application of Databases 40


Accomplishing this with SQL

• Goal: Write a query that performs the same calculations and


groupings as in the example.
• Using:
“Year” for the columns dimension,
“Country” for the rows dimension,
“Quantity” for the measure
• We can do this with multiple GROUP BYs and UNIONs

CMPT 355: Theory and Application of Databases 41


Accomplishing this with SQL

SELECT
"Country",
date_part('year', "InvoiceDate") AS "Year",
SUM("Quantity")
FROM "Sales"
GROUP BY "Country", "Year"

CMPT 355: Theory and Application of Databases 42


Accomplishing this with SQL

SELECT
"Country",
date_part('year', "InvoiceDate") AS "Year",
SUM("Quantity")
FROM "Sales"
GROUP BY "Country", "Year"

We still need:
• Subtotals for country, year
• Grand total

CMPT 355: Theory and Application of Databases 43


Accomplishing this with SQL
Adding subtotals and grand total

Country, Year SELECT SELECT Sub-Total for


combination "Country", "Country", each Country
date_part('year', "InvoiceDate") NULL,
AS "Year", SUM("Quantity")
SUM("Quantity") FROM "Sales"
FROM "Sales" GROUP BY "Country"
GROUP BY "Country", "Year"
UNION
UNION
SELECT Sub-Total for
SELECT NULL, each Year
Grand Total NULL, date_part('year', "InvoiceDate")
NULL, AS "Year",
SUM("Quantity") SUM("Quantity")
FROM "Sales" FROM "Sales"
GROUP BY "Year"
UNION
CMPT 355: Theory and Application of Databases 44
GROUPING SETS (SQL-99)

• That was tedious.


• Wouldn’t it be nice if we could do that in one query?
• Just use grouping sets!
• In general… SELECT "column1", "column2"
FROM "table"
SELECT "column1", "column2" GROUP BY "column1", "column2"
FROM "table"
GROUP BY Is shorthand for… UNION
GROUPING SETS (
("column1",) SELECT "column1", NULL
("column1", "column2") FROM "table"
) GROUP BY "column1"

CMPT 355: Theory and Application of Databases 45


GROUPING SETS (SQL-99)

• So we can write a much simpler query…

SELECT
"Country",
date_part('year', "InvoiceDate") AS "Year",
SUM("Quantity")
FROM "Sales"
GROUP BY GROUPING SETS (
(),
("Country"),
("Year"),
("Country", "Year")
)

CMPT 355: Theory and Application of Databases 46


CUBE

• This type of OLAP Cube analysis is so common that there’s a CUBE


shorthand for GROUPING SETS
SELECT SELECT
"Country", "Country",
date_part('year', "InvoiceDate") date_part('year', "InvoiceDate")
AS "Year", AS "Year",
SUM("Quantity") SUM("Quantity")
FROM "Sales" FROM "Sales"
GROUP BY GROUPING SETS ( GROUP BY CUBE ("Country", "Year")
(),
("Country"),
("Year"),
("Country", "Year")
)

CMPT 355: Theory and Application of Databases 47


https://www.postgresql.org/docs/10/
CUBE queries-table-expressions.html#QUERIES-GROUPING-SETS

In General

• In general…
GROUPING SETS (
( a, b, c ),
( a, b ),
Is equivalent to ( a, c ),
CUBE ( a, b, c )
( a ),
( b, c ),
( b ),
( c ),
( )
)

CMPT 355: Theory and Application of Databases 48


https://www.postgresql.org/docs/10/
ROLLUP queries-table-expressions.html#QUERIES-GROUPING-SETS

• Another shorthand notation for GROUPING SETS is ROLLUP

GROUPING SETS (
( e1, e2, e3, ... ),
ROLLUP ( e1, e2, e3, ... ) ...
( e1, e2 ),
Is equivalent to ( e1 ),
( )
)

CMPT 355: Theory and Application of Databases 49


Window Functions (SQL:2003)

• Another OLAP-related feature in SQL are window functions.


• Window functions allow you to show the results of various
calculations at the row-level.

CMPT 355: Theory and Application of Databases 50


http://www.postgresqltutorial.com/
Window Functions (SQL:2003) postgresql-window-function/

• Window functions can be categorized into:


• Value window functions • You can also use the aggregate
• FIRST_VALUE, LAST_VALUE functions as window functions
• LAG, LEAD • AVG, COUNT, MAX, MIN, SUM
• Ranking window functions
• ROW_NUMBER, RANK, DENSE_RANK
• They use the syntax…
window_function(arg1, arg2,..) OVER (
PARTITION BY expression
ORDER BY expression Partition is optional
)

CMPT 355: Theory and Application of Databases 51


http://www.postgresqltutorial.com/
Window Functions (SQL:2003) postgresql-window-function/
http://www.sqlitetutorial.net/sqlite-
window-functions/

Name Description
Compute the rank for a row in an ordered set of rows with no gaps in rank
DENSE_RANK
values.
RANK Assign a rank to each row within the partition of the result set.
FIRST_VALUE Get the value of the first row in a specified window frame.
Provide access to a row at a given physical offset that comes before the
LAG
current row.
LAST_VALUE Get the value of the last row in a specified window frame.
LEAD Provide access to a row at a given physical offset that follows the current row.
Assign a sequential integer starting from one to each row within the current
ROW_NUMBER
partition.

CMPT 355: Theory and Application of Databases 52


Example
Top Ranked Country by Total Sales

• It’s trivial to create a query to list the countries in order by sales


SELECT
"Country",
SUM("ExtendedPrice") AS "Sales"
FROM "Sales"
GROUP BY "Country"
ORDER BY "Sales" DESC

• But what if we wanted to rank them?


• USA #1
• Canada #2
• etc.

CMPT 355: Theory and Application of Databases 53


Example
Top Ranked Country by Total Sales

• Use the RANK() window function.


SELECT
"Country",
SUM("ExtendedPrice") AS "Sales",
RANK() OVER (ORDER BY SUM("ExtendedPrice") DESC)
FROM "Sales"
GROUP BY "Country"
ORDER BY "Sales" DESC

CMPT 355: Theory and Application of Databases 54


Example
Country rank by year

• Let’s say our query was a bit more complicated.


• Each country’s sales by year
SELECT
"Country",
SUM("ExtendedPrice") AS "Sales",
DATE_PART('year', "InvoiceDate") AS "Year"
FROM "Sales"
GROUP BY "Country", "Year"
ORDER BY "Sales" DESC

• We now want to know the country’s rank over each year

CMPT 355: Theory and Application of Databases 55


Example
Country rank by year

• Now we need PARTITION BY


SELECT
"Country",
SUM("ExtendedPrice") AS "Sales",
DATE_PART('year', "InvoiceDate")
AS "Year",
RANK() OVER (
PARTITION BY
DATE_PART('year', "InvoiceDate")
ORDER BY SUM("ExtendedPrice") DESC
)
FROM "Sales"
GROUP BY "Country", "Year"
ORDER BY "Sales" DESC

CMPT 355: Theory and Application of Databases 56


Example
Rank by country, year, subtotals

• We can combine the CUBE with RANK to get even more…


SELECT
"Country",
SUM("ExtendedPrice") AS "Sales",
DATE_PART('year', "InvoiceDate")
AS "Year",
RANK() OVER (
PARTITION BY
DATE_PART('year', "InvoiceDate")
ORDER BY SUM("ExtendedPrice") DESC
)
FROM "Sales"
GROUP BY CUBE("Country", "Year")
ORDER BY "Sales" DESC

CMPT 355: Theory and Application of Databases 57


Example: Distance from X,Y,Z
From Assignment 2’s database

• In Assignment 2, there was a “movement” table

CMPT 355: Theory and Application of Databases 58


Example: Distance from X,Y,Z
From Assignment 2’s database

• In Assignment 2, there was a “movement” table


• I am interested in the total distance travelled for each logNavigationID
• From one row, we need to access the row after (or before)
• Calculate the distance from one row to another
• Then calculate the total over the logNavigationID

CMPT 355: Theory and Application of Databases 59


Example: Distance from X,Y,Z
From Assignment 2’s database

• Let’s start with accessing the adjacent row…

SELECT
movement.lognavigationid,
movement.posx,
movement.posy,
movement.posz
FROM movement

CMPT 355: Theory and Application of Databases 60


Example: Distance from X,Y,Z
From Assignment 2’s database

• Let’s start with accessing the adjacent row…


I’ll use the LAG function to get the data from the previous row…

LEAD (expression [, offset]) OVER (PARTITION BY ___ ORDER BY ___)

CMPT 355: Theory and Application of Databases 61


Example: Distance from X,Y,Z
From Assignment 2’s database

SELECT
movement.lognavigationid,
movement.posx,
LAG(movement.posx, 1) OVER (
PARTITION BY lognavigationid ORDER BY logmovementid ASC) AS prev_posx,
movement.posy,
LAG(movement.posy, 1) OVER (
PARTITION BY lognavigationid ORDER BY logmovementid ASC) AS prev_posy,
movement.posz,
LAG(movement.posz, 1) OVER (
PARTITION BY lognavigationid ORDER BY logmovementid ASC) AS prev_posz
FROM movement

CMPT 355: Theory and Application of Databases 62


Example: Distance from X,Y,Z
From Assignment 2’s database

SELECT
lognavigationid,
SQRT((m.posx-m.prev_posx)^2 + (m.posy-m.prev_posy)^2 + (m.posz-m.prev_posz)^2) AS distance
FROM
(
SELECT
movement.logmovementid,
movement.lognavigationid,
movement.posx,
LAG(movement.posx, 1)
OVER (PARTITION BY lognavigationid ORDER BY logmovementid ASC) AS prev_posx,
movement.posy,
LAG(movement.posy, 1)
OVER (PARTITION BY lognavigationid ORDER BY logmovementid ASC) AS prev_posy,
movement.posz,
LAG(movement.posz, 1)
OVER (PARTITION BY lognavigationid ORDER BY logmovementid ASC) AS prev_posz
FROM movement
) AS m

CMPT 355: Theory and Application of Databases 63


Example: Distance from X,Y,Z

Now calculate SUM for each logNavigationID

SELECT
lognavigationid,
SUM(SQRT(
(m.posx-m.prev_posx)^2
+ (m.posy-m.prev_posy)^2
+ (m.posz-m.prev_posz)^2)) AS distance
FROM
(

) AS m
GROUP BY lognavigationid

CMPT 355: Theory and Application of Databases 64


Example: Distance from X,Y,Z

What I was really interested in was between-group conditions

SELECT
P."condition",
SUM(SQRT((
m.posx-m.prev_posx)^2
+ (m.posy-m.prev_posy)^2
+ (m.posz-m.prev_posz)^2)) AS distance
Group 0 travelled a greater distance
FROM
(more error in completing the navigation task)
(
...
) AS M
JOIN log_navigation LN ON LN.lognavigationid = M.lognavigationid
JOIN participant P ON P.participantid = LN.participantid
GROUP BY P."condition"

CMPT 355: Theory and Application of Databases 65


Summary

• OLTP is what we’ve been doing in this course.


• OLAP is all about analytical, read-only queries
• OLAP can be supported by the creation of a dedicated data
warehouse.
• Data warehouses are populated using ETL: Extract, Transform, Load
• The structure of a data warehouse can be modelled by a “Star Schema,” a
single fact table with many dimension tables
• In SQL, OLAP is facilitated by two features
• GROUPING SETS
• Window functions

CMPT 355: Theory and Application of Databases 66

You might also like