100% found this document useful (2 votes)
824 views83 pages

LookML Foundations Training

The document provides an overview of LookML concepts including: - LookML informs Looker's abstraction of SQL to define modeling layers, views, dimensions, measures, explores, and projects. - Dimensions are always in the GROUP BY and measures are always aggregated. Views correspond to tables or derived tables joined in explores. - The document outlines how to create a new LookML project, set up Git version control, and switch between development and production modes. - It describes how Looker writes SQL based on dimensions, measures, filters, and joins defined in LookML and how to reference objects between files. Dimension types like string, number, yesno and tier are also covered.

Uploaded by

J
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
824 views83 pages

LookML Foundations Training

The document provides an overview of LookML concepts including: - LookML informs Looker's abstraction of SQL to define modeling layers, views, dimensions, measures, explores, and projects. - Dimensions are always in the GROUP BY and measures are always aggregated. Views correspond to tables or derived tables joined in explores. - The document outlines how to create a new LookML project, set up Git version control, and switch between development and production modes. - It describes how Looker writes SQL based on dimensions, measures, filters, and joins defined in LookML and how to reference objects between files. Dimension types like string, number, yesno and tier are also covered.

Uploaded by

J
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

LookML Foundations

Training
Looker Hosted Webinar
Agenda

1. Introduction to Looker

2. Creating Dimensions and Measures

3. Model Files

4. Caching and Datagroups

5. Derived Tables

6. Best Practices
2
Introduction to
Looker
Defining
Transition
Terms

4
LookML

● Informs Looker in abstracting SQL


● Creates the modeling layer between
the database and the user
● Defines:
○ Join logic between tables (Views)
○ Custom tables (Derived Tables)
defined by Looker
○ Fields taken directly from the
database
○ Custom fields defined in Looker

5
Dimensions & Measures

● Dimensions
○ Always in the GROUP BY part of
the query
○ Automatically created for all
fields in a table

● Measures
○ Always part of an aggregate
function
○ Defined as a function of fields
that have already been
aggregated

6
View

● View files correspond to either


○ Tables in the database (Standard
View Files)
○ Virtual tables defined in Looker
(Derived Tables), or
○ Tables defined in Looker that are
physically written (materialized
view) to the database (Persistent
Derived Tables or PDTs)
● Explores are made with one or more
view files joined together
● Views become headers in Explores
● Dimensions and Measures are defined
in View files
7
Explore

● Explores are the starting point for


analysis
● Should be clearly organized around
business themes, (e.g., users, orders,
inventory, etc.) to minimize confusion
for the end-user

8
Model

● Contains data connection information


and Explore definitions
● Can be used for the following
purposes:
○ Restrict user access to certain
Explores
○ To separate and organize
Explores by business area

9
Project

● Highest level Looker object


● An almost independent Looker
instance
● Usually used for completely different
data sources
● View files cannot be shared between
different projects (unless a project
import is used)

10
Creating
Transition
a New Project

11
LookML Projects

● Modeling in Looker is based around LookML projects. Typically a project consists of


the following files:
○ One or more model files that define the project’s Explore options and joins

○ Multiple view files, each corresponding to a database table or derived table

○ One or more dashboard files which define the data and layouts for dashboards,
if you choose to use LookML Dashboards in addition to User Defined
Dashboards

12
Creating a New LookML Project

From the “Manage Project” screen,click on the button at bottom of page

1. Project Name

2. Click to run Looker model


generator to create Model file and
default explores & view files

3. Specify database connection

4. Specify All Tables or just a Single


Table from database

5. Underlying Data Schema

6. Prefixes to ignore (for example


“_” to ignore underscores)

13
Setting
Transition
Up Git

14
Configure Git

● Hit the “Configure Git” button


● Enter the URL for your Git repository
○ GitHub:
[email protected]:myorganization/myproject.git
○ GitHub Enterprise:
[email protected]:myorganization/myproject.git
○ GitLab:
[email protected]:myorganization/myproject.git
○ Bitbucket:
[email protected]:myorganization/myproject.git
○ Phabricator:
ssh://[email protected]/diffusion/MYCAL
LSIGN/myproject.git

15
Project

● If Looker does not successfully detect


your Git provider, it will ask you to
choose from a dropdown.
● Copy and paste the generated SSH
key as a deploy key in your Git host.
In GitHub, remember to check the
Allow Write Access checkbox.
● Click Continue Setup
● Commit and push to make your code
show up on your new repo!

16
Looker Development
TransitionEnvironment

17
Development Mode vs. Production Mode

A Looker data model can exists in two states: production mode and development mode.
PRODUCTION MODE: Users typically explore data in Looker in production mode. The
data model is shared across all users, and the LookML files are treated as read-only
DEVELOPMENT MODE: Developers must be in development mode when making a
change to the LookML. This mode accesses a completely separate version of the data
model that only the developers can see and edit. (In Git terms, development mode is
handled by a separate branch).

Development mode allows developers to make and test LookML changes without
affecting other users.

18
Switching In and Out of Development Mode

Developers can switch development mode on and off by clicking the Development Mode
ON/OFF button within the Develop Menu in Looker. There is also a keyboard shortcut
Ctrl+Shift+D.

While in development mode:


● LookML and Explore menus will be populated by the development version of the
model
● There will be a purple development mode bar at the top of the browser window
19
Git Integration for Version Control

Looker’s IDE is integrated with Git for version control. When LookML changes are ready
to be pushed to Production, the Git menu can be used to Commit and Deploy.

Looker automatically manages the Git workflows for committing, pulling, and pushing
changes.
20
How Looker
Transition
Writes SQL

21
How Looker Writes SQL

● Dimensions and measures in base view, note:


○ Dimensions are in the GROUP BY
○ Measures are in aggregating functions
● Dimensions and measures in the joined view, note:
○ Base view is still joined in
○ Unnecessary Views are not joined in
● Filter Dimensions and Measures
○ Dimension filter in the WHERE clause
○ Measure filter in the HAVING clause
● Row limit in the LIMIT

22
Creating
Dimensions &
Measures
Referencing Objects

dimension: sale_price {
type: number
sql: ${TABLE}.sale_price
Referencing a database object in Looker:
;;
● ${TABLE} references the table defined in the }
View
● Looker automatically creates dimensions for every measure: total_revenue {
field in the view type: sum
sql: ${sale_price} ;;
Referencing another Looker object: }
● ${field_name} references the Looker object
measure:
average_sale_price {
type: average
sql: ${sale_price} ;;
}
24
Dimension Types

● Dimension type string: dimension: full_name {


○ Used for text dimensions type: string
○ Default type sql: ${first_name} || ${last_name}
;;
}
● Dimension type number:
○ Used for numeric dimension: days_since_signup {
dimensions type: number
○ Commonly used with date sql: DATEDIFF(day, ${created_date},
difference calculations or current_date) ;;
basic row-level math }
across fields

25
Dimension Types

● Dimension type yesno: dimension: is_new_customer {


○ Define logical condition in type: yesno
sql parameter sql: ${days_since_signup} <= 90 ;;
○ Field becomes either yes }
or no

● Dimension type tier: dimension: days_since_signup_tier {


○ Buckets a dimension using type: tier
case statements tiers: [0, 30, 90, 180, 360, 720]
○ Style: classic, interval, sql: ${days_since_signup} ;;
integer, and relational style: integer
}

26
Time Dimension Groups

● Looker can cast a date or timestamp dimension_group: created {


into several different forms of time type: time
● The timeframes parameter is used timeframes: [raw, time, date,
to specify the specific date and time hour,
parts that are required hour_of_day, day_of_week,
● Number of dimension fields created day_of_week_index, time_of_day,
is dependent upon the number of week,
timeframes listed month_num, month, year, quarter,
● Fields can be referenced by quarter_of_year]
appending the date or time part sql: ${TABLE}.created_at ;;
desired to the name of the }
dimension group with an underscore
These fields are then referenced as:
${created_time} , ${created_date} ,
27
${created_hour_of_day} , etc.
Measure Types

● Three most common Measure dimension: cost {


types: type: number
○ Sum sql: ${TABLE}.cost ;;
○ Average }
○ Count
● There are two types of counts: measure: total_cost {
○ type: count (Counts rows in type: sum
that table and does NOT sql: ${cost} ;;
require a SQL parameter) }
○ type: count_distinct
(Computes a distinct count of measure: average_cost {
the field in the SQL type: average
parameter) sql: ${cost} ;;
}
28
Filtered Measures

measure: count_female_users {
type: count
● YesNo dimensions can be used as
filters: {
filters
field: gender
● Looker transfers the logic of the
value: "Female"
yesno dimension into the case
}
statement that produces the
measure
}
● Other dimension types can also be
used as filters
measure: total_sales_new_users {
type: sum
sql: ${sale_price}
filters: {
field: users.is_new_user
value: "Yes"
29
}
}
Measures Defining Other Measures

● Measures can be used within other


measures for more complex
calculations
● Measures that use other measures measure: percentage_female_users {
in their SQL definitions should have type: number
a type of number (nested value_format_name: percent_1
aggregations will cause errors) sql: 1.0*${count_female_users}
● the value_format_name /NULLIF(${count}, 0) ;;
parameter can be utilized to format }
the final output

30
Referencing Fields in Other View Files

● Fields in another view can be


referenced if the two views have
been joined together within an
Explore dimension: profit {
● Fields that reference other views will type: number
only be valid within an Explore in value_format_name: usd
which joins are defined for both sql: ${sale_price} -
required views ${inventory_items.cost} ;;
● If a field is selected in the UI that }
references multiple views, Looker
needs to know how to execute the
joins necessary for completing the
calculation

31
Helpful Field Parameters

● hidden : hides a field from the user interface while still allowing it to be available for
modeling (great for fields like primary keys that are not meaningful to users)

● label : changes how a field name will appear in the Field Picker

● description : displays additional information about a field to users upon hovering

● value_format_name : formats Looker cells using built-in or custom format names

● drill_fields : controls what fields are shown to a user when he or she clicks on
the value of a table cell to “drill” into the data while Exploring

● group_label : combines fields into custom groups within a view in the Field Picker

32
Working with Model
Files
Explores
Transition
and Join Logic

34
Building Explores

Key Join Parameters


explore: inventory_items {
1. join : the name of the join, which is
join: products {
typically the name of the view
type: left_outer
being joined
sql_on:
2. type : the type of join that should
${inventory_items.product_id}
occur (left_outer join by default)
= ${products.id} ;;
3. sql_on : the fields that should be
relationship: many_to_one
used within the ON clause of the
}
SQL query in order to join the two
tables together
4. relationship : how the two tables
relate to each other

35
Types of Joins
1. The name of the Explore.
2. Base View: The one View that is
always joined in.
3. Standard join
4. Joins renaming the view such that
the same view can be joined twice
5. Indirect join

Note: Defining a primary key in each View


and properly defining the relationship
parameter is very important
36
Building SQL from Explores

Typical SQL statement:

SELECT column_a, column_b, …, column_n


FROM table_a
JOIN table_b on table_a.field_1 = table_b.field_1
JOIN table_c on table_a.field_2 = table_c.field_2
WHERE some condition equals some value

SELECT = User chosen dimensions and measures


FROM clause plus any JOINs = Explore
WHERE clause = User-added filters

37
Translating SQL to an Explore

SELECT
flights.destination AS "flights.destination",
carriers.name AS "carriers.name", explore: flights {
aircraft.name AS "aircraft.name",
aircraft_origin.city AS "aircraft_origin.city",
COUNT(*) AS "flights.1_count" join: carriers {
sql_on: ${flights.carrier} = ${carriers.code} ;;
relationship: many_to_one
FROM flights AS flights }
LEFT JOIN public.carriers AS carriers
ON flights.carrier = carriers.code join: aircraft {
sql_on: ${flights.tail_num} = ${aircraft.tail_num} ;;
LEFT JOIN public.aircraft AS aircraft
ON flights.tail_num = aircraft.tail_num relationship: many_to_one
}
LEFT JOIN public.airports AS aircraft_origin
ON flights.origin = aircraft_origin.code join: aircraft_origin {
from: airports
WHERE (flights.cancelled = 'N') AND (aircraft_origin.state = 'CA') sql_on: ${flights.origin} = ${aircraft_origin.code} ;;
GROUP BY 1,2,3,4,5,6 relationship: many_to_one
fields: [full_name, city, state, code]
}
}

38
Helpful Explore Parameters

● label : changes how an Explore name will appear

● description : displays additional information about an Explore upon hovering over


the information icon within the Explore dropdown menu

● view_label : changes the label of the view within the field picker in the Explore

● group_label : combines Explores into custom groups within the Explore dropdown
menu the Explore

● fields : limits the scope of fields that are available within an Explore or view

39
Symmetric
Transition
Aggregation

40
What is the fanout problem?

Consider the following two tables individually:


● count(*) produces a count of customers and orders, and
● sum(visits) and sum(amount) gives total visits and total revenue

41
What is the fanout problem?

Consider joining these two tables on customer_id, like we would in the following Explore:

42
What is the fanout problem?

This join “fans out” the customer table


● count(*) and sum(amount) still work on the order-table fields, but
● count(*) and sum(visits) produce incorrect results on the customer-table fields
● Looker can handle this situation with Symmetric Aggregates!

43
Using Symmetric Aggregates

To leverage Symmetric Aggregation, two things MUST be done:

1. Specify primary keys in the view files. (This means a field that uniquely identifies
each row. If none exists, we can make one by concatenating fields together.)

44
Using Symmetric Aggregates

To leverage Symmetric Aggregation, two things MUST be done:

2. Correctly specify the “relationship” parameter. The possible values are


one_to_one, one_to_many, many_to_one, or many_to_many.

Left side: the view joined from Right side: the view joined to
(other view used in “sql_on:”) (the name next to “join:”)

45
Identifying the Correct Join Relationship

Test each possible relationship in a sentence. For example:

✖ one customer can relate to only one order


✔ one customer can relate to many orders
✖ many customers can relate to only one order
✖ many customers can relate to many (possibly same) orders

46
Results of Missing Primary Keys

If we do not specify a primary key in the view files:


1. Measures in joined in Views don’t come through to the Explore
2. Measures do work in the Base view but will result in errors if joining another table

1 2

47
Results of Incorrect Join Relationships

What happens when we incorrectly specify the relationships: parameter?

The Measures from the Orders table are correct, while the Measures from the fanned out
Customers table are not.

48
Using Symmetric Aggregates

If we identify Primary Keys and correctly specify the relationship: parameter

All measures in all Views are calculated correctly

49
How Does It Work?

Counts are simple: Looker does a count distinct of the primary keys.

Sums and averages are a bit more complex, but basically function in the same way:

50
How Does It Work?

SUM(DISTINCT (visits + MD5(customer_id))) - SUM(DISTINCT (MD5(customer_id)))

51
Explore Filters
Filtering Explores: Learning Objectives

● Understand the most commonly utilized options for applying default filters to an
Explore
○ sql_always_where and sql_always_having
○ always_filter
○ conditionally_filter

● Recognize common use cases for adding Explore filters

Note: This training will cover the most common Explore filter options. Check out Looker docs to see additional Explore
filter options.

53
sql_always_where and sql_always_having

WHAT: A filter within an Explore that cannot be changed by users


● No indicator in the UI (unless user looks at generated SQL)
● Applies to:
○ Users’ queries
○ Looks and Dashboards
○ Scheduled Content
○ Embeds
● Written in SQL and can use LookML substitutions: ${field_name}
WHY: Certain values should be filtered out of the Explore for ALL USERS (such as test
users, internal orders, etc.)

54
sql_always_where and sql_always_having

Example: sql_always_having

55
always_filter

WHAT: Required filter fields that are automatically added to the Explore
● Filter value can be changed but the filter itself cannot be removed
● Default values are written as Looker Filter Expressions
WHY: Prompts users to leverage appropriate filters when querying data

56
conditionally_filter

WHAT: A default filter that can be removed if at least one of the specified alternative filter
fields is selected
WHY: Typically used to prevent users from accidentally creating very large queries that
may be too expensive to run on your database

57
Caching &
Datagroups
Caching in Looker

● Using cached results of prior queries helps to reduce database load


● Caching policies can be set up in Looker using datagroups
● These caching policies can then be applied to various Looker objects:
○ Use persist_with parameters at the model or explore level to specify which
explores use each policy for clearing the query cache
○ Use datagroup_trigger in a PDT definition to specify which policy to use in
rebuilding PDTs
○ Build schedules that trigger based on datagroups to cause Looks or
Dashboards to run and send immediately after the cache has been invalidated,
thus warming the cache with the latest results

59
Using Cached Queries

● A query is run by a user and cached (cache results are stored in an encrypted file on
the Looker instance)
● For any new queries, the cache is checked to see if the same query was previously
run before running the query against the database
○ If the query is not found, Looker runs the query against the database and
caches the new result
○ If the query is found and the results are still valid then Looker uses the cached
results
○ If the query is found and the results are no longer valid, Looker runs the query
against the database and caches the new result

60
Datagroups

WHAT: Named caching policies within Looker that can be applied to Models, Explores, or
Persistent Derived Tables
WHY: Integrate Looker more closely with ETL processes or guarantee a refreshed cache
● Define one or more datagroup parameters at the model level
● Different caching policies require separate datagroup definitions

61
Configuring Datagroups

Caching policy parameters:


● sql_trigger parameter
○ Should be SQL query that returns one row with one column
○ Typically will query a field that serves as a good indicator that the underlying
data has been updated, such as a max(date) or will return a specific time of day
● max_cache_age to indicate the longest amount of time in which a query should be
cached before being invalidated
Only one of these parameters is required

62
Applying Datagroups to Query Results

A datagroup’s caching policy can be applied to one, some or all Explores in a Model.
● As a default for all explores in a model: use the persist_with parameter at the
model level and specify the name of the datagroup
● For a specific explore: use the persist_with parameter in that Explore’s definition
and specify the name of the datagroup
● For a group of explores: use the persist_with parameter in each of those Explore’s
definition and specify the name of the same datagroup

Note: Datagroups can also be used to add persistence to derived tables, which will be covered in the
next section.
63
Derived Tables
Derived Tables

WHAT: Tables defined within Looker that do not already exist in the database
● Two types of derived tables
○ Ephemeral: built at query time
○ Persisted: stored in the database
● Defined within the LookML
● Referenced in the LookML just like any other table
WHY: Expand the sophistication of analyses
● Aggregate data to a different level of granularity (example: aggregate fact data)
● Speed up performance (example: precompute joins)
● Write custom SQL for advanced use cases (example: utilize window functions)

65
Building SQL
Transition
Derived Tables

66
SQL Derived Tables

Build and test a SQL query within SQL Runner:

67
SQL Derived Tables

Within the Gear Menu, Select “Add to Project”:

68
SQL Derived Tables

Select the project in which the Derived Table should be added and input a descriptive
name for the table:

69
SQL Derived Tables

Looker creates a new View with the SQL Runner query and automatically writes
Dimensions for every field as well as a count measure:

70
Persisting
Transition
Derived Tables

71
Persisting Derived Tables

Two parameters should be added to a derived table when persisting it


● Table refresh logic for table rebuilding
○ datagroup_trigger: a caching policy definition triggered by a change in the
underlying data and/or a maximum time to use the cached values
○ sql_trigger_value: triggered by a change in the underlying data
○ persist_for: a set time period
● Add indexes

72
Derived Table Refresh Logic

● Use a datagroup_trigger in the PDT’s definition to use a datagroup’s caching


policy to trigger rebuilding a PDT**

● Use a sql_trigger_value in the PDT’s definition to rebuild the PDT based on a


change in the value returned by that query

● Use persist_for to set the length of time the derived table should be stored
before it is dropped from the database

**Recommended Approach
73
Indexing Derived Tables

● indexes: [field1, field2]


● Redshift
○ distkey: field1
■ Controls how the the data is distributed across the nodes.
■ Joining on dist keys is most efficient
○ sortkeys: [field1, field2]
■ Controls the order the data is written to disk
■ Filtering on sort_keys is most efficient
○ indexes: [field1, field2]
■ This creates Interleaved Sort Keys
■ Works best for very large datasets

74
Ephemeral vs. Persistent Derived Tables

Ephemeral derived tables will build at runtime as a temporary table (mysql) or via a
SQL common table expression.

75
Ephemeral vs. Persistent Derived Tables

Persistent derived tables will be stored as physical tables within the database once
built. Looker will then simply query those physical tables as needed.

76
Best Practices
Naming Conventions

● Name measures with aggregate function or common terms. total_[FIELD] for sum,
count_[FIELD], avg_[FIELD], etc.
● Name ratios descriptively. For example, “orders per purchasing customers” is
clearer than “orders percent.”
● Name yesno fields clearly: “Orders Is Returned” instead of “Returned”.
● Avoid the words “date” or “time” in a dimension group because Looker appends
each timeframe to the end of the dimension name: “created_date” becomes
“created_date_date”.

78
Model Organization

● Joining many to one from the most granular level typically provides the best query
performance.
● Use the fewest number of explores possible that allows users to easily get access to
the answers they need.
● Organize Explores using the group_label parameter to help the end-user find the
correct Explore as easily as possible.

79
Explore Design

● Don’t join in extraneous views.


● Use the fields parameter to limit fields surfaced to the end user.
● Comment out or delete extraneous autogenerated explores in the model file to
reduce clutter when developing.

80
Join Design

● Use ${date_raw} when joining on a date.


● Always define a relationship using the relationship parameter to ensure correct
aggregates are produced (and always declare a primary key in the View file.)

81
PDT Usage

● Choose the parameter sql_trigger_value over persist_for when you want to have
data ready the first time someone runs an explore or on a schedule.
● Evaluate your sql_trigger_value schedules such that tables are not building during
business hours/replication processes/peak usage points. Trigger the tables late in
the night or early in the morning, after ETL is expected to be completed.
● Always define indexes/distkeys/sortkeys to improve query performance. Generally
speaking, indexes should be applied to primary keys and date or time columns.

82
Questions?

You might also like