LookML Foundations Training
LookML Foundations Training
Training
Looker Hosted Webinar
Agenda
1. Introduction to Looker
3. Model Files
5. Derived Tables
6. Best Practices
2
Introduction to
Looker
Defining
Transition
Terms
4
LookML
5
Dimensions & Measures
● Dimensions
○ Always in the GROUP BY part of
the query
○ Automatically created for all
fields in a table
● Measures
○ Always part of an aggregate
function
○ Defined as a function of fields
that have already been
aggregated
6
View
8
Model
9
Project
10
Creating
Transition
a New Project
11
LookML Projects
○ One or more dashboard files which define the data and layouts for dashboards,
if you choose to use LookML Dashboards in addition to User Defined
Dashboards
12
Creating a New LookML Project
1. Project Name
13
Setting
Transition
Up Git
14
Configure Git
15
Project
16
Looker Development
TransitionEnvironment
17
Development Mode vs. Production Mode
A Looker data model can exists in two states: production mode and development mode.
PRODUCTION MODE: Users typically explore data in Looker in production mode. The
data model is shared across all users, and the LookML files are treated as read-only
DEVELOPMENT MODE: Developers must be in development mode when making a
change to the LookML. This mode accesses a completely separate version of the data
model that only the developers can see and edit. (In Git terms, development mode is
handled by a separate branch).
Development mode allows developers to make and test LookML changes without
affecting other users.
18
Switching In and Out of Development Mode
Developers can switch development mode on and off by clicking the Development Mode
ON/OFF button within the Develop Menu in Looker. There is also a keyboard shortcut
Ctrl+Shift+D.
Looker’s IDE is integrated with Git for version control. When LookML changes are ready
to be pushed to Production, the Git menu can be used to Commit and Deploy.
Looker automatically manages the Git workflows for committing, pulling, and pushing
changes.
20
How Looker
Transition
Writes SQL
21
How Looker Writes SQL
22
Creating
Dimensions &
Measures
Referencing Objects
dimension: sale_price {
type: number
sql: ${TABLE}.sale_price
Referencing a database object in Looker:
;;
● ${TABLE} references the table defined in the }
View
● Looker automatically creates dimensions for every measure: total_revenue {
field in the view type: sum
sql: ${sale_price} ;;
Referencing another Looker object: }
● ${field_name} references the Looker object
measure:
average_sale_price {
type: average
sql: ${sale_price} ;;
}
24
Dimension Types
25
Dimension Types
26
Time Dimension Groups
measure: count_female_users {
type: count
● YesNo dimensions can be used as
filters: {
filters
field: gender
● Looker transfers the logic of the
value: "Female"
yesno dimension into the case
}
statement that produces the
measure
}
● Other dimension types can also be
used as filters
measure: total_sales_new_users {
type: sum
sql: ${sale_price}
filters: {
field: users.is_new_user
value: "Yes"
29
}
}
Measures Defining Other Measures
30
Referencing Fields in Other View Files
31
Helpful Field Parameters
● hidden : hides a field from the user interface while still allowing it to be available for
modeling (great for fields like primary keys that are not meaningful to users)
● label : changes how a field name will appear in the Field Picker
● drill_fields : controls what fields are shown to a user when he or she clicks on
the value of a table cell to “drill” into the data while Exploring
● group_label : combines fields into custom groups within a view in the Field Picker
32
Working with Model
Files
Explores
Transition
and Join Logic
34
Building Explores
35
Types of Joins
1. The name of the Explore.
2. Base View: The one View that is
always joined in.
3. Standard join
4. Joins renaming the view such that
the same view can be joined twice
5. Indirect join
37
Translating SQL to an Explore
SELECT
flights.destination AS "flights.destination",
carriers.name AS "carriers.name", explore: flights {
aircraft.name AS "aircraft.name",
aircraft_origin.city AS "aircraft_origin.city",
COUNT(*) AS "flights.1_count" join: carriers {
sql_on: ${flights.carrier} = ${carriers.code} ;;
relationship: many_to_one
FROM flights AS flights }
LEFT JOIN public.carriers AS carriers
ON flights.carrier = carriers.code join: aircraft {
sql_on: ${flights.tail_num} = ${aircraft.tail_num} ;;
LEFT JOIN public.aircraft AS aircraft
ON flights.tail_num = aircraft.tail_num relationship: many_to_one
}
LEFT JOIN public.airports AS aircraft_origin
ON flights.origin = aircraft_origin.code join: aircraft_origin {
from: airports
WHERE (flights.cancelled = 'N') AND (aircraft_origin.state = 'CA') sql_on: ${flights.origin} = ${aircraft_origin.code} ;;
GROUP BY 1,2,3,4,5,6 relationship: many_to_one
fields: [full_name, city, state, code]
}
}
38
Helpful Explore Parameters
● view_label : changes the label of the view within the field picker in the Explore
● group_label : combines Explores into custom groups within the Explore dropdown
menu the Explore
● fields : limits the scope of fields that are available within an Explore or view
39
Symmetric
Transition
Aggregation
40
What is the fanout problem?
41
What is the fanout problem?
Consider joining these two tables on customer_id, like we would in the following Explore:
42
What is the fanout problem?
43
Using Symmetric Aggregates
1. Specify primary keys in the view files. (This means a field that uniquely identifies
each row. If none exists, we can make one by concatenating fields together.)
44
Using Symmetric Aggregates
Left side: the view joined from Right side: the view joined to
(other view used in “sql_on:”) (the name next to “join:”)
45
Identifying the Correct Join Relationship
46
Results of Missing Primary Keys
1 2
47
Results of Incorrect Join Relationships
The Measures from the Orders table are correct, while the Measures from the fanned out
Customers table are not.
48
Using Symmetric Aggregates
49
How Does It Work?
Counts are simple: Looker does a count distinct of the primary keys.
Sums and averages are a bit more complex, but basically function in the same way:
50
How Does It Work?
51
Explore Filters
Filtering Explores: Learning Objectives
● Understand the most commonly utilized options for applying default filters to an
Explore
○ sql_always_where and sql_always_having
○ always_filter
○ conditionally_filter
Note: This training will cover the most common Explore filter options. Check out Looker docs to see additional Explore
filter options.
53
sql_always_where and sql_always_having
54
sql_always_where and sql_always_having
Example: sql_always_having
55
always_filter
WHAT: Required filter fields that are automatically added to the Explore
● Filter value can be changed but the filter itself cannot be removed
● Default values are written as Looker Filter Expressions
WHY: Prompts users to leverage appropriate filters when querying data
56
conditionally_filter
WHAT: A default filter that can be removed if at least one of the specified alternative filter
fields is selected
WHY: Typically used to prevent users from accidentally creating very large queries that
may be too expensive to run on your database
57
Caching &
Datagroups
Caching in Looker
59
Using Cached Queries
● A query is run by a user and cached (cache results are stored in an encrypted file on
the Looker instance)
● For any new queries, the cache is checked to see if the same query was previously
run before running the query against the database
○ If the query is not found, Looker runs the query against the database and
caches the new result
○ If the query is found and the results are still valid then Looker uses the cached
results
○ If the query is found and the results are no longer valid, Looker runs the query
against the database and caches the new result
60
Datagroups
WHAT: Named caching policies within Looker that can be applied to Models, Explores, or
Persistent Derived Tables
WHY: Integrate Looker more closely with ETL processes or guarantee a refreshed cache
● Define one or more datagroup parameters at the model level
● Different caching policies require separate datagroup definitions
61
Configuring Datagroups
62
Applying Datagroups to Query Results
A datagroup’s caching policy can be applied to one, some or all Explores in a Model.
● As a default for all explores in a model: use the persist_with parameter at the
model level and specify the name of the datagroup
● For a specific explore: use the persist_with parameter in that Explore’s definition
and specify the name of the datagroup
● For a group of explores: use the persist_with parameter in each of those Explore’s
definition and specify the name of the same datagroup
Note: Datagroups can also be used to add persistence to derived tables, which will be covered in the
next section.
63
Derived Tables
Derived Tables
WHAT: Tables defined within Looker that do not already exist in the database
● Two types of derived tables
○ Ephemeral: built at query time
○ Persisted: stored in the database
● Defined within the LookML
● Referenced in the LookML just like any other table
WHY: Expand the sophistication of analyses
● Aggregate data to a different level of granularity (example: aggregate fact data)
● Speed up performance (example: precompute joins)
● Write custom SQL for advanced use cases (example: utilize window functions)
65
Building SQL
Transition
Derived Tables
66
SQL Derived Tables
67
SQL Derived Tables
68
SQL Derived Tables
Select the project in which the Derived Table should be added and input a descriptive
name for the table:
69
SQL Derived Tables
Looker creates a new View with the SQL Runner query and automatically writes
Dimensions for every field as well as a count measure:
70
Persisting
Transition
Derived Tables
71
Persisting Derived Tables
72
Derived Table Refresh Logic
● Use persist_for to set the length of time the derived table should be stored
before it is dropped from the database
**Recommended Approach
73
Indexing Derived Tables
74
Ephemeral vs. Persistent Derived Tables
Ephemeral derived tables will build at runtime as a temporary table (mysql) or via a
SQL common table expression.
75
Ephemeral vs. Persistent Derived Tables
Persistent derived tables will be stored as physical tables within the database once
built. Looker will then simply query those physical tables as needed.
76
Best Practices
Naming Conventions
● Name measures with aggregate function or common terms. total_[FIELD] for sum,
count_[FIELD], avg_[FIELD], etc.
● Name ratios descriptively. For example, “orders per purchasing customers” is
clearer than “orders percent.”
● Name yesno fields clearly: “Orders Is Returned” instead of “Returned”.
● Avoid the words “date” or “time” in a dimension group because Looker appends
each timeframe to the end of the dimension name: “created_date” becomes
“created_date_date”.
78
Model Organization
● Joining many to one from the most granular level typically provides the best query
performance.
● Use the fewest number of explores possible that allows users to easily get access to
the answers they need.
● Organize Explores using the group_label parameter to help the end-user find the
correct Explore as easily as possible.
79
Explore Design
80
Join Design
81
PDT Usage
● Choose the parameter sql_trigger_value over persist_for when you want to have
data ready the first time someone runs an explore or on a schedule.
● Evaluate your sql_trigger_value schedules such that tables are not building during
business hours/replication processes/peak usage points. Trigger the tables late in
the night or early in the morning, after ETL is expected to be completed.
● Always define indexes/distkeys/sortkeys to improve query performance. Generally
speaking, indexes should be applied to primary keys and date or time columns.
82
Questions?