100% found this document useful (2 votes)

824 views83 pages

LookML Foundations Training

The document provides an overview of LookML concepts including: - LookML informs Looker's abstraction of SQL to define modeling layers, views, dimensions, measures, explores, and projects. - Dimensions are always in the GROUP BY and measures are always aggregated. Views correspond to tables or derived tables joined in explores. - The document outlines how to create a new LookML project, set up Git version control, and switch between development and production modes. - It describes how Looker writes SQL based on dimensions, measures, filters, and joins defined in LookML and how to reference objects between files. Dimension types like string, number, yesno and tier are also covered.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

824 views83 pages

LookML Foundations Training

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 83

LookML Foundations

Training
Looker Hosted Webinar
Agenda

1. Introduction to Looker

2. Creating Dimensions and Measures

3. Model Files

4. Caching and Datagroups

5. Derived Tables

6. Best Practices
2
Introduction to
Looker
Defining
Transition
Terms

4
LookML

● Informs Looker in abstracting SQL

● Creates the modeling layer between
the database and the user
● Defines:
○ Join logic between tables (Views)
○ Custom tables (Derived Tables)
defined by Looker
○ Fields taken directly from the
database
○ Custom fields defined in Looker

5
Dimensions & Measures

● Dimensions
○ Always in the GROUP BY part of
the query
○ Automatically created for all
fields in a table

● Measures
○ Always part of an aggregate
function
○ Defined as a function of fields
that have already been
aggregated

6
View

● View files correspond to either

○ Tables in the database (Standard
View Files)
○ Virtual tables defined in Looker
(Derived Tables), or
○ Tables defined in Looker that are
physically written (materialized
view) to the database (Persistent
Derived Tables or PDTs)
● Explores are made with one or more
view files joined together
● Views become headers in Explores
● Dimensions and Measures are defined
in View files
7
Explore

● Explores are the starting point for

analysis
● Should be clearly organized around
business themes, (e.g., users, orders,
inventory, etc.) to minimize confusion
for the end-user

8
Model

● Contains data connection information

and Explore definitions
● Can be used for the following
purposes:
○ Restrict user access to certain
Explores
○ To separate and organize
Explores by business area

9
Project

● Highest level Looker object

● An almost independent Looker
instance
● Usually used for completely different
data sources
● View files cannot be shared between
different projects (unless a project
import is used)

10
Creating
Transition
a New Project

11
LookML Projects

● Modeling in Looker is based around LookML projects. Typically a project consists of

the following files:
○ One or more model files that define the project’s Explore options and joins

○ Multiple view files, each corresponding to a database table or derived table

○ One or more dashboard files which define the data and layouts for dashboards,
if you choose to use LookML Dashboards in addition to User Defined
Dashboards

12
Creating a New LookML Project

From the “Manage Project” screen,click on the button at bottom of page

1. Project Name

2. Click to run Looker model

generator to create Model file and
default explores & view files

3. Specify database connection

4. Specify All Tables or just a Single

Table from database

5. Underlying Data Schema

6. Prefixes to ignore (for example

“_” to ignore underscores)

13
Setting
Transition
Up Git

14
Configure Git

● Hit the “Configure Git” button

● Enter the URL for your Git repository
○ GitHub:
[email protected]:myorganization/myproject.git
○ GitHub Enterprise:
[email protected]:myorganization/myproject.git
○ GitLab:
[email protected]:myorganization/myproject.git
○ Bitbucket:
[email protected]:myorganization/myproject.git
○ Phabricator:
ssh://[email protected]/diffusion/MYCAL
LSIGN/myproject.git

15
Project

● If Looker does not successfully detect

your Git provider, it will ask you to
choose from a dropdown.
● Copy and paste the generated SSH
key as a deploy key in your Git host.
In GitHub, remember to check the
Allow Write Access checkbox.
● Click Continue Setup
● Commit and push to make your code
show up on your new repo!

16
Looker Development
TransitionEnvironment

17
Development Mode vs. Production Mode

A Looker data model can exists in two states: production mode and development mode.
PRODUCTION MODE: Users typically explore data in Looker in production mode. The
data model is shared across all users, and the LookML files are treated as read-only
DEVELOPMENT MODE: Developers must be in development mode when making a
change to the LookML. This mode accesses a completely separate version of the data
model that only the developers can see and edit. (In Git terms, development mode is
handled by a separate branch).

Development mode allows developers to make and test LookML changes without
affecting other users.

18
Switching In and Out of Development Mode

Developers can switch development mode on and off by clicking the Development Mode
ON/OFF button within the Develop Menu in Looker. There is also a keyboard shortcut
Ctrl+Shift+D.

While in development mode:

● LookML and Explore menus will be populated by the development version of the
model
● There will be a purple development mode bar at the top of the browser window
19
Git Integration for Version Control

Looker’s IDE is integrated with Git for version control. When LookML changes are ready
to be pushed to Production, the Git menu can be used to Commit and Deploy.

Looker automatically manages the Git workflows for committing, pulling, and pushing
changes.
20
How Looker
Transition
Writes SQL

21
How Looker Writes SQL

● Dimensions and measures in base view, note:

○ Dimensions are in the GROUP BY
○ Measures are in aggregating functions
● Dimensions and measures in the joined view, note:
○ Base view is still joined in
○ Unnecessary Views are not joined in
● Filter Dimensions and Measures
○ Dimension filter in the WHERE clause
○ Measure filter in the HAVING clause
● Row limit in the LIMIT

22
Creating
Dimensions &
Measures
Referencing Objects

dimension: sale_price {
type: number
sql: ${TABLE}.sale_price
Referencing a database object in Looker:
;;
● ${TABLE} references the table defined in the }
View
● Looker automatically creates dimensions for every measure: total_revenue {
field in the view type: sum
sql: ${sale_price} ;;
Referencing another Looker object: }
● ${field_name} references the Looker object
measure:
average_sale_price {
type: average
sql: ${sale_price} ;;
}
24
Dimension Types

● Dimension type string: dimension: full_name {

○ Used for text dimensions type: string
○ Default type sql: ${first_name} || ${last_name}
;;
}
● Dimension type number:
○ Used for numeric dimension: days_since_signup {
dimensions type: number
○ Commonly used with date sql: DATEDIFF(day, ${created_date},
difference calculations or current_date) ;;
basic row-level math }
across fields

25
Dimension Types

● Dimension type yesno: dimension: is_new_customer {

○ Define logical condition in type: yesno
sql parameter sql: ${days_since_signup} <= 90 ;;
○ Field becomes either yes }
or no

● Dimension type tier: dimension: days_since_signup_tier {

○ Buckets a dimension using type: tier
case statements tiers: [0, 30, 90, 180, 360, 720]
○ Style: classic, interval, sql: ${days_since_signup} ;;
integer, and relational style: integer
}

26
Time Dimension Groups

● Looker can cast a date or timestamp dimension_group: created {

into several different forms of time type: time
● The timeframes parameter is used timeframes: [raw, time, date,
to specify the specific date and time hour,
parts that are required hour_of_day, day_of_week,
● Number of dimension fields created day_of_week_index, time_of_day,
is dependent upon the number of week,
timeframes listed month_num, month, year, quarter,
● Fields can be referenced by quarter_of_year]
appending the date or time part sql: ${TABLE}.created_at ;;
desired to the name of the }
dimension group with an underscore
These fields are then referenced as:
${created_time} , ${created_date} ,
27
${created_hour_of_day} , etc.
Measure Types

● Three most common Measure dimension: cost {

types: type: number
○ Sum sql: ${TABLE}.cost ;;
○ Average }
○ Count
● There are two types of counts: measure: total_cost {
○ type: count (Counts rows in type: sum
that table and does NOT sql: ${cost} ;;
require a SQL parameter) }
○ type: count_distinct
(Computes a distinct count of measure: average_cost {
the field in the SQL type: average
parameter) sql: ${cost} ;;
}
28
Filtered Measures

measure: count_female_users {
type: count
● YesNo dimensions can be used as
filters: {
filters
field: gender
● Looker transfers the logic of the
value: "Female"
yesno dimension into the case
}
statement that produces the
measure
}
● Other dimension types can also be
used as filters
measure: total_sales_new_users {
type: sum
sql: ${sale_price}
filters: {
field: users.is_new_user
value: "Yes"
29
}
}
Measures Defining Other Measures

● Measures can be used within other

measures for more complex
calculations
● Measures that use other measures measure: percentage_female_users {
in their SQL definitions should have type: number
a type of number (nested value_format_name: percent_1
aggregations will cause errors) sql: 1.0*${count_female_users}
● the value_format_name /NULLIF(${count}, 0) ;;
parameter can be utilized to format }
the final output

30
Referencing Fields in Other View Files

● Fields in another view can be

referenced if the two views have
been joined together within an
Explore dimension: profit {
● Fields that reference other views will type: number
only be valid within an Explore in value_format_name: usd
which joins are defined for both sql: ${sale_price} -
required views ${inventory_items.cost} ;;
● If a field is selected in the UI that }
references multiple views, Looker
needs to know how to execute the
joins necessary for completing the
calculation

31
Helpful Field Parameters

● hidden : hides a field from the user interface while still allowing it to be available for
modeling (great for fields like primary keys that are not meaningful to users)

● label : changes how a field name will appear in the Field Picker

● description : displays additional information about a field to users upon hovering

● value_format_name : formats Looker cells using built-in or custom format names

● drill_fields : controls what fields are shown to a user when he or she clicks on
the value of a table cell to “drill” into the data while Exploring

● group_label : combines fields into custom groups within a view in the Field Picker

32
Working with Model
Files
Explores
Transition
and Join Logic

34
Building Explores

Key Join Parameters

explore: inventory_items {
1. join : the name of the join, which is
join: products {
typically the name of the view
type: left_outer
being joined
sql_on:
2. type : the type of join that should
${inventory_items.product_id}
occur (left_outer join by default)
= ${products.id} ;;
3. sql_on : the fields that should be
relationship: many_to_one
used within the ON clause of the
}
SQL query in order to join the two
tables together
4. relationship : how the two tables
relate to each other

35
Types of Joins
1. The name of the Explore.
2. Base View: The one View that is
always joined in.
3. Standard join
4. Joins renaming the view such that
the same view can be joined twice
5. Indirect join

Note: Defining a primary key in each View

and properly defining the relationship
parameter is very important
36
Building SQL from Explores

Typical SQL statement:

SELECT column_a, column_b, …, column_n

FROM table_a
JOIN table_b on table_a.field_1 = table_b.field_1
JOIN table_c on table_a.field_2 = table_c.field_2
WHERE some condition equals some value

SELECT = User chosen dimensions and measures

FROM clause plus any JOINs = Explore
WHERE clause = User-added filters

37
Translating SQL to an Explore

SELECT
flights.destination AS "flights.destination",
carriers.name AS "carriers.name", explore: flights {
aircraft.name AS "aircraft.name",
aircraft_origin.city AS "aircraft_origin.city",
COUNT(*) AS "flights.1_count" join: carriers {
sql_on: ${flights.carrier} = ${carriers.code} ;;
relationship: many_to_one
FROM flights AS flights }
LEFT JOIN public.carriers AS carriers
ON flights.carrier = carriers.code join: aircraft {
sql_on: ${flights.tail_num} = ${aircraft.tail_num} ;;
LEFT JOIN public.aircraft AS aircraft
ON flights.tail_num = aircraft.tail_num relationship: many_to_one
}
LEFT JOIN public.airports AS aircraft_origin
ON flights.origin = aircraft_origin.code join: aircraft_origin {
from: airports
WHERE (flights.cancelled = 'N') AND (aircraft_origin.state = 'CA') sql_on: ${flights.origin} = ${aircraft_origin.code} ;;
GROUP BY 1,2,3,4,5,6 relationship: many_to_one
fields: [full_name, city, state, code]
}
}

38
Helpful Explore Parameters

● label : changes how an Explore name will appear

● description : displays additional information about an Explore upon hovering over

the information icon within the Explore dropdown menu

● view_label : changes the label of the view within the field picker in the Explore

● group_label : combines Explores into custom groups within the Explore dropdown
menu the Explore

● fields : limits the scope of fields that are available within an Explore or view

39
Symmetric
Transition
Aggregation

40
What is the fanout problem?

Consider the following two tables individually:

● count(*) produces a count of customers and orders, and
● sum(visits) and sum(amount) gives total visits and total revenue

41
What is the fanout problem?

Consider joining these two tables on customer_id, like we would in the following Explore:

42
What is the fanout problem?

This join “fans out” the customer table

● count(*) and sum(amount) still work on the order-table fields, but
● count(*) and sum(visits) produce incorrect results on the customer-table fields
● Looker can handle this situation with Symmetric Aggregates!

43
Using Symmetric Aggregates

To leverage Symmetric Aggregation, two things MUST be done:

1. Specify primary keys in the view files. (This means a field that uniquely identifies
each row. If none exists, we can make one by concatenating fields together.)

44
Using Symmetric Aggregates

To leverage Symmetric Aggregation, two things MUST be done:

2. Correctly specify the “relationship” parameter. The possible values are

one_to_one, one_to_many, many_to_one, or many_to_many.

Left side: the view joined from Right side: the view joined to
(other view used in “sql_on:”) (the name next to “join:”)

45
Identifying the Correct Join Relationship

Test each possible relationship in a sentence. For example:

✖ one customer can relate to only one order

✔ one customer can relate to many orders
✖ many customers can relate to only one order
✖ many customers can relate to many (possibly same) orders

46
Results of Missing Primary Keys

If we do not specify a primary key in the view files:

1. Measures in joined in Views don’t come through to the Explore
2. Measures do work in the Base view but will result in errors if joining another table

1 2

47
Results of Incorrect Join Relationships

What happens when we incorrectly specify the relationships: parameter?

The Measures from the Orders table are correct, while the Measures from the fanned out
Customers table are not.

48
Using Symmetric Aggregates

If we identify Primary Keys and correctly specify the relationship: parameter

All measures in all Views are calculated correctly

49
How Does It Work?

Counts are simple: Looker does a count distinct of the primary keys.

Sums and averages are a bit more complex, but basically function in the same way:

50
How Does It Work?

SUM(DISTINCT (visits + MD5(customer_id))) - SUM(DISTINCT (MD5(customer_id)))

51
Explore Filters
Filtering Explores: Learning Objectives

● Understand the most commonly utilized options for applying default filters to an
Explore
○ sql_always_where and sql_always_having
○ always_filter
○ conditionally_filter

● Recognize common use cases for adding Explore filters

Note: This training will cover the most common Explore filter options. Check out Looker docs to see additional Explore
filter options.

53
sql_always_where and sql_always_having

WHAT: A filter within an Explore that cannot be changed by users

● No indicator in the UI (unless user looks at generated SQL)
● Applies to:
○ Users’ queries
○ Looks and Dashboards
○ Scheduled Content
○ Embeds
● Written in SQL and can use LookML substitutions: ${field_name}
WHY: Certain values should be filtered out of the Explore for ALL USERS (such as test
users, internal orders, etc.)

54
sql_always_where and sql_always_having

Example: sql_always_having

55
always_filter

WHAT: Required filter fields that are automatically added to the Explore
● Filter value can be changed but the filter itself cannot be removed
● Default values are written as Looker Filter Expressions
WHY: Prompts users to leverage appropriate filters when querying data

56
conditionally_filter

WHAT: A default filter that can be removed if at least one of the specified alternative filter
fields is selected
WHY: Typically used to prevent users from accidentally creating very large queries that
may be too expensive to run on your database

57
Caching &
Datagroups
Caching in Looker

● Using cached results of prior queries helps to reduce database load

● Caching policies can be set up in Looker using datagroups
● These caching policies can then be applied to various Looker objects:
○ Use persist_with parameters at the model or explore level to specify which
explores use each policy for clearing the query cache
○ Use datagroup_trigger in a PDT definition to specify which policy to use in
rebuilding PDTs
○ Build schedules that trigger based on datagroups to cause Looks or
Dashboards to run and send immediately after the cache has been invalidated,
thus warming the cache with the latest results

59
Using Cached Queries

● A query is run by a user and cached (cache results are stored in an encrypted file on
the Looker instance)
● For any new queries, the cache is checked to see if the same query was previously
run before running the query against the database
○ If the query is not found, Looker runs the query against the database and
caches the new result
○ If the query is found and the results are still valid then Looker uses the cached
results
○ If the query is found and the results are no longer valid, Looker runs the query
against the database and caches the new result

60
Datagroups

WHAT: Named caching policies within Looker that can be applied to Models, Explores, or
Persistent Derived Tables
WHY: Integrate Looker more closely with ETL processes or guarantee a refreshed cache
● Define one or more datagroup parameters at the model level
● Different caching policies require separate datagroup definitions

61
Configuring Datagroups

Caching policy parameters:

● sql_trigger parameter
○ Should be SQL query that returns one row with one column
○ Typically will query a field that serves as a good indicator that the underlying
data has been updated, such as a max(date) or will return a specific time of day
● max_cache_age to indicate the longest amount of time in which a query should be
cached before being invalidated
Only one of these parameters is required

62
Applying Datagroups to Query Results

A datagroup’s caching policy can be applied to one, some or all Explores in a Model.
● As a default for all explores in a model: use the persist_with parameter at the
model level and specify the name of the datagroup
● For a specific explore: use the persist_with parameter in that Explore’s definition
and specify the name of the datagroup
● For a group of explores: use the persist_with parameter in each of those Explore’s
definition and specify the name of the same datagroup

Note: Datagroups can also be used to add persistence to derived tables, which will be covered in the
next section.
63
Derived Tables
Derived Tables

WHAT: Tables defined within Looker that do not already exist in the database
● Two types of derived tables
○ Ephemeral: built at query time
○ Persisted: stored in the database
● Defined within the LookML
● Referenced in the LookML just like any other table
WHY: Expand the sophistication of analyses
● Aggregate data to a different level of granularity (example: aggregate fact data)
● Speed up performance (example: precompute joins)
● Write custom SQL for advanced use cases (example: utilize window functions)

65
Building SQL
Transition
Derived Tables

66
SQL Derived Tables

Build and test a SQL query within SQL Runner:

67
SQL Derived Tables

Within the Gear Menu, Select “Add to Project”:

68
SQL Derived Tables

Select the project in which the Derived Table should be added and input a descriptive
name for the table:

69
SQL Derived Tables

Looker creates a new View with the SQL Runner query and automatically writes
Dimensions for every field as well as a count measure:

70
Persisting
Transition
Derived Tables

71
Persisting Derived Tables

Two parameters should be added to a derived table when persisting it

● Table refresh logic for table rebuilding
○ datagroup_trigger: a caching policy definition triggered by a change in the
underlying data and/or a maximum time to use the cached values
○ sql_trigger_value: triggered by a change in the underlying data
○ persist_for: a set time period
● Add indexes

72
Derived Table Refresh Logic

● Use a datagroup_trigger in the PDT’s definition to use a datagroup’s caching

policy to trigger rebuilding a PDT**

● Use a sql_trigger_value in the PDT’s definition to rebuild the PDT based on a

change in the value returned by that query

● Use persist_for to set the length of time the derived table should be stored
before it is dropped from the database

**Recommended Approach
73
Indexing Derived Tables

● indexes: [field1, field2]

● Redshift
○ distkey: field1
■ Controls how the the data is distributed across the nodes.
■ Joining on dist keys is most efficient
○ sortkeys: [field1, field2]
■ Controls the order the data is written to disk
■ Filtering on sort_keys is most efficient
○ indexes: [field1, field2]
■ This creates Interleaved Sort Keys
■ Works best for very large datasets

74
Ephemeral vs. Persistent Derived Tables

Ephemeral derived tables will build at runtime as a temporary table (mysql) or via a
SQL common table expression.

75
Ephemeral vs. Persistent Derived Tables

Persistent derived tables will be stored as physical tables within the database once
built. Looker will then simply query those physical tables as needed.

76
Best Practices
Naming Conventions

● Name measures with aggregate function or common terms. total_[FIELD] for sum,
count_[FIELD], avg_[FIELD], etc.
● Name ratios descriptively. For example, “orders per purchasing customers” is
clearer than “orders percent.”
● Name yesno fields clearly: “Orders Is Returned” instead of “Returned”.
● Avoid the words “date” or “time” in a dimension group because Looker appends
each timeframe to the end of the dimension name: “created_date” becomes
“created_date_date”.

78
Model Organization

● Joining many to one from the most granular level typically provides the best query
performance.
● Use the fewest number of explores possible that allows users to easily get access to
the answers they need.
● Organize Explores using the group_label parameter to help the end-user find the
correct Explore as easily as possible.

79
Explore Design

● Don’t join in extraneous views.

● Use the fields parameter to limit fields surfaced to the end user.
● Comment out or delete extraneous autogenerated explores in the model file to
reduce clutter when developing.

80
Join Design

● Use ${date_raw} when joining on a date.

● Always define a relationship using the relationship parameter to ensure correct
aggregates are produced (and always declare a primary key in the View file.)

81
PDT Usage

● Choose the parameter sql_trigger_value over persist_for when you want to have
data ready the first time someone runs an explore or on a schedule.
● Evaluate your sql_trigger_value schedules such that tables are not building during
business hours/replication processes/peak usage points. Trigger the tables late in
the night or early in the morning, after ETL is expected to be completed.
● Always define indexes/distkeys/sortkeys to improve query performance. Generally
speaking, indexes should be applied to primary keys and date or time columns.

82
Questions?

Data Engineering Quick Reference
No ratings yet
Data Engineering Quick Reference
9 pages
Fast Data Processing with Spark 2 - Third Edition
From Everand
Fast Data Processing with Spark 2 - Third Edition
Krishna Sankar
No ratings yet
Interview Quations Data Science
50% (2)
Interview Quations Data Science
3 pages
ThoughtSpot User Guide 4.5
No ratings yet
ThoughtSpot User Guide 4.5
409 pages
Interview Data Engineer
100% (1)
Interview Data Engineer
13 pages
LTE Question Bank
No ratings yet
LTE Question Bank
49 pages
Lab #5 - Assessment Worksheet Elements of a Security Awareness & Training Policy Student Name: Vũ Tuấn Anh Student ID: SE130255
No ratings yet
Lab #5 - Assessment Worksheet Elements of a Security Awareness & Training Policy Student Name: Vũ Tuấn Anh Student ID: SE130255
5 pages
Looker Overview
No ratings yet
Looker Overview
4 pages
Microsoft Certified: Power BI Data Analyst Associate PL 300 Practice Tests
From Everand
Microsoft Certified: Power BI Data Analyst Associate PL 300 Practice Tests
CertSquad Professional Trainers
No ratings yet
Looker
No ratings yet
Looker
57 pages
Bigdata Interview Preparation Guide
No ratings yet
Bigdata Interview Preparation Guide
292 pages
Data Engineering For Everyone 3
No ratings yet
Data Engineering For Everyone 3
81 pages
Pentaho Data Integration Cookbook - Second Edition
From Everand
Pentaho Data Integration Cookbook - Second Edition
María Carina Roldán
No ratings yet
GCP Data Engineer Resume Examples For 2024 Resume Worded
No ratings yet
GCP Data Engineer Resume Examples For 2024 Resume Worded
1 page
Data Engineering Cookbook
100% (1)
Data Engineering Cookbook
124 pages
Learning Spark Preview Ed
No ratings yet
Learning Spark Preview Ed
18 pages
Data Science Use Cases
100% (1)
Data Science Use Cases
10 pages
Powerbi Notes
No ratings yet
Powerbi Notes
1 page
Apache Spark Essential Training
No ratings yet
Apache Spark Essential Training
30 pages
RDBMS To MongoDB Migration
No ratings yet
RDBMS To MongoDB Migration
19 pages
Spark: Prepared by Dulari Bhatt
No ratings yet
Spark: Prepared by Dulari Bhatt
19 pages
Apache Cassandra
No ratings yet
Apache Cassandra
3 pages
Pyspark Vs Spark SQL
No ratings yet
Pyspark Vs Spark SQL
6 pages
Azure-Databricks-Virtual-Workshop-21-Apr - FINAL PDF
No ratings yet
Azure-Databricks-Virtual-Workshop-21-Apr - FINAL PDF
43 pages
Instant Pentaho Data Integration Kitchen
From Everand
Instant Pentaho Data Integration Kitchen
Sergio Ramazzina
No ratings yet
Spark Tutorial
No ratings yet
Spark Tutorial
8 pages
Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified in Apache Spark using practical examples with Python
From Everand
Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified in Apache Spark using practical examples with Python
Saba Shah
No ratings yet
AWS DataEngineering
100% (1)
AWS DataEngineering
23 pages
Spark Interview Ques1
No ratings yet
Spark Interview Ques1
20 pages
Spark Databricks Summary
80% (5)
Spark Databricks Summary
100 pages
Apache Spark Interview Questions
No ratings yet
Apache Spark Interview Questions
12 pages
Simplifying Data Engineering Databricks
100% (1)
Simplifying Data Engineering Databricks
20 pages
Data Engineering Explanation
No ratings yet
Data Engineering Explanation
43 pages
Mastering Data Engineering and Analytics with Databricks: A Hands-on Guide to Build Scalable Pipelines Using Databricks, Delta Lake, and MLflow (English Edition)
From Everand
Mastering Data Engineering and Analytics with Databricks: A Hands-on Guide to Build Scalable Pipelines Using Databricks, Delta Lake, and MLflow (English Edition)
Manoj Kumar
No ratings yet
Ineuron Slides PDF
No ratings yet
Ineuron Slides PDF
38 pages
Azure DataEngineering End To End Videos
No ratings yet
Azure DataEngineering End To End Videos
21 pages
PySpark Reference Guide
No ratings yet
PySpark Reference Guide
2 pages
PySpark Cheat Sheet For RDD Operations
No ratings yet
PySpark Cheat Sheet For RDD Operations
1 page
Data Engineering with Databricks Cookbook: Build effective data and AI solutions using Apache Spark, Databricks, and Delta Lake
From Everand
Data Engineering with Databricks Cookbook: Build effective data and AI solutions using Apache Spark, Databricks, and Delta Lake
Pulkit Chadha
No ratings yet
Cloud Digital Leader Learning Path Quiz Solutions
No ratings yet
Cloud Digital Leader Learning Path Quiz Solutions
9 pages
Hive Interview Questions Answers
No ratings yet
Hive Interview Questions Answers
6 pages
Apache Spark For Beginners
No ratings yet
Apache Spark For Beginners
30 pages
Instant SQL Server Analysis Services 2012 Cube Security
From Everand
Instant SQL Server Analysis Services 2012 Cube Security
Satya SK Jayanty
No ratings yet
Distributed Computing With Python - Sample Chapter
No ratings yet
Distributed Computing With Python - Sample Chapter
18 pages
Cassandra Tutorial
No ratings yet
Cassandra Tutorial
27 pages
Apache Spark Interview Questions and Answers PDF
No ratings yet
Apache Spark Interview Questions and Answers PDF
31 pages
Top 200 Data Engineer Interview Question PDF
100% (4)
Top 200 Data Engineer Interview Question PDF
482 pages
PySpark Essentials: A Practical Guide to Distributed Computing
From Everand
PySpark Essentials: A Practical Guide to Distributed Computing
Robert Johnson
No ratings yet
Data Engineering Questions Answers 1679109980
No ratings yet
Data Engineering Questions Answers 1679109980
26 pages
Caching in Spark
No ratings yet
Caching in Spark
51 pages
Making Big Data Simple With Databricks
No ratings yet
Making Big Data Simple With Databricks
25 pages
Power BI Interview Question
100% (1)
Power BI Interview Question
14 pages
Powerbi Course Syllabus
No ratings yet
Powerbi Course Syllabus
14 pages
1 - Creating A Data Transformation Pipeline With Cloud Dataprep
0% (1)
1 - Creating A Data Transformation Pipeline With Cloud Dataprep
39 pages
Cloud Data Engineer Prep
100% (1)
Cloud Data Engineer Prep
7 pages
Python Data Cleaning Cookbook: Prepare your data for analysis with pandas, NumPy, Matplotlib, scikit-learn, and OpenAI
From Everand
Python Data Cleaning Cookbook: Prepare your data for analysis with pandas, NumPy, Matplotlib, scikit-learn, and OpenAI
Michael Walker
5/5 (1)
PySpark RDD Basics PDF
No ratings yet
PySpark RDD Basics PDF
1 page
SQL Interview Questions For A Data Engineer
No ratings yet
SQL Interview Questions For A Data Engineer
11 pages
Spark Interview Questions 1713805760
No ratings yet
Spark Interview Questions 1713805760
40 pages
Final Print Py Spark
No ratings yet
Final Print Py Spark
133 pages
Intro To Spark Development
No ratings yet
Intro To Spark Development
172 pages
Data Cleaning with Power BI: The definitive guide to transforming dirty data into actionable insights
From Everand
Data Cleaning with Power BI: The definitive guide to transforming dirty data into actionable insights
Gus Frazer
No ratings yet
Ficha Tecnica DH Xvr1b04h I Lyd3pe
No ratings yet
Ficha Tecnica DH Xvr1b04h I Lyd3pe
3 pages
Counter-Strike 1.6 Download
No ratings yet
Counter-Strike 1.6 Download
3 pages
Steps For Using DCB Pro-TPD Tracker Updated 20191203
No ratings yet
Steps For Using DCB Pro-TPD Tracker Updated 20191203
18 pages
Product Backlog: The Application Development Project Topic
No ratings yet
Product Backlog: The Application Development Project Topic
2 pages
Voip Cpe SWC-5100 Series User Manual: Seowon Intech
No ratings yet
Voip Cpe SWC-5100 Series User Manual: Seowon Intech
29 pages
Software Solution Test Guidelines For Microsoft Dynamics Business Central On-Premises
No ratings yet
Software Solution Test Guidelines For Microsoft Dynamics Business Central On-Premises
63 pages
Automated Mobile Testing Using Appium
No ratings yet
Automated Mobile Testing Using Appium
34 pages
Error Detection and Correction
100% (3)
Error Detection and Correction
36 pages
BTF QB Chap 6
No ratings yet
BTF QB Chap 6
40 pages
1102 - Chapter 13 Users, Groups, and Permissions - Slide Handouts
No ratings yet
1102 - Chapter 13 Users, Groups, and Permissions - Slide Handouts
30 pages
Mendoza Lance Jimwell U. Nas Vs The Cloud
No ratings yet
Mendoza Lance Jimwell U. Nas Vs The Cloud
4 pages
Test Partner
No ratings yet
Test Partner
2 pages
What Is Linux?
No ratings yet
What Is Linux?
8 pages
"Firewall": A Seminar Report On
No ratings yet
"Firewall": A Seminar Report On
15 pages
Lesson Plan Copa NW
No ratings yet
Lesson Plan Copa NW
91 pages
CO - Chap 1
No ratings yet
CO - Chap 1
33 pages
Ict CSS 12 Q1 Las2 Final
No ratings yet
Ict CSS 12 Q1 Las2 Final
12 pages
Malicious PDF Files Detecting and Analyzing
No ratings yet
Malicious PDF Files Detecting and Analyzing
26 pages
Cyber Security
No ratings yet
Cyber Security
22 pages
MiCollab Installation and Maintenance Guide For MiVB and MiVO 250
No ratings yet
MiCollab Installation and Maintenance Guide For MiVB and MiVO 250
261 pages
Network Connectivity: Introduction To Computing Chapter 1: Computing Essentials Lesson 5 - Network Connectivity
No ratings yet
Network Connectivity: Introduction To Computing Chapter 1: Computing Essentials Lesson 5 - Network Connectivity
3 pages
RAH Infotech Profile
No ratings yet
RAH Infotech Profile
2 pages
Connected Components Workbench: Software
No ratings yet
Connected Components Workbench: Software
7 pages
Indusoft Web Studio V8.1: Install and Run Iotview On Raspberry Pi
No ratings yet
Indusoft Web Studio V8.1: Install and Run Iotview On Raspberry Pi
6 pages
Data Veiling Using Residue Number System Based Encryption in Cybersecurity
No ratings yet
Data Veiling Using Residue Number System Based Encryption in Cybersecurity
6 pages
Security Issues, E-Commerce Threats: Part-1
No ratings yet
Security Issues, E-Commerce Threats: Part-1
23 pages
Trevaughn Banton I.T. 2011 Paper
No ratings yet
Trevaughn Banton I.T. 2011 Paper
5 pages
Netapp Ontap Flexgroup Volumes: Best Practices and Implementation Guide
No ratings yet
Netapp Ontap Flexgroup Volumes: Best Practices and Implementation Guide
213 pages

LookML Foundations Training

Uploaded by

LookML Foundations Training

Uploaded by

LookML Foundations

2. Creating Dimensions and Measures

4. Caching and Datagroups

● Informs Looker in abstracting SQL

● View files correspond to either

● Explores are the starting point for

● Contains data connection information

● Highest level Looker object

● Modeling in Looker is based around LookML projects. Typically a project consists of

○ Multiple view files, each corresponding to a database table or derived table

From the “Manage Project” screen,click on the button at bottom of page

2. Click to run Looker model

3. Specify database connection

4. Specify All Tables or just a Single

5. Underlying Data Schema

6. Prefixes to ignore (for example

● Hit the “Configure Git” button

● If Looker does not successfully detect

While in development mode:

● Dimensions and measures in base view, note:

● Dimension type string: dimension: full_name {

● Dimension type yesno: dimension: is_new_customer {

● Dimension type tier: dimension: days_since_signup_tier {

● Looker can cast a date or timestamp dimension_group: created {

● Three most common Measure dimension: cost {

● Measures can be used within other

● Fields in another view can be

● description : displays additional information about a field to users upon hovering

● value_format_name : formats Looker cells using built-in or custom format names

Key Join Parameters

Note: Defining a primary key in each View

Typical SQL statement:

SELECT column_a, column_b, …, column_n

SELECT = User chosen dimensions and measures

● label : changes how an Explore name will appear

● description : displays additional information about an Explore upon hovering over

Consider the following two tables individually:

This join “fans out” the customer table

To leverage Symmetric Aggregation, two things MUST be done:

To leverage Symmetric Aggregation, two things MUST be done:

2. Correctly specify the “relationship” parameter. The possible values are

Test each possible relationship in a sentence. For example:

✖ one customer can relate to only one order

If we do not specify a primary key in the view files:

What happens when we incorrectly specify the relationships: parameter?

If we identify Primary Keys and correctly specify the relationship: parameter

All measures in all Views are calculated correctly

SUM(DISTINCT (visits + MD5(customer_id))) - SUM(DISTINCT (MD5(customer_id)))

● Recognize common use cases for adding Explore filters

WHAT: A filter within an Explore that cannot be changed by users

● Using cached results of prior queries helps to reduce database load

Caching policy parameters:

Build and test a SQL query within SQL Runner:

Within the Gear Menu, Select “Add to Project”:

Two parameters should be added to a derived table when persisting it

● Use a datagroup_trigger in the PDT’s definition to use a datagroup’s caching

● Use a sql_trigger_value in the PDT’s definition to rebuild the PDT based on a

● indexes: [field1, field2]

● Don’t join in extraneous views.

● Use ${date_raw} when joining on a date.

You might also like