0% found this document useful (0 votes)
51 views5 pages

Databricks - Data Analyst

professional Certificates
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views5 pages

Databricks - Data Analyst

professional Certificates
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

10/17/22, 11:23 PM OneNote

PracticeEx
am-

PracticeEx Real
am- Databricks

Flow of users in a website Where caches is stored?


What kind of visualization should be used
A query is taking data from cache or not how to check
Choropleth map is good options
Not Sankey 1. From endpoints or warehouse

Cohorts is the answer 2. From query history

Databricks SQL UI caching: Per user caching of all query and dashboard results in
the Databricks SQL UI.
During Public Preview, the default behavior for queries and query results is that
both the queries results are cached forever and are located within your
Databricks filesystem in your account. You can delete query results by re-
running the query that you no longer want to be stored. Once re-run, the old
query results are removed from cache.
Query results caching: Per cluster caching of query results for all queries
through SQL warehouses.
To disable query result caching, you can run SET use_cached_result = false in the
SQL editor.

If Query profile is not available is displayed, no profile is available for this


query. A query profile is not available for queries that run from the query
cache. To circumvent the query cache, make a trivial change to the query,
such as changing or removing the LIMIT

By default which visualization is selected??

Q) Transfer ownership of a dashboard


If a dashboard’s owner is removed from a workspace, the dashboard no longer
has an owner, and only an admin user can manage the dashboard’s
permissions.
An admin user can transfer ownership of any dashboard,
• -that means non-admins cannot??
including one without an owner, to a different user. To transfer ownership by
using the Databricks SQL UI:
1. Open the dashboard.
2. Click Share.
3. Click Assign new owner.
4. Select the new user you’d like to make the owner from the dropdown and
click Confirm.

If the dashboard previously had an owner, that user no longer has the Can
Manage permission on the dashboard. The user you gave the Can Manage
permission is now the owner.

https://onedrive.live.com/redir?resid=E1CDE1DB50DF2737%21130&page=Edit&wd=target%28Corporate thoughts.one%7C21308d76-3c50-4740-96db-b025f0563c6e%2FTuring SQLTest%7C5d1a938… 7/11


10/17/22, 11:23 PM OneNote

Last mile etl Create view syntax using or as


Ad hoc improvement?
Last mile dashboarding CREATE TEMPORARY VIEW subscribed_movies
AS
Gold layer table is there..
One table is added SELECT mo.member_id, mb.full_name, mo.movie_title
Or
FROM movies AS mo
Some transformation needs to be done
INNER JOIN
What this is called members AS mb
ON mo.member_id = mb.id;
Group by
Partition by syntax Drop table syntax

Percent rank is there or not DROP TABLE userdb.employeetable;


percent_rank ranking window function (Databricks SQL) | Databricks on AWS

As it was starting from 0 I guess percent rank is the Ans

Left semi join

Left anti join difference

Does databricks support these -- yes

[ LEFT ] SEMI
Returns values from the left side of the relation that has a match with
the right. It is also referred to as a left semi join.
○ [ LEFT ] ANTI
Returns values from the left relation that has no match with the right.
It is also referred to as a left anti join.

Databricks sql support ansi sql

What is the advantage?


1. Faster
2. More customisation

Used for a variety of tasks, such as querying data, controlling access to the
database and its objects, guaranteeing database consistency, updating rows in a
table, and creating, replacing, altering and dropping objects, SQL lets users work
with data at the logical level.0

Dashboard refresh interval Dashboards do not support which of the following options
1min – 1 week by default 1. Borders
2. Customize tooltips
3. Customize labels

Edit widgets

https://onedrive.live.com/redir?resid=E1CDE1DB50DF2737%21130&page=Edit&wd=target%28Corporate thoughts.one%7C21308d76-3c50-4740-96db-b025f0563c6e%2FTuring SQLTest%7C5d1a938… 8/11


10/17/22, 11:23 PM OneNote

Advanced

The report will be emailed to subscribers every time it is updated.

Add a query param to a dashboard how it will impact?

1. All the dashboards


2. Only that visuals

https://docs.databricks.com/sql/user/queries/query-parameters.html

Who use databricks sql as secondary use? Query is scheduled 4 hours interval
1. Business intelligence analyst But the endpoints is taking time to start
2. Business analyst What should be done while managing costs
3. Data analyst 1. Increase the cluster size
4. Data engineering 2. Decrease the cluster size

https://onedrive.live.com/redir?resid=E1CDE1DB50DF2737%21130&page=Edit&wd=target%28Corporate thoughts.one%7C21308d76-3c50-4740-96db-b025f0563c6e%2FTuring SQLTest%7C5d1a938… 9/11


10/17/22, 11:23 PM OneNote
3. Scale down

Top 5 Databricks Performance Tips - How to Speed Up Your Workloads - The


Databricks Blog

1. Use larger clusters. It may sound obvious, but this is the number one
problem we see. It’s actually not any more expensive to use a large cluster
for a workload than it is to use a smaller one. It’s just faster. If there’s
anything you should take away from this article, it’s this. Read section 1.
Really.
2. Use Photon, Databricks’ new, super-fast execution engine. Read section 2
to learn more. You won’t regret it.
3. Clean out your configurations. Configurations carried from one Apache
Spark™ version to the next can cause massive problems. Clean up! Read
section 3 to learn more.
4. Use Delta Caching. There’s a good chance you’re not using caching
correctly, if at all. See Section 4 to learn more.
5. Be aware of lazy evaluation. If this doesn’t mean anything to you and
you’re writing Spark code, jump to section 5.
6. Bonus tip! Table design is super important. We’ll go into this in a future
blog, but for now, check out the guide on Delta Lake best practices.

Every minute data refresh from steaming dataset Insert into syntax
What should analyst say as a concern
Options
1. Streaming dataset doesn't support fault tolerance 1. Wrong syntax – syntax was correct
2. It will be costly 2. Append the data including duplicates
3.
INSERT { OVERWRITE | INTO } [ TABLE ] table_name
[ PARTITION clause ]
[ ( column_name [, ...] ) ]
query

> INSERT INTO students TABLE visiting_students;

Q) Fivetran connect with databricks

Fivetran automated data integration adapts as schemas and APIs change,


ensuring reliable data access and simplified analysis with ready-to-query
schemas.
You can integrate your Databricks SQL warehouses (formerly Databricks SQL
endpoints) and Databricks clusters with Fivetran.
The Fivetran integration with Databricks helps you centralize data from
disparate data sources into Delta Lake.
Note
Partner Connect does not integrate Fivetran with Databricks clusters. To
integrate a cluster with Fivetran, connect to Fivetran manually.

https://onedrive.live.com/redir?resid=E1CDE1DB50DF2737%21130&page=Edit&wd=target%28Corporate thoughts.one%7C21308d76-3c50-4740-96db-b025f0563c6e%2FTuring SQLTest%7C5d1a93… 10/11


10/17/22, 11:23 PM OneNote

https://onedrive.live.com/redir?resid=E1CDE1DB50DF2737%21130&page=Edit&wd=target%28Corporate thoughts.one%7C21308d76-3c50-4740-96db-b025f0563c6e%2FTuring SQLTest%7C5d1a93… 11/11

You might also like