0% found this document useful (0 votes)
4 views

SQL Left Join

Uploaded by

conzydesigns
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

SQL Left Join

Uploaded by

conzydesigns
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

SQL Left Join

Attempter Specifications
CONFIDENTIAL INFORMATION
This document contains confidential and proprietary information intended solely for the use of the individual or entity to whom it
is disclosed.

Project Overview
Video Sessions
Task Attempt Workflow
Task flow
Step 1: Review the Natural Language question and the corresponding SQL query
Step 2: Verify if the SQL query executes, returns at least one result, and uses a left join
statement in an intuitive way.
Inappropriate Left Join Usage
Appropriate Left Join Usage
Step 3: Modify and paste the results of the modified query
Step 4: Determine whether or not the modified SQL query aligns with the provided natural
language question
Troubleshooting SQL query
Question alignment rubric
Appendix
1. Good vs Bad - NL questions + SQL Code Alignments
2. 😱 Sample task walkthroughs
FAQs

Updates and Change Log


Check for periodic updates and review any changes to stay successful on this task.
🟢 Changes successfully implemented.
🟡 Changes must be implemented immediately.
🔴 Changes under review.
Change Log

Date Summary of changes Changes requested by


(link to reference)

Oct 28, Release date. Internal.


2024

Nov 1, 2024  Added onboarding videos. Internal


 Updated `Important Note About Rows
Limitation`.

Nov 4, 2024  Added `Important Note About Rewritten Natural Customer.


Language Questions`.
 Updated prompt for the first example of the 1.
Good vs Bad - NL questions + SQL Code
Alignments table. (Before, it was: Get the
bigquery-public-data.ethereum_blockchain.logs
for bigquery-public-
data.ethereum_blockchain.transactions that
failed.)

Nov 6, 2024  Added a very important new requirement (item Customer


3) for generating rewritten natural language
questions in `Important Note About Rewritten
Natural Language Questions`.
 Added new notes ‘About SQL Query
Execution’ and ‘About SQL Syntax’ in Step 2:
Verify if the SQL query executes, returns at
least one result, and uses a left join statement in
an intuitive way.

Nov 13,  Added an important note about when to change Customer.


2024 the NLQ (item 1). `Important Note About
Rewritten Natural Language Questions`.
 Added FAQs.

Nov 20,  The use of limit 1000 is not allowed. See Customer
2024 “Important Note About Rows Limitation” in
the Step 2.
 Added notes about SELECT * and pdt format in
“About SQL Syntax, Aliases, SELECT *” in
Step 2.

Project Overview
The goal of this project is to assist in the research and training of large language models (LLMs) to
improve their functionality working with SQL. Specifically, you will analyze an SQL query and its
Natural Language question and decide what changes need to be made to the SQL code so that it is
correctly executed, returned in at least one row, and used a left join intuitively. Also, if the Natural
Language question is not aligned with the SQL Query, you will rewrite it to fulfill the alignment.

Video Sessions
 Onboarding Course: Onboarding SQL Left Join Course.mp4
 Onboarding Session in English: SQL Left Join Tasks - Onboarding 10_30_24 (EN).mp4
 Onboarding Session in Spanish: SQL Left Join Tasks - Onboarding 10_30_24 - LCs.mp4
 How to check if the LEFT JOIN is ok:
https://www.loom.com/share/7a00c51b499e403aa0ba792cc3ccd6b7?sid=9c03d040-0640-
4310-bd08-d4ef2e4cfb4f

Task Attempt Workflow


High-Resolution Workflow: SQL Left Join Workflow

Task flow
⚠️Important Note:
 SQL Code Requirements: Ensure each SQL question results in a solution using a LEFT
JOIN intuitively. Use the table below to distinguish between intuitive and non-intuitive LEFT
JOIN usage.
 Identify Alignment: Determine whether the SQL query accurately aligns with the intent of
the natural language question.
 Check if the SQL code is executable: As a preliminary check, be sure the code runs
correctly and returns at least one row.
 Provide Justification for Misalignment: If the SQL query does not fully meet the
question’s intent:

 Provide a detailed justification for the discrepancies.


 Specify any missing or misinterpreted elements in the SQL query that prevent it
from fulfilling the question’s requirements.

 Alternative Justification for Question Misalignment: If the SQL code is correct but the
question does not match the SQL code requirements:

 Explain how the question fails to represent the SQL query accurately.
 Highlight areas in the question that are ambiguous, incomplete, or misaligned with
the SQL code’s intent.

Step 1: Review the Natural Language question and the corresponding SQL
query

You will be given a natural language question paired with a SQL query. Analyze this question
alongside the SQL code, which may reference multiple tables, and ensure that the SQL code aligns
accurately with the intent of the question.
Once you fix the SQL code in step 2, you will need to make sure that the code aligns with the intent of
the natural language question. So consider where the natural language question and the query may
diverge.

Step 2: Verify if the SQL query executes, returns at least one result, and uses a
left join statement in an intuitive way.
⚠️About SQL Query Execution:
 Avoid using execution results to compare INNER JOIN and LEFT JOIN to determine the
appropriate join type. However, always ensure that the SQL queries are executable and
return non-empty results.

⚠️Important Note About Rows Limitation:


 NO add Limit 1000 for the current TRAIN batch. TRAIN batch doesn't need a hard limit.

1. If you are testing the query in the IDE Sphere Engine, only use this limit for testing
purposes, but remember to write the modified SQL query without using this limit
number.

 If there is a defined limit in the query, this should be meaningful, you cannot ask for a Limit
70 if you are asking for all the US states (50 in total), for example.

 This part of the task should be completed within the IDE environment in the task.
 Check to see if the SQL executes and returns at least one row.
 If not there are many potential problems. Please visit the troubleshooting SQL query section
of these instructions.
 Once the SQL executes and returns one row, the next step is to ensure that it uses left
join functionality in an intuitive way. Please reference the below table to understand what is
an intuitive use of left join functionality and a non-intuitive use of left-join functionality.
 You can reference the table taxonomies in this sheet

⚠️About SQL Syntax, Aliases, SELECT *:


 Avoid LEFT JOIN (SELECT your_table_name) AS T2; instead, use a CTE (Common
Table Expression) format with descriptive aliases.
 Do not use aliases like, for example, T1,T2,a,b; or single characters trying to describe a
table name. Always choose meaningful, clear, and descriptive names.
 Use SELECT * if there are >= 8 columns selected, not SELECT table1., table2. for joined
tables, just SELECT *.
 Check if the SQL query includes the correct project ID and table names in the format:
`p.d.t`. If not, add escape characters around the project ID and table names (i.e., use
backticks: `SELECT ... FROM \`p\`.\`d\`.\`t\``).

1. For example:

SELECT column_name FROM `project_name`.`dataset_name`.`table_name`;

2. Even when the original SQL is correct, you must write a modified one with three
backticks around the project, dataset, and table names if necessary.

Inappropriate Left Join Usage

Left Join Usage Explanation


When an INNER JOIN is more appropriate An INNER JOIN is more appropriate than
a LEFT JOIN when you only want to see the
data where there is a match in both tables
based on the join condition. Avoid these use
cases as you are modifying the SQL

Avoid using a LEFT JOIN on a table if you do not In such cases, the LEFT JOIN is unnecessary
reference any columns from that table in your and should be removed to simplify the query
query. and improve performance.
Example below:
-- Incorrect:
```sql
SELECT
TableA.column1,
TableA.column2
FROM
TableA
LEFT JOIN
TableB
ON
TableA.id = TableB.id;
```
TableB is joined, but none of its columns are
used in the SELECT statement. This makes
the LEFT JOIN redundant. Removing it would
produce the same result without the extra join.
-- Correct:
```sql
SELECT
TableA.column1,
TableA.column2
FROM
TableA;
```
Since no columns from TableB are needed,
the join is removed, streamlining the query.

Appropriate Left Join Usage

Left Join Usage Explanation

Retrieving all rows from the "left" table, regardless This is the core purpose of a LEFT JOIN. You
of matches. want everything from the first table, and
optionally the matching information from the
second.

Prioritizing the left table's data. When the left table holds the primary
information, and the right table provides
supplementary details.

How to use the embedded IDE:

Step 1: Open the IDE


Step 2: Open main.py when the engine loads

Step 3: Edit the Query in the QUERY_TO_RUN variable DO NOT CHANGE ANY OTHER CODE
Step 4: Run and view output. Remember to limit results to 1K rows
Step 5: Before closing, be sure to copy your (potentially edited) SQL query so you can paste it back
into the task

Step 3: Paste the modified SQL query into the task

 Modify the SQL code to meet the natural language question. At this step, remember to
apply the previous table to distinguish between intuitive and non-intuitive LEFT JOIN usage.
 Copy the modified query into the provided space.

Step 4: Determine whether or not the modified SQL query aligns with the
provided natural language question

⚠️Important Note About Rewritten Natural Language Questions:


1. DO NOT change the essential meaning of the NLQ, except it is really
necessary. Keep the essence of the original NLQ, and do not introduce major changes
except when:

1. The NLQ can be answered by one-only SQL query using, for example, only one
table, or there is no need to use a Left Join to answer it.
2. You have 2 tables and a valid Left Join code structure, but the columns of your
tables do not match.
3. You have one or more tables with empty data, generating an empty output when
you run the SQL query.

2. Ask in a natural language to retrieve all rows from the "left" table, regardless of matches.
Example: “Retrieve the list of all countries along with the count of unique debt
descriptions by the unique description count in descending order” instead
of “Retrieve the list of countries along with the count of unique debt descriptions for each
country. Ensure that all countries are included in the results, even those that may not have
any associated descriptions, and order the results by the unique description count in
descending order”.
3. We aim to write natural language questions that feel authentic, as if from a real user, rather
than detailed, step-by-step SQL instructions. Describe the task naturally and
conversationally, without emphasizing technical SQL terms like `LEFT JOIN` vs `INNER
JOIN`. There's no need to specify "include all records even though...", a `LEFT JOIN` will
be implied unless the question explicitly says to exclude records with `NULL` values or
similar constraints.
4. If the natural language question includes technical SQL terms, rewrite it to sound more
conversational and user-friendly. Replace technical words like `LEFT JOIN` or `NULL` with
everyday language that describes the task without SQL jargon. Focus on what the user
wants to achieve, rather than specifying how to do it in SQL.
5. Avoid using execution results to decide whether INNER JOIN or LEFT JOIN is more
appropriate in the SQL query. The natural language question does not need to include
phrases like "include all even if..." because real users would not typically phrase requests
this way. By default, assume that LEFT JOIN is appropriate unless the question explicitly
implies INNER JOIN. Ensure that the NL question is clear and does not unintentionally
suggest the need for an INNER JOIN.

 Questions may have issues such as:

 Referencing fields that are NOT in the Query.


 Not referencing fields that ARE in the Query.
 Not reflect the provided schema.
 A full question alignment rubric is provided below
 If the question does not align with the SQL query, provide a rewrite that uses a left join
intuitively.

 Examples of intuitive vs unintuitive left-join usage are provided in the explanation


in step 2.

 This justification should detail why the SQL query does not meet the question's requirements,
highlighting any missing or misinterpreted elements. Alternatively, if the SQL code is correct
and the question itself does not meet the requirements of the code, explain how the question
falls short in representing the SQL query accurately.

Troubleshooting SQL query:


If the SQL does not immediately execute and return at least one row of results, please check the
following:

 Syntax errors:

 If the query does not execute due to a syntax error, then check the provided error log
and correct any issues present.

 Incorrect column names:

 Refer to the provided public table taxonomies below to determine the most likely
column name that the SQL query should be referencing.

 Correct syntax but no results:

 In the case where the SQL executes but there are no results returned, check that the
data returned in the filtered or joined columns matches the format of the filters or
joins.
 Example: the following query would return no results.

SELECT
top_rising_terms.term,
top_rising_terms.score
FROM
`bigquery-public-data`.`google_trends`.`international_top_rising_terms` AS top_rising_terms
LEFT JOIN
`bigquery-public-data`.`google_trends`.`international_top_terms` AS top_terms
ON top_rising_terms.term = top_terms.term
WHERE
top_terms.country_code = 'il'
AND top_rising_terms.score IS NOT NULL
ORDER BY
top_rising_terms.score DESC;

 This is because top_terms.country_code needs to be in capital letters.

 You can confirm this by running a quick check such as: select distinct
country_code from bigquery-public-
data.google_trends.international_top_rising_terms limit 5 which will return a
sample of what the data in that column will look like.
 You can replicate this for any of the tables in the database by
utilizing SELECT DISTINCT [col_name] FROM bigquery-public-data.
[dataset_name].[table_name] to find out the distinct values in [table_name].
[col_name] and filter appropriately

Public Table Taxonomies:

PLEASE SEE HERE FOR ALL OF THE PUBLIC TABLE TAXONOMIES

Question alignment rubric:


 There are multiple reasons why a question may need to be rewritten.
 If any of these criteria are met, then the question should be rewritten.

 Note/Exception: If a filter/ordering is heavily implied by the prompt but not overly


explicit, then it is acceptable to mark the prompt as “perfect” and not edit it. E.g. if
there are dates ordered in ascending order, then it is acceptable to say, “ordered by
date” as opposed to ordered by dates in ascending order.

 IMPORTANT: Rewrites that improve the quality of the prompt are extremely
valuable. Marking bad prompts as “Perfect” to avoid rewrites will result in removal from
the project.
 Be sure to assess whether the date range in the query is properly referenced in the
prompt: use this resource to determine how to refer to different date ranges.

Issue Description Example Example Explanation

Prompt The exact return Query: One of the issues with


does not columns defined in SELECT t0._TagName_, this is that the Prompt does
reflect the the query need to be SUM(t0._Count_) FROM not reference that the sum of
output reflected in the t0 WHERE the counts of the tag names
columns question. question are t0._WikiPostId_ > 102937 will also be included in the
bad if they: AND t0._TagName_ = results.
'nnt' GROUP BY
 Reference too t0._TagName_ HAVING
many fields MAX(t0._Count_) < 100
 Reference too “Has Issues” Question:
few fields Which tag has a count
less than 100 and a
WikiPostId greater than
102937?
Corrected (“Perfect”)
Question: Provide the
total count for tag names
that are “nnt” and have
fewer than 100
WikiPostIds that are
greater than 102937.

Prompt The question Query: One of the issues with


references references information SELECT t0._AboutMe_ this is that the SQL will return
incorrect that is contained in FROM t0 WHERE the _AboutMe_ column, but
fields another field from the t0._CreationDate_ = the Prompt asks about which
schema or references '2010-12-31' GROUP BY users. The question of “which
information that is not t0._AboutMe_ HAVING users” is better answered
contained anywhere in AVG(t0._DownVotes_) > using
the schema. 10 the _DisplayName_ column
“Has Issues” Question: from the users table.
Which users have an
average of more than 10
downvotes on their posts
created on December 31,
2010?
Corrected (“Perfect”)
Question:
What are the distinct
“about me” sections that
are associated with more
than 10 downvotes and
were created on
December 31, 2010?

Limits, Any inclusion of Query: One of the issues with this


ordering, ordering, limits, SELECT query/prompt
filters, or filtering, or SUM(t0._l_partkey_), combination is that the SQL
aggregation aggregation functions t0._l_shipdate_, only returns 20 rows but this is
are present needs to be reflected t0._l_returnflag_ FROM t0 not acknowledged in the
in the query in the Prompt. WHERE t0._l_suppkey_ = Prompt.
but are not 6966 GROUP BY
present in t0._l_shipdate_,
the Prompt t0._l_returnflag_ LIMIT 20
“Has Issues” Question:
What is the total sum of
line items grouped by
return flag for supplier
6966?
Corrected (“Perfect”)
Question:
Can you show me the
sum of partkeys for 20
distinct combinations of
shipdate and returnflag
that are associated with
suppkey 6966

New Question Alignment Feedback:


The following table is for guidance and shows the cases where the client preferred one question type
instead of Please us
Note on date usage:
IMPORTANT: THE BETWEEN FILTER FOR DATES IN SQL IS INCLUSIVE
UPDATE: Refer to “in the trailing year” or “in the last year” as the trailing 12 months and “last
year” as the entirety of the previous year. You can extend this to all types of dates
Please use this resource to determine how to refer to different date ranges. Some samples of queries
with issues around date filtering are below.

Query Prompt Issue Explanation

SELECT Product_line, AVG(cogs) from t0 What is This date filter looks at the
WHERE Date BETWEEN the entirety of 2 months ago
DATE_SUB(DATE_TRUNC(CURRENT_DATE(), MONTH), average not “the last two months”
INTERVAL 2 MONTH) AND cogs for which implies that the
DATE_SUB(DATE_SUB(DATE_TRUNC(CURRENT_DATE() each previous month should be
, MONTH), INTERVAL 1 MONTH), INTERVAL 1 product included. This can be
DAY) GROUP BY Product_line; line in rewritten as “two months
the last ago”.
two
months?

SELECT AVG(cogs) from t0 WHERE Date BETWEEN What is The date filter does not
DATE_SUB(DATE_TRUNC(CURRENT_DATE(), YEAR), the look at the past two years.
INTERVAL 1 DAY) AND average This query looks from the
DATE_ADD(DATE_TRUNC(CURRENT_DATE(), YEAR), cogs for last day of the previous
INTERVAL 2 YEAR); the past year (inclusive) to the first
two day of the year after next
years? (inclusive). It can be
referred to as, “the last day
of last year to the first day
of the year after next” in
the rewrite.

SELECT AVG(Total) from t0 WHERE Date What is The date range in the
BETWEEN DATE_SUB(DATE_TRUNC(CURRENT_DATE(), the prompt should be referred
WEEK), INTERVAL 1 WEEK) AND average to as “last week” as
DATE_SUB(DATE_TRUNC(CURRENT_DATE(), WEEK), total for opposed to “the last week
INTERVAL 1 DAY); the last excluding today”. Stating
week excluding today is
excludin unnecessary/irrelevant
g today? and that wording implies
the last seven days
excluding today.

Appendix
1. Good vs Bad - NL questions + SQL Code Alignments

Question Good SQL Code Bad SQL Code Justification

Retrieve ```sql ```sql The first query correctly


SELECT SELECT
the logs for follows the intent by
T1.transaction_hash,
transaction transactions.transaction_hash, T2.data using a LEFT JOIN to
s that did logs.data FROM retrieve data
not FROM `bigquery-public- from logs even if there
succeed in `bigquery-public- data.ethereum_blockchain.tr
are no corresponding
data`.`ethereum_blockchain`.` ansactions` AS T1
the transactions` AS transactions LEFT JOIN entries for
Ethereum LEFT JOIN `bigquery-public- each transaction.
blockchain `bigquery-public- data.ethereum_blockchain.lo The WHERE clause
dataset. data`.`ethereum_blockchain`.` gs` AS T2
filters for failed
logs` AS logs ON
T1.hash = transactions
ON
transactions.hash = T2.transaction_hash (where receipt_status
logs.transaction_hash WHERE = 0), ensuring that only
WHERE T1.receipt_status = 1;
failed transactions are
transactions.receipt_status = ```
included in the result.
0;
``` This query is well-aligned
with the question's
requirements.
The second query does
not meet the question’s
intent because it filters
for successful
transactions
(receipt_status = 1)
instead of failed ones
(receipt_status = 0).
This misalignment results
in a dataset that includes
only successful
transactions,
contradicting the
requirement to retrieve
data specifically for failed
transactions.

The first query meets the


question's requirements.
It uses a LEFT JOIN to
combine
the orders and events t
ables, ensuring that all
orders are counted even
if no traffic source is
```sql
SELECT associated with some of
```sql
SELECT
them. By grouping
events_data.traffic_source, events_data.traffic_source, on events_data.traffic_
COUNT(DISTINCT
COUNT(orders_data.order_id) source, it calculates the
AS total_orders orders_data.user_id) AS
total_orders total number of orders
What is the FROM
FROM per traffic source, as
total `bigquery-public-
data`.`thelook_ecommerce`.`o
`bigquery-public- intended by the question.
number of data.thelook_ecommerce.ord
rders` AS orders_data
orders ers` AS orders_data
LEFT JOIN
LEFT JOIN The second query does
associated `bigquery-public-
`bigquery-public- not align with the
with each data`.`thelook_ecommerce`.`e data.thelook_ecommerce.ev question’s intent because
traffic vents` AS events_data ents` AS events_data
ON it counts
source? ON
orders_data.user_id = orders_data.user_id =
distinct user_id values
events_data.user_id events_data.user_id instead
GROUP BY GROUP BY of order_id values. This
events_data.traffic_source;
``` leads to a count of
events_data.traffic_source;
```
unique users associated
with each traffic source
rather than the total
number of orders. This
misinterpretation results
in an inaccurate total
order count, failing to
meet the original
question's requirements.

2. Sample task walkthroughs

Example Task One - NL question does not match with the SQL code - SQL is not
executable:
SQL Query:
SELECT
INTERNATIONAL_TOP_RISING_TERMS.term, INTERNATIONAL_TOP_RISING_TERMS.score
FROM
`bigquery-public-data`.`google_trends`.`international_top_rising_terms`
LEFT JOIN
`bigquery-public-data`.`google_trends`.`international_top_terms`
ON
INTERNATIONAL_TOP_RISING_TERMS.term = INTERNATIONAL_TOP_TERMS.term
WHERE
INTERNATIONAL_TOP_TERMS.country_code = 'US'

NL Question:
Top international rising terms for a specific country

Step 1: Review the Natural Language question and the corresponding SQL query.

 This query references: bigquery-public-


data.google_trends.international_top_rising_terms and bigquery-public-
data.google_trends.international_top_terms which are distinct tables so the query
references multiple tables
 Consider whether or not the filter conditions, return columns, etc. align between the query
and the response
 REMINDER: No specific action is needed at this step

Step 2: Verify if the SQL query executes, returns at least one result, and uses a left
join statement in an intuitive way.

 The original query does not return any data when executed

 Using the tips provided in the troubleshooting SQL query section of the instructions, we can
modify the statement to be:

SELECT
top_rising_terms.term,
top_rising_terms.score
FROM
`bigquery-public-data`.`google_trends`.`international_top_rising_terms` AS top_rising_terms
LEFT JOIN
`bigquery-public-data`.`google_trends`.`international_top_terms` AS top_terms
ON top_rising_terms.term = top_terms.term
WHERE
top_terms.country_code = 'IL'
AND top_rising_terms.score IS NOT NULL
ORDER BY
top_rising_terms.score DESC
LIMIT 1000;
 Specific changes employed:

 Checked other country codes for data present and changed the filter
to top_terms.country_code = 'IL'

Step 3: Modify and paste the results of the modified query

 Copy the results and paste them into the provided space (the following image shows the
last 6 rows as an extract of the complete result).

Step 4: Determine whether or not the modified SQL query aligns with the provided
natural language question
 The original question was, “Top international rising terms for a specific country”.
 Remember: This question needs to relate to the MODIFIED SQL query produced in step 2.
 The issues with this question are:

 It does not reference which country is being asked for.


 It does not reference the ordering of the results.
 The question should be rewritten as: Find the top international rising term for a
specific country like 'IL' based on the highest score. Limit the results to 1000.

Example Task Two - NL question does not match with the SQL code - SQL is not
executable:
SQL Query:
SELECT
TOP_TERMS.term,
INTERNATIONAL_TOP_TERMS.region_name
FROM
bigquery-public-data.google_trends.top_terms
LEFT JOIN
bigquery-public-data.google_trends.international_top_terms
ON
TOP_TERMS.term = INTERNATIONAL_TOP_TERMS.term
WHERE
TOP_TERMS.week = '2023-03-05'
LIMIT 1000;

NL Question:
The most popular search terms in a given week, along with their region

Step 1: Review the Natural Language question and the corresponding SQL query.

 Consider whether or not the filter conditions, return columns, etc. align between the query
and the response.
 REMINDER: No specific action is needed at this step.

Step 2: Verify if the SQL query executes, returns at least one result, and uses a left
join statement in an intuitive way.

 The original query executes but it does not rank terms by popularity within each region (the
following image shows the last 6 rows as an extract of the complete result).
 Using the tips provided in the troubleshooting SQL query section of the instructions, we can
modify the statement to be:

SELECT
x.term,
x.region_name,
COUNT(*) AS popular
FROM (
SELECT
top_terms.term,
international_top_terms.region_name
FROM
`bigquery-public-data`.`google_trends`.`top_terms`
LEFT JOIN
`bigquery-public-data`.`google_trends`.`international_top_terms`
ON
top_terms.term = international_top_terms.term
WHERE
top_terms.week = '2023-03-05'
) AS x
GROUP BY
x.term,
x.region_name
ORDER BY
popular DESC
LIMIT 100;

 The corrected query introduces two main changes to address the issues in the original
query that not only retrieve data but also rank terms by popularity within each region:

 Aggregation with COUNT: In the corrected query, a subquery (x) is used to


retrieve term and region_name, followed by an outer query that aggregates
these results using COUNT(*) AS popular. This counts occurrences of each term-
region pair, giving a measure of popularity, which aligns with the intent to find the
"most popular" search terms.
 Ordering and Limiting Results: The corrected query includes an ORDER BY
popular DESC clause to sort terms by popularity, with the LIMIT 100 clause
ensuring that only the top 100 most popular search terms are returned. This step
provides the "most popular search terms" as specified by the natural language
question.

Step 3: Modify and paste the results of the modified query

 Copy the results and paste them into the provided space (the following image shows the
last 5 rows as an extract of the complete result).
Step 4: Determine whether or not the modified SQL query aligns with the provided
natural language question

 The original question was, “The most popular search terms in a given week, along with
their region”
 Remember: This question needs to relate to the MODIFIED SQL query produced in step 2.
 The issues with this question are:

 "in a given week" is not clear.


 “The most popular search terms” is not defined by a limit clause.
 The question should be rewritten as: The most popular search terms in week of
'2023-03-05', along with their region. Return top 100 rows.
FAQs

A quick-reference guide addressing common concerns and issues reviewers or recorders may face,
with clear instructions on how to resolve them
Question Answer

Do I need to select all the columns of a No. Please use "SELECT *" instead, as
table as follows if I want to show all follows. This is critical on training data
them in the results? especially. Teaching the model to list too
SELECT many columns might cause truncation of
assignment.rf_id, results.
assignment.file_id, SELECT
assignment.cname, assignment.*,
assignment.caddress_1, conveyance.convey_ty,
assignment.caddress_2, conveyance.employer_assign
assignment.caddress_3, FROM
assignment.caddress_4, `bigquery-public-
assignment.reel_no, data`.`uspto_oce_assignment`.`assignm
assignment.frame_no, ent` AS assignment
assignment.convey_text, LEFT JOIN
assignment.record_dt, `bigquery-public-
assignment.last_update_dt, data`.`uspto_oce_assignment`.`assignm
assignment.page_count, ent_conveyance` AS conveyance
assignment.purge_in, ON
conveyance.convey_ty, assignment.rf_id =
conveyance.employer_assign conveyance.rf_id
FROM LIMIT 10;
bigquery-public-
data.uspto_oce_assignment.assignme
nt AS assignment
LEFT JOIN
bigquery-public-
data.uspto_oce_assignment.assignme
nt_conveyance AS conveyance
ON
assignment.rf_id =
conveyance.rf_id
LIMIT 10;

You might also like