0% found this document useful (0 votes)
26 views

SQL

The document discusses various SQL queries and concepts: 1. It covers SQL sorting, ordering, and functions like ROUND, FLOOR, and CEILING. 2. Set operations like UNION, INTERSECT, and MINUS are explained along with examples of combining queries. 3. Different types of subqueries like correlated and noncorrelated are defined along with examples. 4. Methods for inserting, updating, deleting and altering tables in Hive SQL are provided including using CTAS and replacing columns.

Uploaded by

Neha Khatri
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

SQL

The document discusses various SQL queries and concepts: 1. It covers SQL sorting, ordering, and functions like ROUND, FLOOR, and CEILING. 2. Set operations like UNION, INTERSECT, and MINUS are explained along with examples of combining queries. 3. Different types of subqueries like correlated and noncorrelated are defined along with examples. 4. Methods for inserting, updating, deleting and altering tables in Hive SQL are provided including using CTAS and replacing columns.

Uploaded by

Neha Khatri
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 57

SQL

Link to prepare SQL Queries (Top 20 Queries frequently asked )


https://artoftesting.com/interviewSection/sql-queries-for-interv
iew.html

• Sorting in SQL & HIve

• sql :- sql has one sorting function ORDERBY


nameofthe column() . no sort by () ASC or
DESC;
• Hive :- hive has order by() , sot by(), distirbute
by() , cluster by().
EXCEPT is the same as MINUS – they both show results
from one query that don’t exist in another query.

Intersect on rows between 2 tables == inner join on


columns between 2 tables
MINUS: Finding Results That Are Missing
Working with Two or More Set Operators

If we wanted to see all of the names in all tables, we could write a query like this:
SELECT first_name, last_name FROM customer UNION SELECT first_name,
last_name FROM employee UNION SELECT first_name, last_name FROM referrer ORDER BY first_name, last_name;

SELECT first_name, last_name FROM customer


UNION
SELECT first_name, last_name FROM employee
UNION
SELECT first_name,
last_name FROM referrer ORDER BY first_name,
last_name;
If we wanted to see all records in the customer and employee that weren’t in the referrer table, we can write this query:

SELECT first_name, last_name FROM customer


r UNION ALL
SELECT first_name, last_name FROM employee
MINUS
SELECT first_name, last_name FROM referrer ORDER BY first_name, last_name;

Now, you might be wondering how this works? The first query is run, then a UNION ALL is performed with
the second query, then using those results a MINUS is performed with the third query.
Unlike the mathematical operators (such as addition and multiplication), there is no order of operations
with set operators. They are treated equally and run in order in the query from start to finish.

https://www.databasestar.com/sql-set-
operators/#c7
•ROUND() function rounds the number up or down depends upon the second argument D and number itself(digit after D decimal places >=5 or not).
•FLOOR() function rounds the number, towards zero, always down.
•CEILING() function rounds the number, away from zero, always up.
mysql> Select ROUND(1.415,2),FLOOR(1.415),CEILING(1.415); +----------------+--------------+----------------+ |
ROUND(1.415,2) | FLOOR(1.415) | CEILING(1.415) | +----------------+--------------+----------------+ | 1.42 | 1| 2|
+----------------+--------------+----------------+ 1 row in set (0.00 sec)
Lookup —
Lookup is used for comparing the data between two tables. But it will return the only
first row of the matched rows.
It does not take the duplicate records. It has a small reference dataset.
It requires only one input and produces the output like -
1.Lookup match output
2.Lookup no match output
3.Error output.
In Lookup, the input records are not required sort.
Example-
There are two tables Source1 and Source2
There are mainly two types of nested queries:

•Independent Nested Queries: In independent nested queries, query execution starts


from innermost query to outermost queries. The execution of inner query is independent
of outer query, but the result of inner query is used in execution of outer query. Various
operators like IN, NOT IN, ANY, ALL etc are used in writing independent nested
queries.

•A noncorrelated (simple) subquery obtains its results independently of its


containing (outer) statement. A correlated subquery requires values from
its outer query in order to execute.
CORRELATED UPDATE :
UPDATE table1 alias1 SET column =

(SELECT expression FROM table2 alias2 WHERE alias1.column = alias2.column);

Correlated subqueries are used for row-by-row processing. Each subquery is executed once for
every row of the outer query.

A correlated subquery is evaluated once for each row processed by the parent statement. The
parent statement can be a SELECT, UPDATE, or DELETE statement.
Noncorrelated Subqueries
A noncorrelated subquery executes independently of the outer query. The subquery executes first, and then passes its results to the outer
query, For example:
=> SELECT name, street, city, state FROM addresses WHERE state IN (SELECT state FROM states);
Vertica executes this query as follows:
1.Executes the subquery SELECT state FROM states (in bold).
2.Passes the subquery results to the outer query.
A query's WHERE and HAVING clauses can specify noncorrelated subqueries if the subquery resolves to a single row, as shown below:
In WHERE clause
=> SELECT COUNT(*) FROM SubQ1 WHERE SubQ1.a = (SELECT y from SubQ2);
In HAVING clause
=> SELECT COUNT(*) FROM SubQ1 GROUP BY SubQ1.a HAVING SubQ1.a = (SubQ1.a & (SELECT y from SubQ2)
Noncorrelated Subqueries
A noncorrelated subquery executes independently of the outer query. The subquery executes first, and then passes its results to the ou
For example:
=> SELECT name, street, city, state FROM addresses WHERE state IN (SELECT state FROM states);
Vertica executes this query as follows:
1.Executes the subquery SELECT state FROM states (in bold).
2.Passes the subquery results to the outer query.
A query's WHERE and HAVING clauses can specify noncorrelated subqueries if the subquery resolves to a single row, as shown below:
In WHERE clause
=> SELECT COUNT(*) FROM SubQ1 WHERE SubQ1.a = (SELECT y from SubQ2);
In HAVING clause
=> SELECT COUNT(*) FROM SubQ1 GROUP BY SubQ1.a HAVING SubQ1.a = (SubQ1.a & (SELECT y from SubQ2)
SELECT [ALL | DISTINCT] select_expr, select_expr, ...
FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[HAVING having_condition]
[ORDER BY col_list]]
[LIMIT number];

in hive ,To insert entire data of table2 in table1. Below is a query:

INSERT INTO TABLE table1 SELECT * FROM table2;

INSERT INTO TABLE testlog SELECT * FROM table1 WHERE some condition;
1. Using Create Table As Select (CTAS) option, we can copy the data from one table to
another in Hive
copy the data from one table into new table .

CREATE TABLE <New_Table_Name> AS --worked


SELECT * FROM <Old_Table_Name>

copy the data from one table into new table with selected columns .

CREATE TABLE <New_Table_Name> AS


SELECT col1,col2 FROM <Old_Table_Name> --worked

https://stackoverflow.com/questions/17810537/how-to-delete-and-update-a-record-in-
hive

https://www.revisitclass.com/hadoop/copy-the-data-or-table-structure-from-one-table-to-
another-in-hive/

https://stackoverflow.com/questions/34198114/alter-hive-table-add-or-drop-column/
34198776
2.DELETE Syntax: delete the row

DELETE FROM tablename [WHERE expression] --no privilege

3. UPDATE Syntax: update the row

UPDATE tablename SET column = value [, column = value ...] [WHERE expression] --
does not support tarnssction manager
4. insert into hive table :- inserting a row in hive --amazing

insert into table tablename (quote,ticker) values ('04/06/2021','ABCD'); -- worked --


inserted for all schema in a raw .

insert into table tablename (ticker) values ('ABCDEF'); -- worked --inserted for one or 2 c
oolumns in a raw . null for rest for the columns only inserted columns will be populated.
--worked great

As of Hive version 0.14.0: INSERT...VALUES, UPDATE, and DELETE are now available with
full ACID support.

INSERT ... VALUES Syntax:

INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES


values_row [, values_row ...]
Where values_row is: ( value [, value ...] ) where a value is either null or any valid SQL
literal
5. add column in a hive table

ALTER TABLE dbname.table_name ADD columns (ticker5 string) CASCADE; -- for partitioned
table adds new column .only give new column in this command .

describe table will give 3 columns after this (quotedate,ticker,ticker5) .

ALTER TABLE dbname.table_name ADD columns (ticker5 string) ; -- for non partitioned table
adds new column .only give new column in this command . --worked
6. drop column using replace column command

You cannot drop column directly from a table using command ALTER TABLE table_name
drop col_name;

The only way to drop column is using replace command. Lets say, I have a table emp with
id, name and dept column. I want to drop id column of table emp. So provide all those
columns which you want to be the part of table in replace columns clause. Below command
will drop id column from emp table.

ALTER TABLE emp REPLACE COLUMNS( name string, dept string); -- this table had 3
columns id,name,dept now it has became name,dept after dropping id . --worked
sql create table from new table
4

Select [Column Name] into [New Table] from [Source Table]

You cannot insert a whole list as a whole into SQL Server - you need to insert one row
for each entry. This means, you need to call the INSERT statement multiple times.

team1.put("foo", 1);
team1.put("bar", 2);
will store 1 with key "foo" and 2 with key "bar". To iterate over all the keys:

for ( String key : team1.keySet() ) {


System.out.println( key );
}
Node manager ==Task tracker
Application manager == Job tracker (which takes care of (half responsibility of
Job tracker of MRv1) data execution engine and scheduling job and taking
updates from node manager and asking for resources from resource
manager ..it sits in between )

And resource allocation part of Job


tracker of MR1 is now Assigned to
resource manager of MR2.
If the schema is evolving very frequently and you need to priorties scalability and tolerance over consistency(RDBMS feature
whereas eventual consistency is Nosql feature ) then choose NoSQl instead of sql by seeing the type of data such as json, graph,
key-value etc.
RDBMS is best for OLTP transactional instead of OLAP.
Federate queries and query data where it lives - data
lakes, lakehouses, and more
Presto can query relational & NoSQL databases, data
warehouses, data lakes and more and has dozens of
connectors available today. It also allows querying
data where it lives and a single Presto query can
combine data from multiple sources, allowing for
analytics across your entire organization.

Blazing fast analytics


Presto is an in-memory distributed SQL engine, faster
than other compute engines in the disaggregated stack

The previous iterations of data lakes


Many experienced at least three phases of data lakes so
far:companies
•Using a data warehouse as a data lake, including modern
cloud data warehouses
•Trying Hadoop (this is declining in use)
•Implementing a cloud data lakehouse that combines the data
lake and warehouse
•Creating a modern (cloud) data lake that is just storage
I would label Delta Lake as the most modern version of the Hadoop-based
data lake.

Delta Lake was created to make sure you never lost data during ETL and
other data processing even if Spark jobs failed. While Delta Lake turned into
more than just a staging area, it’s not a true data lake. Its name says it all; it’s
a “delta lake”. It’s still mostly used to guarantee that all the “deltas” from
spark jobs are never lost. It helps guarantee that the end data loaded into a
data warehouse is correct.

Delta Lake is an open source project that enables building a


Lakehouse architecture on top of data lakes. Delta Lake provides
ACID transactions, scalable metadata handling, and unifies streaming and
batch data processing on top of existing data lakes, such as S3, ADLS, GCS,
and HDFS.
4.6 Data Governance Data governance is the process of
managing the internal data standards, policies that control data
and govern data process, security and availability in enterprise
data platforms.
Is Cassandra a document database?

Yes, Apache Cassandra is a NoSQL Database.


Apache Cassandra is a type of NoSQL Columnar Databases.
Apache Cassandra is a Column-Oriented Database. NoSQL database is a non-relational database c
apable of handling Structured, Semi-Structured and Un-Structured data.
What is data model explain?
Data models are visual representations of an enterprise's data
elements and the connections between them. By helping to
define and structure data in the context of relevant business
processes, models support the development of effective information
systems. What is data model and its
types?
The three primary data model
types are relational,
dimensional, and entity-
relationship (E-R). There are
also several others that are not in
general use, including
hierarchical, network, object-
oriented, and multi-value.
Example: Customers, Orders, Products, etc. Attribute:
Attributes give a way of structuring and organizing the
data. Relationship: Relationship among the entities
explains how one entity is connected to another entity.
Data Warehouse Data Lake Data Lakehouse

Storage Data Type Works well with structured data Works well with semi-structured Can handle structured, semi-
and unstructured data structured, and unstructured data

Purpose Optimal for data analytics and Suitable for machine learning (ML) Suitable for both data analytics and
business intelligence (BI) use-cases and artificial intelligence (AI) machine learning workloads
workloads

Cost Storage is costly and time- Storage is cost-effective, fast, and Storage is cost-effective, fast, and
consuming flexible flexible

ACID Compliance Records data in an ACID-compliant Non-ACID compliance: updates and ACID-compliant to ensure
manner to ensure the highest levels deletes are complex operations consistency as multiple parties
of integrity concurrently read or write data
3 ways to run query on hive hive CLI , web interface , Thrift server
Thrift server means by using JDBC/ODBC connections of any application we can run query on hive ..
Execution engine :: mapreduce
metastore :-- database that stores scvhema, no of columns , sterilization and desterilisation info … in short metadata
ODBC is a standard Microsoft Windows® interface that
enables communication between database management
systems and applications typically written in C or C++.
JDBC is a standard interface that enables communication
between database management systems and applications
written in Oracle Java.

Hive metastore (HMS) is a service that


stores metadata related to Apache Hive
and other services, in a backend RDBMS,
such as MySQL or PostgreSQL.

Deafult database id derby but we can run


only one instance of this so recommended
mysql or any strong db for production env.s

You might also like