SQL
SQL
If we wanted to see all of the names in all tables, we could write a query like this:
SELECT first_name, last_name FROM customer UNION SELECT first_name,
last_name FROM employee UNION SELECT first_name, last_name FROM referrer ORDER BY first_name, last_name;
Now, you might be wondering how this works? The first query is run, then a UNION ALL is performed with
the second query, then using those results a MINUS is performed with the third query.
Unlike the mathematical operators (such as addition and multiplication), there is no order of operations
with set operators. They are treated equally and run in order in the query from start to finish.
https://www.databasestar.com/sql-set-
operators/#c7
•ROUND() function rounds the number up or down depends upon the second argument D and number itself(digit after D decimal places >=5 or not).
•FLOOR() function rounds the number, towards zero, always down.
•CEILING() function rounds the number, away from zero, always up.
mysql> Select ROUND(1.415,2),FLOOR(1.415),CEILING(1.415); +----------------+--------------+----------------+ |
ROUND(1.415,2) | FLOOR(1.415) | CEILING(1.415) | +----------------+--------------+----------------+ | 1.42 | 1| 2|
+----------------+--------------+----------------+ 1 row in set (0.00 sec)
Lookup —
Lookup is used for comparing the data between two tables. But it will return the only
first row of the matched rows.
It does not take the duplicate records. It has a small reference dataset.
It requires only one input and produces the output like -
1.Lookup match output
2.Lookup no match output
3.Error output.
In Lookup, the input records are not required sort.
Example-
There are two tables Source1 and Source2
There are mainly two types of nested queries:
Correlated subqueries are used for row-by-row processing. Each subquery is executed once for
every row of the outer query.
A correlated subquery is evaluated once for each row processed by the parent statement. The
parent statement can be a SELECT, UPDATE, or DELETE statement.
Noncorrelated Subqueries
A noncorrelated subquery executes independently of the outer query. The subquery executes first, and then passes its results to the outer
query, For example:
=> SELECT name, street, city, state FROM addresses WHERE state IN (SELECT state FROM states);
Vertica executes this query as follows:
1.Executes the subquery SELECT state FROM states (in bold).
2.Passes the subquery results to the outer query.
A query's WHERE and HAVING clauses can specify noncorrelated subqueries if the subquery resolves to a single row, as shown below:
In WHERE clause
=> SELECT COUNT(*) FROM SubQ1 WHERE SubQ1.a = (SELECT y from SubQ2);
In HAVING clause
=> SELECT COUNT(*) FROM SubQ1 GROUP BY SubQ1.a HAVING SubQ1.a = (SubQ1.a & (SELECT y from SubQ2)
Noncorrelated Subqueries
A noncorrelated subquery executes independently of the outer query. The subquery executes first, and then passes its results to the ou
For example:
=> SELECT name, street, city, state FROM addresses WHERE state IN (SELECT state FROM states);
Vertica executes this query as follows:
1.Executes the subquery SELECT state FROM states (in bold).
2.Passes the subquery results to the outer query.
A query's WHERE and HAVING clauses can specify noncorrelated subqueries if the subquery resolves to a single row, as shown below:
In WHERE clause
=> SELECT COUNT(*) FROM SubQ1 WHERE SubQ1.a = (SELECT y from SubQ2);
In HAVING clause
=> SELECT COUNT(*) FROM SubQ1 GROUP BY SubQ1.a HAVING SubQ1.a = (SubQ1.a & (SELECT y from SubQ2)
SELECT [ALL | DISTINCT] select_expr, select_expr, ...
FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[HAVING having_condition]
[ORDER BY col_list]]
[LIMIT number];
INSERT INTO TABLE testlog SELECT * FROM table1 WHERE some condition;
1. Using Create Table As Select (CTAS) option, we can copy the data from one table to
another in Hive
copy the data from one table into new table .
copy the data from one table into new table with selected columns .
https://stackoverflow.com/questions/17810537/how-to-delete-and-update-a-record-in-
hive
https://www.revisitclass.com/hadoop/copy-the-data-or-table-structure-from-one-table-to-
another-in-hive/
https://stackoverflow.com/questions/34198114/alter-hive-table-add-or-drop-column/
34198776
2.DELETE Syntax: delete the row
UPDATE tablename SET column = value [, column = value ...] [WHERE expression] --
does not support tarnssction manager
4. insert into hive table :- inserting a row in hive --amazing
insert into table tablename (ticker) values ('ABCDEF'); -- worked --inserted for one or 2 c
oolumns in a raw . null for rest for the columns only inserted columns will be populated.
--worked great
As of Hive version 0.14.0: INSERT...VALUES, UPDATE, and DELETE are now available with
full ACID support.
ALTER TABLE dbname.table_name ADD columns (ticker5 string) CASCADE; -- for partitioned
table adds new column .only give new column in this command .
ALTER TABLE dbname.table_name ADD columns (ticker5 string) ; -- for non partitioned table
adds new column .only give new column in this command . --worked
6. drop column using replace column command
You cannot drop column directly from a table using command ALTER TABLE table_name
drop col_name;
The only way to drop column is using replace command. Lets say, I have a table emp with
id, name and dept column. I want to drop id column of table emp. So provide all those
columns which you want to be the part of table in replace columns clause. Below command
will drop id column from emp table.
ALTER TABLE emp REPLACE COLUMNS( name string, dept string); -- this table had 3
columns id,name,dept now it has became name,dept after dropping id . --worked
sql create table from new table
4
You cannot insert a whole list as a whole into SQL Server - you need to insert one row
for each entry. This means, you need to call the INSERT statement multiple times.
team1.put("foo", 1);
team1.put("bar", 2);
will store 1 with key "foo" and 2 with key "bar". To iterate over all the keys:
Delta Lake was created to make sure you never lost data during ETL and
other data processing even if Spark jobs failed. While Delta Lake turned into
more than just a staging area, it’s not a true data lake. Its name says it all; it’s
a “delta lake”. It’s still mostly used to guarantee that all the “deltas” from
spark jobs are never lost. It helps guarantee that the end data loaded into a
data warehouse is correct.
Storage Data Type Works well with structured data Works well with semi-structured Can handle structured, semi-
and unstructured data structured, and unstructured data
Purpose Optimal for data analytics and Suitable for machine learning (ML) Suitable for both data analytics and
business intelligence (BI) use-cases and artificial intelligence (AI) machine learning workloads
workloads
Cost Storage is costly and time- Storage is cost-effective, fast, and Storage is cost-effective, fast, and
consuming flexible flexible
ACID Compliance Records data in an ACID-compliant Non-ACID compliance: updates and ACID-compliant to ensure
manner to ensure the highest levels deletes are complex operations consistency as multiple parties
of integrity concurrently read or write data
3 ways to run query on hive hive CLI , web interface , Thrift server
Thrift server means by using JDBC/ODBC connections of any application we can run query on hive ..
Execution engine :: mapreduce
metastore :-- database that stores scvhema, no of columns , sterilization and desterilisation info … in short metadata
ODBC is a standard Microsoft Windows® interface that
enables communication between database management
systems and applications typically written in C or C++.
JDBC is a standard interface that enables communication
between database management systems and applications
written in Oracle Java.