SQL DM1
SQL DM1
How would you predict who someone may want to send a Snapchat or
Gmail to?
For each user, assign a score of how likely someone would send an email to
the rest is feature engineering:
-Number of past emails
-How many responses
-The last time they exchanged an email
-Whether the last email ends with a question mark
-Features about the other users, etc.
-People who someone sent emails the most in the past conditioning on time
decay
SQL
A CTE is a named temporary result set that you can reference within a
SELECT, INSERT, UPDATE, or DELETE statement.
CTEs make complex queries more readable and maintainable.
Use CTEs when you need to break down a complex query into smaller, more
understandable parts
-Subqueries are nested queries within another query and are used to retrieve
data for further processing.
-JOINs combine rows from two or more tables based on a related column.
-Use subqueries when you need to retrieve a single value or a small set of
values, and use JOINs when you need to combine data from multiple tables.
Ans.
Order of execution for an SQL query
1) FROM, including JOINs
2) WHERE
3) GROUP BY
4) HAVING
5) WINDOW Functions
6) SELECT
7) DISTINCT
8) UNION
9) ORDER BY
10) LIMIT AND OFFSET
4.Write a SQL query to find all the student names Nitin in a table
select name
from student
where lower like ‘%nitin%’
Now the trick is to make sure you convert the name in lower for the complete
column
wrong output
5.Write a query to get all the student with name length 10, starting with K
and ending with z.
select name
from student
where length=10 and lower like ‘k%z’
7.ACID Properties
1.What is PostgreSQL?
Hive is optimized for query throughput, while Presto is optimized for latency.
Presto has a limitation on the maximum amount of memory that each task in a
query can store, so if a query requires a large amount of memory, the query
simply fails
5.Define HDFS
HDFS stands for Hadoop Distributed File System. The Hadoop Distributed
File System (HDFS) is the primary data storage system used by Hadoop
applications. HDFS employs a NameNode and DataNode architecture to
implement a distributed file system that provides high-performance access to
data across highly scalable Hadoop clusters.
With HDFS, data is written on the server once, and read and reused
numerous times after that. HDFS has a primary NameNode, which keeps
track of where file data is kept in the cluster.
Instagram Post - Click and
Follow