LT Mindtree
LT Mindtree
SQL conditions are used to filter data based on specified criteria. Common
conditions include WHERE, AND, OR, IN, BETWEEN, etc.
Common SQL conditions include WHERE, AND, OR, IN, BETWEEN, LIKE, etc.
Conditions are used to filter data based on specified criteria in SQL queries.
Examples: WHERE salary > 50000, AND department = 'IT', OR age < 30
Handle missing data in pyspark dataframe by using functions like dropna, fillna, or
replace.
Use dropna() function to remove rows with missing data
The tasks are then scheduled to run on the available worker nodes in the cluster.
The worker nodes execute the tasks and return the results to the driver.
The driver aggregates the results and presents them to the user.
Various optimizations such as data shuffling and caching may be applied during the
execution process.
It reduces the time taken to retrieve data by minimizing disk I/O and utilizing in-
memory processing.
Examples include using columnar storage formats like Parquet or optimizing join
operations.
Consider using a temporary table to store the unique records before deleting the
duplicates
Q6. duplicate table how we create? window function? types of joins? explain each
join?
Ans. To duplicate a table, use CREATE TABLE AS or INSERT INTO SELECT. Window
functions are used for calculations across a set of table rows. Types of joins
include INNER, LEFT, RIGHT, and FULL OUTER joins.
To duplicate a table, use CREATE TABLE AS or INSERT INTO SELECT
Window functions are used for calculations across a set of table rows
Types of joins include INNER, LEFT, RIGHT, and FULL OUTER joins
Explain each join: INNER - returns rows when there is at least one match in both
tables,
LEFT - returns all rows from the left table and the matched rows from the right
table,
RIGHT - returns all rows from the right table and the matched rows from the left
table,
FULL OUTER - returns rows when there is a match in one of the tables
Use data connectors or APIs of B dashboard to transfer the filtered data from A
dashboard to B dashboard.
I have experience with data ingestion, processing, and storage using these tools.
I have also worked with NoSQL databases like Cassandra and MongoDB.
I am familiar with data warehousing concepts and have worked with tools like
Redshift and Snowflake.
Q10. 4) Describe the SSO process between Snowflake and Azure Active Directory.
Ans. SSO process between Snowflake and Azure Active Directory involves configuring
SAML-based authentication.
Configure Snowflake to use SAML authentication with Azure AD as the identity
provider
SSO eliminates the need for separate logins and passwords for Snowflake and Azure
AD