0% found this document useful (0 votes)

26 views3 pages

LT Mindtree

The document outlines various SQL conditions, methods for handling missing data in PySpark, and the backend processes involved when submitting a Spark job in Databricks. It also discusses query acceleration techniques, ways to delete duplicate records, performance optimization in Spark, data transfer between dashboards, and hands-on experience with big data tools. Additionally, it describes the SSO process between Snowflake and Azure Active Directory, emphasizing SAML-based authentication and trust relationships.

Uploaded by

rameshkudipati015

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views3 pages

LT Mindtree

Uploaded by

rameshkudipati015

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 3

1) SQL what are the condition used in sql?

when we have table but we want create

SQL conditions are used to filter data based on specified criteria. Common
conditions include WHERE, AND, OR, IN, BETWEEN, etc.
Common SQL conditions include WHERE, AND, OR, IN, BETWEEN, LIKE, etc.

Conditions are used to filter data based on specified criteria in SQL queries.

Examples: WHERE salary > 50000, AND department = 'IT', OR age < 30

2) How to handle missing data in pyspark dataframe.

Handle missing data in pyspark dataframe by using functions like dropna, fillna, or
replace.
Use dropna() function to remove rows with missing data

Use fillna() function to fill missing values with a specified value

Use replace() function to replace missing values with a specified value

3) In Databricks, when a spark is submitted, what happens at backend. Explain the

flow?

When a spark is submitted in Databricks, several backend processes are triggered to

execute the job.
The submitted spark job is divided into tasks by the Spark driver.

The tasks are then scheduled to run on the available worker nodes in the cluster.

The worker nodes execute the tasks and return the results to the driver.

The driver aggregates the results and presents them to the user.

Various optimizations such as data shuffling and caching may be applied during the
execution process.

4) How does query acceleration speed up query processing?

Ans. Query acceleration speeds up query processing by optimizing query execution
and reducing the time taken to retrieve data.
Query acceleration uses techniques like indexing, partitioning, and caching to
optimize query execution.

It reduces the time taken to retrieve data by minimizing disk I/O and utilizing in-
memory processing.

Examples include using columnar storage formats like Parquet or optimizing join
operations.

Q5. How would you delete duplicate records from a table?

Ans. To delete duplicate records from a table, you can use the DELETE statement
with a self-join or subquery.
Identify the duplicate records using a self-join or subquery

Use the DELETE statement to remove the duplicate records

Consider using a temporary table to store the unique records before deleting the
duplicates
Q6. duplicate table how we create? window function? types of joins? explain each
join?
Ans. To duplicate a table, use CREATE TABLE AS or INSERT INTO SELECT. Window
functions are used for calculations across a set of table rows. Types of joins
include INNER, LEFT, RIGHT, and FULL OUTER joins.
To duplicate a table, use CREATE TABLE AS or INSERT INTO SELECT

Window functions are used for calculations across a set of table rows

Types of joins include INNER, LEFT, RIGHT, and FULL OUTER joins

Explain each join: INNER - returns rows when there is at least one match in both
tables,
LEFT - returns all rows from the left table and the matched rows from the right
table,
RIGHT - returns all rows from the right table and the matched rows from the left
table,
FULL OUTER - returns rows when there is a match in one of the tables

Q7. How do you do to performance optimization in Spark?

Ans. Performance optimization in Spark involves tuning configurations, optimizing
code, and utilizing caching.
Tune Spark configurations such as executor memory, cores, and parallelism

Optimize code by reducing unnecessary shuffles, using efficient transformations,

and avoiding unnecessary data movements

Utilize caching to store intermediate results in memory for faster access

Q8. How to filter data from A dashboard to B dashboard?

Ans. Use data connectors or APIs to extract and transfer data from one dashboard to
another.
Utilize data connectors or APIs provided by the dashboard platforms to extract data
from A dashboard.

Transform the data as needed to match the format of B dashboard.

Use data connectors or APIs of B dashboard to transfer the filtered data from A
dashboard to B dashboard.

Q9. Do you have hands on experience on big data tools

Ans. Yes, I have hands-on experience with big data tools.
I have worked extensively with Hadoop, Spark, and Kafka.

I have experience with data ingestion, processing, and storage using these tools.

I have also worked with NoSQL databases like Cassandra and MongoDB.

I am familiar with data warehousing concepts and have worked with tools like
Redshift and Snowflake.

Q10. 4) Describe the SSO process between Snowflake and Azure Active Directory.
Ans. SSO process between Snowflake and Azure Active Directory involves configuring
SAML-based authentication.
Configure Snowflake to use SAML authentication with Azure AD as the identity
provider

Set up a trust relationship between Snowflake and Azure AD

Users authenticate through Azure AD and are granted access to Snowflake resources

SSO eliminates the need for separate logins and passwords for Snowflake and Azure
AD

Big Data Engineering Interview Questions
67% (3)
Big Data Engineering Interview Questions
189 pages
Azure Data Engineering Interview Q & A - Topicwise
100% (1)
Azure Data Engineering Interview Q & A - Topicwise
57 pages
Databricks Vs SQL Cheat Sheet
No ratings yet
Databricks Vs SQL Cheat Sheet
11 pages
Databricks Certified Data Engineer Associate 4
100% (1)
Databricks Certified Data Engineer Associate 4
13 pages
Mastercard Data Engineer Interview Questions
No ratings yet
Mastercard Data Engineer Interview Questions
16 pages
SQL, Python, Azure Interview Questions
No ratings yet
SQL, Python, Azure Interview Questions
8 pages
SQL Server 2014 Development Essentials
From Everand
SQL Server 2014 Development Essentials
Basit A. Masood-Al-Farooq
4.5/5 (2)
50 PySpark Interview Questions 1732556477
No ratings yet
50 PySpark Interview Questions 1732556477
7 pages
Big Data Interview Questions 1690738892
No ratings yet
Big Data Interview Questions 1690738892
189 pages
Barclays Data Engineer Interview Questions
No ratings yet
Barclays Data Engineer Interview Questions
17 pages
TCS Azure Data Engineer Interview Questions and Answers
No ratings yet
TCS Azure Data Engineer Interview Questions and Answers
7 pages
Report Zazmic Inc. Senior Middle Data Engineer Hiring Test AWS Snowflake Databricks Python SQL Kalgaonkarsiddhesh
No ratings yet
Report Zazmic Inc. Senior Middle Data Engineer Hiring Test AWS Snowflake Databricks Python SQL Kalgaonkarsiddhesh
36 pages
DBMS
No ratings yet
DBMS
29 pages
Data Management and Analysis: Reda Alhajj Mohammad Moshirpour Behrouz Far Editors
100% (1)
Data Management and Analysis: Reda Alhajj Mohammad Moshirpour Behrouz Far Editors
261 pages
1744827782701
No ratings yet
1744827782701
20 pages
Full Stack Java Development Course
No ratings yet
Full Stack Java Development Course
4 pages
LSMW Step-By-Step Guide
No ratings yet
LSMW Step-By-Step Guide
22 pages
Project
No ratings yet
Project
939 pages
Information Management Reviewer
No ratings yet
Information Management Reviewer
13 pages
Top 100 SQL Interview Questions and Answers (2025)
No ratings yet
Top 100 SQL Interview Questions and Answers (2025)
50 pages
PRA Interview
No ratings yet
PRA Interview
18 pages
Installation Steps For 19c Oracle
No ratings yet
Installation Steps For 19c Oracle
27 pages
Project Report Final
No ratings yet
Project Report Final
40 pages
INTERVIEW QUESTIONS - ALL Companies
No ratings yet
INTERVIEW QUESTIONS - ALL Companies
15 pages
Interview Q & A (SQL Spark HIVE Airflow AWS Kafka) - 1
No ratings yet
Interview Q & A (SQL Spark HIVE Airflow AWS Kafka) - 1
25 pages
100 Interview Questions
No ratings yet
100 Interview Questions
15 pages
Senior Data Engineer Qna
No ratings yet
Senior Data Engineer Qna
4 pages
Data Retrieval and Analysis 2023R1
100% (1)
Data Retrieval and Analysis 2023R1
132 pages
DSV-S7 Data Collection and Data Pre Processing Overview
No ratings yet
DSV-S7 Data Collection and Data Pre Processing Overview
28 pages
Apache Spark
No ratings yet
Apache Spark
62 pages
Data Analyst Interview Questions
No ratings yet
Data Analyst Interview Questions
9 pages
Data Engineer
No ratings yet
Data Engineer
19 pages
The Ultimate Guide of SQL
No ratings yet
The Ultimate Guide of SQL
28 pages
Q1. Difference Between Cache and Pe
No ratings yet
Q1. Difference Between Cache and Pe
13 pages
Wipro Data Analyst Interview Questions
No ratings yet
Wipro Data Analyst Interview Questions
29 pages
SQL Questions
No ratings yet
SQL Questions
25 pages
Unit-5 Spark SQL and Spark Streaming
No ratings yet
Unit-5 Spark SQL and Spark Streaming
24 pages
Tiger Analytics 1735834470
No ratings yet
Tiger Analytics 1735834470
27 pages
Important SQL Q&A
No ratings yet
Important SQL Q&A
18 pages
CDE Sample Interview Questions
No ratings yet
CDE Sample Interview Questions
10 pages
Top 30 MS Excel Interview Questions and Answers (2024) - Naukri Code 360
No ratings yet
Top 30 MS Excel Interview Questions and Answers (2024) - Naukri Code 360
16 pages
Top SQL Interview Questions With Solutions
No ratings yet
Top SQL Interview Questions With Solutions
9 pages
Apache Spark - Practices
No ratings yet
Apache Spark - Practices
24 pages
Data Analyst Interview Questions
No ratings yet
Data Analyst Interview Questions
12 pages
Top 50 Industry-Relevant Data Analyst Interview Q - A
No ratings yet
Top 50 Industry-Relevant Data Analyst Interview Q - A
5 pages
Tech Mahindra
No ratings yet
Tech Mahindra
2 pages
Question
No ratings yet
Question
6 pages
Pyq 435
No ratings yet
Pyq 435
1 page
DBMS 02
No ratings yet
DBMS 02
13 pages
Imp Pyspark Questions
No ratings yet
Imp Pyspark Questions
1 page
Airflow Notes
No ratings yet
Airflow Notes
5 pages
Questions
No ratings yet
Questions
4 pages
Rdbms
No ratings yet
Rdbms
5 pages
Advance SQL
No ratings yet
Advance SQL
12 pages
Python SQL Quiz With Answers
No ratings yet
Python SQL Quiz With Answers
2 pages
Questions For Preparation
No ratings yet
Questions For Preparation
9 pages
Pyspark 1
No ratings yet
Pyspark 1
7 pages
Midterm Exam Multiple Choice
No ratings yet
Midterm Exam Multiple Choice
8 pages
Midterm Exam Multiple Choice
No ratings yet
Midterm Exam Multiple Choice
8 pages
Prac
No ratings yet
Prac
4 pages
Advanced SQL Topics in Snowflake
No ratings yet
Advanced SQL Topics in Snowflake
4 pages
Snowpro™ Advanced: Architect: Exam Study Guide
No ratings yet
Snowpro™ Advanced: Architect: Exam Study Guide
10 pages
Internal Mock Ques
No ratings yet
Internal Mock Ques
6 pages
Interviewsss
No ratings yet
Interviewsss
4 pages
SQL Interview Success From Beginner To Pro
From Everand
SQL Interview Success From Beginner To Pro
Shana
No ratings yet
The Most Commonly Used SQL Queries
No ratings yet
The Most Commonly Used SQL Queries
29 pages
100+ Data Analyst Interview QnA PDF
No ratings yet
100+ Data Analyst Interview QnA PDF
19 pages
ChatLog DevOps - Oct - Evening - 2022 2022-11-06 18 - 48
No ratings yet
ChatLog DevOps - Oct - Evening - 2022 2022-11-06 18 - 48
1 page
Data Cleaning: A Brief Guide To
No ratings yet
Data Cleaning: A Brief Guide To
15 pages
BR Tools To Back Up The Oracle Database
No ratings yet
BR Tools To Back Up The Oracle Database
12 pages
Recruitment of Specialist Officers/Domain Experts On Contractual Basis
No ratings yet
Recruitment of Specialist Officers/Domain Experts On Contractual Basis
6 pages
SQL Interview Qns-4
No ratings yet
SQL Interview Qns-4
9 pages
Microsoft Azure Database Administrator DP 300
From Everand
Microsoft Azure Database Administrator DP 300
Manish Soni
No ratings yet
Interview Q
No ratings yet
Interview Q
2 pages
KBKrishnaTeja Interview Questions
No ratings yet
KBKrishnaTeja Interview Questions
2 pages
Introduction To Business Intelligence and Data Modeling
No ratings yet
Introduction To Business Intelligence and Data Modeling
15 pages
Tuning
No ratings yet
Tuning
20 pages
IDS Unit3
No ratings yet
IDS Unit3
16 pages
Imp Mid Sem
No ratings yet
Imp Mid Sem
8 pages
Gmail - Registration Confirmed - EY GDS Mega Recruitment Drive
No ratings yet
Gmail - Registration Confirmed - EY GDS Mega Recruitment Drive
2 pages
Online Canteen Food Ordering System: N. Durga Swathi & T. Durga
No ratings yet
Online Canteen Food Ordering System: N. Durga Swathi & T. Durga
5 pages
01 Navigation Basics and Organization
No ratings yet
01 Navigation Basics and Organization
2 pages
BYD Company Interview Questions
No ratings yet
BYD Company Interview Questions
1 page
SQL Day 2
No ratings yet
SQL Day 2
6 pages
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet
DBMS Lab Manual
From Everand
DBMS Lab Manual
Jitendra Patel
1.5/5 (3)
Avalonbay'S Guide To Successfully Upgrading To Peopletools 8.57
No ratings yet
Avalonbay'S Guide To Successfully Upgrading To Peopletools 8.57
40 pages
AperitB和AperturedB和HuggingFace Smolagents - 由Haziqa - 2025年2月 - 中等的 - Agentic RAG with ApertureDB and HuggingFace SmolAgents - by Haziqa - Feb, 2025 - Medium
No ratings yet
AperitB和AperturedB和HuggingFace Smolagents - 由Haziqa - 2025年2月 - 中等的 - Agentic RAG with ApertureDB and HuggingFace SmolAgents - by Haziqa - Feb, 2025 - Medium
26 pages
Module 3 - Server Services
No ratings yet
Module 3 - Server Services
14 pages
Big Data For Data Economy
No ratings yet
Big Data For Data Economy
18 pages
SAP HANA Studio Service Connection: Symptom
No ratings yet
SAP HANA Studio Service Connection: Symptom
4 pages
DATA Base System Class Notes 2024
No ratings yet
DATA Base System Class Notes 2024
22 pages
Unit 3.BigData Notes
No ratings yet
Unit 3.BigData Notes
19 pages
PostgreSQL - Performance Analysis & Tuning
No ratings yet
PostgreSQL - Performance Analysis & Tuning
3 pages
Cs Questions For Students
No ratings yet
Cs Questions For Students
6 pages
Anita Bhandare
No ratings yet
Anita Bhandare
2 pages
Laserfiche Security Datasheet
No ratings yet
Laserfiche Security Datasheet
2 pages
Answers
No ratings yet
Answers
5 pages
HTTP Methods For RESTful Services
No ratings yet
HTTP Methods For RESTful Services
2 pages
Requirements of Hostel Management System. (S.E)
No ratings yet
Requirements of Hostel Management System. (S.E)
4 pages

LT Mindtree

Uploaded by

LT Mindtree

Uploaded by

1) SQL what are the condition used in sql?

when we have table but we want create

2) How to handle missing data in pyspark dataframe.

Use fillna() function to fill missing values with a specified value

Use replace() function to replace missing values with a specified value

3) In Databricks, when a spark is submitted, what happens at backend. Explain the

When a spark is submitted in Databricks, several backend processes are triggered to

4) How does query acceleration speed up query processing?

Q5. How would you delete duplicate records from a table?

Use the DELETE statement to remove the duplicate records

Q7. How do you do to performance optimization in Spark?

Optimize code by reducing unnecessary shuffles, using efficient transformations,

Utilize caching to store intermediate results in memory for faster access

Q8. How to filter data from A dashboard to B dashboard?

Transform the data as needed to match the format of B dashboard.

Q9. Do you have hands on experience on big data tools

Set up a trust relationship between Snowflake and Azure AD

You might also like