0% found this document useful (0 votes)
24 views1 page

EY Mock

The document outlines a set of interview questions for data engineers with 3-4 years of experience, covering topics such as SQL, Python, PySpark, Azure Data Engineering, data modeling, and behavioral scenarios. It includes specific technical questions about SQL functions, data manipulation, and big data concepts, as well as practical scenarios related to data pipeline management and optimization. The questions aim to assess both technical skills and problem-solving abilities in real-world situations.

Uploaded by

gupta.ayushi2425
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views1 page

EY Mock

The document outlines a set of interview questions for data engineers with 3-4 years of experience, covering topics such as SQL, Python, PySpark, Azure Data Engineering, data modeling, and behavioral scenarios. It includes specific technical questions about SQL functions, data manipulation, and big data concepts, as well as practical scenarios related to data pipeline management and optimization. The questions aim to assess both technical skills and problem-solving abilities in real-world situations.

Uploaded by

gupta.ayushi2425
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 1

𝗘𝗬 𝗗𝗮𝘁𝗮 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿 𝗜𝗻𝘁𝗲𝗿𝘃𝗶𝗲𝘄 𝗤𝘂𝗲𝘀𝘁𝗶𝗼𝗻𝘀 (𝟯–𝟰 𝗬𝗲𝗮𝗿𝘀 𝗘𝘅𝗽𝗲𝗿𝗶𝗲𝗻𝗰𝗲)

.
.
.

🔹 𝗦𝗤𝗟 & 𝗗𝗮𝘁𝗮 𝗠𝗮𝗻𝗶𝗽𝘂𝗹𝗮𝘁𝗶𝗼𝗻


.

What is the difference between RANK(), DENSE_RANK(), and ROW_NUMBER()?


How would you find the second highest salary from an employee table?
Write a SQL query to find duplicate records in a table.
What is a CTE? When would you use it over subqueries?
Explain different types of SQL joins with real-time examples.

𝗣𝘆𝘁𝗵𝗼𝗻 𝗳𝗼𝗿 𝗗𝗮𝘁𝗮 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴

How would you handle missing values in a large dataset using Python?
Explain the difference between list, tuple, set, and dictionary.
What are Python generators? How are they useful in data pipelines?
How would you optimize a large data transformation using Pandas?
Explain multithreading vs multiprocessing in Python.

𝗣𝘆𝗦𝗽𝗮𝗿𝗸 & 𝗕𝗶𝗴 𝗗𝗮𝘁𝗮

Difference between RDD, DataFrame, and Dataset in PySpark?


How do you handle skewed data in Spark?
What are broadcast variables and accumulators?
Explain the Spark execution flow (Job → Stage → Task).
How do you optimize a PySpark job?

𝗔𝘇𝘂𝗿𝗲 𝗗𝗮𝘁𝗮 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴

What are the key components of Azure Data Factory?


Difference between Azure Blob Storage and Azure Data Lake?
How does Azure Databricks integrate with ADF?
What are triggers and pipelines in Azure Data Factory?
Explain Delta Lake. Why is it used?

𝗗𝗮𝘁𝗮 𝗠𝗼𝗱𝗲𝗹𝗶𝗻𝗴 & 𝗪𝗮𝗿𝗲𝗵𝗼𝘂𝘀𝗶𝗻𝗴

What is the difference between OLTP and OLAP?


Explain Star Schema vs Snowflake Schema.
What is data partitioning and bucketing?
What is Slowly Changing Dimension (SCD) Type 2?
How would you design a data warehouse for a retail chain?

𝗦𝗰𝗲𝗻𝗮𝗿𝗶𝗼 -𝗕𝗮𝘀𝗲𝗱 / 𝗕𝗲𝗵𝗮𝘃𝗶𝗼𝗿𝗮𝗹

How do you handle a failed pipeline in production?


Describe a time you optimized a data job and improved performance.
Have you ever dealt with data duplication issues? How did you fix it?
How do you ensure data quality in your ETL processes?
What is your approach to version control and deployment of data pipelines?

You might also like