0% found this document useful (0 votes)

44 views4 pages

BASF Interview QA

Uploaded by

Fitri Afiq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views4 pages

BASF Interview QA

Uploaded by

Fitri Afiq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

BASF Interview Q&A - Muhammad Fitri Afiq Bin Mohd Rashid

BASF Data Engineer for Supply Chains - Mock Q&A

1. Walk me through how you build a data pipeline end-to-end.

To build a data pipeline, I start by understanding the business requirements and identifying the data sources.

Then, I ingest the data using tools like Azure Data Factory or Databricks Auto Loader. I store the data in

Delta Lake for reliability and ACID compliance. Transformation is handled using PySpark in Databricks,

including cleaning, standardizing, and joining data. I schedule the pipeline using Databricks Workflows and

monitor it via Azure Monitor. I apply best practices like modular coding, schema validation, and error handling

throughout.

2. How do you handle schema changes or data drift in production pipelines?

I implement schema validation checks at the ingestion layer to detect changes early. For optional or new

columns, I apply dynamic schema inference where appropriate. If schema drift is a recurring issue, I work

with upstream teams to enforce data contracts. I also log and alert deviations using built-in monitoring tools.

3. What are the benefits of using Delta Lake in Databricks?

Delta Lake provides ACID transactions, schema enforcement, time travel, and versioning, which makes it

ideal for production data pipelines. It also allows for efficient updates and deletes, which traditional data lakes

struggle with. This improves data quality, reliability, and pipeline resilience.

4. How do you optimize performance when working with large datasets in Spark?

I optimize performance by using partitioning, caching intermediate datasets, minimizing shuffles, and tuning

the cluster size. I also ensure data types are appropriate and avoid wide transformations when possible.

Broadcasting small lookup tables is another trick I use to speed up joins.

5. What components of Azure have you used for data engineering or ML workflows?

I've worked with Azure Data Factory for ETL orchestration, Azure Blob Storage for raw data, Azure SQL DB

for structured outputs, Azure Key Vault for credentials, and Azure Databricks for transformation and ML

modeling. I also use Azure DevOps for CI/CD.

6. How do you orchestrate data pipelines in Azure?

I use Azure Data Factory pipelines or Databricks Workflows depending on the use case. For ADF, I define

linked services, datasets, and activities for each step. For Databricks Workflows, I configure jobs and

clusters, and link notebooks with dependencies. Monitoring and alerting are set up via Azure Monitor.

7. How do you handle access control and security in Databricks?

I implement role-based access control (RBAC) using Azure Active Directory. I restrict workspace and cluster

access by role. I use secrets stored in Azure Key Vault and referenced securely in notebooks. I also enable

audit logging and review cluster policies regularly.

8. Explain how you would deploy a data pipeline that pulls from SAP and writes into a data lake.

I'd use Azure Data Factory to connect to SAP (e.g., via OData or SAP connector) and extract the required

data. I'd perform preliminary validation, land it in Azure Blob Storage, and use Databricks for further

transformation. Final outputs would be stored in Delta format in ADLS, and monitored with alerts for any

ingestion failures.

9. Write a SQL query to find the top 5 suppliers by total delay in shipments.

SELECT supplier_id, SUM(delay_days) AS total_delay

FROM shipment_data

GROUP BY supplier_id

ORDER BY total_delay DESC

LIMIT 5;

10. How do you write efficient Python code for ETL?

I use vectorized operations with Pandas or PySpark, avoid unnecessary loops, and modularize code into

reusable functions. I also log key steps and handle exceptions gracefully. For large data, I prefer PySpark

over Pandas due to its distributed computing.

11. Have you used Pandas vs PySpark? When do you use each?

Yes. I use Pandas for small to medium datasets that fit in memory, typically for local prototyping. I use

PySpark for large datasets that require distributed processing, especially when running in Databricks or on
big data clusters.

12. How do you handle null values and data validation in Python?

I use functions like .fillna(), .dropna(), or custom imputation logic. For validation, I use assertions, schema

checks, and row-level filtering. In PySpark, I define schemas explicitly and use filters or when/otherwise logic

to clean data.

13. How would you prepare supply chain data to be used in a GenAI application?

I'd clean and structure the data into context-rich formats. For example, convert supplier delivery logs into

natural language summaries. I'd use semantic tagging to label entities and relationships (e.g., vendor ->

product -> delay_reason). I'd also consider embedding structured summaries for retrieval-augmented

generation if using LLMs.

14. What's your understanding of semantic layers and ontologies in data platforms?

A semantic layer abstracts raw data into business-friendly terms (e.g., "Total Delivered Quantity" instead of

qty_delivd). Ontologies define the relationships between data entities, like how a vendor relates to a shipment

or invoice. These are crucial for GenAI and self-serve BI tools to interpret data meaningfully.

15. What KPIs would you track in a supply chain analytics dashboard?

On-time delivery rate, supplier reliability, inventory turnover, stockout rate, average shipment delay, demand

forecast accuracy, and fill rate. I'd visualize these by supplier, region, and product category.

16. How would you build a forecasting model for inventory demand?

I'd aggregate historical sales/order data, perform time series decomposition, and apply models like ARIMA,

Prophet, or LSTM based on seasonality and volume. I'd also include external factors like lead time, holidays,

and promotions as features.

17. Have you ever helped solve a supply chain issue using data?

Yes, at PETRONAS I worked on a project that monitored upstream tank levels in real time. Our data solution

helped flag abnormal consumption patterns, leading to faster restocking decisions and preventing potential

shutdowns.
18. How do you work in agile teams with product owners and data scientists?

I attend sprint planning and daily stand-ups, contribute to story grooming, and ensure my deliverables (e.g.,

pipelines, transformations) align with the product vision. I often collaborate with data scientists to shape

datasets that match model needs and iterate based on feedback.

19. Have you used Azure DevOps boards or GitHub Actions for CI/CD?

Yes, I've used Azure DevOps Boards for tracking tasks and GitHub Actions for automating testing and

deployment of notebooks and pipelines. I've also worked with Git branching strategies for collaborative

development.

20. Tell us about a time you had to deliver something under pressure or with limited data.

Once I was asked to build a backfill pipeline within 2 days for missing EPH-OPU data. With incomplete

documentation, I reverse-engineered available logs, validated key fields with SMEs, and delivered an

automated solution that filled a 3-week gap successfully. This reduced reporting delays and was later

repurposed for anomaly detection.

Azure Comapny Wise Question
No ratings yet
Azure Comapny Wise Question
68 pages
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Learning Informatica PowerCenter 9.x
From Everand
Learning Informatica PowerCenter 9.x
Rahul Malewar
3/5 (4)
BAD Sample Code
No ratings yet
BAD Sample Code
2 pages
My Walmart Interviewexperience Answers
No ratings yet
My Walmart Interviewexperience Answers
13 pages
HCL Interview Prepration
No ratings yet
HCL Interview Prepration
4 pages
Efficient Data Preparation with AWS Glue DataBrew: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Preparation with AWS Glue DataBrew: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
Azure de QSN and Ans
No ratings yet
Azure de QSN and Ans
16 pages
Marketing Questions - Updated
No ratings yet
Marketing Questions - Updated
6 pages
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
From Everand
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
Will Girten
No ratings yet
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
From Everand
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Azure Etl 1741608374
No ratings yet
Azure Etl 1741608374
14 pages
Practical TimescaleDB Solutions: Definitive Reference for Developers and Engineers
From Everand
Practical TimescaleDB Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Airflow for Data Workflow Automation
From Everand
Airflow for Data Workflow Automation
Richard Johnson
No ratings yet
Data Lake Development with Big Data: Explore architectural approaches to building Data Lakes that ingest, index, manage, and analyze massive amounts of data using Big Data technologies
From Everand
Data Lake Development with Big Data: Explore architectural approaches to building Data Lakes that ingest, index, manage, and analyze massive amounts of data using Big Data technologies
Pradeep Pasupuleti
No ratings yet
Tcs DE INTERVIEW Q&A2025
No ratings yet
Tcs DE INTERVIEW Q&A2025
12 pages
Dataproc Administration and Engineering Solutions: Definitive Reference for Developers and Engineers
From Everand
Dataproc Administration and Engineering Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
ETL Question and Answers
No ratings yet
ETL Question and Answers
6 pages
Essential Guide to DataStage Systems: Definitive Reference for Developers and Engineers
From Everand
Essential Guide to DataStage Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Effective Business Intelligence with QuickSight
From Everand
Effective Business Intelligence with QuickSight
Rajesh Nadipalli
No ratings yet
Data Integration with Blendo: Definitive Reference for Developers and Engineers
From Everand
Data Integration with Blendo: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Eng
No ratings yet
Data Eng
10 pages
Databricks Platform Essentials: Definitive Reference for Developers and Engineers
From Everand
Databricks Platform Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
InfluxDB Essentials: Definitive Reference for Developers and Engineers
From Everand
InfluxDB Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Dataiku Platform Foundations: Definitive Reference for Developers and Engineers
From Everand
Dataiku Platform Foundations: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Ade Companywise Interview
No ratings yet
Ade Companywise Interview
133 pages
Azure DE Interview Que
100% (1)
Azure DE Interview Que
25 pages
Google Cloud Data Engineer 100+ Practice Exam Questions With Well Explained Answers
From Everand
Google Cloud Data Engineer 100+ Practice Exam Questions With Well Explained Answers
vivian njoroge
No ratings yet
Data Pipeline Automation with Airbyte: Definitive Reference for Developers and Engineers
From Everand
Data Pipeline Automation with Airbyte: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Synapse Administration and Deployment: The Complete Guide for Developers and Engineers
From Everand
Synapse Administration and Deployment: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Advanced Apache Tez Techniques: Definitive Reference for Developers and Engineers
From Everand
Advanced Apache Tez Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Amazon Athena Query Design and Optimization: Definitive Reference for Developers and Engineers
From Everand
Amazon Athena Query Design and Optimization: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Lakes & Pipelines: A Modern Azure Guide
From Everand
Data Lakes & Pipelines: A Modern Azure Guide
Kameron Hussain
No ratings yet
Learning Cascading
From Everand
Learning Cascading
Michael Covert
No ratings yet
Alteryx Workflow Automation and Data Transformation: Definitive Reference for Developers and Engineers
From Everand
Alteryx Workflow Automation and Data Transformation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Advanced Project For Data Engineering in Azure
100% (1)
Advanced Project For Data Engineering in Azure
5 pages
Sqoop Essentials: Definitive Reference for Developers and Engineers
From Everand
Sqoop Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Comprehensive Guide to Azure HDInsight: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Azure HDInsight: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Pandas Essentials for Data Analysis: Definitive Reference for Developers and Engineers
From Everand
Pandas Essentials for Data Analysis: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
DataGrip Essentials: Definitive Reference for Developers and Engineers
From Everand
DataGrip Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Time-Series Data Management with TimescaleDB: The Complete Guide for Developers and Engineers
From Everand
Efficient Time-Series Data Management with TimescaleDB: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Engineering Data Mesh in Azure Cloud: Implement data mesh using Microsoft Azure's Cloud Adoption Framework
From Everand
Engineering Data Mesh in Azure Cloud: Implement data mesh using Microsoft Azure's Cloud Adoption Framework
Aniruddha Deswandikar
No ratings yet
Superset Data Exploration and Analysis Framework: Definitive Reference for Developers and Engineers
From Everand
Superset Data Exploration and Analysis Framework: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Advanced Interview QA ADF Databricks PowerBI
No ratings yet
Advanced Interview QA ADF Databricks PowerBI
3 pages
Efficient Data Querying with Drill: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Querying with Drill: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Data Processing with Apache Pig: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Processing with Apache Pig: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Engineer Toolkit in 2025 - Must Have Skills, Tools & Resources - by Vijay Gadhave - May, 2025 - Medium
No ratings yet
Data Engineer Toolkit in 2025 - Must Have Skills, Tools & Resources - by Vijay Gadhave - May, 2025 - Medium
15 pages
Azure Synapse Analytics Solutions: Definitive Reference for Developers and Engineers
From Everand
Azure Synapse Analytics Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
MasterCard Data Engineering
No ratings yet
MasterCard Data Engineering
17 pages
Metaflow for Data Science Workflows: The Complete Guide for Developers and Engineers
From Everand
Metaflow for Data Science Workflows: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Talend Data Integration Essentials: Definitive Reference for Developers and Engineers
From Everand
Talend Data Integration Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
ETL Interview Preparation
No ratings yet
ETL Interview Preparation
18 pages
Hadoop Blueprints
From Everand
Hadoop Blueprints
Anurag Shrivastava
No ratings yet
Python Data Science Cookbook
From Everand
Python Data Science Cookbook
Taryn Voska
No ratings yet
Python Data Science Cookbook: Practical solutions across fast data cleaning, processing, and machine learning workflows with pandas, NumPy, and scikit-learn
From Everand
Python Data Science Cookbook: Practical solutions across fast data cleaning, processing, and machine learning workflows with pandas, NumPy, and scikit-learn
Taryn Voska
No ratings yet
StreamSets Data Integration Architecture and Design: The Complete Guide for Developers and Engineers
From Everand
StreamSets Data Integration Architecture and Design: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Data - Engineer Questions
No ratings yet
Data - Engineer Questions
3 pages
dMBA Ultimateguide v1
No ratings yet
dMBA Ultimateguide v1
50 pages
Climbing Grammar Mountain Game Could Improve The Ability of The Students of Grade VII B of SMP Satu Atap 2 Batukandik
100% (1)
Climbing Grammar Mountain Game Could Improve The Ability of The Students of Grade VII B of SMP Satu Atap 2 Batukandik
44 pages
Journey of Financial Technology (Fintech) : A Systematic Literature Review and Future Research Agenda
No ratings yet
Journey of Financial Technology (Fintech) : A Systematic Literature Review and Future Research Agenda
20 pages
Resume Internship
No ratings yet
Resume Internship
1 page
SQL Joins Cheat Sheet
No ratings yet
SQL Joins Cheat Sheet
6 pages
SIP ARICENT Latest 20th Jan
No ratings yet
SIP ARICENT Latest 20th Jan
69 pages
Building A Data-Driven Organization
100% (1)
Building A Data-Driven Organization
14 pages
Glossary of BitTorrent Terms
No ratings yet
Glossary of BitTorrent Terms
4 pages
SB Hain
No ratings yet
SB Hain
16 pages
Assignment 3 Vedprakash
No ratings yet
Assignment 3 Vedprakash
3 pages
2 - Vocabulary-1
No ratings yet
2 - Vocabulary-1
4 pages
Problem Set Part A: Caches Direct-Mapped Cache: ECE 3056 Architecture, Concurrency, and Energy in Computation
No ratings yet
Problem Set Part A: Caches Direct-Mapped Cache: ECE 3056 Architecture, Concurrency, and Energy in Computation
12 pages
Data Cube Technology
No ratings yet
Data Cube Technology
20 pages
OpenText EDOCS DM 16.7 Registry Guide
No ratings yet
OpenText EDOCS DM 16.7 Registry Guide
150 pages
NT-1698W V2.2
No ratings yet
NT-1698W V2.2
29 pages
Chapter 15 Business Studies
No ratings yet
Chapter 15 Business Studies
4 pages
The Views of Teachers Towards Perception of Discipline in Schools
No ratings yet
The Views of Teachers Towards Perception of Discipline in Schools
6 pages
TY Jitendra-Jadhav Dinesh-Mahajan
No ratings yet
TY Jitendra-Jadhav Dinesh-Mahajan
19 pages
Modern Business Analytics: Practical Data Science For Decision-Making - Ebook PDF Download
No ratings yet
Modern Business Analytics: Practical Data Science For Decision-Making - Ebook PDF Download
50 pages
Dbms Unit 1-1
No ratings yet
Dbms Unit 1-1
87 pages
Java Collection Matrix
No ratings yet
Java Collection Matrix
1 page
Python Dictionaries Cheat Sheet
No ratings yet
Python Dictionaries Cheat Sheet
9 pages
Big Data Technologies Course Outline
No ratings yet
Big Data Technologies Course Outline
2 pages
Cimplicity HMI 1
100% (1)
Cimplicity HMI 1
1,052 pages
Unix Command Unit 1
No ratings yet
Unix Command Unit 1
40 pages
CRUD in Servlet
No ratings yet
CRUD in Servlet
16 pages
Holgersson Et Al 2024 Open Innovation in The Age of Ai
No ratings yet
Holgersson Et Al 2024 Open Innovation in The Age of Ai
16 pages
Ebook PDF Core Concepts of Accounting Information Systems Canadian Edition PDF
100% (49)
Ebook PDF Core Concepts of Accounting Information Systems Canadian Edition PDF
42 pages
Influence of Project Managers Leadership Style On Project Implementation
No ratings yet
Influence of Project Managers Leadership Style On Project Implementation
9 pages

BASF Interview QA

Uploaded by

BASF Interview QA

Uploaded by

BASF Interview Q&A - Muhammad Fitri Afiq Bin Mohd Rashid

BASF Data Engineer for Supply Chains - Mock Q&A

1. Walk me through how you build a data pipeline end-to-end.

2. How do you handle schema changes or data drift in production pipelines?

3. What are the benefits of using Delta Lake in Databricks?

Broadcasting small lookup tables is another trick I use to speed up joins.

modeling. I also use Azure DevOps for CI/CD.

7. How do you handle access control and security in Databricks?

audit logging and review cluster policies regularly.

SELECT supplier_id, SUM(delay_days) AS total_delay

ORDER BY total_delay DESC

10. How do you write efficient Python code for ETL?

over Pandas due to its distributed computing.

generation if using LLMs.

and promotions as features.

datasets that match model needs and iterate based on feedback.

repurposed for anomaly detection.

You might also like