0% found this document useful (0 votes)

25 views5 pages

DE - Test

Test for DE

Uploaded by

rupamjdash5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views5 pages

DE - Test

Test for DE

Uploaded by

rupamjdash5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Coding Test for Senior Data Engineer

Objective:
To evaluate the candidate's proficiency in data engineering,
including data pipelines, ETL processes, data storage
solutions, and distributed systems.

Duration:
2-3 hours

Submission:
Candidates should submit their code via a GitHub repository.

Task 1:Data Ingestion and ETL

Scenario:
You are working for a Data Product company that needs to
process user interaction data. The data is in CSV format
and needs to be cleaned, transformed, and loaded into a
data warehouse for analysis.

Steps:

1. Data Ingestion:

- Write a Python script to ingest the interaction data from

a given CSV file.
- The CSV file will have columns: ìnteraction_id`,
ùser_id`, `product_id`, àction`, `timestamp`.

2. Data Cleaning:

- Handle missing values by either filling with a default

value or removing the rows.

- Ensure that the data types are correct (e.g., `timestamp`

should be a datetime type).

3. Data Transformation:

- Calculate the number of interactions per user and per

product.

- Add a new column `interaction_count` to the data for each

user and product.

4. Data Loading:

- Load the cleaned and transformed data into a SQL database

(e.g., SQLite or PostgreSQL).

5. Documentation:

- Document your ETL process clearly in a README file.

Deliverables:

- Python script(s) for data ingestion, cleaning,

transformation, and loading.

- SQL database file with the loaded data.

- README file with instructions on how to run the scripts and

any dependencies required.
Task 2: Data Pipeline with Apache Airflow

Scenario:Automate the ETL process from Task 1 using Apache

Airflow.

Steps:

1. Set Up Airflow:

- Install Apache Airflow and set up a local environment.

2. Create an Airflow DAG:

- Define a DAG that includes tasks for data ingestion,

cleaning, transformation, and loading as defined in Task 1.

3. Task Scheduling:

- Schedule the DAG to run daily.

4. Error Handling:

- Implement error handling and logging within the DAG to

manage and record any issues that occur during the ETL
process.

Deliverables:

- Airflow DAG file(s).

- Instructions on how to set up and run the Airflow

environment.
Task 3: Data Storage and Retrieval

Scenario:Implement a solution to store and retrieve user

interaction data efficiently.

Steps:

1. Data Storage:

- Design a schema for storing user interaction data in a

SQL database (e.g., PostgreSQL).

- Implement the schema in the chosen database.

2. Data Retrieval:

- Write SQL queries to answer the following questions:

1. Total number of interactions per day.

2. Top 5 users by the number of interactions.

3. Most interacted products based on the number of

interactions.

3. Optimization:

- Discuss any optimization techniques you used to improve

the performance of your queries.

Deliverables:

- SQL scripts for schema creation and data insertion.

- SQL queries for data retrieval.

- A document explaining the optimization techniques used.

Instructions for Submission:

1. Create a GitHub repository and commit all your code,

scripts, and documentation.

2. Ensure that your repository is well-structured and includes

a README file with clear instructions on how to set up and run
your solutions.

3. Share the GitHub repository link for review.

Evaluation Criteria:

1. Code Quality:Readability, organization, and adherence to

best practices.

2. Functionality:Correctness and completeness of the ETL

process, Airflow DAG, and SQL queries.

3. Documentation:Clarity and thoroughness of the provided

documentation.

4. Efficiency:Performance and optimization of the data

retrieval queries.

5. Error Handling:Robustness of the ETL process and error

handling in Airflow.

Python Beyond Limits: Python, #3
From Everand
Python Beyond Limits: Python, #3
AnwaarX
No ratings yet
The Definitive Guide to PowerShell
From Everand
The Definitive Guide to PowerShell
Wesley Dunne
No ratings yet
Study Guide 300-835 CLAUTO Automating and Programming Cisco Collaboration Solutions Exam
From Everand
Study Guide 300-835 CLAUTO Automating and Programming Cisco Collaboration Solutions Exam
Anand Vemula
No ratings yet
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
From Everand
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
Will Girten
No ratings yet
Oracle SQL Developer 2.1
From Everand
Oracle SQL Developer 2.1
Sue Harper
No ratings yet
Data Engineering System Design
No ratings yet
Data Engineering System Design
37 pages
The Oracle Universal Content Management Handbook: Build, administer, and manage Oracle Stellent UCM Solutions
From Everand
The Oracle Universal Content Management Handbook: Build, administer, and manage Oracle Stellent UCM Solutions
Dmitri Khanine
5/5 (1)
Fast Data Processing Systems with SMACK Stack
From Everand
Fast Data Processing Systems with SMACK Stack
Raúl Estrada
No ratings yet
Louisiana Cookin' - January-February 2020 US PDF
100% (5)
Louisiana Cookin' - January-February 2020 US PDF
100 pages
CouchDB and PHP Web Development Beginner’s Guide
From Everand
CouchDB and PHP Web Development Beginner’s Guide
Tim Juravich
No ratings yet
JavaScript Introduction
From Everand
JavaScript Introduction
Lisa Saldivar
No ratings yet
SilverStripe 2.4 Module Extension, Themes, and Widgets: Beginner's Guide
From Everand
SilverStripe 2.4 Module Extension, Themes, and Widgets: Beginner's Guide
Philipp Krenn
No ratings yet
Alfresco Developer Guide
From Everand
Alfresco Developer Guide
Jeff Potts
No ratings yet
Project Documentation
No ratings yet
Project Documentation
36 pages
Summer Internship Report (ETSI-600) (KOUSTAV DUTTA 49)
No ratings yet
Summer Internship Report (ETSI-600) (KOUSTAV DUTTA 49)
36 pages
Agile Web Application Development with Yii1.1 and PHP5
From Everand
Agile Web Application Development with Yii1.1 and PHP5
Jeffrey Winesett
3.5/5 (1)
Ember Js Guide
No ratings yet
Ember Js Guide
295 pages
Bangladesh Impex
100% (1)
Bangladesh Impex
17 pages
DE Weekly Learning Update Fakrul
No ratings yet
DE Weekly Learning Update Fakrul
7 pages
Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial
From Everand
Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial
David Hecksel
5/5 (2)
Hrishikesh Reddy (Project)
No ratings yet
Hrishikesh Reddy (Project)
14 pages
Karthik (Project Details)
No ratings yet
Karthik (Project Details)
14 pages
Learning Informatica PowerCenter 9.x
From Everand
Learning Informatica PowerCenter 9.x
Rahul Malewar
3/5 (4)
Mastering Go Network Automation
From Everand
Mastering Go Network Automation
Ian Taylor
No ratings yet
Mastering Go Network Automation: Automating Networks, Container Orchestration, Kubernetes with Puppet, Vegeta and Apache JMeter
From Everand
Mastering Go Network Automation: Automating Networks, Container Orchestration, Kubernetes with Puppet, Vegeta and Apache JMeter
Ian Taylor
No ratings yet
Hackathon Retail
No ratings yet
Hackathon Retail
6 pages
Blaupunkt Wiring Connection-Equlizer BEQ-S2
100% (2)
Blaupunkt Wiring Connection-Equlizer BEQ-S2
2 pages
Full Level Descriptors Cambridge English Teaching Framework
No ratings yet
Full Level Descriptors Cambridge English Teaching Framework
11 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
IBM Cognos 8 Planning
From Everand
IBM Cognos 8 Planning
Jason Edwards
No ratings yet
DP-500 Designing and Implementing Enterprise-Scale Analytics Solutions Using Microsoft Azure and Microsoft Power BI Exam Guide
From Everand
DP-500 Designing and Implementing Enterprise-Scale Analytics Solutions Using Microsoft Azure and Microsoft Power BI Exam Guide
Anand Vemula
No ratings yet
ETL Testing Engineer From Python
No ratings yet
ETL Testing Engineer From Python
3 pages
ETL AWS Real Time Senario
No ratings yet
ETL AWS Real Time Senario
1 page
Study Guide Cisco 300-735 SAUTO Automating and Programming Cisco Security Solutions Exam
From Everand
Study Guide Cisco 300-735 SAUTO Automating and Programming Cisco Security Solutions Exam
Anand Vemula
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Data Engineer Test - 20231004
No ratings yet
Data Engineer Test - 20231004
2 pages
UNIT 1 Merged
No ratings yet
UNIT 1 Merged
11 pages
5-Day KVCET Bootcamp - Data Analytics
No ratings yet
5-Day KVCET Bootcamp - Data Analytics
6 pages
Learning DHTMLX Suite UI
From Everand
Learning DHTMLX Suite UI
Eli Geske
No ratings yet
Introduction to Oracle Database Administration
From Everand
Introduction to Oracle Database Administration
Ying Wang
5/5 (1)
Students Booklet 2024-2025
No ratings yet
Students Booklet 2024-2025
50 pages
Calendar Spread
No ratings yet
Calendar Spread
7 pages
5 - Linked List C++
No ratings yet
5 - Linked List C++
27 pages
Data Engineer Introduction
No ratings yet
Data Engineer Introduction
3 pages
MarketLytics DA
No ratings yet
MarketLytics DA
3 pages
How To Build Social Science Theories 1st Pamela J Shoemaker Download
No ratings yet
How To Build Social Science Theories 1st Pamela J Shoemaker Download
78 pages
History November 2006 Paper 2
No ratings yet
History November 2006 Paper 2
5 pages
353 Cpets
No ratings yet
353 Cpets
155 pages
Peaktronics: Digital High-Resolution Controller
No ratings yet
Peaktronics: Digital High-Resolution Controller
12 pages
NoSQL Injection for Elasticsearch
From Everand
NoSQL Injection for Elasticsearch
Gary Drocella
No ratings yet
C++ Basics for New Programmers: A Practical Guide with Examples
From Everand
C++ Basics for New Programmers: A Practical Guide with Examples
William E. Clark
No ratings yet
Learning Dynamics NAV Patterns: Create solutions that are easy to maintain, are quick to upgrade, and follow proven concepts and design
From Everand
Learning Dynamics NAV Patterns: Create solutions that are easy to maintain, are quick to upgrade, and follow proven concepts and design
Marije Brummel
No ratings yet
Electromechanical System Lab Manual 2
No ratings yet
Electromechanical System Lab Manual 2
8 pages
Google Cloud Data Engineer 100+ Practice Exam Questions With Well Explained Answers
From Everand
Google Cloud Data Engineer 100+ Practice Exam Questions With Well Explained Answers
vivian njoroge
No ratings yet
C# 2010 Coding Briefs Data Access
From Everand
C# 2010 Coding Briefs Data Access
Kevin Hough
No ratings yet
Chemistry Final Exam Review
No ratings yet
Chemistry Final Exam Review
23 pages
The Artist's Palette - Unveiling Personalities Through Ink Choices
No ratings yet
The Artist's Palette - Unveiling Personalities Through Ink Choices
2 pages
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet
RCF Usa Lf15g401 Specifications 46922
No ratings yet
RCF Usa Lf15g401 Specifications 46922
2 pages
Lesson 7A Fundamental Skills & Drills The Serve Set Your Goals
100% (1)
Lesson 7A Fundamental Skills & Drills The Serve Set Your Goals
4 pages
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
Adarsh Ba 1 Semester Report Card
No ratings yet
Adarsh Ba 1 Semester Report Card
1 page
AWS Solution Architect Certification Exam Practice Paper 2019
From Everand
AWS Solution Architect Certification Exam Practice Paper 2019
Tech Interviews
3.5/5 (3)
Gas Law Review Problems
No ratings yet
Gas Law Review Problems
4 pages
Assessing IT Usage - The Role of Prior Experience
No ratings yet
Assessing IT Usage - The Role of Prior Experience
11 pages
Mrunal's Weekly MockTest Pillar 2B Finance Commission Unacademy
No ratings yet
Mrunal's Weekly MockTest Pillar 2B Finance Commission Unacademy
10 pages
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
The Pioneers of Biodynamics in USA The E
No ratings yet
The Pioneers of Biodynamics in USA The E
6 pages
Knight's Microsoft SQL Server 2012 Integration Services 24-Hour Trainer
From Everand
Knight's Microsoft SQL Server 2012 Integration Services 24-Hour Trainer
Brian Knight
No ratings yet
Assignment No3
No ratings yet
Assignment No3
2 pages
Work Order JITF Water Infrastructure Limited
No ratings yet
Work Order JITF Water Infrastructure Limited
8 pages
EMT Shock Notes
No ratings yet
EMT Shock Notes
4 pages
l4 Tutorial Solution
No ratings yet
l4 Tutorial Solution
10 pages
Oracle Quick Guides: Part 2 - Oracle Database Design
From Everand
Oracle Quick Guides: Part 2 - Oracle Database Design
Malcolm Coxall
No ratings yet
Introduction To Normal Distribution
No ratings yet
Introduction To Normal Distribution
8 pages
Inspiring Powershell Articles
From Everand
Inspiring Powershell Articles
Murat Yildirimoglu
No ratings yet
Rohini 28599856674
No ratings yet
Rohini 28599856674
5 pages
Exam MS-102: Microsoft 365 Administrator Complete Exam Preparation
From Everand
Exam MS-102: Microsoft 365 Administrator Complete Exam Preparation
Georgio Daccache
No ratings yet
DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Preparation
From Everand
DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Preparation
Georgio Daccache
No ratings yet
Howell Conversion Cylinder Manual
No ratings yet
Howell Conversion Cylinder Manual
2 pages
SD - User's Manual SO
No ratings yet
SD - User's Manual SO
12 pages
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Oracle Database Administration Interview Questions You'll Most Likely Be Asked: Job Interview Questions Series
From Everand
Oracle Database Administration Interview Questions You'll Most Likely Be Asked: Job Interview Questions Series
Vibrant Publishers
5/5 (1)
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
From Everand
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
Georgio Daccache
No ratings yet
SQL Server Interview Questions You'll Most Likely Be Asked
From Everand
SQL Server Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
SAP XI Exchange Infrastructure
From Everand
SAP XI Exchange Infrastructure
equitypress
1/5 (3)
Java / J2EE Interview Questions You'll Most Likely Be Asked
From Everand
Java / J2EE Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
C# for Beginners: Learn in 24 Hours
From Everand
C# for Beginners: Learn in 24 Hours
Alex Nordeen
No ratings yet
SRS - How to build a Pen Test and Hacking Platform
From Everand
SRS - How to build a Pen Test and Hacking Platform
alasdair gilchrist
2/5 (1)

DE - Test

Uploaded by

DE - Test

Uploaded by

Coding Test for Senior Data Engineer

Task 1:Data Ingestion and ETL

- Write a Python script to ingest the interaction data from

- Handle missing values by either filling with a default

- Ensure that the data types are correct (e.g., `timestamp`

- Calculate the number of interactions per user and per

- Add a new column `interaction_count` to the data for each

- Load the cleaned and transformed data into a SQL database

- Document your ETL process clearly in a README file.

- Python script(s) for data ingestion, cleaning,

- SQL database file with the loaded data.

- README file with instructions on how to run the scripts and

Scenario:Automate the ETL process from Task 1 using Apache

- Install Apache Airflow and set up a local environment.

2. Create an Airflow DAG:

- Define a DAG that includes tasks for data ingestion,

- Schedule the DAG to run daily.

- Implement error handling and logging within the DAG to

- Airflow DAG file(s).

- Instructions on how to set up and run the Airflow

Scenario:Implement a solution to store and retrieve user

- Design a schema for storing user interaction data in a

- Implement the schema in the chosen database.

- Write SQL queries to answer the following questions:

1. Total number of interactions per day.

2. Top 5 users by the number of interactions.

3. Most interacted products based on the number of

- Discuss any optimization techniques you used to improve

- SQL scripts for schema creation and data insertion.

- SQL queries for data retrieval.

- A document explaining the optimization techniques used.

1. Create a GitHub repository and commit all your code,

2. Ensure that your repository is well-structured and includes

3. Share the GitHub repository link for review.

1. Code Quality:Readability, organization, and adherence to

2. Functionality:Correctness and completeness of the ETL

3. Documentation:Clarity and thoroughness of the provided

4. Efficiency:Performance and optimization of the data

5. Error Handling:Robustness of the ETL process and error

You might also like