Daily Activity
Daily Activity
After logging in, I start by checking emails and messages to review important
updates, notifications, and communications from the previous day.
I prioritize responding to urgent emails or messages that require immediate
attention.
I review my calendar for scheduled meetings, deadlines, and events for the
day.
I then check the ticketing system for any new tickets assigned to me or updates
on existing tickets.
I attend the daily stand-up meeting with the Scrum Master (Team Lead),
Solution Architect, Senior Software Consultant, Associate members, and
Trainee Software Engineer.
During the meeting, I discuss current tasks, share progress updates, and
address any blockers or issues. We align on goals and priorities for the day.
I start working on assigned tickets.
If I receive a new ticket, I review and acknowledge it, ensuring the status is
updated to "Acknowledged."
I begin analyzing the ticket by thoroughly reading the Business Requirement
Document (BRD) and any data files provided.
If the business requirements are unclear, I raise a clarification request with the
Business Analyst (BA) and connect with them to resolve any ambiguities.
During the analysis, I document the technical requirements, potential technical
approaches, and the impact of the requirement.
Once the analysis is complete, I start the development work, which typically
involves coding with PySpark and orchestrating workflows in Azure Data
Factory (ADF).
After completing the development work I do some testing from my side before
pushing this development work to the QA Environment. First I do unit testing
then Integration Testing and last Sanity testing.
Testing Part:-
1) Unit Testing:-
I use Python libraries like pytest or unittest to test individual functions
and components in my notebooks.
I run each cell with sample inputs and then execute the complete
notebook to validate the final output.
Configuration :-
Cluster Mode: Standard
Databricks Runtime Version: 13.x with Apache Spark 3.3.2
Node Type: Standard_DS3_v2 (Driver and Worker)
Number of Workers: 2-4 (with auto-scaling enabled)
Spot Instances: Enabled
Auto-Termination: 60 minutes
Libraries: pandas, numpy, pyarrow, requests
Cluster Tags: Environment: Development
2) Integration Testing:-
I verify the interaction between different components of the data
pipeline to ensure they work together as expected.
This involves running the pipeline end-to-end and checking the outputs
against expected results.
3) Sanity Testing:-
As a precaution, I perform sanity testing to verify that basic data
transformations, data loading, and processing steps are functioning
correctly.
After testing, I submit my code for peer review to ensure that it adheres
to coding standards and best practices.
Once the code passes review, I push it to the QA environment for further
testing and validation.