Intern Report
Intern Report
On
Data Intern
At
eXtenso Data
Submitted To
Swastik College
Chardobato, Bhaktapur
Submitted By
Kebal khadka
May,2025
Supervisor’s Recommendation
I hereby recommend that this report, prepared under my supervision by Kebal khadka(TU
Roll No. 26861/077), be accepted as fulfilling in partial requirement for the degree of
Bachelor of Science in Computer Science and Information Technology. In my best
knowledge, this is an original work in Computer Science and Information Technology.
........................................
Swastik College
Letter of Approval
This is to certify that this report, prepared by Kebal khadka (26861/77) in partial
fulfillment of the requirement for the degree of Bachelor of Science in Computer Science
and Information Technology, has been well studied. In our opinion, it is satisfactory in the
scope and quality of the project for the required degree.
I want to sincerely thank eXtenso Data for giving me the chance to work as a data intern.
I am incredibly appreciative of the priceless influence this experience has had on my
development on both a personal and professional level. It has been a pillar of my career
path.
I am incredibly grateful to Mr. Suresh Gautam, CEO of eXtenso Data, for giving me this
internship opportunity and allowing me to learn a great deal about a variety of industries.
His mentorship and continuous support during my internship have been instrumental in
shaping my professional development, and I deeply value his guidance and
encouragement.
I am sincerely thankful to my supervisor, Ms. Sristi Khatiwada, for her exceptional
guidance, unwavering support, and inspiring encouragement throughout my internship.
Report generation is one of the areas where her wise criticism and guidance have greatly
improved my abilities.
Lastly, I would like to extend my sincerest regards and heartfelt gratitude to all of my
esteemed colleagues, fellow workers, and any other individuals who have provided me
with unwavering support throughout the entirety of this period.
Abstract
Data engineering involves designing and building robust systems that facilitate the
collection, transformation, and management of large-scale data to support strategic
decision-making. This report summarizes my internship as a Data Engineering Intern at
eXtenso Data, a Big Data Analytics company dedicated to enhancing operational
efficiency, optimizing costs, and uncovering new business opportunities through data-
driven insights. During my time at eXtenso Data, I was actively involved in developing
and maintaining data pipelines using Python and SQL, and I worked extensively with
MySQL for data storage and querying. Additionally, I gained hands-on experience with
big data tools such as Apache Spark for large-scale data processing and Apache Airflow
for orchestrating complex data workflows.
This internship deepened my understanding of the complete data engineering lifecycle—
from data ingestion and transformation to scheduling and automation—and provided me
with valuable experience in building scalable data solutions in a real-world business
setting.
Keywords: Data Engineering, Big Data, Python, SQL, MySQL, Apache Spark, Apache
Airflow, Data Pipelines, Data Ingestion, Data Transformation.
Table of Contents
Supervisor’s Recommendation..........................................................................................i
Letter of Approval..............................................................................................................ii
Acknowledgement.............................................................................................................iii
Abstract..............................................................................................................................iv
List of Tables.....................................................................................................................vii
Chapter 1: Introduction.....................................................................................................1
1.1 Introduction..........................................................................................................1
1.3 Objectives.............................................................................................................2
1.4 Scopes..................................................................................................................2
1.5 Limitations...........................................................................................................2
4.1 Conclusion................................................................................................................13
Annex.................................................................................................................................16
List of Tables
Table 2.1 Organization Details.........................................................................................12
Table 2.2 Internship Period Details.................................................................................14
Table 3.1 weekly Log.........................................................................................................17
Chapter 1: Introduction
1.1 Introduction
Data engineering is a crucial field that focuses on designing, building, and managing the
infrastructure and tools needed to collect, store, process, and analyze large volumes of
data. It plays a vital role in enabling organizations to make data-driven decisions and gain
valuable insights from their data. During my ongoing internship, I am building a strong
foundation in data engineering by working on data collection, transformation, and
pipeline development. I am actively involved in creating scalable data workflows,
managing databases, and ensuring data quality across various stages of the pipeline.
During this internship, I focused on building end-to-end data pipelines to support reliable
and scalable data workflows. I started by developing ETL scripts in Python to collect and
transform data from various sources using tools like Selenium Base for web automation
and Pandas for data cleaning and transformation.
I gained hands-on experience with SQL, which I used extensively for querying and
transforming data from structured databases. This laid a strong foundation in data
wrangling, joins, aggregations, and subqueries—essential operations in any data
engineering role.
As the internship progressed, I was introduced to modern big data tools such as Apache
Airflow for scheduling and orchestrating complex data workflows, and Apache Spark
for distributed processing of large datasets. These technologies allowed me to scale data
processing tasks beyond traditional scripting and move toward production-ready
pipelines.
1.3 Objectives
To develop and implement automated ETL (Extract, Transform, Load) pipelines
using Python and SQL to efficiently ingest and process structured and
unstructured data from multiple sources.
To gain practical experience with modern data engineering tools and frameworks,
including Apache Airflow for workflow orchestration and Apache Spark for
distributed big data processing.
To ensure data quality and integrity through effective data cleaning,
transformation, and validation processes, enabling reliable storage and
downstream use by analytics or reporting systems.
1.4 Scopes
To build and manage ETL pipelines using Python and SQL for transforming raw
sanction list data into structured, analyzable formats.
To work with tools like Apache Airflow and Spark for understanding scalable data
processing and workflow automation in a big data environment.
To ensure data consistency and integrity by applying data cleaning techniques,
handling missing values, and standardizing formats using Pandas
1.5 Limitations
eXtensoData, a prominent business vertical of F1Soft Group, was founded in 2018 and is
led by CEO Suresh Gautam. It is a Big Data Analytics company focused on helping
businesses harness the power of their data to improve operational efficiency, optimize
costs, and uncover new opportunities. With a mission to turn raw data into actionable
intelligence, eXtensoData provides a broad suite of advanced data services tailored to
modern business needs
The company’s key areas of expertise include Data Engineering, Process Automation,
Business Analysis, Forecasting, Process Optimization, and Big Data Consulting. Its data
engineering services are designed to transform complex organizational data into
intelligent, timely insights, enabling data-driven decisions. Through process automation,
eXtensoData streamlines repetitive business tasks and eliminates inefficiencies by
leveraging enterprise data and building robust automation platforms.
In addition, the company offers business analysis support at both operational and strategic
levels, enhancing daily performance and delivering insights aligned with emerging
business trends. Its forecasting solutions empower clients with technology-driven
financial foresight, seamlessly integrating predictive models with operational strategies.
Email: [email protected]
i. Data Engineering:
We offer data engineering services that transform organizational data into meaningful,
intelligent insights. Our comprehensive data solutions are designed to address diverse
business challenges, enabling our clients to make timely and informed decisions.
iv. Forecasting:
Forecasting is a key component of effective business planning. Our technologies
automate the forecasting process, making it easier for organizations to align financial
projections with operational strategies for sustained success.
Week Task
1 1. Introduction to SQL and relational database concepts.
2. Learning basic to advanced SQL queries (Joins, Subqueries, Window
Functions).
3. Hands-on practice with SQL on sample datasets.
2. Data Collection: Developing scripts to extract sanctions data from at least five
official international sources, each available in different formats (CSV, XML,
HTML, JSON) and structures.
3. Data Cleaning and Processing: Parsing, standardizing, and transforming the data
into a unified tabular format while resolving inconsistencies, missing fields, and
schema mismatches.
4. Database Integration: Storing the cleaned and structured data into a MySQL
relational database designed for easy querying, analysis, and compliance checks.
5. Reporting: Exporting the entire consolidated dataset using mysqldump into a .sql
file for backup, archival, and integration into compliance systems.
Developed Python scripts to download and extract data from at least five
different official sanctions sources.
Handled different data structures and formats using libraries such as
requests, xml.etree, json.
Used Python libraries such as Pandas to parse and standardize data fields
across all sanctions lists.
Resolved inconsistencies in naming conventions, removed duplicates, and
structured the data into a uniform format.
Ensured all records followed a unified schema to allow smooth integration
into the database.
4. Database Integration:
4.1 Conclusion
My time as an intern at eXtenso Data. has been an ongoing journey of growth and
learning. Working on a challenging project that involves data extraction, transformation,
and loading from global sanctions sources has allowed me to enhance my technical skills
in Python and MySQL while gaining valuable insights into real-world data engineering
workflows.
Although the internship is still in progress, I have already gained hands-on experience in
addressing real business needs through designing an ETL pipeline and dealing with
diverse data formats. Collaborating with the technical team and receiving mentorship has
improved my communication and problem-solving skills, while also deepening my
interest in the fields of data engineering and compliance analytics.
I look forward to completing the internship and continuing to apply what I’ve learned to
the remaining phases of the project. This experience is shaping a strong foundation for
my future academic and professional aspirations, and I’m grateful for the opportunity to
contribute meaningfully while continuing to learn.