0% found this document useful (0 votes)
13 views

JD Data Engineer Intern

Uploaded by

sumeetroy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

JD Data Engineer Intern

Uploaded by

sumeetroy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Data Engineer Internship - Python Web Scraping & Data Pipelines (chennai)

Earthmetry Decision Systems

Earthmetry brings datasets for India from areas like electricity, oil & gas, air quality
monitoring, climate reanalysis, climate models into one platform. We connect this data to
visualizations, models and downstream systems and dashboards via both no-code and
coding applications.

The Role

We are seeking a Data Engineer to build data pipelines for our business. You will be
responsible for designing, developing, and maintaining data pipelines that collect, transform,
and load data from various websites using web scraping techniques. The extracted data will
be stored in Google Cloud Storage, BigQuery, and PostgreSQL databases.

Responsibilities:

● Develop data pipelines using Python: Design and implement data pipelines to
extract, transform, and load data from websites using libraries like Beautiful Soup,
Scrapy, and Selenium.
● Handle challenging scraping scenarios: Implement strategies to overcome obstacles
such as cookies, CAPTCHAs, and reverse engineer APIs to access necessary data.
● Write well-documented and maintainable code: Follow best practices for code style,
documentation, and testing to ensure maintainability and reusability.
● Develop unit and integration tests: Write unit and integration tests to ensure code
quality and functionality.
● Ensure data quality and consistency: Develop and implement data quality checks
and validation processes to ensure data accuracy and completeness.
● Debug and resolve data issues: Analyze large datasets, identify inconsistencies, and
implement solutions to ensure data integrity and consistency.
● Learn technologies: Continuously research and learn new data engineering tools and
technologies.

This is a paid internship opportunity, leading to full time offers based on performance. Best
candidates will be strong in the following areas:

Primary:
Python Programming, Web Scraping, Databases
Bonus:
Linux, Google Cloud, Docker, Apache Airflow, git

To Apply:
Please submit your resume to [email protected]

You might also like