JD Data Engineer Intern
JD Data Engineer Intern
Earthmetry brings datasets for India from areas like electricity, oil & gas, air quality
monitoring, climate reanalysis, climate models into one platform. We connect this data to
visualizations, models and downstream systems and dashboards via both no-code and
coding applications.
The Role
We are seeking a Data Engineer to build data pipelines for our business. You will be
responsible for designing, developing, and maintaining data pipelines that collect, transform,
and load data from various websites using web scraping techniques. The extracted data will
be stored in Google Cloud Storage, BigQuery, and PostgreSQL databases.
Responsibilities:
● Develop data pipelines using Python: Design and implement data pipelines to
extract, transform, and load data from websites using libraries like Beautiful Soup,
Scrapy, and Selenium.
● Handle challenging scraping scenarios: Implement strategies to overcome obstacles
such as cookies, CAPTCHAs, and reverse engineer APIs to access necessary data.
● Write well-documented and maintainable code: Follow best practices for code style,
documentation, and testing to ensure maintainability and reusability.
● Develop unit and integration tests: Write unit and integration tests to ensure code
quality and functionality.
● Ensure data quality and consistency: Develop and implement data quality checks
and validation processes to ensure data accuracy and completeness.
● Debug and resolve data issues: Analyze large datasets, identify inconsistencies, and
implement solutions to ensure data integrity and consistency.
● Learn technologies: Continuously research and learn new data engineering tools and
technologies.
This is a paid internship opportunity, leading to full time offers based on performance. Best
candidates will be strong in the following areas:
Primary:
Python Programming, Web Scraping, Databases
Bonus:
Linux, Google Cloud, Docker, Apache Airflow, git
To Apply:
Please submit your resume to [email protected]