0% found this document useful (0 votes)

12 views

Data Engineering

The document outlines a comprehensive roadmap for aspiring data engineers, detailing essential skills and technologies such as SQL, Python, PySpark, data warehousing, and cloud services. It emphasizes the importance of hands-on experience with various tools and frameworks, as well as the need for strong programming and problem-solving skills. Additionally, it highlights the significance of machine learning knowledge and experience in building scalable data solutions in cloud environments for career advancement in data engineering.

Uploaded by

Abdullah

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Data Engineering

Uploaded by

Abdullah

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

You are on page 1/ 8

Data Engineering & AL

Data Engineering in corporate will make you a millionaire in 10 years and this is how I think
today's computer science students could achieve the same.

𝗦𝘁𝗲𝗽 𝟭: 𝗦𝗤𝗟
- Basic SQL Syntax
- DDL, DML, DCL
- Joins & Subqueires
- Views & Indexes
- CTEs & Window Functions

𝗦𝘁𝗲𝗽 𝟮: 𝗣𝘆𝘁𝗵𝗼𝗻
- Fundamentals
- Numpy
- Pandas

𝗦𝘁𝗲𝗽 𝟯: 𝗣𝘆𝘀𝗽𝗮𝗿𝗸
- RDD
- Dataframe
- Datasets
- Spark Streaming
- Optimization techniques

𝗦𝘁𝗲𝗽 𝟰: 𝗗𝗮𝘁𝗮 𝗪𝗮𝗿𝗲𝗵𝗼𝘀𝘂𝗶𝗻𝗴/𝗗𝗮𝘁𝗮 𝗠𝗼𝗱𝗲𝗹𝗶𝗻𝗴

- OLAP vs OLTP
- Star & Snowflake Schema
- Fact & Dimension Tables
- Slowly Changing Dimensions (SCD)

𝗦𝘁𝗲𝗽 𝟱: 𝗖𝗹𝗼𝘂𝗱 𝗦𝗲𝗿𝘃𝗶𝗰𝗲𝘀

- Nosql DB
- Relational DB
- Datawarehousing
- Scheduling & Orchestration
- Messaging
- ETL Services
- Storage Services
- Data Processing Services

𝗛𝗲𝗿𝗲 𝗮𝗿𝗲 𝘀𝗼𝗺𝗲 𝘃𝗮𝗹𝘂𝗮𝗯𝗹𝗲 𝗿𝗲𝘀𝗼𝘂𝗿𝗰𝗲𝘀 𝘁𝗼 𝗵𝗲𝗹𝗽 𝘆𝗼𝘂 𝗴𝗲𝘁 𝘀𝘁𝗮𝗿𝘁𝗲𝗱:

• SQL - https://lnkd.in/gV_5EFtE
• Python - https://lnkd.in/dt_-2-Uj
• Pyspark - https://lnkd.in/gtCdub-V
• Airflow - https://lnkd.in/guebuHJ7
• Kafka - https://lnkd.in/gVZUT52s
• Azure Cloud - https://lnkd.in/gwc3By9h
• Google Cloud - https://lnkd.in/gV_5EFtE
• AWS - https://lnkd.in/gJeUGfjS
• Projects - https://lnkd.in/gcpsNtnw
5+ years of experience in software engineering, with a focus on platform engineering or
cloud-native application development.

Strong programming skills in Python, Go, or Java, with experience in building

distributed systems.
Extensive hands-on experience with GCP services, including Compute Engine,
Kubernetes Engine (GKE), Cloud Storage, BigQuery, Pub/Sub, and Cloud Functions.
Proficiency in designing and managing cloud-based architectures with a deep
understanding of GCP IAM, VPCs, and networking.
Expertise in containerization and orchestration technologies, including Docker and
Kubernetes.
Hands-on experience with Infrastructure as Code tools like Terraform, Pulumi, or Cloud
Deployment Manager.
Familiarity with monitoring and observability tools such as Cloud Monitoring,
Prometheus, or Grafana.
Strong understanding of network protocols (e.g., TCP/IP, HTTP/HTTPS), load balancing,
and security best practices in a cloud environment.
Knowledge of CI/CD pipelines and related tools (e.g., Jenkins, GitLab CI/CD, or
CircleCI).
Experience with data processing and storage solutions like BigQuery, Firestore, or
Cloud Spanner is a plus.
Excellent problem-solving skills with the ability to troubleshoot and resolve complex
system issues.
Bachelor's or Master’s degree in Computer Science, Engineering, or a related field or
equivalent military experience required

Advanced degree in a technical field such as computer science

•Experience with building and managing scalable ML infrastructure on cloud

platforms
•Ability to promote product excellence and collaboration, driving a portfolio of
concurrent engineering projects, from short-term critical feature launches to long-
term research initiatives.
•Ability to create a compelling vision for the future, communicate clearly, and
have a collaborative leadership approach.
•Experience with machine learning frameworks such as Tensorflow, Caffe2
PyTorch, Spark ML, scikit-learn, or related frameworks
AI experience is a must for this position: Knowledge about LLMs (GPT, Mistral, Llama, Claude,
etc…) and experience developing with them (usage, prompting, etc), AI frameworks
(Langchain, LlamaIndex, Auto Gen, etc), AI architectures (RAG, reranking, etc)

You have at least 5+ years of experience in back-end development in with Python,

Typescript/ Node.js or Java, with a focus on delivering for security, scalability,
availability, and performance
Security is at the forefront of your mind in everything that you do
Nice to have: Ideally you will have worked with AWS in a production environment and
understand how to design for, deploy on and get the best out of, the environment and
services provided by Amazon
Deep knowledge about large-scale recommendation and ML/Ranking systems

•Proven track record of operating highly-available systems at significant scale

•Experience in at least one of the following areas:
•Retrieval Systems (Indexing, Retrieval, ANN Embedding Based Retrieval
System)
•Large-scale ML Feature Management & Serving
•Online Ranking Serving System (ML Inference, Ranking, Marketplace)
•Full-stack Ranking Backend Generalists with Prior Experience in Large-scale
Consumer Products
•h NoSQL systems e.g., Bigtable, HBase
3+ years experience working with distributed Ranking / Recommendation applications in
production

Deep knowledge and experience in recommendation system development life cycles

•Deep Expertise in one or more of the following areas: Retrieval Systems, ML

Feature Management and Serving Systems, Vector Database (ANN), Online
Ranking and Marketplace Systems, etc.

ore Requirements

•Expertise in Large-Scale Storage Systems

•Deep knowledge of AWS storage services (S3, DynamoDB, EBS, EFS, FSx, Glacier) and their
performance characteristics.
•Experience designing and optimizing object storage, block storage, and file storage
solutions at scale.
•Strong understanding of storage durability, consistency models, replication, and erasure
coding for fault tolerance.
•Experience implementing tiered storage solutions and cost-optimized data retention
strategies.
•Distributed Systems & Scalability
•Deep understanding of distributed storage architectures, CAP theorem, and consistency
models.
•Expertise in partitioning, sharding, and replication strategies for low-latency, high-
throughput storage.
•Experience designing and implementing highly available, fault-tolerant distributed
systems using consensus algorithms (Raft, Paxos, Gossip Protocol).
•Hands-on experience with high-performance NoSQL databases (DynamoDB, Cassandra,
RocksDB).
•High-Performance Backend Engineering
•Strong programming skills in Kotlin, Java, Go, Rust, or Python for backend storage
development.
•Experience building event-driven, microservices-based architectures using gRPC, REST, or
WebSockets.
•Expertise in data serialization formats (Parquet, Avro, ORC) for optimized storage access.
•Experience implementing data compression, deduplication, and indexing strategies to
improve storage efficiency.
•Cloud-Native & Infrastructure Automation
•Strong hands-on experience with AWS Well-Architected Framework and cloud storage best
practices.
•Proficiency in Infrastructure as Code (IaC) using Terraform, AWS CDK, or CloudFormation.
•Experience with Kubernetes (EKS), serverless architectures (Lambda, Fargate), and
containerized storage workloads.
•Expertise in CI/CD automation for storage services, leveraging GitHub Actions,
CodePipeline, Jenkins, or ArgoCD.
•Performance Optimization & Observability
•Experience with benchmarking, profiling, and optimizing storage workloads.
•Proficiency in performance monitoring tools (CloudWatch, Prometheus, OpenTelemetry,
Grafana) for storage systems.
•Strong debugging and troubleshooting skills for latency bottlenecks, memory leaks, and
concurrency issues.
•Experience designing observability strategies (tracing, metrics, structured logging) for
large-scale storage systems.
•Security, Compliance, and Data Protection
•Deep knowledge of data security, encryption at rest/in transit, and IAM policies in AWS.
•Experience implementing fine-grained access controls (IAM, KMS, STS, VPC Security
Groups) for multi-tenant storage solutions.
•Familiarity with compliance frameworks (SOC2, GDPR, HIPAA, FedRAMP) and best
practices for secure data storage.
•Expertise in disaster recovery, backup strategies, and multi-region failover solutions.
•Leadership & Architectural Strategy
•Proven ability to design, document, and drive large-scale storage architectures from
concept to production.
•Experience leading technical design reviews, architecture discussions, and engineering
best practices.
•Strong ability to mentor senior and mid-level engineers, fostering growth in distributed
storage expertise.
•Ability to influence technical roadmaps, long-term vision, and cost optimization strategies
for backend storage.
Proficient in Scala or Java or Python, spark, HQL and SQL.

• Deep knowledge in Hadoop ecosystem, like HDFS, Hive, MapReduce, Presto etc.
• Advanced knowledge of complex software design, distributed system design, design patterns,
data structures and algorithms.
• Excellent data analytics skills and ability to explore and identify data issues.
• Ability to explain complex subjects in layman’s terms.
• Experience with distributed version control like Git or similar
• Familiarity with continuous integration/deployment processes and tools such as Jenkins and
Maven.
• Familiar with public cloud technologies in Google Cloud Platform, especially BigQuery, GCS
and Dataproc.
• Experience with ETL pipelines.
• Experience in advertising domain.
• Familiar with workflow management systems like airflow or oozie.
• Experience with enterprise monitoring and alerting solutions like Prometheus, Graphite, alerts
manager and Splunk.
𝗦𝗤𝗟
- How would you write a query to calculate a cumulative sum or running total within a specific
partition in SQL?
- How do window functions differ from aggregate functions, and when would you use them?
- How do you identify and remove duplicate records in SQL without using temporary tables?

𝗣𝘆𝘁𝗵𝗼𝗻
- How do you manage memory efficiently when processing large files in Python?
- What are Python decorators, and how would you use them to optimize reusable code in ETL
processes?
- How do you use Python’s built-in logging module to capture detailed error and audit logs?

𝗣𝘆𝘀𝗽𝗮𝗿𝗸
- How would you handle skewed data in a Spark job to prevent performance issues?
- What is the difference between the Spark Session and Spark Context? When should each be
used?
- How do you handle backpressure in Spark Streaming applications to manage load
effectively?

𝗔𝘇𝘂𝗿𝗲 𝗗𝗮𝘁𝗮𝗯𝗿𝗶𝗰𝗸𝘀
- How do you configure cluster autoscaling in Databricks, and when should it be used?
- How do you implement data versioning in Delta Lake tables within Databricks?
- How would you monitor and optimize Databricks job performance metrics?

𝗔𝘇𝘂𝗿𝗲 𝗗𝗮𝘁𝗮 𝗙𝗮𝗰𝘁𝗼𝗿𝘆

- What are tumbling window triggers in Azure Data Factory, and how do you configure them?
- How would you enable managed identity-based authentication for linked services in ADF?
- How do you create custom activity logs in ADF for monitoring data pipeline execution?

𝗖𝗜/𝗖𝗗
- What are blue-green deployments, and how would you use them for ETL jobs?
- How do you implement rollback mechanisms in CI/CD pipelines for data integration
processes?
- What strategies do you use to handle schema evolution in data pipelines as part of CI/CD?

𝗗𝗮𝘁𝗮 𝗪𝗮𝗿𝗲𝗵𝗼𝘂𝘀𝗶𝗻𝗴
- How do you optimize join operations in a data warehouse to improve query performance?
- What is a slowly changing dimension (SCD), and what are different ways to implement it in a
data warehouse?
- How do surrogate keys benefit data warehouse design over natural keys?

𝗗𝗮𝘁𝗮 𝗠𝗼𝗱𝗲𝗹𝗶𝗻𝗴
- How do you decide between a star schema and a snowflake schema for a data warehouse?
Provide examples of scenarios where each is ideal.
- What is dimensional modeling, and how does it differ from entity-relationship modeling in
terms of use cases?
- How do you handle one-to-many relationships in a dimensional model to ensure efficient
querying?

Experience creating solutions incorporating Machine Learning algorithms and models using
Python with Data Engineering libraries and tools

 You have developed server-side Java and Python applications using mainstream
libraries and frameworks, including the Spring framework, Pandas, SciPy, PySpark, and
Pydantic
 Current cloud technology experience with AWS
 Experience integrating with async messaging, logging, or queues, such as Kafka,
RabbitMQ, SQS, NATS
 You collaborate as a hands-on team member developing a significant commercial
software project in Java and Python
 Software development experience building and testing applications following secure
coding practices. Additional preferred experience includes building systems for
financial services or tightly regulated businesses, security and privacy compliance
(GPDR, CCPA, ISO 27001, PCI, HIPAA, etc.) experience

Responsibilities

 You are an active collaborator as a primary member of a software engineering team

focused on building event-driven services that provide secure, efficient solutions in a
determined timeframe
 You will work with the Data Science teams, creating solutions incorporating Machine
Learning algorithms and models using Python with Data Engineering libraries and tools
 You can work on a scalable data streaming application functionality on an AWS cloud-
based platform
 Diligently observe and maintain Standards for Regulatory Compliance and Information
Security, plus deliver and maintain accurate, complete, and current documentation
 Participate in full Agile cycle engagements, including meetings, iterative development,
estimations, code reviews, and design sessions
 You will work with the service quality engineering team to ensure that only thoroughly
tested code makes it to production, then own deliverables from design through
production operationalization

Qualifications

 You have 8+ years of software development experience building and testing

applications following secure coding practices
 You are a hands-on team member working on a significant commercial software project
in Java and Python
 Your recent experience is hands-on building and supporting commercial systems
managing data and transactions, including server-side development of Data Flow
processes, incorporating Machine Learning models, and performing Data Enrichment
and ETL processes.
 Current cloud technology experience with AWS (Kubernetes, Fargate, EC2, S3, RDS
PostgreSQL, Lambda, OpenSearch/Elasticsearch). Familiarity with creating and using
Docker/Kubernetes applications
 Experience with Continuous Integration/Continuous Delivery (CI/CD) processes and
practices (CodeCommit, CodeDeploy, CodePipeline/Harness/Jenkins/GitHub Actions,
CLI, BitBucket/Git).
 Knowledgeable and experienced with software and system patterns and their
application in prior works. Experience gathering and assessing specifications and
requirements. Experience supporting data science efforts.

Design business-critical data models that would be used to power business decisions. Ensure
data quality, consistency, and accuracy.

 Design, build, and maintain scalable, robust and reliable data pipelines for internal
stakeholders and customers.
 Deliver data products that our customers can use, including data warehouse sharing
and embedded analytics.
 Help develop a mature product analytics capability within the company and empower
data-driven decisions.
 Contribute to the broader Data Analytics community at Zip to influence tooling and
standards to improve culture and productivity.

Qualifications

 5+ years of industry software development experience within the Data domain

 Experience with data processing technologies and frameworks, such as Airflow, luigi,
dbt etc.
 Experience with data warehousing technologies such as Snowflake, Clickhouse,
Redshift etc.
 Ability to effectively communicate complex projects to non-technical stakeholders.
 Bachelor’s and/or Master’s degree, preferably in CS or engineering fields, or equivalent
experience

Nice to Haves

 Experience with analytics tools such as Superset and Looker.

 Experience with deploying data systems on AWS and Kubernetes.

15+ years of experience in software development, focusing on big data processing, real-time
serving and distributed low latency systems

Expert in multiple distributed technologies (e.g. Spark/Storm, Kafka, Key Value Stores,
Caching, Solr, Druid, etc.)
Proficient in Scala or Java and Full Stack application development.
Deep knowledge in Hadoop ecosystem, like HDFS, Hive, MapReduce, presto etc.
Advanced knowledge of complex software design, distributed system design, design
patterns, data structures and algorithms.
Experience working as a Machine learning engineer closely collaborating with data
scientists.
Experience working with ML frameworks like TensorFlow and ML feature engineering.
Experience in one or more public cloud technologies like GCP, Azure, etc.
Excellent debugging and problem-solving capability.
Experience in working in large teams using CI/CD and agile methodologies.
Domain expertise in Ad Tech systems is a plus.
Experience working with financial applications is a plus.

Hubspot Social Media Marketing Certification Exam Answers 2022
100% (2)
Hubspot Social Media Marketing Certification Exam Answers 2022
8 pages
Sunbeam Popcorn Maker FPSBPP7310 FPSBPP7316
60% (10)
Sunbeam Popcorn Maker FPSBPP7310 FPSBPP7316
9 pages
Wiley - Operations Management - An Integrated Approach, 7th Edition - 978-1-119-49706-6
No ratings yet
Wiley - Operations Management - An Integrated Approach, 7th Edition - 978-1-119-49706-6
3 pages
Ford Escape 4wd Workshop Manual v6 3 0l 2008
100% (4)
Ford Escape 4wd Workshop Manual v6 3 0l 2008
7,556 pages
Statistics, Data Analysis, and Decision Modeling, 5th Edition
100% (5)
Statistics, Data Analysis, and Decision Modeling, 5th Edition
556 pages
2019 Book EssentialsOfBusinessAnalytics PDF
93% (14)
2019 Book EssentialsOfBusinessAnalytics PDF
971 pages
Tejaswi Devops Engineer Resume
No ratings yet
Tejaswi Devops Engineer Resume
5 pages
Web Application Development 1 ALL
No ratings yet
Web Application Development 1 ALL
8 pages
C-Plath 4000 Autopilot Operation
No ratings yet
C-Plath 4000 Autopilot Operation
142 pages
Dot Net JD - Senior Associate
No ratings yet
Dot Net JD - Senior Associate
3 pages
JD Reliability Engineer
No ratings yet
JD Reliability Engineer
2 pages
Python - notes
No ratings yet
Python - notes
3 pages
Senior Data Engineer JD MathCo
No ratings yet
Senior Data Engineer JD MathCo
4 pages
Tiger Jobs List
No ratings yet
Tiger Jobs List
11 pages
Raja V Reddy
No ratings yet
Raja V Reddy
4 pages
SRE Job Description
No ratings yet
SRE Job Description
4 pages
ritishsajjagcp (1)
No ratings yet
ritishsajjagcp (1)
7 pages
[email protected]
No ratings yet
[email protected]
7 pages
Dinesh_Cloud_Security
No ratings yet
Dinesh_Cloud_Security
7 pages
Job Information - Data Engineer
No ratings yet
Job Information - Data Engineer
2 pages
DevSecOps Engineer
No ratings yet
DevSecOps Engineer
3 pages
Cloud and Infrastructure Job Spec
No ratings yet
Cloud and Infrastructure Job Spec
3 pages
DevOps JD Macropay
No ratings yet
DevOps JD Macropay
2 pages
Azure Platform Architect_JD
No ratings yet
Azure Platform Architect_JD
3 pages
J OHN
No ratings yet
J OHN
8 pages
GCP Cloud
No ratings yet
GCP Cloud
5 pages
JD - Full Stack Developer
No ratings yet
JD - Full Stack Developer
2 pages
Rajkumar-Shanmugam-A
No ratings yet
Rajkumar-Shanmugam-A
5 pages
Gagan
No ratings yet
Gagan
8 pages
Senior Devops Engineer
No ratings yet
Senior Devops Engineer
5 pages
Role
No ratings yet
Role
2 pages
Abrar Mohiuddin DBA 06102024
No ratings yet
Abrar Mohiuddin DBA 06102024
5 pages
SSREDDY
No ratings yet
SSREDDY
8 pages
RD98CF6BDW7VB1L9CFB
No ratings yet
RD98CF6BDW7VB1L9CFB
6 pages
Data Resume snowflake (1) (1)
No ratings yet
Data Resume snowflake (1) (1)
7 pages
KranthiParameshwar_Resume
No ratings yet
KranthiParameshwar_Resume
6 pages
CV Ethical
No ratings yet
CV Ethical
3 pages
Kiramat Shah CV_Clouddev-tech
No ratings yet
Kiramat Shah CV_Clouddev-tech
3 pages
Principal Software Engineer jd
No ratings yet
Principal Software Engineer jd
2 pages
1.SAS Analytics and SAS Eminer) - SAS Applications (10-14yrs)
No ratings yet
1.SAS Analytics and SAS Eminer) - SAS Applications (10-14yrs)
5 pages
Ian
No ratings yet
Ian
7 pages
Cloud Migration Specialist
No ratings yet
Cloud Migration Specialist
1 page
JR43920 - Data - Developer - Cloud Engineer - Aanand Daga - 111024
No ratings yet
JR43920 - Data - Developer - Cloud Engineer - Aanand Daga - 111024
4 pages
Ravi_Shankar_Chittela_DataEngg
No ratings yet
Ravi_Shankar_Chittela_DataEngg
10 pages
Farhan_Data_Engineer
No ratings yet
Farhan_Data_Engineer
9 pages
JD Physical Network (Cloud Infrastructure Engineer)
No ratings yet
JD Physical Network (Cloud Infrastructure Engineer)
2 pages
Mujtaba Latest
No ratings yet
Mujtaba Latest
8 pages
Sourcing JD - People logic
No ratings yet
Sourcing JD - People logic
3 pages
Data Platform Engineer - TextSpeech and Language Models
No ratings yet
Data Platform Engineer - TextSpeech and Language Models
3 pages
Hemanth Kumar
No ratings yet
Hemanth Kumar
5 pages
Data Engineer-GC
No ratings yet
Data Engineer-GC
7 pages
Azeem Support
No ratings yet
Azeem Support
10 pages
Backend Engineer
No ratings yet
Backend Engineer
1 page
SNR Data Engineer #24-00017 in India
No ratings yet
SNR Data Engineer #24-00017 in India
1 page
Hi Patokl
No ratings yet
Hi Patokl
7 pages
Uday Kumar S PDF
No ratings yet
Uday Kumar S PDF
4 pages
Dice_Profile_Bijit_Banerjee
No ratings yet
Dice_Profile_Bijit_Banerjee
10 pages
CV 2024
No ratings yet
CV 2024
4 pages
RC - Python JD
No ratings yet
RC - Python JD
2 pages
TE Requirements
No ratings yet
TE Requirements
12 pages
Naukri AmitGupta (21y 0m)
No ratings yet
Naukri AmitGupta (21y 0m)
7 pages
Naukri_PrajktaBaburaoMhapsekar[5y_2m]
No ratings yet
Naukri_PrajktaBaburaoMhapsekar[5y_2m]
6 pages
String
No ratings yet
String
5 pages
JD - GCP Infrastructure Architect
No ratings yet
JD - GCP Infrastructure Architect
2 pages
Sr Data Engineer+ SCALA
No ratings yet
Sr Data Engineer+ SCALA
2 pages
System Operations Associate Role - 230602 - 023256
No ratings yet
System Operations Associate Role - 230602 - 023256
4 pages
JD Data Engineer
No ratings yet
JD Data Engineer
1 page
OpenStack Object Storage (Swift) Essentials
From Everand
OpenStack Object Storage (Swift) Essentials
Kris Rajana
No ratings yet
A Collection of Fraud Schemes
67% (3)
A Collection of Fraud Schemes
54 pages
Consumer Reports Buying Guide 2021
100% (1)
Consumer Reports Buying Guide 2021
227 pages
Resume Updated
100% (3)
Resume Updated
2 pages
Mindmapping in 8 Easy Steps
No ratings yet
Mindmapping in 8 Easy Steps
40 pages
GRE Text Completion and Sentence Equivalence Practice Questions
100% (2)
GRE Text Completion and Sentence Equivalence Practice Questions
32 pages
Organizations' System Noise Creates Errors in Decision Making - McKinsey
No ratings yet
Organizations' System Noise Creates Errors in Decision Making - McKinsey
10 pages
Focus Investing PDF
No ratings yet
Focus Investing PDF
18 pages
Microsoft AppSource Partner Listing Guidelines PDF
No ratings yet
Microsoft AppSource Partner Listing Guidelines PDF
10 pages
Data Warehouse and Data Mining Notes
No ratings yet
Data Warehouse and Data Mining Notes
66 pages
Online Casino Software For Sale and Casino Software Solutions
No ratings yet
Online Casino Software For Sale and Casino Software Solutions
2 pages
Putting Intelligent Insights To Work
100% (1)
Putting Intelligent Insights To Work
12 pages
Blackbook - Stealth Influence
100% (5)
Blackbook - Stealth Influence
104 pages
Bill Bartmann Interview PDF
No ratings yet
Bill Bartmann Interview PDF
63 pages
Data Analytics Concepts Techniques and A PDF
100% (11)
Data Analytics Concepts Techniques and A PDF
451 pages
Cadastre PHD Thesis
100% (1)
Cadastre PHD Thesis
227 pages
Horizon Power Testing and Commissioning Manual PDF
No ratings yet
Horizon Power Testing and Commissioning Manual PDF
78 pages
QuickBooks Online Core Certification Self Study Workbook V21.2.2
100% (1)
QuickBooks Online Core Certification Self Study Workbook V21.2.2
55 pages
Grant7 8 9 PDF
100% (1)
Grant7 8 9 PDF
26 pages
Medical Coding
100% (1)
Medical Coding
43 pages
TED Talks List
100% (2)
TED Talks List
15 pages
(PDF) Introduction To Selling Value - Course-Final
No ratings yet
(PDF) Introduction To Selling Value - Course-Final
75 pages
Immediate download Decision Sciences: Theory and Practice 1st Edition Raghu Nandan Sengupta ebooks 2024
100% (1)
Immediate download Decision Sciences: Theory and Practice 1st Edition Raghu Nandan Sengupta ebooks 2024
55 pages
SCOTUS Filing: Arizona Lawsuit, Kari Lake and Mark Finchem
No ratings yet
SCOTUS Filing: Arizona Lawsuit, Kari Lake and Mark Finchem
176 pages
The Impact of Control Technology
No ratings yet
The Impact of Control Technology
246 pages
AI - (Deep Learning/NLP) : 5 Days
No ratings yet
AI - (Deep Learning/NLP) : 5 Days
4 pages
ML-5010 Release Note English
No ratings yet
ML-5010 Release Note English
2 pages
Python Revision Tour 1 Copy - Class12
No ratings yet
Python Revision Tour 1 Copy - Class12
14 pages
Final Report
No ratings yet
Final Report
31 pages
nanoVTEP Upgrade
No ratings yet
nanoVTEP Upgrade
10 pages
John Hennigar Shuh - Teaching Yourself To Teach With Objects PDF
100% (1)
John Hennigar Shuh - Teaching Yourself To Teach With Objects PDF
6 pages
user-manual-RICOH-AFICIO MP 4500-E
No ratings yet
user-manual-RICOH-AFICIO MP 4500-E
2 pages
Sem 3 Python Module I Final
No ratings yet
Sem 3 Python Module I Final
32 pages
Written Home Assignment Make This A Class Test... Make It 15 Marks ... Include One Short Note (With Options)
100% (1)
Written Home Assignment Make This A Class Test... Make It 15 Marks ... Include One Short Note (With Options)
8 pages
Defining Digital Learning
No ratings yet
Defining Digital Learning
4 pages
UNIT 5 - Software Project Management Concepts
No ratings yet
UNIT 5 - Software Project Management Concepts
8 pages
Example: Step 2) Add CSS
No ratings yet
Example: Step 2) Add CSS
4 pages
App Development: Profile No.: 270 NIC Code:63999
No ratings yet
App Development: Profile No.: 270 NIC Code:63999
10 pages
3D Cell Design Capability Stack Template 2402 Workflow
No ratings yet
3D Cell Design Capability Stack Template 2402 Workflow
53 pages
Pipeline Hazards - Computer Architecture
No ratings yet
Pipeline Hazards - Computer Architecture
5 pages
Empirical Action Research On Improving Student S Classroom
No ratings yet
Empirical Action Research On Improving Student S Classroom
12 pages
Professional Biography: Personal Information Katerina Ilioska
No ratings yet
Professional Biography: Personal Information Katerina Ilioska
3 pages
Unit Ii
No ratings yet
Unit Ii
43 pages
Script 23
No ratings yet
Script 23
3 pages
Schematics: X-Four Mixing Console
No ratings yet
Schematics: X-Four Mixing Console
32 pages
Performance Analysis of Dynamic Routing Protocols Using Packet Tracer
No ratings yet
Performance Analysis of Dynamic Routing Protocols Using Packet Tracer
5 pages
Split Vs Strip
No ratings yet
Split Vs Strip
9 pages
PTC04 User Interface Guide
No ratings yet
PTC04 User Interface Guide
7 pages
Phase 5
No ratings yet
Phase 5
26 pages
Row Reduction Algorithm MATHS PROJECT.
No ratings yet
Row Reduction Algorithm MATHS PROJECT.
11 pages
Huntleigh BD4000 Fetal Monitor Service Manual
No ratings yet
Huntleigh BD4000 Fetal Monitor Service Manual
76 pages
Important Java Programs Prelims
No ratings yet
Important Java Programs Prelims
6 pages
Gold 4g Speaker Manual Multi Language Rev4
No ratings yet
Gold 4g Speaker Manual Multi Language Rev4
130 pages

Data Engineering

Uploaded by

Data Engineering

Uploaded by

Data Engineering & AL

𝗦𝘁𝗲𝗽 𝟰: 𝗗𝗮𝘁𝗮 𝗪𝗮𝗿𝗲𝗵𝗼𝘀𝘂𝗶𝗻𝗴/𝗗𝗮𝘁𝗮 𝗠𝗼𝗱𝗲𝗹𝗶𝗻𝗴

𝗦𝘁𝗲𝗽 𝟱: 𝗖𝗹𝗼𝘂𝗱 𝗦𝗲𝗿𝘃𝗶𝗰𝗲𝘀

𝗛𝗲𝗿𝗲 𝗮𝗿𝗲 𝘀𝗼𝗺𝗲 𝘃𝗮𝗹𝘂𝗮𝗯𝗹𝗲 𝗿𝗲𝘀𝗼𝘂𝗿𝗰𝗲𝘀 𝘁𝗼 𝗵𝗲𝗹𝗽 𝘆𝗼𝘂 𝗴𝗲𝘁 𝘀𝘁𝗮𝗿𝘁𝗲𝗱:

Strong programming skills in Python, Go, or Java, with experience in building

Advanced degree in a technical field such as computer science

•Experience with building and managing scalable ML infrastructure on cloud

You have at least 5+ years of experience in back-end development in with Python,

•Proven track record of operating highly-available systems at significant scale

Deep knowledge and experience in recommendation system development life cycles

•Deep Expertise in one or more of the following areas: Retrieval Systems, ML

•Expertise in Large-Scale Storage Systems

𝗔𝘇𝘂𝗿𝗲 𝗗𝗮𝘁𝗮 𝗙𝗮𝗰𝘁𝗼𝗿𝘆

 You are an active collaborator as a primary member of a software engineering team

 You have 8+ years of software development experience building and testing

 5+ years of industry software development experience within the Data domain

 Experience with analytics tools such as Superset and Looker.

You might also like