Data and Analytics - TechM PDF
Data and Analytics - TechM PDF
P0( Rest API Services P0( Rest API Services P0( Rest API Services P0( Rest API Services P0( Rest API Services
) ) ) ) )
Agile for Developers Python-Fundamentals Python Coding Python-File Handling
Scope If-
Challenge
SDLC Python-Orientation
Else While Python-Modules Read Files
Scrum SQL
Objects
Python Syntax
Ceremonies
OOP Concepts
Git Fundamentals Comments
Inheritance
OS-Introduction Variables and
Datat ypes Iterators
OS: Fundamentals
Week1-Python SQL Operators Python-Exception
Initializing A Sets
Repository Binary Type
Git Exercises
Week2-Python SQL P0( Rest API Services P0( Rest API Services P0( Rest API Services P0( Rest API Services P0( Rest API Services
(SQL) ) ) ) ) )
QC Audit SQL Coding SQL Joins coding test Python Coding Review Topics
Challenge Challenge
MySQL
Sub Languages Inner Join Advanced-SQL
1/8
MON TUE WED THU FRI ENVIRONMENT
What Is A Join
De ning Schema
Week-3-Hadoop P0( Rest API Services P0( Rest API Services P1(Data Science ) P1(Data Science ) P1(Data Science )
(Hadoop, Hive, Spark) ) )
Cloud Computing Spark Review Topics
QC Audit
Introduction To
Cloud Introduction Spark-Fundamentals Project Presentation
Hadoop Mapreduce
Cloud Computing Introduction To
Big Data Introduction Hadoop Vs
Model Types Spark
Mapreduce Vs Spark
Big Data Hive -Introduction Cloud Computing Spark Ecosystem
Fundamentals Service Types
Hadoop Vs Spark
Introduction To
Components Of Cloud Computing
Hive Spark Setup
Big Data De nition
Basic Hive GCP Introduction Local Vs Cluster
Architecture
Queries Mode
Bene ts Challenges
Google Cloud
Data Loading And
Data lifecycle
MON TUE WED THU FRI ENVIRONMENT
stages- Generation, Platform Overview Saving Through Rdds
collection,
GCP Regions and
processing, storage,
Zones
management,
analysis, IAM Basics
visualization,
Pricing and Billing
interpretation
Hadoop Introduction Google Compute
Engine
Big Data Fresher
Google Cloud
Hadoop Architecture Storage
Hadoop
ecosystem
Components of
Hadoop
Introduction Hdfs
Evolution Of
Hadoop
Hdfs Commands
Yarn Overview
Joins
UDF's
MON TUE WED THU FRI ENVIRONMENT
Spark caching /
Persistence(Al
l storage
levels)
Week5-Data P1(Data Science ) P1(Data Science ) P1(Data Science ) P1(Data Science ) P1(Data Science )
Warehouse
(Big Query) QC Audit Big Query Datasets Big Query Analyze Data Warehousing Review Topics
Test
Data Warehousing Creating Datasets Introduction to Delta Lake
Big Query Routines
BigQuery Analysis Schema Evolution
Public Datasets
DataWarehousing- Run a Query Manage Routines Delta Lake Time
Dataset Properties
Introduction Travel
Write Query User-De ned
Create and Query
Data Store Results Functions Delta Lake
Clustered Tables
Vendors Per formance
GoogleSQL ANSI Table Functions
Create and Query optimizations
OLAP,OLTP standard
External Tables SQL Stored
Systems
Big Query Tables Querying with Procedures
DWH Vs. Data Arrays Big Query
Lake,DWH Vs. Data Create and Use Connections
Querying JSON
Vir tualization Tables
data
Introduction to
DWH Architecture Table Schemas
Querying using Connections
Operational Data Create, Manage, Sketches
GCP GCS
Store/Staging Area and Query
Multi Statement Connections
Data Mart,Data Partitioned Tables
Queries Manage
Cleansing Connections
Recursive CTEs
Load/Transform/Exp
Table Sampling ort Data
Conceptual/Logical/
Physical Multi Statement
Transactions Creating a Search
Dimensional
Index
Modeling Running
Parameterized Manage Search
Star Schema &
Queries Indexes
Snow ake Schema
Creating and Transfer GCS data
Slowly Changing
Running Saved
Dimensions Schedule
Queries
of Data with GCS
DWH Vendors, Transfers
Optimize Queries
Cloud Vs. On- Load Avro,
Premises Query External Parquet, CSV, JSON,
Tables and ORC batch data
Big Query
Introduction Logical Views Load externally
partitioned data
Introduction to Materialized
BigQuer y Views Load data into
partitioned tables
Using The
BigQuery sandbox Transforming with
DML and GoogleSQL
BigQuery Dry
Runs Transforming data
in Partitioned tables
gsutil and
common bq Work with
commands Change History
Export Data to a
MON TUE WED THU FRI ENVIRONMENT
le
Export Data to
GCS
P2(ETL Pipeline ) P2(ETL Pipeline ) P2(ETL Pipeline ) P2(ETL Pipeline ) P2(ETL Pipeline )
Week7-GCP P2(ETL Pipeline ) P2(ETL Pipeline ) P2(ETL Pipeline ) P2(ETL Pipeline ) P2(ETL Pipeline )
Professional Data
Engineer Review GCP Data Ingesting and Storing the Data Maintaining and Review
(Topic Review) Engineering Review Processing the Data Automating Data
Selecting storage AI-Tooling
Designing Data Workloads
Planning the data systems
Processing Systems AI-Tooling-
pipelines Optimizing
Identity and Choosing Orientation
resources
De ning data managed services
AI Tooling
Access Management sources and sinks (e.g., Bigtable, Designing
Over view
Data security De ning data Spanner, Cloud SQL, automation and
Cloud Storage, repeatability AI Pair
Privacy transformation logic Firestore, Programming
Net working Memor ystore) Organizing
Regional Over view
workloads based on
fundamentals
considerations Planning for business requirements Codeium
Data encryption storage costs and Over view
Legal and Monitoring and
per formance
regulatory Building the troubleshooting Using Copilot,
compliance pipelines Lifecycle processes Codeium, Code
management of data Whisperer (TBD which
Preparing and Data cleansing Maintaining
one)
cleaning data (e.g., Planning for using awareness of failures
Identifying the
Dataprep, Data ow, a data warehouse and mitigating impact Integration with
services (e.g.,
and Cloud Data IDE
Data ow, Apache Using a data lake AI- Orientation
Fusion) AI-Tooling-Code-
Beam, Dataproc,
Designing for a ML Introduction Generation
Monitoring and Cloud Data Fusion,
data mesh
orchestration of data BigQuery, Pub/Sub, AI Introduction
Preparing and Using Use Cases and
Apache Spark,
5/8
MON TUE WED THU FRI ENVIRONMENT
pipelines Hadoop ecosystem, Data for Analysis GenAI Overview Best Practices for
and Apache Kafka) GenAI Code
Disaster recovery Preparing data for LLM-Over view Generation
and fault tolerance Transformations visualization
LLMs (GPT, BERT, Using GenAI for
Making decisions Data acquisition Sharing data Claude, Llama, Code Generation
related to ACID and import Copilot, Codeium)
Exploring and AI-Tooling-UnitTest-
compliance and
Integrating with analyzing data Generation
availabilit y Use cases for LLM
new data sources
Data validation LLM best practices Use Cases and
Job automation
Securit y Best Practices for
Mapping current and orchestration
GenAI Unit Tests
and future business (e.g., Cloud considerations
requirements to the Composer and Using GenAI for
Hallucinations
architecture Work ows) Testing
AI Review AI-Tooling-
Designing for CI/CD
Documentation-
data and application Prompt-Engineering
Generation
por tabilit y
Prompt
Data staging, Engineering Use Cases and
cataloging, and Introduction Best Practices for
discover y GenAI
Zero-shot
Documentation
Designing data Prompting
migrations Using GenAI for
Few-shot
Documentation
prompting
AI-Tooling-Code-
Constraints Analysis
Fine-tuning and
Use Cases and
Conditioning
Best Practices for
Interaction and GenAI Code Analysis
Dialog State
Using GenAI for
Instructions and Code Analysis
Guidelines AI-Tooling-Code-
Optimization
Hallucinations
Responsible Uses
Over view
Searching
Codebases with
GenAI
Assessing
Generated Content
6/8
MON TUE WED THU FRI ENVIRONMENT
Qualit y
AI-Tooling-Security
Overview of
Securit y
Bene ts/Risks with
GenAI
GenAI Security
Analysis
Common Security
Problems/Solutions
with GenAI
Gen AI Security
Best Practices
AI-Tooling Review
P2(ETL Pipeline ) P2(ETL Pipeline ) P2(ETL Pipeline ) P2(ETL Pipeline ) P2(ETL Pipeline )
QC Audit
7/8
PROJECT TECHNOLOGIES
PySpark, BigQuery, Hadoop, Spark-SQL
P1(Data Science )
Recap
8/8