0% found this document useful (0 votes)
29 views8 pages

Data and Analytics - TechM PDF

The document outlines a comprehensive training schedule for Data and Analytics at TechM, covering various topics including Python, SQL, Hadoop, Spark, and Data Warehousing over several weeks. Each week includes specific learning objectives, coding challenges, and review topics to ensure a thorough understanding of the material. The schedule emphasizes practical applications and project presentations to reinforce learning outcomes.

Uploaded by

PSIEBEL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views8 pages

Data and Analytics - TechM PDF

The document outlines a comprehensive training schedule for Data and Analytics at TechM, covering various topics including Python, SQL, Hadoop, Spark, and Data Warehousing over several weeks. Each week includes specific learning objectives, coding challenges, and review topics to ensure a thorough understanding of the material. The schedule emphasizes practical applications and project presentations to reinforce learning outcomes.

Uploaded by

PSIEBEL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Data and Analytics - TechM

MON TUE WED THU FRI ENVIRONMENT

P0( Rest API Services P0( Rest API Services P0( Rest API Services P0( Rest API Services P0( Rest API Services
) ) ) ) )
Agile for Developers Python-Fundamentals Python Coding Python-File Handling
Scope If-
Challenge
SDLC Python-Orientation
Else While Python-Modules Read Files

Introduction To Interpreter vs For Write Create Files


SDLC Compiler Math Logging
Function Delete Files
Waterfall What is Python JSON Regex
Lambda File Handling
Why Python numpy pandas
Agile
Arrays Review Topics
Full Stack pip and install pip
Agile Vs Waterfall
Over view Classes and
Story Pointing Python-Basics pylint Connect

Scrum SQL
Objects
Python Syntax
Ceremonies
OOP Concepts
Git Fundamentals Comments
Inheritance
OS-Introduction Variables and
Datat ypes Iterators
OS: Fundamentals
Week1-Python SQL Operators Python-Exception

(Agile for Developers, Unix/Linux: Demo Handling


Git - Fundamentals,
User Input and
Moving and Deleting
P y thon-Fundamentals) Output Error
Files (Using Git Bash)
Python-DataTypes
Exception
Unix/Linux: Demo
Handling
File Commands Strings
(Using Git Bash) tr y -except
Casting
Git Introduction Module
Boolean
Source Control
Lists
Management(git,vcs,
cvcs,dvcs) Tuples

Git Fundamentals Range

Initializing A Sets
Repository Binary Type

Pushing To A Nont ype


Remote Repository
Dictionaries
Git Commit,
Numbers
Branch, Merge, Push,
Pull Namespaces

Git Exercises

Week2-Python SQL P0( Rest API Services P0( Rest API Services P0( Rest API Services P0( Rest API Services P0( Rest API Services
(SQL) ) ) ) ) )
QC Audit SQL Coding SQL Joins coding test Python Coding Review Topics
Challenge Challenge
MySQL
Sub Languages Inner Join Advanced-SQL

1/8
MON TUE WED THU FRI ENVIRONMENT

SQL -Introduction Overview Of Left And Right Scalar Functions


Sublanguages Joins
What Is A Sequence
Database DDL Outer Join
Trigger
What Is SQL DML Cross Join
Views
Consistency DQL Equi And Theta
Window Functions
Joins
Introduction To DCL (ROW_NUMBER,
RDBMS Aliases RANK,
TCL
Structure DENSE_RANK,
Transaction
DDL LEAD, LAG, etc.)
Schema What Is A
CREATE DROP CASE statement
Transaction
Table Structure TRUNCATE
COALESCE
ACID Properties
SQL Data Types Constraints
What Is A Stored
Transaction
Normalization Auto Incrementing Procedure
Properties
Multiplicit y CHECK What Is A User
CRUD Operations
De ned Function
Data Modeling DEFAULT
Transaction
And ERD Indexes
CASCADE Commit Rollback
Primary Key Isolation Levels Per formance
DML
Tuning
Composite Key
INSERT
Data
Foreign Key
UPDATE Manipulation
Unique Key
DELETE Dynamic SQL
Secondar y
DQL Advanced Data
Alternate Key
Types (JSON, XML,
Queries
Referential etc)
Integrit y Aggregate
connecting to DB
Functions
Using Python
Clauses
Hierarchical
What Is A Quer ying
Subquery

What Is A Join

De ning Schema

Week-3-Hadoop P0( Rest API Services P0( Rest API Services P1(Data Science ) P1(Data Science ) P1(Data Science )
(Hadoop, Hive, Spark) ) )
Cloud Computing Spark Review Topics
QC Audit
Introduction To
Cloud Introduction Spark-Fundamentals Project Presentation
Hadoop Mapreduce
Cloud Computing Introduction To
Big Data Introduction Hadoop Vs
Model Types Spark
Mapreduce Vs Spark
Big Data Hive -Introduction Cloud Computing Spark Ecosystem
Fundamentals Service Types
Hadoop Vs Spark
Introduction To
Components Of Cloud Computing
Hive Spark Setup
Big Data De nition
Basic Hive GCP Introduction Local Vs Cluster
Architecture
Queries Mode
Bene ts Challenges
Google Cloud
Data Loading And
Data lifecycle
MON TUE WED THU FRI ENVIRONMENT
stages- Generation, Platform Overview Saving Through Rdds
collection,
GCP Regions and
processing, storage,
Zones
management,
analysis, IAM Basics
visualization,
Pricing and Billing
interpretation
Hadoop Introduction Google Compute
Engine
Big Data Fresher
Google Cloud
Hadoop Architecture Storage

Hadoop
ecosystem
Components of
Hadoop

Introduction Hdfs

Evolution Of
Hadoop

Hdfs Commands

Yarn Overview

Week-4-PySpark P1(Data Science ) P1(Data Science ) P1(Data Science )


P1(Data Science ) P1(Data Science )
(Cloud
Computing ,Spark QC Audit SQL Coding Spark Coding Test Python Coding Review Topics
Fundamentals) Challenge Challenge
Spark-Operations- Spark-SQL Spark- General Interview
Spark-GCP
Pyspark Sorting And Preparation
SQL Concepts
Partitioning
Introduction To Cluster Modes
Introduction To
Rdd Cluster Step Execution Working With
Spark Sql
Running Spark Json Datasets
Basic Rdd
Introduction To
Operations Job on Dataproc Working With
Dataframes
Spark- Advanced Parquet Files
Introduction To Spark-Streaming-
Working On
Pyspark Spark-Advanced Introduction
Dataframes
Concepts
Entr ypoint
Narrow & Wide
Sparksession Introduction To
Executors Transformations
Streaming
Shared Variables
Spark Caching Selecting,
Spark Streaming
Actions Over view Renaming, Adding,
Dropping columns Spark Engine
Transformations Spark Jobs
Troubleshooting Filter, dropping Processing Data
rows Stream Using Spark
Con gure
Streaming
Memory Driver And Using Dataframe
Executors Aggregate Functions Tuning &
Con guration
Driver Class Expressions
Con guration Spark
Sor ting
Optimization
Null handling

Joins

UDF's
MON TUE WED THU FRI ENVIRONMENT

Spark caching /
Persistence(Al
l storage
levels)
Week5-Data P1(Data Science ) P1(Data Science ) P1(Data Science ) P1(Data Science ) P1(Data Science )
Warehouse
(Big Query) QC Audit Big Query Datasets Big Query Analyze Data Warehousing Review Topics
Test
Data Warehousing Creating Datasets Introduction to Delta Lake
Big Query Routines
BigQuery Analysis Schema Evolution
Public Datasets
DataWarehousing- Run a Query Manage Routines Delta Lake Time
Dataset Properties
Introduction Travel
Write Query User-De ned
Create and Query
Data Store Results Functions Delta Lake
Clustered Tables
Vendors Per formance
GoogleSQL ANSI Table Functions
Create and Query optimizations
OLAP,OLTP standard
External Tables SQL Stored
Systems
Big Query Tables Querying with Procedures
DWH Vs. Data Arrays Big Query
Lake,DWH Vs. Data Create and Use Connections
Querying JSON
Vir tualization Tables
data
Introduction to
DWH Architecture Table Schemas
Querying using Connections
Operational Data Create, Manage, Sketches
GCP GCS
Store/Staging Area and Query
Multi Statement Connections
Data Mart,Data Partitioned Tables
Queries Manage
Cleansing Connections
Recursive CTEs
Load/Transform/Exp
Table Sampling ort Data
Conceptual/Logical/
Physical Multi Statement
Transactions Creating a Search
Dimensional
Index
Modeling Running
Parameterized Manage Search
Star Schema &
Queries Indexes
Snow ake Schema
Creating and Transfer GCS data
Slowly Changing
Running Saved
Dimensions Schedule
Queries
of Data with GCS
DWH Vendors, Transfers
Optimize Queries
Cloud Vs. On- Load Avro,
Premises Query External Parquet, CSV, JSON,
Tables and ORC batch data
Big Query
Introduction Logical Views Load externally
partitioned data
Introduction to Materialized
BigQuer y Views Load data into
partitioned tables
Using The
BigQuery sandbox Transforming with
DML and GoogleSQL
BigQuery Dry
Runs Transforming data
in Partitioned tables
gsutil and
common bq Work with
commands Change History

Export Data to a
MON TUE WED THU FRI ENVIRONMENT
le

Export Data to
GCS

P2(ETL Pipeline ) P2(ETL Pipeline ) P2(ETL Pipeline ) P2(ETL Pipeline ) P2(ETL Pipeline )

QC Audit Cloud PubSub Data ow Apache Air ow Review

RDBMS Introduction to Introduction to Introduction to Project Presentation


PubSub Data ow Air ow
Google Cloud
SQL Cloud PubSub Data ow ML Creating DAG
with Python
Spanner Data ow SQL Data Loss Prevention
Cloud PubSup API
NoSQL Creating Pipelines
with Gcloud
Introduction to
NoSQL Overview Apache Beam Data Fusion
Week6-GCP DLP
Professional Data
Firestore Introduction to Data Catalog
Engineer Review Introduction to
Data Fusion
(Technologies) Datastore Apache Beam
Introduction to
Memor yStore Data Pipeline Data Catalog
using Beam Data Analytics & ML
Introduction to
Memor yStore Apache Beam
ML Basics
BigTable Transformations
Data Preparation
Introduction to with DataPrep
BigTable
BigQuery ML
Creating an
Datastudio
instance

Week7-GCP P2(ETL Pipeline ) P2(ETL Pipeline ) P2(ETL Pipeline ) P2(ETL Pipeline ) P2(ETL Pipeline )
Professional Data
Engineer Review GCP Data Ingesting and Storing the Data Maintaining and Review
(Topic Review) Engineering Review Processing the Data Automating Data
Selecting storage AI-Tooling
Designing Data Workloads
Planning the data systems
Processing Systems AI-Tooling-
pipelines Optimizing
Identity and Choosing Orientation
resources
De ning data managed services
AI Tooling
Access Management sources and sinks (e.g., Bigtable, Designing
Over view
Data security De ning data Spanner, Cloud SQL, automation and
Cloud Storage, repeatability AI Pair
Privacy transformation logic Firestore, Programming
Net working Memor ystore) Organizing
Regional Over view
workloads based on
fundamentals
considerations Planning for business requirements Codeium
Data encryption storage costs and Over view
Legal and Monitoring and
per formance
regulatory Building the troubleshooting Using Copilot,
compliance pipelines Lifecycle processes Codeium, Code
management of data Whisperer (TBD which
Preparing and Data cleansing Maintaining
one)
cleaning data (e.g., Planning for using awareness of failures
Identifying the
Dataprep, Data ow, a data warehouse and mitigating impact Integration with
services (e.g.,
and Cloud Data IDE
Data ow, Apache Using a data lake AI- Orientation
Fusion) AI-Tooling-Code-
Beam, Dataproc,
Designing for a ML Introduction Generation
Monitoring and Cloud Data Fusion,
data mesh
orchestration of data BigQuery, Pub/Sub, AI Introduction
Preparing and Using Use Cases and
Apache Spark,

5/8
MON TUE WED THU FRI ENVIRONMENT
pipelines Hadoop ecosystem, Data for Analysis GenAI Overview Best Practices for
and Apache Kafka) GenAI Code
Disaster recovery Preparing data for LLM-Over view Generation
and fault tolerance Transformations visualization
LLMs (GPT, BERT, Using GenAI for
Making decisions Data acquisition Sharing data Claude, Llama, Code Generation
related to ACID and import Copilot, Codeium)
Exploring and AI-Tooling-UnitTest-
compliance and
Integrating with analyzing data Generation
availabilit y Use cases for LLM
new data sources
Data validation LLM best practices Use Cases and
Job automation
Securit y Best Practices for
Mapping current and orchestration
GenAI Unit Tests
and future business (e.g., Cloud considerations
requirements to the Composer and Using GenAI for
Hallucinations
architecture Work ows) Testing
AI Review AI-Tooling-
Designing for CI/CD
Documentation-
data and application Prompt-Engineering
Generation
por tabilit y
Prompt
Data staging, Engineering Use Cases and
cataloging, and Introduction Best Practices for
discover y GenAI
Zero-shot
Documentation
Designing data Prompting
migrations Using GenAI for
Few-shot
Documentation
prompting
AI-Tooling-Code-
Constraints Analysis

Fine-tuning and
Use Cases and
Conditioning
Best Practices for
Interaction and GenAI Code Analysis
Dialog State
Using GenAI for
Instructions and Code Analysis
Guidelines AI-Tooling-Code-
Optimization
Hallucinations

Responsible Use Cases and


Usage Best Practices for
GenAI Code
Securit y
Optimization
Prompt Engineering
Using GenAI for
Review
Code Optimization
AI-Tooling-
Responsible-Use

Responsible Uses
Over view

AI Tools for Code


Review

Searching
Codebases with
GenAI

Assessing
Generated Content

6/8
MON TUE WED THU FRI ENVIRONMENT
Qualit y

AI-Tooling-Security

Overview of
Securit y
Bene ts/Risks with
GenAI

GenAI Security
Analysis

Common Security
Problems/Solutions
with GenAI

Gen AI Security
Best Practices
AI-Tooling Review

P2(ETL Pipeline ) P2(ETL Pipeline ) P2(ETL Pipeline ) P2(ETL Pipeline ) P2(ETL Pipeline )

Week8 Recap Recap Recap Recap Recap

QC Audit

7/8
PROJECT TECHNOLOGIES
PySpark, BigQuery, Hadoop, Spark-SQL
P1(Data Science )

Recap

P0( Rest API Services ) Python, SQL, REST, Git

GCP, Data Visualization, Apache Air ow,


P2(ETL Pipeline )
BigQuery ML

8/8

You might also like