0% found this document useful (0 votes)
5 views9 pages

Python and Pyspark With Databricks, With Azure Project

The document outlines a comprehensive syllabus for a course on Python, PySpark, and Databricks, covering fundamental concepts, data management, computation management, and advanced topics. It includes detailed sections on Python programming, Databricks architecture and components, and PySpark functionalities, along with practical projects involving Delta Lake and Azure. The course aims to equip learners with the necessary skills for data engineering and data science using these technologies.

Uploaded by

ganeshdane1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views9 pages

Python and Pyspark With Databricks, With Azure Project

The document outlines a comprehensive syllabus for a course on Python, PySpark, and Databricks, covering fundamental concepts, data management, computation management, and advanced topics. It includes detailed sections on Python programming, Databricks architecture and components, and PySpark functionalities, along with practical projects involving Delta Lake and Azure. The course aims to equip learners with the necessary skills for data engineering and data science using these technologies.

Uploaded by

ganeshdane1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Python + Pyspark with Data Bricks

Syllabus:

Python Syllabus

1. Introduction to Python
Python Introduction, History of Python, Introduction to Python
Interpreter and program execution, Python Installation Process in
Windows and Linux, Python IDE, Introduction to anaconda, python
variable declaration, Keywords, Indents in Python,
Python input/output operations

2. Python’s Operators
Arithmetic Operators, Comparison Operators, Assignment Operators,
Logical Operators, Bitwise Operators, Membership Operators, Identity
Operators, Ternary Operator, Operator precedence.

3. Python’s Built-in Data types


String, List, Tuple, Set, Dictionary (characteristics and methods)

4. Conditional Statements & Loop


Conditional Statements (If, If-else, If-elif-else, Nested-if etc.) and loop
control statements
(for, while, Nested loops, Break, Continue, Pass statements)

5. Function in python.
Introduction to functions, Function definition and calling, Function
parameters, Default argument function, Variable argument
function, in built functions in python, Scope of variable in python
6. File Processing
Concept of Files, File opening in various modes and closing of a file,
Reading from a file, Writing onto a file, some important File
handling functions e.g open(), close(), read(), readline() etc.

7. Modules
Concept of modularization, Importance of modules in python,
Importing modules, Built in modules ( ex: Numpy)
Databricks Concepts.

1) Databricks Introduction

A. Databricks Architecture

B. Databricks Components overview

C. Benefits for data engineers and data scientists

2) Databricks concepts

A. Workspace – Creation and managing workspace.

B. Notebook – creating notebooks, calling and managing

different notebooks.

C. Library - installing libraries, managing libraries

3) Data Management

A. Databricks File System. - DBFS commands copy and

manage files using DBFS.

B. Database - Creating database, tables and managing

databases and tables.

C. Table - Creating Tables, dropping tables, loading data ..

D. Metastore - managing metadata and delta tables creation,

managing delta tables.

E. Unity Catalog configuration and creation

4) Computation Management

A. Cluster -- Creating Clusters , managing clusters

B. Pool - creating pools and using pools for Auto scaling.


C. Databricks RunTime - understanding and using Databricks runtimes
based on requirement.
D. Jobs - creating jobs from notebooks and assigning types of clusters
for jobs.
E. Workload - monitoring jobs and managing loads.

F. Execution Context –understandingcontext.

5) Databricks Advanced topics.

A. Databricks Workflows

B. Workflow task

C. Implementing parallel and sequential tasks

D. Scheduling workflows in Databricks

E. Calling one notebook into another

notebook. F. Parameterization in

notebooks

G. How to implement parallelism in notebooks execution.

H. Mounting azure blob storage and data lake

storage accounts. I. Repos integration in

Databricks

J. Volumes in Databricks

K. Costing and Performance monitoring

L. Databricks Unity Catalog

M. Databricks Delta Live Tables

N. Databricks Change Data Feed


PySpark Content

1 Pyspark Introduction
2 Pyspark Features and Advantages
3 Pyspark RDD Computation
4 Pyspark Transformations and Actions
5 Pyspark Fault-Tolerance mechanism
6 Pyspark RDD persistence
7 Different persistence options
8 Test
9 ON Lambda filter and map functions
10 Pyspark RDD in-built Transformations
11 Pyspark key value Transformations
12 Pyspark inbuilt Actions
13 Pyspark inbuilt actions and increasing part
14 Pyspark Filtering operations and word count
15 Pyspark Goupings and Aggregations
16 Pyspark installation within jupyter Notebook
17 Pyspark SQL and Creating Dataframes
18 Pyspark sql Dataframe functions
19 Pyspark various Dataframe Functions
20 Pyspark Sql DataFrame Functions
21 Pyspark different types of joins
22 Pyspark working with sql stmts
23 Pyspark Working with CSV and Json data
24 MultiLine JSON and Pyspark integration with
25 Pyspark Column Transformations
26 Nosql Introduction
27 NoSql Hbase Introduction
28 Nosql Hbase CRUD operations
29 Importing data from RDBMS to Hbase table
30 Mysql and Hbase
31 Various pyspark functions
32 Filtering and Replacing column values
33 Pyspark Jupyter and pyspark pandas and cal
34 Pyspark Date and Timestamp functions
35 Stages and Tasks Narrow and wide Transforma
36 Test
37 Nifi Lecture 1
38 Nifi Lecture 2
39 Kafka Lecture 1
40 Kafka Lecture 2
41 Streaming Lecture 1
43 Streaming Lecture 2
44 Streaming Lecture 3

45 Pyspark PROJECTS(s)

1) Delta Lake usage in


Databricks.

A. Delta Lake Architecture

B. Delta Lake Storage Understanding

C. Delta lake table creation and API options

D. Delta Lake DML

Operations usage. E.
Delta Lake partitions

F. Delta Lake Schema Enforcement

G. Delta Lake Schema Evolution

H. Delta Lake Versions

I. Delta Lake Time Travel

J. Delta Lake Vaccum

K. Delta Lake Merge (SCD Type 1 and SCD Type2)

B. Understand storage account keys.

C. Understand shared access signatures.

D. Understand transport-level encryption

with HTTPS. E. Understand Advanced

Threat Protection.

F. Control network access.

SPARK SQL:

1) Introduction to Spark SQL.

2) Spark SQL Create database

3) Drop databases

4) Create internal table

5) Create external table

6) Create partitioned table


7) Create partitioned with bucketing table

8) SPARK DML insert, update, delete and merge operations

9) SPARK SQL DRL Select queries with different clauses

10) Spark SQL MERGE With SCD Type 1 and SCD Type 2

11) Spark SQL WHERE Clause, Group By Clause and Having Clauses

12) Spark SQL Order by, Sort By clauses

13) Spark SQL join types, Window , Pivot , Limit and Like

14) Spark SQL Grouping Sets, Rollup and Cube

15) Spark SQL Cultured By and Distributed By

16) Spark SQL Case, With and Take sample

AZURE

1) Overview of the Microsoft Azure Platform


A. Introduction to Azure
B. Basics of Cloud computing
C. Azure Infrastructure
D. Walkthrough of Azure Portal
E. Overview of Azure Services

2) Azure Data Architecture

A. Traditional RDBMS workloads.


B. Data Warehousing Approach
C. Big data architectures.
D. Transferring data to and from Azure

3) Blob Storage
A. Azure Blob Resources
B. Azure storage account data objects
C. Azure storage account types and Options
D. Replications in distribution
E. Secure access to an application's data
F. Azure Import/Export service
G. Storage Explorer
H. Practical section on Blob Storage

PROJECT AZURE
Streaming Project Using
1.Nifi
2.Kafka
3.Pyspark
4.Azure

You might also like