0% found this document useful (0 votes)

50 views11 pages

DE Python

Uploaded by

subrahmanya02_203915

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views11 pages

DE Python

Uploaded by

subrahmanya02_203915

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data Engineering and Machine Learning Using Python

Module 1: Introduction to Machine Learning

▪ Introduction To Machine Learning

▪ Life Cycle of Machine Learning
▪ Skills required for Machine Learning
▪ Careers Path in Machine Learning
▪ Applications of Machine Learning

Module 3: Python for Machine Learning

▪ Python programming:
▪ Environment Setup
▪ Jupyter Notebook Overview
▪ Data types:Numbers,Strings,Printing,Lists,Dictionaries,Booleans,Tuples
,Sets
▪ Comparison Operators
▪ if,elif, else Statements
▪ Loops:for Loops,while Loops
▪ range()
▪ list comprehension
▪ functions
▪ lambda expressions
▪ map and filter
▪ methods
▪ Programming Exercises.
▪ Object Oriented Programming
▪ Modules and packages
▪ Errors and Exception Handling
▪ Python Decorators
▪ Python generators
▪ Collections
▪ Regular Expression
▪ Python for Exploratory Data Analysis:
▪ NumPy:
▪ Installing numpy
▪ Using numpy
▪ NumPy arrays
▪ Creating numpy arrays from python list
▪ Creating arrays using built in
methods(arrange(),zeros(),ones(),linspace(),eye(),rand(),etc.
▪ Array attributes :shape, type
▪ Array methods: Reshape(),min(),max(),argmax(),argmin(),etc.
▪ Pandas:
▪ Introduction to Pandas
▪ Series
▪ DataFrames
▪ Missing Data
▪ GroupBy
▪ Merging, Joining and Concatenating
▪ Operations
▪ Data Input and Output
▪ Python for Data Visualization:
▪ Matplotlib:
▪ Installing Matplotlib,Basic Matplotlib commands
▪ Creating Multiplot on same canvas
▪ Object Oriented Method:figure(),plot(),add_axes(),subplots(),etc.
▪ MatplotlibExercise
▪ Seaborn:
▪ Categorical plot
▪ Distribution plot
▪ Regression plot
▪ Seaborn Exercise
▪ Pandas built in visualization:
▪ Scatter plot
▪ Histograms
▪ Box plot
▪ CAPSTONE PROJECT FOR DATA ANALYSIS

Module 4: Deep dive into Machine Learning

▪ Introduction To Machine Learning:

▪ Relationship between Data Science and Machine Learning
▪ Supervised Learning
▪ Unsupervised Learning

Supervised Learning (Regression AND Classification Algorithms):

▪ Linear Regression
▪ Ridge Regression
▪ Lasso Regression
▪ Polynomial Regression
▪ Support vector regression
▪ Decision Tree Regression
▪ Random Forest Regression
▪ Logistic Regression
▪ Support Vector Machines
▪ Kernel SVM
▪ Decision Trees and Random Forest
▪ Ensemble Of Decision Trees
▪ Model Evaluation and Improvement

Unsupervised Learning:

▪ Challenges in Unsupervised Learning

▪ Preprocessing AND Scaling
▪ Dimensionality Reduction, Feature Extraction
▪ Principle Component Analysis (PCA)
▪ Clustering
▪ KMEANS
▪ Model evaluation and improvement
▪ Cross validation, Grid search, Evaluation metrics and scoring
▪ Working with text data

Module 5: NLP & Recommender Systems:

▪ Corpus
▪ Text preprocessing using Bag of words technique
▪ TF(Term Frequency)
▪ IDF(Inverse Document Frequency)
▪ Normalization
▪ Vectorization
▪ NLP with Python

Hadoop Developer Course

During this course you will learn:

• Linux (Ubuntu/Centos) - Tips and Tricks

• Basic Java Programming – Core Java Oops Concepts
• Introduction to Big Data and Hadoop
• Hadoop ecosystem concepts
• Hadoop MapReduce concepts and features
• Developing MapReduce applications
• Pig concepts
• Hive concepts
• Impala
• Oozie workflow concepts
• Sqoop Data Ingestion
• Flume Agents
• Tableau Visualization
HBase concepts
• Real Time tools like Hue, Putty, FileZilla, Cloudera Manager
• Real Time Projects

Linux (Ubuntu/Cent Os) - Tips and Tricks

Basic(core) Java Programming Concepts – OOPS

Introduction to Big Data and Hadoop

• What is Big Data?
• What are the challenges for processing big data?
• What is Hadoop?
• Why Hadoop?
• History of Hadoop
• Hadoop ecosystem
• HDFS
• MapReduce

Understanding the Cluster

• Hadoop 2.x Architecture
• Typical workflow
• HDFS Commands
• Writing files to HDFS
• Reading files from HDFS
• Rack awareness
• Hadoop daemons

Let's talk MapReduce

• Before MapReduce
Hadoop Developer Course

• MapReduce overview
• Word count problem
• Word count flow and solution
• MapReduce flow

Developing the MapReduce Application

• Data Types
• File Formats
• Explain the Driver, Mapper and Reducer code
• Configuring development environment - Eclipse
• Writing unit test
• Running locally
• Running on cluster
• Hands on exercises

How MapReduce Works

• Anatomy of MapReduce job run
• Job submission
• Job initialization
• Task assignment
• Job completion
• Job scheduling
• Job failures
• Shuffle and sort
• Hands on exercises

MapReduce Types and Formats

• File Formats – Sequence Files
• Compression Techniques
• Input Formats - Input splits & records, text input, binary input
• Output Formats - text output, binary output, lazy output
• Hands on exercises

MapReduce Features

Counters
• Side data distribution
• MapReduce combiner
• MapReduce partitioner
• MapReduce distributed cache
• Hands exercises

Hive
• Hive Architecture
• Types of Metastore
• Hive Data Types
Hadoop Developer Course
• HiveQL
• File Formats – Parquet, ORC, Sequence and Avro Files Comparison
• Partitioning & Bucketing
• Hive JDBC Client
• Hive UDFs
• Hive Serdes
• Hive on Tez
• Hands-on exercises
• Integration with Tableau

Pig
• Pig Architecture
• Pig Data Types
• Load/Store Functions
• PigLatin
• Pig Udfs

Hbase

• HBase architecture and concepts

• Hbase Data Model
• Hbase Shell Interface
• Hbase Java API

Sqoop
• Sqoop Architecture
• Sqoop Import Command Arguments, Incremental Import
• Sqoop Export
• Sqoop Jobs
• Hands-on exercises

Flume
• Flume Architecture
• Flume Agent Setup
• Types of sources, channels, sinks Multi Agent Flow
• Hands-on exercises

Oozie
• Oozie Fundamentals
• Oozie workflow creations
• Oozie Job submission, monitoring, debugging
• Concepts on Coordinators and Bundles
• Hands-on exercises
Case Studies Discussions

Any one of the Four Projects

• Log File Analysis covering Flume, HDFS, MR/Pig, Hive, Tableau
• Crime Data Analysis Covering Oozie, Sqoop, HDFS, Hive, Hbase, RestFul Client.

• Hadoop Use Cases in Insurance Domain

Hadoop Use Cases in Retail Domain

Scala or Python , Spark
➢ Understand the difference between Apache Spark and Hadoop
➢ Learn Scala and its programming implementation

✓ Why Scala or python

✓ Scala Installation
✓ Get deep insights into the functioning of Scala
✓ Execute Pattern Matching in Scala
✓ Functional Programming in Scala – Closures, Currying, Expressions,
Anonymous Functions
✓ Know the concepts of classes in Scala
✓ Object Orientation in Scala – Primary, Auxiliary Constructors, Singleton &
Companion Objects
✓ Traits and Abstract classes in Scala
✓ Scala Simple Build Tool – SBT
✓ Building with Maven

➢ Spark Basics

✓ What is Apache Spark?

✓ Spark Installation
✓ Spark Configuration
✓ Spark Context
✓ Using Spark Shell
✓ Resilient Distributed Datasets (RDDs) – Features, Partitions, Tuning Parallelism
✓ Functional Programming with Spark

➢ Working with RDDs

✓ RDD Operations - Transformations and Actions
✓ Types of RDDs
✓ Key-Value Pair RDDs – Transformations and Actions
✓ MapReduce and Pair RDD Operations
✓ Serialization

➢ Spark on a cluster

✓ Overview
✓ A Spark Standalone Cluster
✓ The Spark Standalone Web UI
✓ Executors & Cluster Manager
✓ Spark on YARN Framework

➢ Writing Spark Applications

✓ Spark Applications vs. Spark Shell

✓ Creating the SparkContext
✓ Configuring Spark Properties
✓ Building and Running a Spark Application
✓ Logging
✓ Spark Job Anatomy

➢ Caching and Persistence

✓ RDD Lineage
✓ Caching Overview
✓ Distributed Persistence

➢ Improving Spark Performance

✓ Shared Variables: Broadcast Variables

✓ Shared Variables: Accumulators
✓ Per Partition Processing
✓ Common Performance Issues

➢ Spark API for different File Formats & Compression Codecs

✓ Text
✓ CSV
✓ Sequence
✓ Parquet
✓ ORC
✓ Compression Techniques – Snappy, Zlib, Gzip

➢ Spark SQL
✓ Spark SQL Overview
✓ HiveContext
✓ SQL Datatypes
✓ Dataframes vs RDDs
✓ Operations on DFs
✓ Parquet Files with Spark Sql – Read, Write, Partitioning, Merging Schema
✓ ORC Files
✓ JSON Files
✓ Inferring Schema programmatically
✓ Custom Case Classes
✓ Temp Tables vs Persistent Tables
✓ Writing UDFs
✓ Hive Support
✓ JDBC Support - Examples
✓ HBase Support - Examples
➢ Spark Streaming

✓ Spark Streaming Overview

✓ Example: Streaming Word Count
✓ Other Streaming Operations
✓ Sliding Window Operations
✓ Developing Spark Streaming Applications – Integration with Kafka and Hbase

Complementary Course: AWS

Developer Training For Apache Spark and Hadoop
No ratings yet
Developer Training For Apache Spark and Hadoop
3 pages
Data Bots Training Courses
100% (1)
Data Bots Training Courses
36 pages
IIT Kharagpur Data Science PDF
No ratings yet
IIT Kharagpur Data Science PDF
22 pages
Linux Programming
No ratings yet
Linux Programming
4 pages
Data Science Training Content Naresh IT Hyderabad
No ratings yet
Data Science Training Content Naresh IT Hyderabad
13 pages
PySpark and AWS Big Data Training
No ratings yet
PySpark and AWS Big Data Training
8 pages
Big Data and Hadoop Course Overview
No ratings yet
Big Data and Hadoop Course Overview
6 pages
Hadoop Essentials for Big Data Solutions
No ratings yet
Hadoop Essentials for Big Data Solutions
2 pages
Big Data - Road Map
No ratings yet
Big Data - Road Map
22 pages
Big Data and Hadoop For Developers - Syllabus
No ratings yet
Big Data and Hadoop For Developers - Syllabus
6 pages
Hadoop Development and Career Guide
No ratings yet
Hadoop Development and Career Guide
5 pages
Big Data Curriculum for CS & CSE Students
No ratings yet
Big Data Curriculum for CS & CSE Students
2 pages
Specialised Programme On Big Data and Machine Learning - 8 Weeks
No ratings yet
Specialised Programme On Big Data and Machine Learning - 8 Weeks
6 pages
Comprehensive Data Science Guide
No ratings yet
Comprehensive Data Science Guide
10 pages
MCA - II Sem - Curriculum and Syllabus
No ratings yet
MCA - II Sem - Curriculum and Syllabus
15 pages
Annexure - I - Syllabus PG-DBDA Aug 16
No ratings yet
Annexure - I - Syllabus PG-DBDA Aug 16
4 pages
Road Map 1741960074
No ratings yet
Road Map 1741960074
24 pages
20IT503 - Big Data Analytics - Unit4
No ratings yet
20IT503 - Big Data Analytics - Unit4
73 pages
Azure de and Fabric de Full Edited
No ratings yet
Azure de and Fabric de Full Edited
7 pages
Cloud Data Engineering Program Overview
No ratings yet
Cloud Data Engineering Program Overview
5 pages
Learn Well Technocraft: Hadoop/Big Data Syllabus
100% (1)
Learn Well Technocraft: Hadoop/Big Data Syllabus
12 pages
Best Hadoop Training in Hyderabad
No ratings yet
Best Hadoop Training in Hyderabad
7 pages
Had Oop Details
No ratings yet
Had Oop Details
21 pages
Skyess Spark Syllabus
No ratings yet
Skyess Spark Syllabus
12 pages
Course Outline Hadoop and Spark For Big Data and Data Science PDF
No ratings yet
Course Outline Hadoop and Spark For Big Data and Data Science PDF
4 pages
Big Data Mastery with Hadoop & Spark
100% (1)
Big Data Mastery with Hadoop & Spark
4 pages
B2. Introduction To Big Data With Spark and Hadoop - Coursera
No ratings yet
B2. Introduction To Big Data With Spark and Hadoop - Coursera
12 pages
Comprehensive Guide to Hadoop and Big Data
No ratings yet
Comprehensive Guide to Hadoop and Big Data
2 pages
Big Data Hadoop & Spark Workshop
No ratings yet
Big Data Hadoop & Spark Workshop
8 pages
Data Analytics TOC
No ratings yet
Data Analytics TOC
6 pages
Big Data Hadoop & Spark Training Course
No ratings yet
Big Data Hadoop & Spark Training Course
10 pages
GAME
No ratings yet
GAME
2 pages
Big Data Training in Chennai - Big Data Course in Chennai
No ratings yet
Big Data Training in Chennai - Big Data Course in Chennai
1 page
Course Contents of Hadoop and Big Data
No ratings yet
Course Contents of Hadoop and Big Data
11 pages
Comprehensive Azure SQL Training Guide
No ratings yet
Comprehensive Azure SQL Training Guide
6 pages
Syllabus of Big Data Analysis - Proposed
No ratings yet
Syllabus of Big Data Analysis - Proposed
2 pages
Advanta Innovation: Course Objective Summary
No ratings yet
Advanta Innovation: Course Objective Summary
3 pages
Big Data With Hadoop and Spark - 2023-25
No ratings yet
Big Data With Hadoop and Spark - 2023-25
4 pages
Big Data Hadoop - Course Curriculum - V1
No ratings yet
Big Data Hadoop - Course Curriculum - V1
7 pages
Big Data Engineer Course Syllabus
No ratings yet
Big Data Engineer Course Syllabus
21 pages
Big Data Analytics
No ratings yet
Big Data Analytics
2 pages
Big Data Hadoop & Spark Course
No ratings yet
Big Data Hadoop & Spark Course
30 pages
Data Science C
No ratings yet
Data Science C
21 pages
Big Data and Hadoop Training Course
No ratings yet
Big Data and Hadoop Training Course
9 pages
Data Engineering Brochure FXSr63lN9T
No ratings yet
Data Engineering Brochure FXSr63lN9T
14 pages
Big Data - Hadoop & Spark Training Syllabus: Tamilboomi
No ratings yet
Big Data - Hadoop & Spark Training Syllabus: Tamilboomi
4 pages
BIG DATA ANALYTIS LAB File Shivam
No ratings yet
BIG DATA ANALYTIS LAB File Shivam
42 pages
Big Data Hadoop Certification Training: About Intellipaat
No ratings yet
Big Data Hadoop Certification Training: About Intellipaat
13 pages
Hadoop Developer Training Overview
No ratings yet
Hadoop Developer Training Overview
8 pages
Big Data Hadoop Certification Course
No ratings yet
Big Data Hadoop Certification Course
13 pages
Big Data Roadmap
No ratings yet
Big Data Roadmap
3 pages
Introduction to Big Data with Hadoop
No ratings yet
Introduction to Big Data with Hadoop
3 pages
Big Data Analytics Course Syllabus
No ratings yet
Big Data Analytics Course Syllabus
4 pages
Hadoop Big Data Analytics Course
No ratings yet
Hadoop Big Data Analytics Course
5 pages
Data Engineer in 3 Months
No ratings yet
Data Engineer in 3 Months
2 pages
Bigdata
No ratings yet
Bigdata
3 pages
Big Data Analytics with PySpark Course
No ratings yet
Big Data Analytics with PySpark Course
2 pages
CO I Internal 2020
No ratings yet
CO I Internal 2020
3 pages
Coa PPT
No ratings yet
Coa PPT
158 pages
CPU Instruction Execution Guide
No ratings yet
CPU Instruction Execution Guide
103 pages
Module 3
No ratings yet
Module 3
60 pages
Respiratory Care Sciences 5th Edition by Wojciechowski Official Test Bank
No ratings yet
Respiratory Care Sciences 5th Edition by Wojciechowski Official Test Bank
302 pages
Understanding Social Loafing
No ratings yet
Understanding Social Loafing
19 pages
International Geeta Olympiad
No ratings yet
International Geeta Olympiad
2 pages
Feasibility Study Methodology for MX Spa
No ratings yet
Feasibility Study Methodology for MX Spa
16 pages
UGC-NET MCQs With Answer Key
83% (6)
UGC-NET MCQs With Answer Key
40 pages
Certification: This Qualitative Research Entitled, "THE INSIGHTS OF
No ratings yet
Certification: This Qualitative Research Entitled, "THE INSIGHTS OF
5 pages
Mechanics+ +Waves+on+a+Stretched+String
No ratings yet
Mechanics+ +Waves+on+a+Stretched+String
4 pages
Solution Manual For Principles of Geotechnical Engineering SI 9th Edition Das Sobhan 1305970950 9781305970953 Ready To Read
100% (16)
Solution Manual For Principles of Geotechnical Engineering SI 9th Edition Das Sobhan 1305970950 9781305970953 Ready To Read
54 pages
Debasis LP Practice IsEven 1652666994479
No ratings yet
Debasis LP Practice IsEven 1652666994479
5 pages
Education Policy in Finland
No ratings yet
Education Policy in Finland
11 pages
GCSE AQA Science Grade Boundaries
No ratings yet
GCSE AQA Science Grade Boundaries
1 page
Bulletin 4 14 14
No ratings yet
Bulletin 4 14 14
3 pages
RHAPSODY Jupiter
No ratings yet
RHAPSODY Jupiter
4 pages
CSEC Math May Answers
0% (1)
CSEC Math May Answers
3 pages
Unit 3
No ratings yet
Unit 3
9 pages
Receipt 1746174907293
No ratings yet
Receipt 1746174907293
2 pages
Part A: Multiple Choice: Answer With The Best Choice. Make Sure That You Clearly Circle The
No ratings yet
Part A: Multiple Choice: Answer With The Best Choice. Make Sure That You Clearly Circle The
8 pages
STAT202-homework2 HW21
No ratings yet
STAT202-homework2 HW21
2 pages
Woolley Malone HBR2011
No ratings yet
Woolley Malone HBR2011
4 pages
Cambridge IGCSE ™: Mathematics 0580/12
No ratings yet
Cambridge IGCSE ™: Mathematics 0580/12
10 pages
Appian Developer Resume Guide
No ratings yet
Appian Developer Resume Guide
9 pages
Session Plan - Frigid Zone
100% (2)
Session Plan - Frigid Zone
3 pages
APJ Elevate - Databricks Certification Exam Overview Training Data Analyst Associate
No ratings yet
APJ Elevate - Databricks Certification Exam Overview Training Data Analyst Associate
96 pages
Ambedkar Medical College
No ratings yet
Ambedkar Medical College
16 pages
Conditional Admission Offer: MSc Program
No ratings yet
Conditional Admission Offer: MSc Program
4 pages
CFPE Exam Dumps
No ratings yet
CFPE Exam Dumps
2 pages
Resume Panganiban Bsie
No ratings yet
Resume Panganiban Bsie
2 pages
Homework 10: R Markdown
No ratings yet
Homework 10: R Markdown
39 pages
Cognitive Therapy Scale Rating Manual
No ratings yet
Cognitive Therapy Scale Rating Manual
24 pages
Textosnouryokushiken
No ratings yet
Textosnouryokushiken
3 pages

DE Python

Uploaded by

DE Python

Uploaded by

Data Engineering and Machine Learning Using Python

Module 1: Introduction to Machine Learning

▪ Introduction To Machine Learning

Module 3: Python for Machine Learning

Module 4: Deep dive into Machine Learning

▪ Introduction To Machine Learning:

Supervised Learning (Regression AND Classification Algorithms):

▪ Challenges in Unsupervised Learning

Module 5: NLP & Recommender Systems:

Hadoop Developer Course

During this course you will learn:

• Linux (Ubuntu/Centos) - Tips and Tricks

Linux (Ubuntu/Cent Os) - Tips and Tricks

Basic(core) Java Programming Concepts – OOPS

Introduction to Big Data and Hadoop

Understanding the Cluster

Let's talk MapReduce

Developing the MapReduce Application

How MapReduce Works

MapReduce Types and Formats

• HBase architecture and concepts

Any one of the Four Projects

• Hadoop Use Cases in Insurance Domain

Hadoop Use Cases in Retail Domain

✓ Why Scala or python

✓ What is Apache Spark?

➢ Working with RDDs

➢ Writing Spark Applications

✓ Spark Applications vs. Spark Shell

➢ Caching and Persistence

➢ Improving Spark Performance

✓ Shared Variables: Broadcast Variables

➢ Spark API for different File Formats & Compression Codecs

✓ Spark Streaming Overview

Complementary Course: AWS

You might also like