CERTIFICATE PROGRAM
At Cloudxlab, we are building one of the best gamified learning
environments to make technology learning fun and for life. More than
50,000 users across the world have been benefited by our signature
courses on Machine Learning and Big Data. Our vision is to upskill people
on high-end technologies like Deep Learning, Machine Learning, Big
Data and make them employable.
Every domain of computing such as data analysis, software engineering,
and artificial intelligence is going to be impacted by Data Science.
Therefore,every engineer, researcher, manager or scientist would be
expected to know Data Science.
So naturally, you are excited about Data Science and would love to dive
into it. This specialization is designed for those who want to gain
hands-on experience in solving real-life problems using big data,
machine learning and deep learning. After finishing this specialization,
you will find creative ways to apply your learning to your work like
building a robot which can recognize faces or change the path after
discovering obstacles on the path.
Sandeep Giri
Founder at CloudxLab
E&ICT Academy, IIT Roorkee provides training programs with an
emphasis on hands-on learning in basic/advanced topics and emerging
technologies. Then project is sponsored by the Ministry of Electronics and
Information Technology, Govt. of India. E&ICT Academy courses are at par
with Quality Improvement Program (QIP) for recognition/credits.
The programs are conducted by well-known industry partners,
researches and experts from leading academic and renowned R&D
organizations. For this, the Academy has signed MoUs with Industry/ R&D
partners in different domains, who collaborate and work with them in
conducting the training programs. Academy also facilitates the
interaction between beneficiaries and industry experts to enable
collaboration and finding opportunities for parent institutions.
Sanjeev Manhas
Associate Professor IIT Roorkee
Earn a certificate Learn Data Science from
from E&ICT Academy, industry experts and
IIT Roorkee. become expert in Data
Science domain
Online cloud lab for Best-in-class support
hands-on for Throughout your
real-world experience learning journey
Work on real-world
Lifetime course
projects.
access
Interact with the
international
community of peers
via the discussion
forum.
Sandeep Giri Course Developer
Founder at CloudxLab Know More
Past: Amazon, InMobi, D.E.Shaw
Sanjeev Manhas Course Advisor
Associate Professor, Know More
IIT Roorkee
R. Balasubramanian Course Developer
Professor, Know More
IIT Roorkee
Partha Pratim Roy Course Developer
Assistant Professor, Know More
IIT Roorkee
Abhinav Singh Course Developer
Co-Founder at CloudxLab Know More
Past: Byjus
1. Introduction to Linux
2. Introduction to Python
3. Hands-on using Jupyter on CloudxLab
4. Overview of Linear Algebra
5. Introduction to NumPy & Pandas
6. Quizzes, gamified assessments & projects
1. Introduction
● Big Data Introduction
● Distributed systems
● Big Data Use Cases
● Various Solutions
● Overview of Hadoop Ecosystem
● Spark Ecosystem Walkthrough
● Quiz
4. HDFS
● Why HDFS or Why not existing file systems?
● HDFS - NameNode & DataNodes
● Quiz
● Advance HDFS Concepts (HA, Federation)
● Quiz
● Hands-on with HDFS (Upload, Download, SetRep)
● Quiz & Assessment
● Data Locality (Rack Awareness)
5. YARN
● YARN - Why not existing tools?
● YARN - Evolution from MapReduce 1.0
● Resource Management: YARN Architecture
● Advance Concepts - Speculative Execution
● Quiz
6. MapReduce Basics
● MapReduce - Understanding Sorting
● MapReduce - Overview & Quiz
● Example 0 - Word Frequency Problem - Without MR
● Example 1 - Only Mapper - Image Resizing
● Example 2 - Word Frequency Problem
● Example 3 - Temperature Problem
● Example 4 - Multiple Reducer
● Example 5 - Java MapReduce Walkthrough & Quiz
2. Foundation & Environment
● Understanding the Cloudxlab
● Cloudxlab Hands-on
● Hadoop & Spark Hands-on
● Quiz and Assessment
● Basics of Linux - Quick Hands-on
● Understanding Regular Expressions
● Quiz and Assessment
● Setting up VM (optional)
3. Zookeeper
● ZooKeeper - Race Condition
● ZooKeeper - Deadlock
● Hands-On
● Quiz & Assessment
● How does election happen - Paxos Algorithm?
● Use cases
● When not to use
● Quiz & Assessment
7. Map Reduce Advanced
● Writing MapReduce Code Using Java
● Building MapReduce project using Apache Ant
● Concept - Associative & Commutative
● Quiz
● Example 8 - Combiner
● Example 9 - Hadoop Streaming
● Example 10 - Adv. Problem Solving - Anagrams
● Example 11 - Adv. Problem Solving - Same DNA
● Example 12 - Adv. Problem Solving - Similar DNA
● Example 12 - Joins - Voting
● Limitations of MapReduce
● Quiz
8. Analyzing Data with Pig
● Pig - Introduction
● Pig - Modes
● Getting Started
● Example - NYSE Stock Exchange
● Concept - Lazy Evaluation
9. Processing Data with Hive
● Hive - Introduction
● Hive - Data Types
● Getting Started
● Loading Data in Hive (Tables)
● Example: Movielens Data Processing
● Advance Concepts: Views
● Connecting Tableau and HiveServer 2
● Connecting Microsoft Excel and HiveServer 2
● Project: Sentiment Analysis of Twitter Data
● Advanced - Partition Tables
● Understanding HCatalog & Impala
● Quiz
10. NoSQL and HBase
● NoSQL - Scaling Out / Up
● NoSQL - ACID Properties and RDBMS Story
● CAP Theorem
● HBase Architecture - Region Servers etc
● Hbase Data Model - Column Family Orientedness
● Getting Started - Create table, Adding Data
● Adv Example - Google Links Storage
● Concept - Bloom Filter
● Comparison of NOSQL Databases
● Quiz
11. Importing Data with Sqoop and Flume, Oozie
● Sqoop - Introduction
● Sqoop Import - MySQL to HDFS
● Exporting to MySQL from HDFS
● Concept - Unbounding Dataset Processing or Stream Processing
● Flume Overview: Agents - Source, Sink, Channel
● Example 1 - Data from Local network service into HDFS
● Example 2 - Extracting Twitter Data
● Quiz
● Example 3 - Creating workflow with Oozie
12. Live Session Recordings
1. Introduction
● Apache Spark ecosystem walkthrough
● Spark Introduction - Why Spark?
● Quiz
2. Scala Basics
● Scala - Quick Introduction - Access Scala on CloudxLab
● Scala - Quick Introduction - Variables and Methods
● Getting Started: Interactive, Compilation, SBT
● Types, Variables & Values
● Functions
● Collections
● Classes
● Parameters
● More Features
● Quiz and Assessment
3. Spark Basics
● Apache Spark ecosystem walkthrough
● Spark Introduction - Why Spark?
● Using the Spark Shell on CloudxLab
● Example 1 - Performing Word Count
● Understanding Spark Cluster Modes on YARN
● RDDs (Resilient Distributed Datasets)
● General RDD Operations: Transformations & Actions
● RDD lineage
● RDD Persistence Overview
● Distributed Persistence.
4. Writing and Deploying Spark Applications
● Creating the SparkContext
● Building a Spark Application (Scala, Java, Python)
● The Spark Application Web UI
● Configuring Spark Properties
● Running Spark on Cluster
● RDD Partitions
● Executing Parallel Operations
● Stages and Tasks
5. Common Patterns in Spark Data Processing
● Common Spark Use Cases
● Example 1 - Data Cleaning (Movielens)
● Example 2 - Understanding Spark Streaming
● Understanding Kafka
● Example 3 - Spark Streaming from Kafka
● Iterative Algorithms in Spark
● Project: Real-time analytics of orders in an e-commerce company
6. Data Formats and Management
● InputFormat and InputSplit
● JSON
● XML
● AVRO
● How to store many small files - SequenceFile?
● Parquet
● Protocol Buffers
● Comparing Compressions
● Understanding Row Oriented and Column Oriented Formats - RCFile?
7. DataFrames and Spark SQL
● Spark SQL - Introduction
● Spark SQL - Dataframe Introduction
● Transforming and Querying DataFrames
● Saving DataFrames
● DataFrames and RDDs
● Comparing Spark SQL, Impala, and Hive-on-Spark
8. Machine Learning with Spark
● Machine Learning Introduction
● Applications Of Machine Learning
● MlLib Example: k-means
● SparkR Example
9. Live Session Recordings
1. Introduction to Statistic
Statistical Inference, Types of Variables, Probability Distribution,
Normality, Measures of Central Tendencies, Normal Distribution
2. Machine Learning Applications & Landscape
Introduction to Machine Learning, Machine Learning Application,
Introduction to AI, Different types of Machine Learning - Supervised,
Unsupervised, Reinforcement
3. Building end-to-end Machine Learning Project
Machine Learning Projects Checklist, Frame the problem and look at the
big picture, Get the data, Explore the data to gain insights, Prepare the
data for Machine Learning algorithms, Explore many different models
and short-list the best ones, Fine-tune model, Present the solution,
Launch, monitor, and maintain the system
4. Classifications
Training a Binary classification, Performance Measures, Confusion Matrix,
Precision and Recall, Precision/Recall Tradeoff, The ROC Curve, Multiclass
Classification, Multilabel Classification, Multioutput Classification
5. Training Models
Linear Regression, Gradient Descent, Polynomial Regression, Learning
Curves, Regularized Linear Models, Logistic Regression
6. Support Vector Machines
Linear SVM Classification, Nonlinear SVM Classification, SVM Regression
7. Decision Trees
Training and Visualizing a Decision Tree, Making Predictions, Estimating
Class Probabilities, The CART Training Algorithm, Gini Impurity or
Entropy, Regularization Hyperparameters, Regression, Instability
8. Ensemble Learning and Random Forests
Voting Classifiers, Bagging and Pasting, Random Patches and Random
Subspaces, Random Forests, Boosting, Stacking
9. Dimensionality Reduction
The Curse of Dimensionality, Main Approaches for Dimensionality
Reduction, PCA, Kernel PCA, LLE, Other Dimensionality Reduction
Techniques
10. Quizzes, gamified assessments & projects
1. Introduction to Deep Learning
Deep Learning Applications, Artificial Neural Network, TensorFlow Demo,
Deep Learning Frameworks
2. Up and Running with TensorFlow
Installation, Creating Your First Graph and Running It in a Session,
Managing Graphs, Lifecycle of a Node Value, Linear Regression with
TensorFlow, Implementing Gradient Descent, Feeding Data to the
Training Algorithm, Saving and Restoring Models, Visualizing the Graph
and Training Curves Using TensorBoard, Name Scopes, Modularity,
Sharing Variables
3. Introduction to Artificial Neural Networks
From Biological to Artificial Neurons, Training an MLP with TensorFlow’s
High-Level API, Training a DNN Using Plain TensorFlow, Fine-Tuning
Neural Network Hyperparameters
4. Training Deep Neural Nets
Vanishing / Exploding Gradients Problems, Reusing Pretrained Layers,
Faster Optimizers, Avoiding Overfitting Through Regularization, Practical
Guidelines
5. Convolutional Neural Networks
The Architecture of the Visual Cortex, Convolutional Layer, Pooling Layer,
CNN Architectures
6. Recurrent Neural Networks
Recurrent Neurons, Basic RNNs in TensorFlow, Training RNNs, Deep
RNNs, LSTM Cell, GRU Cell, Natural Language Processing
7. Autoencoders
Efficient Data Representations, Performing PCA with an Under Complete
Linear Autoencoder, Stacked Autoencoders, Unsupervised Pre Training
Using Stacked Autoencoders, Denoising Autoencoders, Sparse
Autoencoders, Variational Autoencoders
8. Reinforcement Learning
Learning to Optimize Rewards, Policy Search, Introduction to OpenAI
Gym, Neural Network Policies, Evaluating Actions: The Credit Assignment
Problem, Policy Gradients, Markov Decision Processes, Temporal
Difference Learning and Q-Learning, Learning to Play Ms. Pac-Man Using
Deep Q-Learning
9. Quizzes, gamified assessments & projects
1. Analyze Emails
Churn the mail activity from various individuals in an open source
project development team.
2. Predict the median housing prices in California
We start Machine Learning course with this end-to-end project. Learn
various data manipulation, visualization and cleaning techniques using
various libraries of Python like Pandas, Scikit-Learn and Matplotlib.
3. Classify handwritten digits in MNIST dataset
The MNIST dataset is considered as "Hello World!" of Machine Learning.
Write your first classification logic. Starting with Binary Classification
learn Multiclass, Multilabel, Multi-output classification and different error
analysis techniques.
4. Noise removal from the images
Build a model that takes a noisy image as an input and outputs the
clean image.
5. Predict the class of flower in IRIS dataset
IRIS dataset contains 3 classes of 50 instances each, where each class
refers to a type of iris plant. The three classes in this dataset are Setosa,
Versicolor, and Verginica. Learn Decision Trees, CART algorithm and
Ensemble method. Then use Random Forest classifier to make
predictions.
6. Predict which passengers survived in the Titanic shipwreck
The sinking of the RMS Titanic is one of the most infamous shipwrecks in
history. In this project, you build a model to predict which passengers
survived the tragedy.
7. Predict bikes rental demand
Build a model to predict the bikes demand given the past data.
8. Build a spam classifier
Build a model to classify email as spam or ham. First, download
examples of spam and ham from Apache SpamAssassin’s public
datasets and then train a model to classify email.
9. Build cats classifier using neural network
In this project, you will build a basic neural network to classify if a given
image is of cat or not.
10. Classify large images using Inception v3
Download images of various animals and then download the latest
pretrained Inception v3 model. Run the model to classify downloaded
images and display the top five predictions for each image, along with
the estimated probability.
11. Classify clothes using TensorFlow
Build a model to classify clothes into various categories in Fashion
MNIST dataset.
12. Predict the hourly rain gauge total
This is a time series prediction task: you are given snapshots of
polarimetric radar values and asked to predict the hourly rain gauge
total.
13. Sentiment analysis
Sentiment analysis of "Iron Man 3" movie using Hive and visualizing the
sentiment data using BI tools such as Tableau
14. Process the NSE
Process the NSE (National Stock Exchange) data using Hive for various
insights
15. MovieLens Project
Analyze MovieLens data using Hive
16. Spark MLlib
Generate movie recommendations using Spark MLlib
17. Spark GraphX
Derive the importance of various handles at Twitter using Spark GraphX
18. Churn the logs
Churn the logs of NASA Kennedy Space Center WWW server using
Spark to find out useful business and devops metrics
19. Spark application
Write end-to-end Spark application starting from writing code on your
local machine to deploying to the cluster
20. Analytics Dashboard
Real-time analytics dashboard for an e-commerce company using
Apache Spark, Kafka, Spark Streaming, Node.js, Socket.IO and
Highcharts
Please find more information about the course and fees here:
https://cloudxlab.com/course/73/data-science-specialization-eict-iitr
Online Self-Paced Learning
Contact us at or + or contact:
Aswath Madhu Prakhar Katiyar
Program Director Chief Admissions Counsellor
For corporate training and bulk enrollments, write to us at
Headquarters - United States R&D Center - India
2035, Sunset Lake Road Suite B-2, 19702 Issimo Technology Private Limited
Newark, New Castle #215, Arcade, Brigade Metropolis,
Delaware, United States Mahadevpura, Bangalore, India - 560 048