0% found this document useful (0 votes)
466 views22 pages

IIT Kharagpur Data Science PDF

The document provides information about a certificate program in data science offered by CloudxLab and E&ICT Academy, IIT Roorkee. The program aims to provide hands-on training in data science and related technologies like machine learning, deep learning, and big data. It will be taught by industry experts and researchers. Upon completion, students will earn a certificate and have access to cloud labs and industry mentors to work on real-world projects. The program curriculum covers topics ranging from Linux, Python, and data analysis to Spark, Hadoop, machine learning algorithms and building end-to-end machine learning projects.

Uploaded by

Rintu Dey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
466 views22 pages

IIT Kharagpur Data Science PDF

The document provides information about a certificate program in data science offered by CloudxLab and E&ICT Academy, IIT Roorkee. The program aims to provide hands-on training in data science and related technologies like machine learning, deep learning, and big data. It will be taught by industry experts and researchers. Upon completion, students will earn a certificate and have access to cloud labs and industry mentors to work on real-world projects. The program curriculum covers topics ranging from Linux, Python, and data analysis to Spark, Hadoop, machine learning algorithms and building end-to-end machine learning projects.

Uploaded by

Rintu Dey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

CERTIFICATE PROGRAM

At Cloudxlab, we are building one of the best gamified learning


environments to make technology learning fun and for life. More than
50,000 users across the world have been benefited by our signature
courses on Machine Learning and Big Data. Our vision is to upskill people
on high-end technologies like Deep Learning, Machine Learning, Big
Data and make them employable.

Every domain of computing such as data analysis, software engineering,


and artificial intelligence is going to be impacted by Data Science.
Therefore,every engineer, researcher, manager or scientist would be
expected to know Data Science.

So naturally, you are excited about Data Science and would love to dive
into it. This specialization is designed for those who want to gain
hands-on experience in solving real-life problems using big data,
machine learning and deep learning. After finishing this specialization,
you will find creative ways to apply your learning to your work like
building a robot which can recognize faces or change the path after
discovering obstacles on the path.

Sandeep Giri
Founder at CloudxLab
E&ICT Academy, IIT Roorkee provides training programs with an
emphasis on hands-on learning in basic/advanced topics and emerging
technologies. Then project is sponsored by the Ministry of Electronics and
Information Technology, Govt. of India. E&ICT Academy courses are at par
with Quality Improvement Program (QIP) for recognition/credits.

The programs are conducted by well-known industry partners,


researches and experts from leading academic and renowned R&D
organizations. For this, the Academy has signed MoUs with Industry/ R&D
partners in different domains, who collaborate and work with them in
conducting the training programs. Academy also facilitates the
interaction between beneficiaries and industry experts to enable
collaboration and finding opportunities for parent institutions.

Sanjeev Manhas
Associate Professor IIT Roorkee
Earn a certificate Learn Data Science from
from E&ICT Academy, industry experts and
IIT Roorkee. become expert in Data
Science domain

Online cloud lab for Best-in-class support


hands-on for Throughout your
real-world experience learning journey

Work on real-world
Lifetime course
projects.
access

Interact with the


international
community of peers
via the discussion
forum.
Sandeep Giri Course Developer
Founder at CloudxLab Know More
Past: Amazon, InMobi, D.E.Shaw

Sanjeev Manhas Course Advisor


Associate Professor, Know More
IIT Roorkee

R. Balasubramanian Course Developer


Professor, Know More
IIT Roorkee

Partha Pratim Roy Course Developer


Assistant Professor, Know More
IIT Roorkee

Abhinav Singh Course Developer


Co-Founder at CloudxLab Know More
Past: Byjus
1. Introduction to Linux

2. Introduction to Python

3. Hands-on using Jupyter on CloudxLab

4. Overview of Linear Algebra

5. Introduction to NumPy & Pandas

6. Quizzes, gamified assessments & projects

1. Introduction

● Big Data Introduction


● Distributed systems
● Big Data Use Cases
● Various Solutions
● Overview of Hadoop Ecosystem
● Spark Ecosystem Walkthrough
● Quiz
4. HDFS

● Why HDFS or Why not existing file systems?


● HDFS - NameNode & DataNodes
● Quiz
● Advance HDFS Concepts (HA, Federation)
● Quiz
● Hands-on with HDFS (Upload, Download, SetRep)
● Quiz & Assessment
● Data Locality (Rack Awareness)

5. YARN

● YARN - Why not existing tools?


● YARN - Evolution from MapReduce 1.0
● Resource Management: YARN Architecture
● Advance Concepts - Speculative Execution
● Quiz

6. MapReduce Basics

● MapReduce - Understanding Sorting


● MapReduce - Overview & Quiz
● Example 0 - Word Frequency Problem - Without MR
● Example 1 - Only Mapper - Image Resizing
● Example 2 - Word Frequency Problem
● Example 3 - Temperature Problem
● Example 4 - Multiple Reducer
● Example 5 - Java MapReduce Walkthrough & Quiz
2. Foundation & Environment

● Understanding the Cloudxlab


● Cloudxlab Hands-on
● Hadoop & Spark Hands-on
● Quiz and Assessment
● Basics of Linux - Quick Hands-on
● Understanding Regular Expressions
● Quiz and Assessment
● Setting up VM (optional)

3. Zookeeper

● ZooKeeper - Race Condition


● ZooKeeper - Deadlock
● Hands-On
● Quiz & Assessment
● How does election happen - Paxos Algorithm?
● Use cases
● When not to use
● Quiz & Assessment
7. Map Reduce Advanced

● Writing MapReduce Code Using Java


● Building MapReduce project using Apache Ant
● Concept - Associative & Commutative
● Quiz
● Example 8 - Combiner
● Example 9 - Hadoop Streaming
● Example 10 - Adv. Problem Solving - Anagrams
● Example 11 - Adv. Problem Solving - Same DNA
● Example 12 - Adv. Problem Solving - Similar DNA
● Example 12 - Joins - Voting
● Limitations of MapReduce
● Quiz

8. Analyzing Data with Pig

● Pig - Introduction
● Pig - Modes
● Getting Started
● Example - NYSE Stock Exchange
● Concept - Lazy Evaluation
9. Processing Data with Hive

● Hive - Introduction
● Hive - Data Types
● Getting Started
● Loading Data in Hive (Tables)
● Example: Movielens Data Processing
● Advance Concepts: Views
● Connecting Tableau and HiveServer 2
● Connecting Microsoft Excel and HiveServer 2
● Project: Sentiment Analysis of Twitter Data
● Advanced - Partition Tables
● Understanding HCatalog & Impala
● Quiz

10. NoSQL and HBase

● NoSQL - Scaling Out / Up


● NoSQL - ACID Properties and RDBMS Story
● CAP Theorem
● HBase Architecture - Region Servers etc
● Hbase Data Model - Column Family Orientedness
● Getting Started - Create table, Adding Data
● Adv Example - Google Links Storage
● Concept - Bloom Filter
● Comparison of NOSQL Databases
● Quiz
11. Importing Data with Sqoop and Flume, Oozie

● Sqoop - Introduction
● Sqoop Import - MySQL to HDFS
● Exporting to MySQL from HDFS
● Concept - Unbounding Dataset Processing or Stream Processing
● Flume Overview: Agents - Source, Sink, Channel
● Example 1 - Data from Local network service into HDFS
● Example 2 - Extracting Twitter Data
● Quiz
● Example 3 - Creating workflow with Oozie

12. Live Session Recordings


1. Introduction

● Apache Spark ecosystem walkthrough


● Spark Introduction - Why Spark?
● Quiz

2. Scala Basics

● Scala - Quick Introduction - Access Scala on CloudxLab


● Scala - Quick Introduction - Variables and Methods
● Getting Started: Interactive, Compilation, SBT
● Types, Variables & Values
● Functions
● Collections
● Classes
● Parameters
● More Features
● Quiz and Assessment

3. Spark Basics

● Apache Spark ecosystem walkthrough


● Spark Introduction - Why Spark?
● Using the Spark Shell on CloudxLab
● Example 1 - Performing Word Count
● Understanding Spark Cluster Modes on YARN
● RDDs (Resilient Distributed Datasets)
● General RDD Operations: Transformations & Actions
● RDD lineage
● RDD Persistence Overview
● Distributed Persistence.

4. Writing and Deploying Spark Applications

● Creating the SparkContext


● Building a Spark Application (Scala, Java, Python)
● The Spark Application Web UI
● Configuring Spark Properties
● Running Spark on Cluster
● RDD Partitions
● Executing Parallel Operations
● Stages and Tasks

5. Common Patterns in Spark Data Processing

● Common Spark Use Cases


● Example 1 - Data Cleaning (Movielens)
● Example 2 - Understanding Spark Streaming
● Understanding Kafka
● Example 3 - Spark Streaming from Kafka
● Iterative Algorithms in Spark
● Project: Real-time analytics of orders in an e-commerce company
6. Data Formats and Management

● InputFormat and InputSplit


● JSON
● XML
● AVRO
● How to store many small files - SequenceFile?
● Parquet
● Protocol Buffers
● Comparing Compressions
● Understanding Row Oriented and Column Oriented Formats - RCFile?

7. DataFrames and Spark SQL

● Spark SQL - Introduction


● Spark SQL - Dataframe Introduction
● Transforming and Querying DataFrames
● Saving DataFrames
● DataFrames and RDDs
● Comparing Spark SQL, Impala, and Hive-on-Spark

8. Machine Learning with Spark

● Machine Learning Introduction


● Applications Of Machine Learning
● MlLib Example: k-means
● SparkR Example

9. Live Session Recordings


1. Introduction to Statistic
Statistical Inference, Types of Variables, Probability Distribution,
Normality, Measures of Central Tendencies, Normal Distribution

2. Machine Learning Applications & Landscape


Introduction to Machine Learning, Machine Learning Application,
Introduction to AI, Different types of Machine Learning - Supervised,
Unsupervised, Reinforcement

3. Building end-to-end Machine Learning Project


Machine Learning Projects Checklist, Frame the problem and look at the
big picture, Get the data, Explore the data to gain insights, Prepare the
data for Machine Learning algorithms, Explore many different models
and short-list the best ones, Fine-tune model, Present the solution,
Launch, monitor, and maintain the system

4. Classifications
Training a Binary classification, Performance Measures, Confusion Matrix,
Precision and Recall, Precision/Recall Tradeoff, The ROC Curve, Multiclass
Classification, Multilabel Classification, Multioutput Classification

5. Training Models
Linear Regression, Gradient Descent, Polynomial Regression, Learning
Curves, Regularized Linear Models, Logistic Regression

6. Support Vector Machines


Linear SVM Classification, Nonlinear SVM Classification, SVM Regression
7. Decision Trees
Training and Visualizing a Decision Tree, Making Predictions, Estimating
Class Probabilities, The CART Training Algorithm, Gini Impurity or
Entropy, Regularization Hyperparameters, Regression, Instability

8. Ensemble Learning and Random Forests


Voting Classifiers, Bagging and Pasting, Random Patches and Random
Subspaces, Random Forests, Boosting, Stacking

9. Dimensionality Reduction
The Curse of Dimensionality, Main Approaches for Dimensionality
Reduction, PCA, Kernel PCA, LLE, Other Dimensionality Reduction
Techniques

10. Quizzes, gamified assessments & projects


1. Introduction to Deep Learning
Deep Learning Applications, Artificial Neural Network, TensorFlow Demo,
Deep Learning Frameworks

2. Up and Running with TensorFlow


Installation, Creating Your First Graph and Running It in a Session,
Managing Graphs, Lifecycle of a Node Value, Linear Regression with
TensorFlow, Implementing Gradient Descent, Feeding Data to the
Training Algorithm, Saving and Restoring Models, Visualizing the Graph
and Training Curves Using TensorBoard, Name Scopes, Modularity,
Sharing Variables

3. Introduction to Artificial Neural Networks


From Biological to Artificial Neurons, Training an MLP with TensorFlow’s
High-Level API, Training a DNN Using Plain TensorFlow, Fine-Tuning
Neural Network Hyperparameters

4. Training Deep Neural Nets


Vanishing / Exploding Gradients Problems, Reusing Pretrained Layers,
Faster Optimizers, Avoiding Overfitting Through Regularization, Practical
Guidelines

5. Convolutional Neural Networks


The Architecture of the Visual Cortex, Convolutional Layer, Pooling Layer,
CNN Architectures
6. Recurrent Neural Networks
Recurrent Neurons, Basic RNNs in TensorFlow, Training RNNs, Deep
RNNs, LSTM Cell, GRU Cell, Natural Language Processing

7. Autoencoders
Efficient Data Representations, Performing PCA with an Under Complete
Linear Autoencoder, Stacked Autoencoders, Unsupervised Pre Training
Using Stacked Autoencoders, Denoising Autoencoders, Sparse
Autoencoders, Variational Autoencoders

8. Reinforcement Learning
Learning to Optimize Rewards, Policy Search, Introduction to OpenAI
Gym, Neural Network Policies, Evaluating Actions: The Credit Assignment
Problem, Policy Gradients, Markov Decision Processes, Temporal
Difference Learning and Q-Learning, Learning to Play Ms. Pac-Man Using
Deep Q-Learning

9. Quizzes, gamified assessments & projects


1. Analyze Emails
Churn the mail activity from various individuals in an open source
project development team.

2. Predict the median housing prices in California


We start Machine Learning course with this end-to-end project. Learn
various data manipulation, visualization and cleaning techniques using
various libraries of Python like Pandas, Scikit-Learn and Matplotlib.

3. Classify handwritten digits in MNIST dataset


The MNIST dataset is considered as "Hello World!" of Machine Learning.
Write your first classification logic. Starting with Binary Classification
learn Multiclass, Multilabel, Multi-output classification and different error
analysis techniques.

4. Noise removal from the images


Build a model that takes a noisy image as an input and outputs the
clean image.

5. Predict the class of flower in IRIS dataset


IRIS dataset contains 3 classes of 50 instances each, where each class
refers to a type of iris plant. The three classes in this dataset are Setosa,
Versicolor, and Verginica. Learn Decision Trees, CART algorithm and
Ensemble method. Then use Random Forest classifier to make
predictions.

6. Predict which passengers survived in the Titanic shipwreck


The sinking of the RMS Titanic is one of the most infamous shipwrecks in
history. In this project, you build a model to predict which passengers
survived the tragedy.
7. Predict bikes rental demand
Build a model to predict the bikes demand given the past data.

8. Build a spam classifier


Build a model to classify email as spam or ham. First, download
examples of spam and ham from Apache SpamAssassin’s public
datasets and then train a model to classify email.

9. Build cats classifier using neural network


In this project, you will build a basic neural network to classify if a given
image is of cat or not.

10. Classify large images using Inception v3


Download images of various animals and then download the latest
pretrained Inception v3 model. Run the model to classify downloaded
images and display the top five predictions for each image, along with
the estimated probability.

11. Classify clothes using TensorFlow


Build a model to classify clothes into various categories in Fashion
MNIST dataset.

12. Predict the hourly rain gauge total


This is a time series prediction task: you are given snapshots of
polarimetric radar values and asked to predict the hourly rain gauge
total.
13. Sentiment analysis
Sentiment analysis of "Iron Man 3" movie using Hive and visualizing the
sentiment data using BI tools such as Tableau

14. Process the NSE


Process the NSE (National Stock Exchange) data using Hive for various
insights

15. MovieLens Project


Analyze MovieLens data using Hive

16. Spark MLlib


Generate movie recommendations using Spark MLlib

17. Spark GraphX


Derive the importance of various handles at Twitter using Spark GraphX

18. Churn the logs


Churn the logs of NASA Kennedy Space Center WWW server using
Spark to find out useful business and devops metrics

19. Spark application


Write end-to-end Spark application starting from writing code on your
local machine to deploying to the cluster

20. Analytics Dashboard


Real-time analytics dashboard for an e-commerce company using
Apache Spark, Kafka, Spark Streaming, Node.js, Socket.IO and
Highcharts
Please find more information about the course and fees here:
https://cloudxlab.com/course/73/data-science-specialization-eict-iitr

Online Self-Paced Learning

Contact us at or + or contact:

Aswath Madhu Prakhar Katiyar


Program Director Chief Admissions Counsellor

For corporate training and bulk enrollments, write to us at

Headquarters - United States R&D Center - India

2035, Sunset Lake Road Suite B-2, 19702 Issimo Technology Private Limited
Newark, New Castle #215, Arcade, Brigade Metropolis,
Delaware, United States Mahadevpura, Bangalore, India - 560 048

You might also like