0% found this document useful (0 votes)
19 views

Data Science C

Data Science

Uploaded by

hr.scratchnest
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Data Science C

Data Science

Uploaded by

hr.scratchnest
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Beginning of ML Libraries and

the course Basics:- Git,


Linux, Mathematical
Python, SQL Concepts

Reinforcement Deep Machine


Learning Learning Learning

Large Language
Models Big Data Real-World
Engineering Projects

Get Certified Business Case


Study and
Interview
preparation
Course
Curriculum
1: Foundntions
● Linux for Data Science/ Machine Learning
● Getting Started with Git
● Python Foundations
● Machine Learning Prerequisites(Including Numpy, Pandas and Linear
Algebra)
● Getting Started with SQL
● Statistics Foundations

Course on Maching Learning

1: Maching Learning 6pplicntions & Landscnpg

● Introduction to Machine Learning


● Machine Learning Application
● Introduction to AI
● Different types of Machine Learning - Supervised, Unsupervised

2: Building end-to-end Manchine Learning Project

● Machine Learning Projects Checklist


● Get the data
● Explore the data to gain insights
● Prepare the data for Machine Learning algorithms
● Explore many different models and short-list the best ones
● Fine-tune model
● Launch, monitor, and maintain the system
Course
Curriculum
3.Training Models

● Linear Regression
● Gradient Descent
● Model Evaluation and Metrics
● Polynomial Regression
● Overfitting and Underfitting
● Regularized Linear Models
● Logistic Regression

“. Classification

● Training a Binary classification


● Multiclass,Multilabel and Multioutput Classification
● Performance Metrics
● Confusion Matrix
● Precision and Recall
● Precision/Recall Tradeoff
● The ROC Curve

5. Support Vector Machines

● Introduction to Support Vector Machines


● SVM for Classification
● SVM for Regression
● HyperParameter Tuning

6: Decision Trees
● Training and Visualizing a Decision Tree
● Making Predictions
● The CART Training Algorithm
● Hyperparameter Tuning
● Handling Overfitting
Course
Curriculum
7: Ensemble Learning

● Introduction to Ensemble Learning


● Demonstrating Why Multiple Models Attain Superior Accuracy
● Types of Ensemble Learning methods

8. Dimensionality Reduction

● The Curse of Dimensionality


● Main Approaches for Dimensionality Reduction
● Principal Component Analysis (PCA)

Course on Deep Learning and Reinforcement Learning

1: Introduction to Artificial Neural Network

● From Biological to Artificial Neurons


● Backpropogation from Scratch
● Activation Functions
● Implementing MLPs using Keras with TensorFlow Backend
● Fine-Tuning Neural Network Hyperparameters

2: Convolutional Neural Networks and Computer Vision

● The Architecture of the Visual Cortex


● Convolutional Layer
● Pooling Layer
● CNN Architectures
● Classification with Keras
● Transfer Learning with Keras
● Object Detection
● YOLO
Course
Curriculum
3: Stable Diffusion

● Introduction to Stable Diffusion


● Stable Diffusion Components
● Diffusion Model
● Stable Diffusion Architecture and Training

“. Recurrent Neural Network

● Recurrent Neurons and Layers


● Basic RNNs in TensorFlow
● Training RNNs
● Deep RNNs
● Forecasting a Time Series
● LSTM Cell

5. Natural Language Processing

● Introduction to Natural Language Processing


● Word Embeddings
● Creating a Quiz Using TextBlob
● Finding Related Posts with scikit-learn
● Sentiment Analysis
● Encoder-Decoder Network for Neural Machine Translation

6: Training Deep Neural Networks

● The Vanishing / Exploding Gradients Problems


● Reusing Pretrained Layers
● Faster Optimizers
● Avoiding Overfitting Through Regularization
● Practical Guidelines to Train Deep Neural Networks
Course
Curriculum
7: Custom Models and Training with TensorFlow

● A Quick Tour of TensorFlow


● Customizing Models and Training Algorithms

®: Autoencoders and GANs

● Efficient Data Representations


● Performing PCA with an Under Complete Linear Autoencoder
● Stacked Autoencoders
● Unsupervised Pre Training Using Stacked Autoencoders
● Denoising Autoencoders
● Sparse Autoencoders
● Variational Autoencoders
● Generative Adversarial Networks

9: Reinforcement Learning

● Learning to Optimize Rewards


● Policy Search
● Introduction to OpenAI Gym
● Neural Network Policies
● Evaluating Actions: The Credit Assignment Problem
● Policy Gradients
● Markov Decision Processes
● Temporal Difference Learning and Q-Learning
● Deep Q-Learning Variants
● The TF-Agents Library
Course
Curriculum
Course on Large Language Models
1: Transformers
● Attention Mechanism
● Transformer Architecture and components
● Transfer Learning
● Transformer Variants

2: OpenAI’s ChatGPT
● Introduction to ChatGPT
● Architecture of GPT
● ChatGPT Architecture and Training

3: Vector Databases
● Introduction to Vector Databases
● Architecture of Vector Databases
● Indexing Techniques
● Distance Metrics and Similarity Measures
● Nearest Neighbor Search
● Open Source Vector Databases:- Chroma and Milvus

“: Langchain
● Introduction to Langchain
● The building blocks of LangChain:- Prompt, Chains, Retrievers,
Parsers, Memory and Agents

2: Creating LLM powered apps with Langchain

● Demonstration:- Building personalized chatbot using Langchain


Course
Curriculum
Large-Scale System Design (Data Engineering)
1: Introduction to Hadoop
● Introduction
● Distributed systems
● Big Data Use Cases
● Various Solutions
● Overview of Hadoop Ecosystem
● Spark Ecosystem Walkthrough

2: Foundation & Environment


● Understanding the CloudxLab Environment
● Getting Started - Hands on
● Hadoop & Spark Hands-on
● Understanding Regular Expressions
● Setting up VM

3: : Zookeeper
● ZooKeeper - Race Condition
● ZooKeeper - Deadlock
● How does election happen - Paxos Algorithm?
● Use cases
● When not to use

“: HDFS
● Why HDFS?
● NameNode & DataNodes
● Advance HDFS Concepts (HA, Federation)
● Hands-on with HDFS (Upload, Download, SetRep)
● Data Locality (Rack Awareness)
Course
Curriculum
5: YARN
● Why YARN?
● Evolution from MapReduce 1.0
● Resource Management: YARN Architecture
● Advance Concepts - Speculative Execution

6: MapReduce Basics
● Understanding Sorting
● MapReduce - Overview
● Word Frequency Problem - Without MR
● Only Mapper - Image Resizing
● Temperature Problem
● Multiple Reducer
● Java MapReduce

7: MapReduce Advanced
● Writing MapReduce Code Using Java
● Apache Ant
● Concept - Associative & Commutative
● Combiner
● Hadoop Streaming
● Adv. Problem Solving - Anagrams
● Adv. Problem Solving - Same DNA
● Adv. Problem Solving - Similar DNA
● Joins - Voting
● Limitations of MapReduce
Course
Curriculum
®: Analyzing Data with Pig
● Pig - Introduction
● Pig - Modes
● Example - NYSE Stock Exchange
● Concept - Lazy Evaluation

9: Processing Data with Hive


● Hive - Introduction
● Hive - Data Types
● Loading Data in Hive (Tables)
● Movielens Data Processing
● Connecting Tableau and HiveServer 2
● Connecting Microsoft Excel and HiveServer 2
● Project: Sentiment Analyses of Twitter Data
● Advanced - Partition Tables
● Understanding HCatalog & Impal

10: NoSQL and HBase

● NoSQL - Scaling Out / Up


● ACID Properties and RDBMS Story
● CAP Theorem
● HBase Architecture - Region Servers etc
● Hbase Data Model - Column Family Orientedness
● Getting Started - Create table, Adding Data
● Adv Example - Google Links Storage
● Concept - Bloom Filter
● Comparison of NOSQL Databases
Course
Curriculum
11: Importing Data with Sqoop and Flume, Oozie
● Sqoop - Introduction
● Sqoop Import - MySQL to HDFS
● Exporting to MySQL from HDFS
● Concept - Unbounding Dataset Processing or Stream Processing
● Flume Overview: Agents - Source, Sink, Channel
● Data from Local network service into HDFS
● Example - Extracting Twitter Data
● Example - Creating workflow with Oozier

12: Introduction to Spark


● Apache Spark ecosystem walkthrough
● Spark Introduction - Why Spark?

13: Scala Basics


● Introduction, Access Scala on CloudxLab
● Variables and Methods
● Interactive, Compilation, SBT
● Types, Variables & Values
● Functions
● Collections
● Classes
● Parameters

1“: Spark Basics


● Apache Spark ecosystem
● Why Spark?
● Using the Spark Shell on CloudxLab
● Example 1 - Performing Word Count
● Understanding Spark Cluster Modes on YARN
Course
Curriculum
● RDDs (Resilient Distributed Datasets)
● General RDD Operations: Transformations & Actions
● RDD lineage
● RDD Persistence Overview
● Distributed Persistence

15: Writing and Deploying Spark Applications


● Creating the SparkContext
● Building a Spark Application (Scala, Java, Python)
● The Spark Application Web UI
● Configuring Spark Properties
● Running Spark on Cluster
● RDD Partitions
● Executing Parallel Operations
● Stages and Tasks

16: Common Patterns in Spark Data Processing


● Common Spark Use Cases
● Understanding Kafka
● Iterative Algorithms in Spark
● Project: Real-time analytics of orders in an e-commerce company

17: Data Formats & Management


● XML
● AVRO
● How to store many small files - SequenceFile?
● Parquet
● Protocol Buffers
● Comparing Compressions
● Understanding Row Oriented and Column Oriented Formats - RCFile?
Course
Curriculum
1®: DataFrames and Spark SQL
● Spark SQL - Introduction
● Spark SQL - Dataframe Introduction
● Transforming and Querying DataFrames
● Saving DataFrames
● DataFrames and RDDs
● Comparing Spark SQL, Impala, and Hive-on-Spark

19: Machine Learning with Spark


● Machine Learning Introduction
● Applications Of Machine Learning
● MlLib Example: k-means
● SparkR Example
Projects

1. Churn Email Inbox with Python Churn the mail activity from various individuals in an open
source project development team.

2. Predicting the median housing prices in California In this project we will build a machine
learning model to predict housing prices. We will learn various data manipulation, visualization
and cleaning techniques using various libraries of Python like Pandas, Scikit-Learn and
Matplotlib.

3. Noise removal from the images Build a model that takes a noisy image as an input and
outputs the clean image.

4. Build a spam classifier Build a model to classify email as spam or ham. First, download
examples of spam and ham from Apache Spam Assassin’s public datasets and then train a model
to classify email.

5. Predict which passengers survived in the Titanic shipwreck The sinking of the RMS Titanic is
one of the most infamous shipwrecks in history. In this project, you build a model to predict
which passengers survived the tragedy.

6. Predicting Noisy Images using KNN Classifier We will learn how to predict images from their
noisy version. We will use the MNIST dataset for this project. First, we will load the dataset,
explore it, and they we will learn how to introduce noise to an image. Next we will train a KNN
Classifier to predict the original image from it's noisy version.

7. Credit Card Fraud Detection using Machine Learning Learn how to over-sample the dataset
with imbalanced classes using the SMOTE technique and how to use the thus obtained data to
build a fraudulent transaction classifier.

8. Building Cat vs Non-Cat Image Classifier using NumPy Use Python and Numpy to build a
Logistic Regression Classifier from scratch, and apply it to predict the class of an input image -
whether it is a cat or a non-cat.

9. Iris Flowers Classification using Deep Learning and Keras Use Python and Tensorflow 2 Keras
to build a dense deep neural network classifier to predict the classes of flowers in the Iris
dataset.
Projects

10. Classify Clothes from Fashion MNIST Dataset Build a model to classify clothes into various
categories in Fashion MNIST dataset.

11. Building personalized chatbot using Langchain Creating end-to-end personalized chatbot
with the help of Langchain

12. Creating a Text summarization model using Langchain Creating end-to-end model which
when given text can summarize into given number of words.

13. Sentiment Analysis Sentiment analysis of "Iron Man 3" movie using Hive and visualizing the
sentiment data using BI tools such as Tableau.

14. Spark application Write end-to-end Spark application starting from writing code on your local
machine to deploying to the cluster.

15. Parse Apache Access Logs using Spark. The logs of a webserver are the gold mines for
gaining insights in the user behavior. So learn to parse the text data stored in logs of a web
server using the Apache Spark.

16. Analytics Dashboard Build real-time analytics dashboard for an e-commerce company using
Apache Spark, Kafka, Spark Streaming, Node.js, Socket.IO and Highcharts

17. MovieLens Project Analyze MovieLens data using Hive

18. Spark MLlib Generate movie recommendations using Spark MLlib

19. Predict bikes rental demand Build a model to predict the bikes demand given the past data.

20. Process the NYSE Process the NYSE (New York Stock Exchange) data using Hive for various
insights.
Issimo Technology
Private Limited 1665
27TH Main, 19th Cross
Rd, Sector 2,
HSR Layout, Bengaluru,
Karnataka 560102

You might also like