Data Science C
Data Science C
Large Language
Models Big Data Real-World
Engineering Projects
● Linear Regression
● Gradient Descent
● Model Evaluation and Metrics
● Polynomial Regression
● Overfitting and Underfitting
● Regularized Linear Models
● Logistic Regression
“. Classification
6: Decision Trees
● Training and Visualizing a Decision Tree
● Making Predictions
● The CART Training Algorithm
● Hyperparameter Tuning
● Handling Overfitting
Course
Curriculum
7: Ensemble Learning
8. Dimensionality Reduction
9: Reinforcement Learning
2: OpenAI’s ChatGPT
● Introduction to ChatGPT
● Architecture of GPT
● ChatGPT Architecture and Training
3: Vector Databases
● Introduction to Vector Databases
● Architecture of Vector Databases
● Indexing Techniques
● Distance Metrics and Similarity Measures
● Nearest Neighbor Search
● Open Source Vector Databases:- Chroma and Milvus
“: Langchain
● Introduction to Langchain
● The building blocks of LangChain:- Prompt, Chains, Retrievers,
Parsers, Memory and Agents
3: : Zookeeper
● ZooKeeper - Race Condition
● ZooKeeper - Deadlock
● How does election happen - Paxos Algorithm?
● Use cases
● When not to use
“: HDFS
● Why HDFS?
● NameNode & DataNodes
● Advance HDFS Concepts (HA, Federation)
● Hands-on with HDFS (Upload, Download, SetRep)
● Data Locality (Rack Awareness)
Course
Curriculum
5: YARN
● Why YARN?
● Evolution from MapReduce 1.0
● Resource Management: YARN Architecture
● Advance Concepts - Speculative Execution
6: MapReduce Basics
● Understanding Sorting
● MapReduce - Overview
● Word Frequency Problem - Without MR
● Only Mapper - Image Resizing
● Temperature Problem
● Multiple Reducer
● Java MapReduce
7: MapReduce Advanced
● Writing MapReduce Code Using Java
● Apache Ant
● Concept - Associative & Commutative
● Combiner
● Hadoop Streaming
● Adv. Problem Solving - Anagrams
● Adv. Problem Solving - Same DNA
● Adv. Problem Solving - Similar DNA
● Joins - Voting
● Limitations of MapReduce
Course
Curriculum
®: Analyzing Data with Pig
● Pig - Introduction
● Pig - Modes
● Example - NYSE Stock Exchange
● Concept - Lazy Evaluation
1. Churn Email Inbox with Python Churn the mail activity from various individuals in an open
source project development team.
2. Predicting the median housing prices in California In this project we will build a machine
learning model to predict housing prices. We will learn various data manipulation, visualization
and cleaning techniques using various libraries of Python like Pandas, Scikit-Learn and
Matplotlib.
3. Noise removal from the images Build a model that takes a noisy image as an input and
outputs the clean image.
4. Build a spam classifier Build a model to classify email as spam or ham. First, download
examples of spam and ham from Apache Spam Assassin’s public datasets and then train a model
to classify email.
5. Predict which passengers survived in the Titanic shipwreck The sinking of the RMS Titanic is
one of the most infamous shipwrecks in history. In this project, you build a model to predict
which passengers survived the tragedy.
6. Predicting Noisy Images using KNN Classifier We will learn how to predict images from their
noisy version. We will use the MNIST dataset for this project. First, we will load the dataset,
explore it, and they we will learn how to introduce noise to an image. Next we will train a KNN
Classifier to predict the original image from it's noisy version.
7. Credit Card Fraud Detection using Machine Learning Learn how to over-sample the dataset
with imbalanced classes using the SMOTE technique and how to use the thus obtained data to
build a fraudulent transaction classifier.
8. Building Cat vs Non-Cat Image Classifier using NumPy Use Python and Numpy to build a
Logistic Regression Classifier from scratch, and apply it to predict the class of an input image -
whether it is a cat or a non-cat.
9. Iris Flowers Classification using Deep Learning and Keras Use Python and Tensorflow 2 Keras
to build a dense deep neural network classifier to predict the classes of flowers in the Iris
dataset.
Projects
10. Classify Clothes from Fashion MNIST Dataset Build a model to classify clothes into various
categories in Fashion MNIST dataset.
11. Building personalized chatbot using Langchain Creating end-to-end personalized chatbot
with the help of Langchain
12. Creating a Text summarization model using Langchain Creating end-to-end model which
when given text can summarize into given number of words.
13. Sentiment Analysis Sentiment analysis of "Iron Man 3" movie using Hive and visualizing the
sentiment data using BI tools such as Tableau.
14. Spark application Write end-to-end Spark application starting from writing code on your local
machine to deploying to the cluster.
15. Parse Apache Access Logs using Spark. The logs of a webserver are the gold mines for
gaining insights in the user behavior. So learn to parse the text data stored in logs of a web
server using the Apache Spark.
16. Analytics Dashboard Build real-time analytics dashboard for an e-commerce company using
Apache Spark, Kafka, Spark Streaming, Node.js, Socket.IO and Highcharts
19. Predict bikes rental demand Build a model to predict the bikes demand given the past data.
20. Process the NYSE Process the NYSE (New York Stock Exchange) data using Hive for various
insights.
Issimo Technology
Private Limited 1665
27TH Main, 19th Cross
Rd, Sector 2,
HSR Layout, Bengaluru,
Karnataka 560102