Lec 01
Lec 01
Lecture - 01
Arpit Rana
2nd January 2025
Course Logistics
Syllabus and Evaluation Scheme
Course Logistics
Lab
Thursday, 14:00 – 16:00
[LT-02]
Skip lectures; avoid private study; cram just before the exam;
How to Fail expect the exam to be a memory test; copy project assignments;
be inactive on the course stream
– No lab –
Week-2 Introduction to Neural Networks: Neurons, Activation,
(Group formation and domain –
[6 Jan 2025] Layers, Architecture, and Examples
finalization through Google form)
Student Groups
● The course project will be allocated to groups of three/ four members. Each group will
also be involved in 100-minute ML development challenges.
● 45% of your marks will be based on team efforts, so choose your members wisely.
● Teams will remain unchanged throughout the semester once registered. No requests for
changes will be entertained.
● Team registration will open the second week after classes start, and you must register
your team via a Google form within three days of the announcement.
● During lab hours, there will be three machine-learning development challenges. The
three most proficient teams shall be acknowledged with a bonus of up to 3% on their
respective scores.
● Every team member must understand the concepts, code, and claims they submit, as
any member may be asked questions about their project.
Course Policy
Course Project
● There is only one-course project, an End-to-End ML application.
● Student groups must select a thematic domain: Finance, E-commerce, Healthcare,
Pharma, Sports, Entertainment, Renewable Energy, Oil & Gas, Automobile, Agriculture,
FMCG, Security, Social Media, Supply Chain, or any other exciting and valuable domain.
● Each group will define the problem in their selected domain and collect dataset(s) from
reliable sources, including publicly available ones (no two groups can work on the same
dataset and not more than two groups can work on the same domain). You are
encouraged to gather additional data to enhance your dataset and better address the
problem.
● Each group must develop a multimodal machine-learning application and select a
dataset with all the necessary modalities. You must add a novel contribution to your
project and compare yours with the existing baselines.
● Three progress checks are scheduled to ensure incremental progress, not a last-week
effort.
● The project guideline document will provide information on domain allocation, general
instructions, evaluation criteria, and other protocols.
Course Policy
Submission
● One group member will submit the project report and the code on Google Classroom.
Submission instructions will be provided in the project guideline document.
● Evaluation will primarily be online, reviewing your code. Any group member may be
asked questions about anything in the assignment.
● Late submissions (up to 24 hours) will incur a 20% penalty.
● Plagiarism includes:
○ Copying any segment of code from any source.
○ Submitting code not written by you personally.
● Suspected plagiarism will result in a ZERO for the assignment.
Introduction
Definition and Tasks
What is Data (Knowledge) Mining?
Databases Extracting
Flat files Interesting Patterns Novel
Data Warehouses using Actionable
and so on. Intelligent Methods Useful
Data Post-
Data Data Mining Knowledge
Preprocessing processing
Textual data
Cleaning Evaluation w.r.t.
e.g. text, blogs
Reduction
- Interestingness
Multimedia data Transformation
- Completeness
e.g. image, video Discretization
- Optimality
Sequential data Selection etc.
e.g., gene sequence
Spatial data
e.g. maps,
and so on.
Descriptive Predictive
Find human-interpretable patterns Use some variables to predict future
that describe the data. or unknown values of other variables.
Descriptive Predictive
Find human-interpretable patterns Use some variables to predict future
that describe the data. or unknown values of other variables.
Machine Learning is
ML
– Aurelien Geron, Google
DL
Gen
-AI
Machine Learning: Example
A Spam Filter,
● a Machine Learning Program, given
○ examples of “spam” emails (e.g. flagged by
users), and
○ examples of “ham” (i.e. regular) emails
● can learn to flag spam
Machine Learning: A New Programming Paradigm
Traditional
Programming Machine
Learning
(Symbolic AI)
Answers Rules
● A long list of complex (hard coded) rules ● Automatically learns which words or
phrases are good predictors of spam
● Keep writing new rules as the new
phrases are introduced by spammers
Machine Learning: Definition Revisited
Machine Learning is the training of a model from data that generalises a decision against a
performance measure.
Model
Learning = Representation + Evaluation + Optimization
Representation
Choosing a representation of the learner: the hypotheses
space or the model class — the set of models that it can
possibly learn.
Evaluation
Choosing an evaluation function (also called objective
function, utility function, loss function, or scoring
function) is needed to distinguish good classifiers from
bad ones.
Optimization
��
Choosing a method to search among the models in the
hypothesis space for the highest-scoring one.
Learning = Representation + Evaluation + Optimization
✔
✔ ✔ ✔
✔ ✔ ✔
✔ ✔
✔ ✔ ✔
✔
✔
✔
Business Case Studies
Fakespot, GoKwik, and Intello Labs
Case Study - I: Fakespot
Problem Identified
● Nearly 93% consumers read reviews before any kind of purchasing decision.
● Out of these, around 91% of 18–34 year olds trust reviews as much as a
recommendation from a friend!
● Over 30% of reviews are found to be fake.
Target Audience
All e-commerce businesses that allow users to write
reviews.
Data-driven Solution
Fakespot reports provide an Adjusted Rating that
weighs reviews based on authenticity and then Courtesy: Fakespot
recalculates it.
Problem Identified
● In e-commerce, more than 30% of orders are returned to origin (RTO, i.e. shipped back
to the warehouse) in India.
Target Audience
All e-commerce businesses
Data-driven Solution
● Mostly, CoD orders are converted to
RTO.
● So, analyzing customer behavioural
patterns and disable CoD option for Courtesy: Gokwik
those showing high-risk RTO
behaviour.
Problem Identified
● One-third of the food produced in the world for human consumption every year gets
lost or wasted.
● Mainly (in some countries) at the early stages of the food value chain.
Target Audience
From growers to packers, from
exporters to food services
Data-driven Solution
Smart, scalable solutions to digitize
food quality, achieve fair pricing and
reduce food wastage. Using AI, ML, and Computer Vision technology