0% found this document useful (0 votes)

5 views7 pages

Datascience

This document presents two case studies: one on predicting malicious URLs using machine learning and another on building a movie recommender system within a MySQL database. The first study achieves 97% accuracy in detecting malicious sites by utilizing sparse data representation and online learning, while the second study employs Locality-Sensitive Hashing and Hamming Distance to recommend movies based on rental history efficiently. Additionally, the document outlines various deep learning algorithms, including CNNs, RNNs, and GANs, highlighting their applications in different domains.

Uploaded by

akshiva005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views7 pages

Datascience

Uploaded by

akshiva005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

By

Rajeshwari S
2022IT35
III- Bsc IT
Case study 1: Predicting Malicious URLs

1. Summary :

The internet is widely used for various purposes, but some websites are malicious and pose
security threats. This case study focuses on predicting malicious URLs using machine learning
while handling large datasets efficiently.

2. Explanation :

1. Defining the Research Goal

 The goal is to determine whether a URL is safe or malicious using a large dataset while
handling memory constraints.

2. Acquiring the URL Data

 The dataset is downloaded in SVMLight format from a research project.

 Each record has 3.2 million features and is labeled 1 (safe) or -1 (malicious).

3. Handling Memory Constraints

 Problem: A single file is too large to fit in memory, causing an out-of-memory error.
 Solution:
 Use a sparse representation (only store non-zero values).
 Process compressed files instead of uncompressed data.
 Use an online learning algorithm that processes data in smaller chunks.

4. Data Exploration

 Checking the dataset confirms that most values are zeros (sparse data).
 Storing only non-zero values saves memory.

5. Model Building

 A Stochastic Gradient Descent (SGD) Classifier is used.

 Instead of loading the entire dataset, files are read one by one, and the model is updated
using partial fitting.
 Results:
 97% accuracy in detecting malicious sites.
 Only 3% false negatives and 6% false positives.

Conclusion:

By using sparse representation, compressed data, and online learning, the model efficiently
classifies URLs without exceeding memory limits.

3. Flowchart :
Case study 2 : Building a recommender system inside a
database

1.Summary :
This case study explains how to create a recommender system that suggests movies to
customers based on their rental history. The system uses a MySQL database and Python to
process large datasets efficiently. It applies Locality-Sensitive Hashing (LSH) and Hamming
Distance techniques to find customers with similar preferences and recommend movies they
haven't seen yet. The goal is to make the system memory-friendly and optimize data processing
inside the database itself.

2. Explanation :

1. Research Question

 The task is to recommend movies to customers based on their previous

rentals. The manager asks if it's possible to suggest movies by analyzing
rental history stored in a MySQL database.
2. Tools and Techniques Needed

 MySQL Database – Stores customer rental data.

 Python Libraries – MySQLdb, SQLAlchemy, and Pandas for connecting

and manipulating data.

 Hash Functions – Groups similar customers into buckets.

 Hamming Distance – Measures the similarity between customers' rental

patterns.
3. Data Preparation

 The dataset shows which movies each customer has rented (1 for rented,
0 for not rented).

 The data is stored in MySQL using Python's Pandas library.

 Binary rental data is compressed into bit strings for faster processing.
4. Hash Functions

 Three hash functions select movies in groups of three.

 Customers with the same movie combinations are placed in the same
bucket.

 This reduces the amount of data to compare directly.

5. Model Building

 The Hamming Distance is used to measure how similar two customers are
by counting the differences in their rental patterns.

 The system first selects customers from the same bucket and then
compares them using the distance function.
6. Recommendations

 The system recommends movies that similar customers have watched but
the target customer hasn't.

 This process is automatic and memory-friendly.

Conclusion:

This case study shows how to build a recommender system inside a relational database using
hashing techniques and distance measures. The system is fast, memory-efficient, and suitable
for large datasets.

3. Flow Chart:
Deep Learning Algorithms

1. Convolutional Neural Networks (CNN)

Used in image processing and computer vision tasks like image classification and object
detection. It extracts spatial features using convolution layers.

2. Recurrent Neural Networks (RNN)

Designed for sequential data processing, commonly used in speech recognition and language
modeling due to its ability to retain past information.

3. Long Short-Term Memory (LSTM)

A type of RNN that solves the vanishing gradient problem, making it effective for time-series
forecasting, chatbots, and text generation.

4. Gated Recurrent Unit (GRU)

A simplified version of LSTM with fewer parameters, used for text processing and sequential
data applications like speech recognition.

5. Transformer

A deep learning model that relies on attention mechanisms, widely used in NLP tasks like
machine translation (e.g., GPT, BERT).

6. Generative Adversarial Networks (GAN)

Consists of a generator and discriminator competing to create realistic data, applied in deepfake
generation and image synthesis.

7. Autoencoders

Used for data compression, anomaly detection, and noise reduction by encoding input data
into a lower-dimensional representation and reconstructing it.

8. Deep Belief Networks (DBN)

A stack of Restricted Boltzmann Machines used for feature learning, image recognition, and
dimensionality reduction.

9. Restricted Boltzmann Machines (RBM)

A two-layer neural network primarily used for collaborative filtering, dimensionality
reduction, and feature learning.

10. Self-Organizing Maps (SOM)

An unsupervised learning algorithm that maps high-dimensional data to a lower-dimensional

space, useful for clustering and pattern recognition.

11. Capsule Networks (CapsNet)

An alternative to CNNs that captures spatial hierarchies, improving performance in image

classification and object detection.

12. Deep Q-Networks (DQN)

A reinforcement learning algorithm that combines deep learning with Q-learning, used in game
playing and autonomous decision-making.

13. Variational Autoencoders (VAE)

A probabilistic model for generating new data similar to the training set, used in image
generation and data augmentation.

14. Spiking Neural Networks (SNN)

A biologically inspired neural network that processes information more efficiently, applied in
neuromorphic computing.

15. Attention Mechanism

Enhances deep learning models by focusing on important input parts, crucial for NLP, speech
recognition, and computer vision.

THANK YOU

AI ML Python Content
No ratings yet
AI ML Python Content
4 pages
AWS SageMaker Built-In Algorithms Cheat Sheet
No ratings yet
AWS SageMaker Built-In Algorithms Cheat Sheet
20 pages
Mike Kelley Minor Histories
100% (2)
Mike Kelley Minor Histories
459 pages
Pawan ML
No ratings yet
Pawan ML
11 pages
ML Mini Project Idea
No ratings yet
ML Mini Project Idea
13 pages
Final Report
No ratings yet
Final Report
74 pages
Syllabus Neural Networks and Deep Learning
No ratings yet
Syllabus Neural Networks and Deep Learning
30 pages
Project Report - Intro To AI
No ratings yet
Project Report - Intro To AI
40 pages
CL-I Lab Manual
No ratings yet
CL-I Lab Manual
131 pages
(Online Teaching) b1 Preliminary For Schools Speaking Part 3 Vocabulary
0% (1)
(Online Teaching) b1 Preliminary For Schools Speaking Part 3 Vocabulary
9 pages
Deep Image Search For Similar Image Using ML
No ratings yet
Deep Image Search For Similar Image Using ML
13 pages
Data Modeling Project
No ratings yet
Data Modeling Project
5 pages
LP V ProblemStatements
No ratings yet
LP V ProblemStatements
4 pages
Fake Review Detection Prj2
No ratings yet
Fake Review Detection Prj2
30 pages
Lavajiit Singh CV
No ratings yet
Lavajiit Singh CV
3 pages
Machine Learning Project in Python Step-By-Step
No ratings yet
Machine Learning Project in Python Step-By-Step
23 pages
DL Record
No ratings yet
DL Record
11 pages
25june Final - Merged
No ratings yet
25june Final - Merged
64 pages
Application Admin
No ratings yet
Application Admin
136 pages
1 AI - Introduction and ML
No ratings yet
1 AI - Introduction and ML
32 pages
IV Year Technical Seminar Presentation
No ratings yet
IV Year Technical Seminar Presentation
16 pages
Iv Year Technical Seminar Presentation
No ratings yet
Iv Year Technical Seminar Presentation
16 pages
Project Ideas
No ratings yet
Project Ideas
5 pages
AIML 2nd Year
No ratings yet
AIML 2nd Year
5 pages
Eti 2 - Compressed
No ratings yet
Eti 2 - Compressed
11 pages
Customer Segmentation 2
No ratings yet
Customer Segmentation 2
19 pages
Final.r1222eportt Facemask
No ratings yet
Final.r1222eportt Facemask
36 pages
Advanced Techniques in Machine Learning and Optimization
No ratings yet
Advanced Techniques in Machine Learning and Optimization
8 pages
Ai and ML qp1 Solved
No ratings yet
Ai and ML qp1 Solved
20 pages
(IJCST-V9I6P5) :amalesh A, Gowthamy J
No ratings yet
(IJCST-V9I6P5) :amalesh A, Gowthamy J
4 pages
Pa Lab MDM
No ratings yet
Pa Lab MDM
4 pages
Sample List of ML Projects BzSpGaxrkK
No ratings yet
Sample List of ML Projects BzSpGaxrkK
10 pages
Image Recognition Using Neural Network & Deep Learning
No ratings yet
Image Recognition Using Neural Network & Deep Learning
60 pages
Mohamed Nassar Resume
No ratings yet
Mohamed Nassar Resume
6 pages
AI ML Theory Fixed
No ratings yet
AI ML Theory Fixed
5 pages
Faculty Project Titles 2024
No ratings yet
Faculty Project Titles 2024
26 pages
ML Case Study
No ratings yet
ML Case Study
4 pages
DSF - Unit V Notes
No ratings yet
DSF - Unit V Notes
7 pages
Marxismo y Dialéctica, Lucio Colletti
No ratings yet
Marxismo y Dialéctica, Lucio Colletti
24 pages
Deep Learning Nanodegree Syllabus: Project: Find Donors For Charityml
No ratings yet
Deep Learning Nanodegree Syllabus: Project: Find Donors For Charityml
13 pages
Answers 111111111111111111111111111
No ratings yet
Answers 111111111111111111111111111
21 pages
List of Experiments - CL-I
No ratings yet
List of Experiments - CL-I
3 pages
Data Science Project List - Sheet1
No ratings yet
Data Science Project List - Sheet1
5 pages
Sodapdf
No ratings yet
Sodapdf
6 pages
Deep Learning Based Recommendation Systems
No ratings yet
Deep Learning Based Recommendation Systems
47 pages
Chapter 17: Renaissance and Reformation
No ratings yet
Chapter 17: Renaissance and Reformation
18 pages
D Caltech PG AI & ML Project
No ratings yet
D Caltech PG AI & ML Project
4 pages
Bhargav Resume
No ratings yet
Bhargav Resume
2 pages
NewSyllabus 1157202352913185
No ratings yet
NewSyllabus 1157202352913185
7 pages
Sayiqa - AI Engineer
No ratings yet
Sayiqa - AI Engineer
4 pages
SOAL PTS KELAS 8 Bahasa Inggris
0% (1)
SOAL PTS KELAS 8 Bahasa Inggris
5 pages
Chapter 4 Multithreading in Java PDF
No ratings yet
Chapter 4 Multithreading in Java PDF
21 pages
Use Cases For Project
No ratings yet
Use Cases For Project
4 pages
Wahm in Arabic and Its Cognates
100% (1)
Wahm in Arabic and Its Cognates
18 pages
Classifying Hand-Written Digits Using Neural Network: A Project Report On
No ratings yet
Classifying Hand-Written Digits Using Neural Network: A Project Report On
19 pages
IT, HW & AI Workshop Projects
No ratings yet
IT, HW & AI Workshop Projects
2 pages
AI in Marketing Industry Course Curriculum
No ratings yet
AI in Marketing Industry Course Curriculum
17 pages
WH Questions With Did - TEACHER
100% (1)
WH Questions With Did - TEACHER
4 pages
Fairfield Inn: Suites Brand Identity Standards
100% (1)
Fairfield Inn: Suites Brand Identity Standards
90 pages
CS304 Mcqs FinalTerm by Vu Topper RM
No ratings yet
CS304 Mcqs FinalTerm by Vu Topper RM
34 pages
CCS355 SET1 Anna University Lab Manual Question Set
100% (1)
CCS355 SET1 Anna University Lab Manual Question Set
3 pages
LP5 List of Assignments
No ratings yet
LP5 List of Assignments
2 pages
MIT Data Science and Big Data Analytics Case Study
No ratings yet
MIT Data Science and Big Data Analytics Case Study
8 pages
Detailed Lesson Plan in Mathematics 2
No ratings yet
Detailed Lesson Plan in Mathematics 2
7 pages
Third Periodical Test
No ratings yet
Third Periodical Test
6 pages
Chen, Bansal - 2018 - Fast Abstractive Summarization With Reinforce-Selected Sentence Rewriting-Annotated
No ratings yet
Chen, Bansal - 2018 - Fast Abstractive Summarization With Reinforce-Selected Sentence Rewriting-Annotated
12 pages
Tarun DS Resume
No ratings yet
Tarun DS Resume
1 page
Week 1 - The Structure of A Paragraph
No ratings yet
Week 1 - The Structure of A Paragraph
8 pages
Assignment 3
No ratings yet
Assignment 3
6 pages
Using Boss Tone Studio For Me-25
No ratings yet
Using Boss Tone Studio For Me-25
4 pages
Worksheet Integer Operations With Powers
No ratings yet
Worksheet Integer Operations With Powers
3 pages
Eschatology Lecture Note - 093527
No ratings yet
Eschatology Lecture Note - 093527
16 pages
UN - LOCODE Code List by Country and Territory - UNECE
No ratings yet
UN - LOCODE Code List by Country and Territory - UNECE
10 pages
IsiZulu HL P2 June-July 2015
No ratings yet
IsiZulu HL P2 June-July 2015
25 pages
Alice
No ratings yet
Alice
7 pages
Indiana University Summer Language Workshop Student Handbook 2020
No ratings yet
Indiana University Summer Language Workshop Student Handbook 2020
17 pages
3 Must-Have Projects For Your Data Science Portfolio - by Aakash N S - Jovian - Jan, 2021 - Medium
No ratings yet
3 Must-Have Projects For Your Data Science Portfolio - by Aakash N S - Jovian - Jan, 2021 - Medium
1 page
Raviteja Resume GD
No ratings yet
Raviteja Resume GD
2 pages
Question Paper Language
No ratings yet
Question Paper Language
12 pages
Routh's Criterion
No ratings yet
Routh's Criterion
19 pages
PS2 Final Exam Description (Online)
No ratings yet
PS2 Final Exam Description (Online)
2 pages
Detailed Lesson Plan For Multigrade Classes in Grade 2 and 3
100% (1)
Detailed Lesson Plan For Multigrade Classes in Grade 2 and 3
4 pages
W5 Quiz-Ans
No ratings yet
W5 Quiz-Ans
5 pages
Appel - Conference - ACEA - 2024 - Yaoundé II - New - FR
No ratings yet
Appel - Conference - ACEA - 2024 - Yaoundé II - New - FR
2 pages
Congratulating and Complimenting Others: Trisukses Vocational High School
No ratings yet
Congratulating and Complimenting Others: Trisukses Vocational High School
10 pages