0% found this document useful (0 votes)

16 views7 pages

Intel Ai Project

The document outlines a project focused on movie genre classification using both data-centric and model-centric approaches. It details the steps taken for data preparation, including data loading, cleaning, preprocessing, and model training using Logistic Regression and Random Forest Classifier. Additionally, it provides Python code for implementing the model and predicting genres based on user input.

Uploaded by

tejeshwinirajendran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views7 pages

Intel Ai Project

Uploaded by

tejeshwinirajendran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

INTEL AI PROJECT

Tejeshwini R
22BTDS94
“B” Sec
Data Centric approach·

Logical planning is about the sequence of operations and steps

needed to achieve a goal.
Steps:
1. Data Loading
The movie dataset was loaded from CSV.
2. Data Cleaning
We handled missing and short descriptions given in description column of database.
3. Text Preprocessing
We converted text to lowercase, removed URLs or any special characters, tokenized and
removed stopwords, and also lemmatized the words.
4. Genre Processing
The data was splitted and genres were standardized into lists.
5. Feature Extraction
We used CountVectorizer for text and added description length as an extra feature.
6. Label Binarization
We also converted genre lists into binary format using MultiLabelBinarizer.
7. Train-Test Split
The data was divided for training and testing.
8. Model Training
We used RandomForestClassifier model for training the data.
9. Evaluation
We also predicted and calculated accuracy.
10. Prediction Function
Now for the new input provided by the user, the model can predict the genre.
Algorithm used:
We are using the classification algorithm.
Model used:
We have used a Random Forest Classifier.
Model-centric approach

Python Code (Model-Centric Movie Genre

Classifier)

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load dataset
df = pd.read_csv("movie_genre_classifier_dataset.csv") # Ensure the CSV is in your
working directory
# Combine title and plot into one feature
df['text'] = df['Title'] + " " + df['Plot']

# Features and labels

X = df['text']
y = df['Genre']

# Vectorize text using TF-IDF

vectorizer = TfidfVectorizer()
X_vectorized = vectorizer.fit_transform(X)

# Split into training and testing data

X_train, X_test, y_train, y_test = train_test_split(X_vectorized, y, test_size=0.2,
random_state=42)

# Train Logistic Regression model

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Predict and evaluate

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print("Model Accuracy:", accuracy)

Train the Model & Enable Prediction from User Input

# Load dataset
df = pd.read_csv("movie_genre_classifier_dataset.csv")

# Combine title and plot into one feature

df['text'] = df['Title'] + " " + df['Plot']

# Features and labels

X = df['text']
y = df['Genre']

# Vectorize text using TF-IDF

vectorizer = TfidfVectorizer()
X_vectorized = vectorizer.fit_transform(X)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X_vectorized, y, test_size=0.2,
random_state=42)

# Train the model

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Evaluate accuracy
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Model Accuracy:", accuracy)

Data-Centric Enhancements
# 4a. Remove duplicates
df.drop_duplicates(subset=["description"], inplace=True)

# 4b. Drop rows with missing description or genre

df.dropna(subset=["description", "genre"], inplace=True)

# 4c. Normalize genre labels

df['genre'] = df['genre'].str.strip().str.lower()

# 4d. Combine text

df['text'] = df['movie_name'].astype(str) + " " + df['description'].astype(str)

# 4e. Preprocess text

def clean_text(text):
return re.sub(r"[^a-zA-Z0-9\s]", "", text.lower())

df['text'] = df['text'].apply(clean_text)

Predict Genre from User Input

# User input: Movie description
user_input = input("Enter movie plot or description: ")

# Vectorize the input

user_vector = vectorizer.transform([user_input])

# Predict genre
predicted_genre = model.predict(user_vector)
print("Predicted Genre:", predicted_genre[0])

RESULT:Enter movie plot or description: A spaceship crew lands on an alien planet and
discovers a hidden danger.
Predicted Genre: Sci-Fi

COLAB
LINK:https://colab.research.google.com/drive/1mbl6sXsu6pGQ_LUq0QGJDFzXA7W1AXrR?
usp=sharing

DATASET:

Aspect Model-Centric AI Data-Centric AI

Focus Improving the model Improving the quality of the data

Techniques Try different ML models, Clean descriptions, correct genre

hyperparameters labels

Your Logistic Regression + TF-IDF User input genre prediction

Example

Algorithm: Logistic Regression

Model: Machine Learning (Text Classification)

Planning Type: Logical Planning

Steps in Planning: Preprocessing, Vectorization, Training, Evaluation, Prediction

Email Spam Detection Final Presentation-21BSCHH010002
No ratings yet
Email Spam Detection Final Presentation-21BSCHH010002
17 pages
Combine PDF
No ratings yet
Combine PDF
124 pages
Sentimental Analysis of Movie Review
100% (1)
Sentimental Analysis of Movie Review
58 pages
Team Renegades MMLA Report
No ratings yet
Team Renegades MMLA Report
27 pages
Mini Project
No ratings yet
Mini Project
10 pages
Sentiment Analysis IMDB Review - Presentation
No ratings yet
Sentiment Analysis IMDB Review - Presentation
19 pages
Final Presentation
No ratings yet
Final Presentation
18 pages
ML Lab AIDS
No ratings yet
ML Lab AIDS
25 pages
Practical File OF Machine Learning
No ratings yet
Practical File OF Machine Learning
31 pages
29-Movie Review NLTK
No ratings yet
29-Movie Review NLTK
27 pages
Group 4 MovieReview
No ratings yet
Group 4 MovieReview
10 pages
DL 3
No ratings yet
DL 3
5 pages
Book Genre Classification Using ML
No ratings yet
Book Genre Classification Using ML
46 pages
Shashank ML
No ratings yet
Shashank ML
23 pages
New Report
No ratings yet
New Report
23 pages
AI Lab Report BIM
No ratings yet
AI Lab Report BIM
34 pages
DL 3
No ratings yet
DL 3
6 pages
Ritesh Mangla ML PracticalFile
No ratings yet
Ritesh Mangla ML PracticalFile
55 pages
Multi-Output Classification With Machine Learning
No ratings yet
Multi-Output Classification With Machine Learning
10 pages
Lab Report 8
No ratings yet
Lab Report 8
11 pages
Movie Genre Tools Explanation
No ratings yet
Movie Genre Tools Explanation
2 pages
Unit 4
No ratings yet
Unit 4
23 pages
DMDW G3
No ratings yet
DMDW G3
16 pages
Ai Project
No ratings yet
Ai Project
15 pages
Case Study NLP
No ratings yet
Case Study NLP
4 pages
Model Determination
No ratings yet
Model Determination
23 pages
DL Exp-10,11,12
No ratings yet
DL Exp-10,11,12
6 pages
Machine Learning Lab - 28012025
No ratings yet
Machine Learning Lab - 28012025
4 pages
Malignant Comments Classifier Project
No ratings yet
Malignant Comments Classifier Project
30 pages
Practical 2
No ratings yet
Practical 2
4 pages
My ML Lab Manual
No ratings yet
My ML Lab Manual
21 pages
Review 2
No ratings yet
Review 2
21 pages
Spamdetection
No ratings yet
Spamdetection
6 pages
Prac - 5 (Aam)
No ratings yet
Prac - 5 (Aam)
1 page
Review 1
No ratings yet
Review 1
18 pages
School of Engineering: Lab Manual On Machine Learning Lab
No ratings yet
School of Engineering: Lab Manual On Machine Learning Lab
23 pages
COMP 4650 6490 Assignment 3 2023-v1.1
No ratings yet
COMP 4650 6490 Assignment 3 2023-v1.1
6 pages
AD3461-Machine Learning Lab Manual
No ratings yet
AD3461-Machine Learning Lab Manual
26 pages
Batch 13
No ratings yet
Batch 13
11 pages
Command Classifier
No ratings yet
Command Classifier
4 pages
Final Review
No ratings yet
Final Review
24 pages
Movie Genre Prediction From Plot Summaries by Comparing Various Classification Algorithms
No ratings yet
Movie Genre Prediction From Plot Summaries by Comparing Various Classification Algorithms
3 pages
ML Lab Manual - Ex No. 1 To 9
No ratings yet
ML Lab Manual - Ex No. 1 To 9
26 pages
Classifying Fake News Using Supervised Learning With NLP: Katharine Jarmul
No ratings yet
Classifying Fake News Using Supervised Learning With NLP: Katharine Jarmul
20 pages
PMT Hps Ep03 500 100 Hps Fte Specification
No ratings yet
PMT Hps Ep03 500 100 Hps Fte Specification
26 pages
Capstone Project - Jaro-Prof. Babji
No ratings yet
Capstone Project - Jaro-Prof. Babji
5 pages
IQBAL Fresher 19
No ratings yet
IQBAL Fresher 19
3 pages
Text Classification - Movie Review - News Wires
No ratings yet
Text Classification - Movie Review - News Wires
5 pages
Q 3
No ratings yet
Q 3
2 pages
Mastering Objectoriented Python
From Everand
Mastering Objectoriented Python
Steven F. Lott
5/5 (2)
Programming Assignment 3: Logistic Regression Instructions
No ratings yet
Programming Assignment 3: Logistic Regression Instructions
3 pages
IT ITES Level 3& Level 4
No ratings yet
IT ITES Level 3& Level 4
571 pages
Fagor CNC 8025 - 8030
No ratings yet
Fagor CNC 8025 - 8030
255 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Microsoft Certified: Power BI Data Analyst Associate PL 300 Practice Tests
From Everand
Microsoft Certified: Power BI Data Analyst Associate PL 300 Practice Tests
CertSquad Professional Trainers
No ratings yet
Manual Sim Next
No ratings yet
Manual Sim Next
16 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
5555 83 FireCatalog 06
No ratings yet
5555 83 FireCatalog 06
56 pages
Data Structures and Algorithms Made Easy-Narasimha Karumanchi
85% (40)
Data Structures and Algorithms Made Easy-Narasimha Karumanchi
228 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
#!sunnydays Redacted
No ratings yet
#!sunnydays Redacted
386 pages
InTechOpen General Information
No ratings yet
InTechOpen General Information
2 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
SH3 Patrol Event Recorder
No ratings yet
SH3 Patrol Event Recorder
53 pages
Inter Graph TANK Users
100% (2)
Inter Graph TANK Users
143 pages
Fmr67 Manual
No ratings yet
Fmr67 Manual
190 pages
i-CAT FLX - User Manual
No ratings yet
i-CAT FLX - User Manual
97 pages
Black Blue Modern Professional CV Resume Template-1
No ratings yet
Black Blue Modern Professional CV Resume Template-1
1 page
Chat Bot
No ratings yet
Chat Bot
48 pages
FRD FSD Template
No ratings yet
FRD FSD Template
7 pages
Novatel OEM7 AUTHCODES
No ratings yet
Novatel OEM7 AUTHCODES
2 pages
Manual Del Usuario Ford FOCUS 2002: EBHFZBWHZR - PDF - 291.76 KB - 07 Oct, 2015
0% (1)
Manual Del Usuario Ford FOCUS 2002: EBHFZBWHZR - PDF - 291.76 KB - 07 Oct, 2015
4 pages
Jaudat Ali-Resume Updated
No ratings yet
Jaudat Ali-Resume Updated
2 pages
Presentation 1
No ratings yet
Presentation 1
9 pages
b5 ICT
No ratings yet
b5 ICT
3 pages
Chapter 12
No ratings yet
Chapter 12
48 pages
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet
Caesar II Enhancement List
100% (1)
Caesar II Enhancement List
16 pages
TensorFlow深度学习项目实战: Chinese Edition
From Everand
TensorFlow深度学习项目实战: Chinese Edition
Posts & Telecom Press
No ratings yet
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
No ratings yet
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
10 pages
Algebra 1 - Equal Values Method For Solving Systems
0% (1)
Algebra 1 - Equal Values Method For Solving Systems
3 pages
AS11 Create Asset Sub-Number
No ratings yet
AS11 Create Asset Sub-Number
6 pages
Nieuwenhuizen - The Study and Implementation of Shazam's Audio Fingerprinting Algorithm For Advertisement Identification
No ratings yet
Nieuwenhuizen - The Study and Implementation of Shazam's Audio Fingerprinting Algorithm For Advertisement Identification
4 pages
Content Server Refresh - SCN
0% (1)
Content Server Refresh - SCN
2 pages
Configuration Board RNC
No ratings yet
Configuration Board RNC
2 pages
Erreur 0 104 PDF
No ratings yet
Erreur 0 104 PDF
2 pages
6WIND-Intel White Paper - Optimized Data Plane Processing Solutions Using The Intel® DPDK v2
No ratings yet
6WIND-Intel White Paper - Optimized Data Plane Processing Solutions Using The Intel® DPDK v2
8 pages
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
From Everand
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
Georgio Daccache
No ratings yet

Intel Ai Project

Uploaded by

Intel Ai Project

Uploaded by

INTEL AI PROJECT

Logical planning is about the sequence of operations and steps

Python Code (Model-Centric Movie Genre

# Features and labels

# Vectorize text using TF-IDF

# Split into training and testing data

# Train Logistic Regression model

# Predict and evaluate

print("Model Accuracy:", accuracy)

Train the Model & Enable Prediction from User Input

# Combine title and plot into one feature

# Features and labels

# Vectorize text using TF-IDF

# Train the model

# 4b. Drop rows with missing description or genre

# 4c. Normalize genre labels

# 4d. Combine text

# 4e. Preprocess text

Predict Genre from User Input

# Vectorize the input

Aspect Model-Centric AI Data-Centric AI

Focus Improving the model Improving the quality of the data

Techniques Try different ML models, Clean descriptions, correct genre

Your Logistic Regression + TF-IDF User input genre prediction

Algorithm: Logistic Regression​

Planning Type: Logical Planning​

Steps in Planning: Preprocessing, Vectorization, Training, Evaluation, Prediction

You might also like

Algorithm: Logistic Regression

Planning Type: Logical Planning