Intel Ai Project
Intel Ai Project
Tejeshwini R
22BTDS94
“B” Sec
Data Centric approach·
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Load dataset
df = pd.read_csv("movie_genre_classifier_dataset.csv") # Ensure the CSV is in your
working directory
# Combine title and plot into one feature
df['text'] = df['Title'] + " " + df['Plot']
# Load dataset
df = pd.read_csv("movie_genre_classifier_dataset.csv")
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X_vectorized, y, test_size=0.2,
random_state=42)
# Evaluate accuracy
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Model Accuracy:", accuracy)
Data-Centric Enhancements
# 4a. Remove duplicates
df.drop_duplicates(subset=["description"], inplace=True)
df['text'] = df['text'].apply(clean_text)
# Predict genre
predicted_genre = model.predict(user_vector)
print("Predicted Genre:", predicted_genre[0])
RESULT:Enter movie plot or description: A spaceship crew lands on an alien planet and
discovers a hidden danger.
Predicted Genre: Sci-Fi
COLAB
LINK:https://colab.research.google.com/drive/1mbl6sXsu6pGQ_LUq0QGJDFzXA7W1AXrR?
usp=sharing
DATASET: