0% found this document useful (0 votes)
7 views4 pages

Practical 2

This document provides a step-by-step guide for building a natural language processing model for sentiment analysis using TensorFlow and the IMDB dataset. It includes installation instructions, Python code for data preprocessing, model creation, training, and evaluation, as well as visualizing training history. The model utilizes an LSTM layer for sequential data processing and is trained for 5 epochs with a batch size of 64.

Uploaded by

Tania Jamdar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views4 pages

Practical 2

This document provides a step-by-step guide for building a natural language processing model for sentiment analysis using TensorFlow and the IMDB dataset. It includes installation instructions, Python code for data preprocessing, model creation, training, and evaluation, as well as visualizing training history. The model utilizes an LSTM layer for sequential data processing and is trained for 5 epochs with a batch size of 64.

Uploaded by

Tania Jamdar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Practical 2:

Building a natural language processing (NLP) model for sentiment analysis or text classification.

step-by-step Python implementation for building a Natural Language Processing (NLP) model for
sentiment analysis using TensorFlow. We'll use the IMDB dataset for training and testing. The
implementation involves text preprocessing, model creation, training, and evaluation.

Step 1: Install Dependencies

Run the following command in your terminal to install the required libraries:

pip install tensorflow matplotlib

Step 2: Python Code for Sentiment Analysis

Save the following code in a file named sentiment_analysis.py.

import tensorflow as tf

from tensorflow.keras import layers, models

from tensorflow.keras.preprocessing.text import Tokenizer

from tensorflow.keras.preprocessing.sequence import pad_sequences

from tensorflow.keras.datasets import imdb

import matplotlib.pyplot as plt

# Step 1: Load the IMDB dataset

num_words = 10000 # Use the top 10,000 words in the vocabulary

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=num_words)

# Step 2: Explore the dataset

print(f"Number of training samples: {len(x_train)}")

print(f"Number of test samples: {len(x_test)}")

print(f"Sample review (tokenized): {x_train[0]}")

print(f"Label (0 = negative, 1 = positive): {y_train[0]}")

# Step 3: Decode a sample review

word_index = imdb.get_word_index()
reverse_word_index = {value: key for key, value in word_index.items()}

decoded_review = " ".join([reverse_word_index.get(i - 3, "?") for i in x_train[0]])

print(f"Decoded review: {decoded_review}")

# Step 4: Pad sequences

maxlen = 200 # Limit each review to 200 words

x_train = pad_sequences(x_train, maxlen=maxlen)

x_test = pad_sequences(x_test, maxlen=maxlen)

# Step 5: Define the model

model = models.Sequential([

layers.Embedding(input_dim=num_words, output_dim=32, input_length=maxlen),

layers.LSTM(32), # Use an LSTM layer for capturing sequential dependencies

layers.Dense(1, activation='sigmoid') # Output layer for binary classification

])

# Step 6: Compile the model

model.compile(optimizer='adam',

loss='binary_crossentropy',

metrics=['accuracy'])

# Step 7: Display the model architecture

model.summary()

# Step 8: Train the model

history = model.fit(x_train, y_train, epochs=5, batch_size=64, validation_split=0.2)

# Step 9: Evaluate the model

test_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)

print(f"Test Accuracy: {test_acc}")


# Step 10: Plot training history

plt.figure(figsize=(12, 4))

# Accuracy plot

plt.subplot(1, 2, 1)

plt.plot(history.history['accuracy'], label='Train Accuracy')

plt.plot(history.history['val_accuracy'], label='Validation Accuracy')

plt.xlabel('Epoch')

plt.ylabel('Accuracy')

plt.legend()

plt.title('Model Accuracy')

# Loss plot

plt.subplot(1, 2, 2)

plt.plot(history.history['loss'], label='Train Loss')

plt.plot(history.history['val_loss'], label='Validation Loss')

plt.xlabel('Epoch')

plt.ylabel('Loss')

plt.legend()

plt.title('Model Loss')

plt.show()

Step 3: Run the Code

1. Save the script as sentiment_analysis.py.

2. Open the terminal in VS Code.

3. Run the script:

bash

Copy code

python sentiment_analysis.py
How the Code Works

1. Dataset: The IMDB dataset is a built-in dataset in TensorFlow, containing 50,000 movie
reviews labeled as positive or negative.

2. Preprocessing:

o The text data is tokenized into sequences of integers.

o The sequences are padded to ensure uniform length using pad_sequences.

3. Model Architecture:

o Embedding Layer: Converts words to dense vector representations.

o LSTM Layer: Captures sequential dependencies in the text.

o Dense Layer: A single output node with a sigmoid activation for binary classification.

4. Training:

o The model is trained for 5 epochs with a batch size of 64.

o 20% of the training data is used for validation.

5. Evaluation:

o The test accuracy is printed.

o Training and validation accuracy/loss curves are plotted.

Dataset

The IMDB dataset is built into TensorFlow and does not require manual download. For more
information, see the TensorFlow documentation on IMDB.

Let me know if you need further explanations or enhancements!

You might also like