0% found this document useful (0 votes)
7 views40 pages

Deep - Learning Rec

The document outlines a series of programming tasks focused on building deep neural network models for linear regression, both with single and multiple variables, using TensorFlow and Keras. It includes mathematical equations, optimization techniques, and code examples for generating data, training models, and evaluating performance. Additionally, it describes a speech-to-text conversion process, detailing the steps from audio input to outputting transcribed text.

Uploaded by

322103383011
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views40 pages

Deep - Learning Rec

The document outlines a series of programming tasks focused on building deep neural network models for linear regression, both with single and multiple variables, using TensorFlow and Keras. It includes mathematical equations, optimization techniques, and code examples for generating data, training models, and evaluating performance. Additionally, it describes a speech-to-text conversion process, detailing the steps from audio input to outputting transcribed text.

Uploaded by

322103383011
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

322103383011 B.

Gayathri

WEEK 1
AIM: Build a deep neural network model start with linear regression with a single
variable.
DESCRIPTION:
Linear regression with a single variable is a statistical method used to model the
relationship between one independent variable xx and a dependent variable y by
fitting a straight line to the data.
1. Model Equation
The mathematical equation for a simple linear regression model is:
𝑦 = 𝑚𝑥 + 𝑐 + 𝜖
Where:
• y: Dependent variable (output).
• x: Independent variable (input).
• m: Slope of the regression line (change in y for a unit change in x).
• c: Intercept (value of y when x=0).
• ϵ: Error term accounting for the deviation of actual data points from the
regression line.
2. Objective
The goal of linear regression is to minimize the error between the actual values
(yi) and the predicted values (𝑦̂). The predicted value is given by:
𝑦̂ = 𝑚𝑥 + 𝑐
3. Error Measurement
The most commonly used error metric is the Residual Sum of Squares (RSS),
defined as:
𝑛 𝑛
2
𝑅𝑆𝑆 = ∑(𝑦𝑖 − 𝑦̂)2 = ∑(𝑦𝑖 − (𝑚𝑥𝑖 + 𝑐))
1 𝑖=1

Where:
• yi: Actual value of the dependent variable for the i-th data point.
• 𝑦̂= 𝑚𝑥𝑖 + 𝑐: Predicted value for the i-th data point.

1
322103383011 B.Gayathri

• n: Number of data points.


4. Optimization
To find the best-fit line, we solve for m and c by minimizing RSS. The solution
can be derived using calculus:
Slope (m):
𝑛∑𝜘1 ⋅ 𝑦1 − ∑𝑥𝑖 ∑𝑦𝑖
𝑚=
𝑛∑𝑥𝑖2 − (𝛴𝑥𝑖 )2
Intercept (c):
𝛴𝑦𝑖 − 𝑚𝛴𝑥𝑖
𝑐=
𝑛
Where:
• ∑xi: Sum of all xx values.
• ∑yi: Sum of all yy values.
• ∑xiyi: Sum of the product of xx and yy values.
• ∑xi2: Sum of the squares of xx values.
5. Interpretation
The resulting regression line y^= mx + c:
• Predicts y for a given x.
• Explains the linear relationship between x and y.
• Helps in determining the strength of this relationship using metrics like
R^2, which measures the proportion of variance in y explained by x.

CODE:
(Randomly Generated Data)
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
np.random.seed(42)
X = np.linspace(0, 10, 100)
2
322103383011 B.Gayathri

y = 2 * X + 3 + np.random.normal(0, 1, size=X.shape)
plt.scatter(X, y, label="Data")
plt.xlabel("Input (X)")
plt.ylabel("Target (y)")
plt.title("Sample Data")
plt.legend()
plt.show()
from tensorflow.keras import Input
from tensorflow.keras.layers import Dense
linear_model = tf.keras.Sequential([
Input(shape=(1,)),
Dense(1)
])
linear_model.compile(optimizer='sgd', loss='mse')
history_linear = linear_model.fit(X, y, epochs=100, verbose=0)
plt.plot(history_linear.history['loss'], label="Linear Model Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Loss Curve (Linear Regression)")
plt.legend()
plt.show()
y_pred_linear = linear_model.predict(X)
plt.scatter(X, y, label="Data")
plt.plot(X, y_pred_linear, color='red', label="Linear Regression Prediction")
plt.xlabel("Input (X)")
plt.ylabel("Target (y)")
plt.title("Linear Regression Model")

3
322103383011 B.Gayathri

plt.legend()
plt.show()
dnn_model = tf.keras.Sequential([
Dense(10, activation='relu', input_shape=(1,)),
Dense(10, activation='relu'),
Dense(1)
])
dnn_model.compile(optimizer='adam', loss='mse')
history_dnn = dnn_model.fit(X, y, epochs=100, verbose=0)
plt.plot(history_dnn.history['loss'], label="DNN Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Loss Curve (DNN)")
plt.legend()
plt.show()
y_pred_dnn = dnn_model.predict(X)
plt.scatter(X, y, label="Data")
plt.plot(X, y_pred_linear, color='red', label="Linear Regression Prediction")
plt.plot(X, y_pred_dnn, color='green', label="DNN Prediction")
plt.xlabel("Input (X)")
plt.ylabel("Target (y)")
plt.title("Linear Regression vs DNN")
plt.legend()
plt.show()
OUTPUT:

4
322103383011 B.Gayathri

5
322103383011 B.Gayathri

6
322103383011 B.Gayathri

CODE:
(Data from a CSV):
import pandas as pd
import tensorflow as tf
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
data = pd.read_csv('data1.csv')
X = data[['X']].values
y = data[['y']].values
scaler_X = MinMaxScaler()
scaler_y = MinMaxScaler()
X = scaler_X.fit_transform(X)
y = scaler_y.fit_transform(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
model = tf.keras.Sequential([
tf.keras.layers.Dense(units=1, input_shape=[1])
])
model.compile(optimizer='adam', loss='mse')
model.fit(X_train, y_train, epochs=100, verbose=0)
loss = model.evaluate(X_test, y_test)
weights, bias = model.layers[0].get_weights()
print("Weights:", weights, "Bias:", bias)
y_pred = model.predict(X_test)
print("\nPredicted House Prices:")
print(y_pred)
print(f"Test Loss: {loss}")
7
322103383011 B.Gayathri

plt.plot(y_test, color='green', label='Actual Values', linewidth=2)


plt.plot(y_pred, color='blue', label='Predicted Values', linestyle='--', linewidth=2)
plt.title('Actual vs Predicted Values')
plt.xlabel('Samples')
plt.ylabel('Values')
plt.legend()
plt.show()

OUTPUT:

Predicted House Prices:


[[0.09018803]
[0.18801826]
[0.37478504]
[0.11242217]]
Test Loss: 0.03589074686

8
322103383011 B.Gayathri

WEEK 2
AIM: Build a deep neural network model start with linear regression with
multiple variables.
DESCRIPTION:
Linear regression with multiple variables (also called multiple linear regression)
models the relationship between one dependent variable yy and multiple
independent variables x1,x2,…,xpx_1, x_2, \dots, x_p. The goal is to fit a
hyperplane to the data that minimizes the prediction error.
1. Model Equation:
The general equation for multiple linear regression is:
y=β0+β1x1+β2x2+⋯+βpxp+ϵ
Where:
• y: Dependent variable (output).
• x1,x2,…,xp: Independent variables (inputs or predictors).
• β0: Intercept (value of y when all xi=0).
• β1,β2,…,βp: Coefficients (weights representing the effect of each
independent variable on y).
• ϵ: Error term accounting for the deviation of actual values from the
predicted values.
2. Matrix Representation
Using matrix notation, the equation can be written as:
y=Xβ+ϵ
Where:
• y n×1n \times 1 vector of dependent variable values.
• X: n×(p+1)n \times (p+1) matrix of independent variable values, where the
first column is all ones to account for β0\beta_0 (intercept).
• Β: (p+1)×1(p+1) \times 1 vector of coefficients (β0,β1,…,βp\beta_0,
\beta_1, \dots, \beta_p).
• ϵ : n×1n \times 1 vector of error terms.
3. Objective

9
322103383011 B.Gayathri

The objective is to minimize the sum of squared residuals (SSR):


𝑛

𝑆𝑆𝑅 = ∑(𝑦𝑖 − 𝑦̂)2


1

4. Optimization (Normal Equation)


The coefficients β that minimize SSR can be calculated using:
β=(X⊤X)−1X⊤y
Here:
• X⊤X: (p+1)×(p+1)(p+1) \times (p+1) matrix.
• (X⊤X)−1: Inverse of the matrix X⊤X
• X⊤y : (p+1)×1(p+1) \times 1 vector.
5. Interpretation of Coefficients
• β0: Expected value of y when all xi=0x_i = 0.
• Βj: The expected change in y for a one-unit increase in xj, holding all other
variables constant.
CODE:
(Using randomly generated data):
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
from sklearn.model_selection import train_test_split
from tensorflow.keras import Input
from tensorflow.keras.layers import Dense
from sklearn.metrics import mean_squared_error,r2_score
from tensorflow.keras.models import Sequential
np.random.seed(42)
samples=100

10
322103383011 B.Gayathri

features=3
X=np.random.rand(samples,features)
y=2*X[:,0]+3*X[:,1]+4*X[:,2]+np.random.normal(0,1,size=samples)
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=4
2)
model = Sequential()
model.add(Dense(64, input_dim=X_train.shape[1], activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(X_train,y_train,epochs=100,batch_size=10,verbose=0)
y_pred=model.predict(X_test)
mse=mean_squared_error(y_test,y_pred)
print(f"Mean Squared Error:{mse}")
r2=r2_score(y_test,y_pred)
print(f"R2 Score:{r2}")
sorted_indices = np.argsort(y_test)
y_test_sorted = y_test[sorted_indices]
y_pred_sorted = y_pred.flatten()[sorted_indices]
plt.plot(y_test_sorted, color='green', label='Actual Values', linewidth=2)
plt.plot(y_pred_sorted, color='blue', label='Predicted Values', linestyle='--',
linewidth=2)
plt.title('Actual vs Predicted Values')
plt.xlabel('Samples')
plt.ylabel('Values')
plt.legend()
plt.show()

11
322103383011 B.Gayathri

OUTPUT:

Mean Squared Error:2.011931696635534


R2 Score:0.5736873030241454

CODE:
(Data from a CSV):
import pandas as pd
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

12
322103383011 B.Gayathri

data = pd.read_csv('house_prices.csv')
X = data[['SquareFootage', 'NumRooms', 'AgeOfHouse']].values
y = data['Price'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
model = Sequential()
model.add(Dense(64, input_dim=X_train.shape[1], activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mean_squared_error')
history = model.fit(X_train, y_train, epochs=50, batch_size=10, verbose=0,
validation_data=(X_test, y_test))
y_pred = model.predict(X_test)
print("\nPredicted House Prices:")
print(y_pred)
loss = model.evaluate(X_test, y_test)
print(f"Test Loss: {loss}")
plt.plot(y_test, color='green', label='Actual Values', linewidth=2)
plt.plot(y_pred, color='blue', label='Predicted Values', linestyle='--', linewidth=2)
plt.title('Actual vs Predicted Values')
plt.xlabel('Samples')
plt.ylabel('Values')
plt.legend()
plt.show()
OUTPUT:
Predicted House Prices:

13
322103383011 B.Gayathri

[[4636.989 ]
[4436.9272]
[5037.114 ]
[4029.194 ]
[4634.084 ]
[3227.5212]]
Test Loss: 1470.2730712890625

14
322103383011 B.Gayathri

WEEK 3
AIM: Write a program to convert speech into text.
DESCRIPTION:
Converting speech to text involves transforming spoken language into written
text using techniques in audio processing and natural language processing (NLP).
Here’s how it generally works:
1. Audio Input
• A microphone or recording device captures the speech.
• The audio signal is usually stored in a digital format, such as WAV or MP3.
2. Preprocessing
• The audio is cleaned to remove background noise and enhance the clarity
of speech.
• The signal is segmented into smaller frames (e.g., 20-40 milliseconds) to
capture the characteristics of speech over time.
3. Feature Extraction
• Features like Mel-frequency cepstral coefficients (MFCCs), spectrograms,
or log-Mel features are extracted from the audio. These features represent
the sound in a format suitable for machine learning models.
4. Acoustic Model
• The acoustic model maps the audio features to phonemes (basic units of
sound in a language). This model is trained on a large dataset of audio and
corresponding phonemes.
5. Language Model
• The language model predicts the sequence of words based on the phonemes
and the context. It uses linguistic rules and probabilities to ensure the
output is coherent and grammatically correct.
6. Decoding
• A decoding algorithm combines the acoustic and language models to
generate the most probable text representation of the speech.
• Techniques like beam search or Viterbi decoding are often used to find the
best match.

15
322103383011 B.Gayathri

7. Postprocessing
• The raw text is refined for accuracy, punctuation, and formatting.
• Named entities (e.g., names, places) and domain-specific terms are
corrected or added.
8. Output
• The final text is displayed to the user, saved to a file, or sent to another
application for further processing.
CODE:
From Audio file to speech:
# importing libraries
import speech_recognition as sr
import os
from pydub import AudioSegment
from pydub.silence import split_on_silence
r = sr.Recognizer()
def transcribe_audio(path):
# use the audio file as the audio source
with sr.AudioFile(path) as source:
audio_listened = r.record(source)
# try converting it to text
text = r.recognize_google(audio_listened)
return text
def get_large_audio_transcription_on_silence(path):
"""Splitting the large audio file into chunks
and apply speech recognition on each of these chunks"""
sound = AudioSegment.from_file(path)
chunks = split_on_silence(sound,
# experiment with this value for your target audio file

16
322103383011 B.Gayathri

min_silence_len = 500,
# adjust this per requirement
silence_thresh = sound.dBFS-14,
# keep the silence for 1 second, adjustable as well
keep_silence=500,
)
folder_name = "audio-chunks"
# create a directory to store the audio chunks
if not os.path.isdir(folder_name):
os.mkdir(folder_name)
whole_text = ""
# process each chunk
for i, audio_chunk in enumerate(chunks, start=1):
# export audio chunk and save it in
# the `folder_name` directory.
chunk_filename = os.path.join(folder_name, f"chunk{i}.wav")
audio_chunk.export(chunk_filename, format="wav")
# recognize the chunk
try:
text = transcribe_audio(chunk_filename)
except sr.UnknownValueError as e:
print("Error:", str(e))
else:
text = f"{text.capitalize()}. "
print(chunk_filename, ":", text)
whole_text += text
# return the text for all chunks detected

17
322103383011 B.Gayathri

return whole_text
path = "C:/Users/VAMSI/Desktop/6th sem csd/DL/16-122828-0002.wav"
print("\nFull text:", get_large_audio_transcription_on_silence(path))

OUTPUT:
audio-chunks\chunk1.wav : I believe you are just talking non.

Full text: I believe you are just talking non.


(base) PS C:\Users\VAMSI\Desktop\6th sem csd\DL>

From Microphone input to speech:

18
322103383011 B.Gayathri

WEEK 4
AIM: Write a program to convert text into speech.
DESCRIPTION:
Converting text to speech (TTS) involves transforming written text into spoken
words using techniques from natural language processing (NLP) and audio
synthesis.
1. Text Input
• A string of text is provided as input, either typed by the user or retrieved
from a file or database.
2. Text Preprocessing
The input text is normalized to prepare it for processing:
• Expanding abbreviations (e.g., "Dr." to "Doctor").
• Converting numbers to words (e.g., "123" to "one hundred twenty-
three").
• Handling special symbols and formatting (e.g., dates and currency).
3. Linguistic Analysis
• The system performs syntactic and semantic analysis:
• Identifying word boundaries and sentences.

• Assigning part-of-speech tags (e.g., noun, verb).

• Determining intonation and emphasis based on context and punctuation.

4. Phoneme Conversion
• The text is converted into a sequence of phonemes, which are the smallest
units of sound in a language.

• A pronunciation dictionary or rules-based algorithm determines how each


word is pronounced.

5. Prosody Generation
Prosody defines the rhythm, pitch, and stress of speech:
o Assigning pitch contours for intonation (e.g., rising pitch for questions).
o Determining the duration of each phoneme for natural timing.
o Adjusting stress on syllables for emphasis.

19
322103383011 B.Gayathri

6. Audio Synthesis
The phonemes and prosody information are used to generate speech:
• Concatenative TTS: Pre-recorded human speech segments (phonemes or
words) are stitched together to form sentences.
• Parametric TTS: Speech is generated from parameters like pitch, tone,
and speed using a mathematical model.
• Neural TTS: Deep learning models (e.g., Tacotron, WaveNet) produce
highly natural and human-like speech by generating waveforms directly.

7. Audio Postprocessing
• The synthesized speech is refined to remove artifacts, enhance quality,
and adjust volume.
• Additional effects like echo, reverb, or emotional tone can be added if
needed.
8. Output
• The final audio is played back through speakers or headphones.

• The speech can also be saved as a digital audio file (e.g., WAV, MP3).

CODE:
import speech_recognition as sr
import pyttsx3

r = sr.Recognizer()

def SpeakText(command):
engine = pyttsx3.init()
engine.say(command)
engine.runAndWait()

while True:
try:
print("Enter some text (To exit enter only stop or exit)")
text=input().strip()
SpeakText(text)
if text.lower()=="stop" or text.lower()=="exit":
print("Exiting program.")
SpeakText("Goodbye!")
break
20
322103383011 B.Gayathri

except sr.RequestError as e:
print(f"Could not request results; {e}")

except sr.UnknownValueError:
print("Unknown error occurred or unable to recognize speech.")

OUTPUT:
Enter some text (To exit enter only stop or exit)
Hello Good morning
Enter some text (To exit enter only stop or exit)
stop
Exiting program.
Audio corresponding to

21
322103383011 B.Gayathri

WEEK 5
AIM: Write a program to convert video into frames.
DESCRIPTION:
Converting a video to frames or extracting frames from a video is a process where
each individual frame (image) from a video is separated and saved as an
independent image file. This is a common step in video processing, computer
vision, and machine learning tasks.
Why Extract Frames?
1. Analysis: To analyze specific parts of a video, such as detecting objects or
facial expressions.
2. Training Data: For machine learning models, extracted frames are used as
input data.
3. Editing: To work on specific frames for image processing tasks.
4. Visualization: To inspect individual frames for quality or study content.
Steps to Extract Video Frames
1. Input Video: Load the video file you want to process.
2. Choose a Frame Rate:
o Extract all frames or specific ones (e.g., every 5th frame).
o Set the desired frames-per-second (fps) rate for extraction.
3. Save Frames:
o Save each extracted frame as an image in a specific format (JPEG,
PNG, etc.).
o Name frames systematically, like frame_001.png, frame_002.png.
Tools and Libraries
1. OpenCV (Python): A popular library for image and video processing.
2. FFmpeg (Command-Line): A robust tool for video and audio
manipulation.
3. MATLAB: Provides built-in functions for frame extraction.
4. Custom Scripts: You can write your own scripts in various programming
languages.

22
322103383011 B.Gayathri

Frames can also undergo optional preprocessing, such as resizing, cropping, or


grayscale conversion, depending on the application. For scenarios where fewer
frames are needed, one might extract every nth frame, reducing the computational
load. Applications of frame extraction include machine learning tasks (e.g.,
training models for object detection or action recognition), video editing (e.g.,
creating thumbnails), and scientific analysis (e.g., studying motion over time). By
converting a video into frames, the dynamic content of a video is transformed
into discrete, analyzable units, enabling a wide range of computational and visual
tasks.
CODE:
import cv2
import os
def extract_frames(video_file):
cap = cv2.VideoCapture(video_file)
frame_rate = 2
frame_count = 0
video_name = os.path.splitext(os.path.basename(video_file))[0]
output_directory = f"{video_name}_frames"
os.makedirs(output_directory, exist_ok=True)
while True:
ret, frame = cap.read()
if not ret:
break
frame_count += 1
if frame_count % int(cap.get(5) / frame_rate) == 0:
output_file = f"{output_directory}/frame_{frame_count}.jpg"
cv2.imwrite(output_file, frame)
print(f"Frame {frame_count} has been extracted and saved as
{output_file}")
cap.release()

23
322103383011 B.Gayathri

cv2.destroyAllWindows()
if __name__ == "__main__":
video_file = r"video.mp4"
extract_frames(video_file)
OUTPUT:

24
322103383011 B.Gayathri

WEEK 6
AIM: Write a Program to perform Image Classification using CNN

DESCRIPTION:
1. Image Processing
• Convolution Operation:
o Convolution is a mathematical operation used to extract features
(e.g., edges, textures) from an image.
o The Sobel kernel defined in the code detects edges by emphasizing
differences in pixel intensity.
o Padding ensures that the output feature map retains sufficient
dimensions for analysis.
o The result of the convolution is a feature map, highlighting
specific patterns in the image.
• Activation Function (ReLU):
o ReLU (Rectified Linear Unit) is applied to introduce non-linearity
by replacing negative values in the feature map with zero. This
makes the model capable of learning complex features.
• Max Pooling:
o A down-sampling technique that reduces the spatial dimensions of
the feature map while retaining the most prominent features.
o This helps to reduce computational complexity and makes the
features more robust to small variations in the input.

2. Neural Network Basics


• Flattening:
o The 2D feature map is converted into a 1D vector (flattened) to
serve as input to a dense (fully connected) neural network.
• Dense Layer:
o Fully connected layers learn patterns by mapping inputs to outputs
through weighted connections.
25
322103383011 B.Gayathri

o Layers with ReLU activation capture non-linear patterns, while the


softmax activation in the output layer normalizes predictions into
probabilities for multi-class classification.
• Categorical Encoding:
o Labels are one-hot encoded using to_categorical to represent each
digit as a binary vector, enabling the model to predict probabilities
for each class.

3. Model Compilation and Training


• Loss Function:
o Categorical Cross-Entropy is used as the loss function for multi-
class classification. It measures the distance between the predicted
probabilities and the true labels.
• Optimizer:
o Adam (Adaptive Moment Estimation) optimizer is used for
efficient gradient descent. It adapts learning rates for different
parameters, accelerating convergence.
• Metrics:
o Accuracy is used to evaluate the model's performance during
training and testing.
• Training and Testing:
o The model is trained on flattened images from the MNIST dataset,
and its performance is evaluated on unseen test data.

4. Fully Connected Neural Network (FCNN)


• The second part of the code trains an FCNN directly on the raw pixel
values of the MNIST dataset:
o Input: Flattened 28x28 grayscale images (784 features).
o Hidden Layers: Two dense layers with 128 and 64 neurons, using
ReLU activation.

26
322103383011 B.Gayathri

o Output: A dense layer with 10 neurons (one for each digit class)
and softmax activation.

5. Data Normalization
• Pixel values of the MNIST images are normalized to a range of [0, 1] by
dividing by 255. This ensures faster convergence during training by
keeping inputs on a similar scale

MNIST DATA SET:

The MNIST dataset (Modified National Institute of Standards and


Technology) is a widely used benchmark dataset in machine learning and
computer vision. It contains 70,000 grayscale images of handwritten
digits (0–9), each with dimensions 28x28 pixels.
• Training set: 60,000 images.
• Test set: 10,000 images.
• Labels: Each image corresponds to one of 10 classes (digits 0–9).
The dataset is simple, balanced, and ideal for:
• Learning and prototyping basic machine learning and deep learning
models.
• Benchmarking algorithms for classification tasks.
Images are often normalized, flattened, and labels are one-hot encoded
for training. While it's great for beginners, its simplicity makes it less
suitable for modern, advanced models. Extensions like Fashion-MNIST
and EMNIST address some of these limitations.
MNIST is easily accessible through libraries like TensorFlow and remains
a foundational tool for understanding image classification.
The dataset can be loaded directly using machine learning libraries like
TensorFlow, PyTorch, or scikit-learn,
from tensorflow.keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

27
322103383011 B.Gayathri

Sample MNIST Data:

CODE:
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.datasets import mnist
from scipy.signal import convolve2d
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
(x_train, y_train), (x_test, y_test) = mnist.load_data()
plt.imshow(x_train[0], cmap='gray')
plt.title("Sample MNIST Image")

28
322103383011 B.Gayathri

plt.axis('off')
plt.show()
kernel = np.array([
[1, 0, -1],
[2, 0, -2],
[1, 0, -1]
])
image = x_train[0]
pad_width = kernel.shape[0] // 2
padded_image = np.pad(image, pad_width=pad_width, mode='constant',
constant_values=0)
plt.imshow(padded_image, cmap='gray')
plt.title("Padded Image")
plt.axis('off')
plt.show()
convolved_image = convolve2d(padded_image, kernel, mode='valid')
plt.imshow(convolved_image, cmap='gray')
plt.title("Feature Map after Convolution")
plt.axis('off')
plt.show()
relu_feature_map = np.maximum(convolved_image, 0)
plt.imshow(relu_feature_map, cmap='gray')
plt.title("Feature Map after ReLU")
plt.axis('off')
plt.show()
def max_pooling(feature_map, size=2, stride=2):
output_shape = (
29
322103383011 B.Gayathri

(feature_map.shape[0] - size) // stride + 1,


(feature_map.shape[1] - size) // stride + 1
)
pooled_output = np.zeros(output_shape)
for i in range(output_shape[0]):
for j in range(output_shape[1]):
pooled_output[i, j] = np.max(
feature_map[
i*stride:i*stride+size,
j*stride:j*stride+size
]
)
return pooled_output
pooled_feature_map = max_pooling(relu_feature_map)
plt.imshow(pooled_feature_map, cmap='gray')
plt.title("Feature Map after Max Pooling")
plt.axis('off')
plt.show()
flattened_feature_map = pooled_feature_map.flatten()
X = flattened_feature_map.reshape(1, -1)
y = np.array([y_train[0]])
y = to_categorical(y, num_classes=10)
model = Sequential()
model.add(Dense(128, input_dim=X.shape[1], activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='softmax'))

30
322103383011 B.Gayathri

model.compile(optimizer=Adam(), loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(X, y, epochs=3, batch_size=1)
prediction = model.predict(X)
predicted_class = np.argmax(prediction)
print(f"The predicted class for the image is: {predicted_class}")
x_train = x_train / 255.0
x_test = x_test / 255.0
x_train_flat = x_train.reshape(x_train.shape[0], -1)
x_test_flat = x_test.reshape(x_test.shape[0], -1)
y_train_encoded = to_categorical(y_train, num_classes=10)
y_test_encoded = to_categorical(y_test, num_classes=10)
model = Sequential()
model.add(Dense(128, input_dim=x_train_flat.shape[1], activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer=Adam(), loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train_flat, y_train_encoded, epochs=5, batch_size=32)
test_loss, test_accuracy = model.evaluate(x_test_flat, y_test_encoded)
print(f"Test accuracy: {test_accuracy:.4f}")

31
322103383011 B.Gayathri

Output:

32
322103383011 B.Gayathri

33
322103383011 B.Gayathri

WEEK 7

AIM : Write a program to predict the next word in the sentence using RNN
model.
DESCRIPTION:

1. Data Preparation:

• The Tokenizer converts text into numerical indices, creating a word


index.
• The text is split into input sequences, each predicting the next word.
• Padding ensures all sequences have the same length.

2. Model Architecture:

• Embedding Layer maps word indices to dense vectors.


• SimpleRNN Layer captures sequential dependencies, using 150 units.
• Dense Layer with softmax activation predicts the next word.

3. Training:

• The model is trained using categorical cross-entropy loss and the Adam
optimizer.

4. Text Generation:

• generate_text function uses the trained model to predict and generate new
words one by one based on a seed text.

5. RNN Theory:

• RNNs handle sequential data, maintaining a hidden state that updates


with each word, and use it to predict the next word.
• They may suffer from the vanishing gradient problem, especially for long
sequences.

34
322103383011 B.Gayathri

Code:
import numpy as np

import tensorflow as tf

from tensorflow.keras.preprocessing.text import Tokenizer

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import SimpleRNN, Dense, Embedding

# Step 1: Prepare the Data

data = "This is a simple example for text generation using RNN in Python."

# Tokenize the text

tokenizer = Tokenizer()

tokenizer.fit_on_texts([data])

word_index = tokenizer.word_index

total_words = len(word_index) + 1

# Create input sequences

input_sequences = []

for i in range(1, len(tokenizer.texts_to_sequences([data])[0])):

input_sequences.append(tokenizer.texts_to_sequences([data])[0][:i+1])

# Pad sequences

max_sequence_len = max(len(seq) for seq in input_sequences)

input_sequences = np.array(

tf.keras.preprocessing.sequence.pad_sequences(input_sequences, maxlen=max_sequence_len,
padding='pre')

# Create predictors (X) and labels (y)

X, y = input_sequences[:, :-1], input_sequences[:, -1]

y = tf.keras.utils.to_categorical(y, num_classes=total_words)

# Step 2: Build the RNN Model

model = Sequential()

model.add(Embedding(total_words, 100, input_length=max_sequence_len - 1))

model.add(SimpleRNN(150, return_sequences=False))

35
322103383011 B.Gayathri

model.add(Dense(total_words, activation='softmax'))

# Compile the model

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Step 3: Train the Model

model.fit(X, y, epochs=10, verbose=2)

# Step 4: Generate Text

def generate_text(seed_text, next_words, max_sequence_len):

for _ in range(next_words):

token_list = tokenizer.texts_to_sequences([seed_text])[0]

token_list = tf.keras.preprocessing.sequence.pad_sequences(

[token_list], maxlen=max_sequence_len - 1, padding='pre'

predicted = model.predict(token_list, verbose=0)

predicted_word = tokenizer.index_word[np.argmax(predicted)]

seed_text += " " + predicted_word

return seed_text

# Generate new text

seed_text = "hi hello"

print(generate_text(seed_text, 10, max_sequence_len))

Output:

36
322103383011 B.Gayathri

WEEK 8
AIM: Write a program to predict the next word in the sentence using
LSTM model.
DESCRIPTION:

LSTM:

• Sequential Data: LSTMs are well-suited for tasks involving sequential data, like text
or time-series data, because they are designed to retain long-term dependencies between
elements in the sequence.
• Cell States: LSTMs have cell states that carry information across time steps. They
maintain this memory through gates:
o Forget Gate: Decides what information from the previous cell state should be
discarded.
o Input Gate: Determines what new information should be added to the cell state.
o Output Gate: Decides what part of the cell state should be output at each time
step.
• Avoiding Vanishing Gradients: Unlike standard RNNs, LSTMs prevent the
vanishing gradient problem by allowing gradients to flow through the network more
effectively, making them capable of learning long-term dependencies.

Process:

Tokenization and Sequence Preparation:

• Tokenizer: The text is tokenized into words, with each word assigned a unique index.
The Tokenizer converts the input text into sequences of indices corresponding to words
in the vocabulary.
• Input Sequences: The code generates n-gram sequences (sub-sequences) from the
input text to create training data. For example, from the sentence "I spent 20 long years",
it generates sequences like ["I"], ["I spent"], ["I spent 20"], etc.
• Padding: Sequences are padded to ensure that all inputs have the same length when
fed into the model.

LSTM Model:

• Embedding Layer: Converts word indices into dense vector representations. Each
word is mapped to a 10-dimensional vector.
• LSTM Layers: LSTM is a type of RNN designed to capture long-term dependencies
in sequential data. The model uses two LSTM layers with 100 units each:
o The first LSTM layer returns sequences to allow the next LSTM layer to process
the full sequence.
o The second LSTM layer outputs only the final state, capturing the most
important features of the entire sequence.

37
322103383011 B.Gayathri

• Dense Layers: After the LSTM layers, the model has two fully connected layers. The
first dense layer uses a ReLU activation, while the final dense layer uses softmax to
output a probability distribution over the vocabulary, predicting the next word.

Training:

• The model is trained on the input sequences using categorical cross-entropy as the
loss function and Adam optimizer.

Prediction:

• The model predicts the next word given a seed text. It converts the seed text into a
sequence of word indices, pads the sequence to the correct length, and feeds it into the
trained model.
• The model then outputs the predicted probabilities for each word in the vocabulary, and
the word with the highest probability is selected as the predicted next word.

CODE:

import tensorflow as tf

from tensorflow.keras.preprocessing.text import Tokenizer

from tensorflow.keras.preprocessing.sequence import pad_sequences

import numpy as np

# Sample text data (Ideally, you should have a much larger dataset)

text = "I spent 20 long years working for the under-privileged kids in Spain. I then moved to Africa. I
can speak fluent ____ now."

# Tokenization

tokenizer = Tokenizer()

tokenizer.fit_on_texts([text])

total_words = len(tokenizer.word_index) + 1

# Convert text to sequences

input_sequences = []

words = text.split()

for i in range(1, len(words)):

n_gram_sequence = words[:i+1]

input_sequences.append(tokenizer.texts_to_sequences([" ".join(n_gram_sequence)])[0])

38
322103383011 B.Gayathri

# Padding sequences

max_sequence_length = max([len(seq) for seq in input_sequences])

input_sequences = pad_sequences(input_sequences, maxlen=max_sequence_length, padding='pre')

# Splitting into input and output

X, y = input_sequences[:, :-1], input_sequences[:, -1]

y = tf.keras.utils.to_categorical(y, num_classes=total_words)

# Build LSTM Model

model = tf.keras.Sequential([

tf.keras.layers.Embedding(total_words, 10, input_length=max_sequence_length-1),

tf.keras.layers.LSTM(100, return_sequences=True),

tf.keras.layers.LSTM(100),

tf.keras.layers.Dense(100, activation='relu'),

tf.keras.layers.Dense(total_words, activation='softmax')

])

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train the model

model.fit(X, y, epochs=150, verbose=1)

# Predict next word

def predict_next_word(seed_text, tokenizer, model, max_sequence_length):

token_list = tokenizer.texts_to_sequences([seed_text])[0]

token_list = pad_sequences([token_list], maxlen=max_sequence_length-1, padding='pre')

predicted_probs = model.predict(token_list, verbose=0)

predicted_index = np.argmax(predicted_probs)

for word, index in tokenizer.word_index.items():

if index == predicted_index:

return word

return ""

39
322103383011 B.Gayathri

# Example Prediction

seed_text = "I can speak fluent"

predicted_word = predict_next_word(seed_text, tokenizer, model, max_sequence_length)

print("Predicted word:", predicted_word)

OUTPUT:

The Context based next word prediction is observed in the above code which uses LSTM for text
prediction.

40

You might also like