0% found this document useful (0 votes)
19 views3 pages

Text Classification With Transformer - 1716327784332

The document shows how to implement a transformer block as a Keras layer and use it for text classification. It covers setting up the model, implementing the transformer and embedding layers, downloading and preparing the IMDB dataset, creating a classifier model using the transformer layer, and training and evaluating the model.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views3 pages

Text Classification With Transformer - 1716327784332

The document shows how to implement a transformer block as a Keras layer and use it for text classification. It covers setting up the model, implementing the transformer and embedding layers, downloading and preparing the IMDB dataset, creating a classifier model using the transformer layer, and training and evaluating the model.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Search Keras documentation...

Text classi cation with


Transformer
► Code examples / Natural Language Processing / Text classi cation with Transformer ◆ Setup
◆ Implement a Transformer block as
a layer
Text classi cation with Transformer ◆ Implement embedding layer
◆ Download and prepare dataset
Author: Apoorv Nandan ◆ Create classier model using
Date created: 2020/05/10 transformer layer
Last modi ed: 2024/01/18 ◆ Train and Evaluate
Description: Implement a Transformer block as a Keras layer and use it for text classi cation.

ⓘ This example uses Keras 3

View in Colab • GitHub source

Setup
import keras
from keras import ops
from keras import layers

Implement a Transformer block as a layer


class TransformerBlock(layers.Layer):
def __init__(self, embed_dim, num_heads, ff_dim, rate=0.1):
super().__init__()
self.att = layers.MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim)
self.ffn = keras.Sequential(
[layers.Dense(ff_dim, activation="relu"), layers.Dense(embed_dim),]
)
self.layernorm1 = layers.LayerNormalization(epsilon=1e-6)
self.layernorm2 = layers.LayerNormalization(epsilon=1e-6)
self.dropout1 = layers.Dropout(rate)
self.dropout2 = layers.Dropout(rate)

def call(self, inputs):


attn_output = self.att(inputs, inputs)
attn_output = self.dropout1(attn_output)
out1 = self.layernorm1(inputs + attn_output)
ffn_output = self.ffn(out1)
ffn_output = self.dropout2(ffn_output)
return self.layernorm2(out1 + ffn_output)

Implement embedding layer


Two separate embedding layers, one for tokens, one for token index (positions).

class TokenAndPositionEmbedding(layers.Layer):
def __init__(self, maxlen, vocab_size, embed_dim):
super().__init__()
self.token_emb = layers.Embedding(input_dim=vocab_size, output_dim=embed_dim)
self.pos_emb = layers.Embedding(input_dim=maxlen, output_dim=embed_dim)

def call(self, x):


maxlen = ops.shape(x)[-1]
positions = ops.arange(start=0, stop=maxlen, step=1)
positions = self.pos_emb(positions)
x = self.token_emb(x)
return x + positions
Download and prepare dataset Text classi cation with
Transformer
vocab_size = 20000 # Only consider the top 20k words ◆ Setup
maxlen = 200 # Only consider the first 200 words of each movie review
◆ Implement a Transformer block as
(x_train, y_train), (x_val, y_val) =
a layer
keras.datasets.imdb.load_data(num_words=vocab_size)
print(len(x_train), "Training sequences") ◆ Implement embedding layer
print(len(x_val), "Validation sequences") ◆ Download and prepare dataset
x_train = keras.utils.pad_sequences(x_train, maxlen=maxlen) ◆ Create classier model using
x_val = keras.utils.pad_sequences(x_val, maxlen=maxlen)
transformer layer
◆ Train and Evaluate

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-


datasets/imdb.npz
17465344/17464789 [==============================] - 0s 0us/step

<string>:6: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences


(which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or
shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when
creating the ndarray
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/datasets/imdb.py:159:
VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a
list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is
deprecated. If you meant to do this, you must specify 'dtype=object' when creating the
ndarray
x_train, y_train = np.array(xs[:idx]), np.array(labels[:idx])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/datasets/imdb.py:160:
VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a
list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is
deprecated. If you meant to do this, you must specify 'dtype=object' when creating the
ndarray
x_test, y_test = np.array(xs[idx:]), np.array(labels[idx:])

25000 Training sequences


25000 Validation sequences

Create classi er model using transformer layer


Transformer layer outputs one vector for each time step of our input sequence. Here, we take the
mean across all time steps and use a feed forward network on top of it to classify text.

embed_dim = 32 # Embedding size for each token


num_heads = 2 # Number of attention heads
ff_dim = 32 # Hidden layer size in feed forward network inside transformer

inputs = layers.Input(shape=(maxlen,))
embedding_layer = TokenAndPositionEmbedding(maxlen, vocab_size, embed_dim)
x = embedding_layer(inputs)
transformer_block = TransformerBlock(embed_dim, num_heads, ff_dim)
x = transformer_block(x)
x = layers.GlobalAveragePooling1D()(x)
x = layers.Dropout(0.1)(x)
x = layers.Dense(20, activation="relu")(x)
x = layers.Dropout(0.1)(x)
outputs = layers.Dense(2, activation="softmax")(x)

model = keras.Model(inputs=inputs, outputs=outputs)

Train and Evaluate


model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=
["accuracy"])
history = model.fit(
x_train, y_train, batch_size=32, epochs=2, validation_data=(x_val, y_val)
)
Epoch 1/2
782/782 [==============================] - 15s 18ms/step - loss: 0.5112 - accuracy:
0.7070 - val_loss: 0.3598 - val_accuracy: 0.8444 Text classi cation with
Epoch 2/2
Transformer
782/782 [==============================] - 13s 17ms/step - loss: 0.1942 - accuracy:
0.9297 - val_loss: 0.2977 - val_accuracy: 0.8745 ◆ Setup
◆ Implement a Transformer block as
a layer
◆ Implement embedding layer
Terms | Privacy ◆ Download and prepare dataset
◆ Create classier model using
transformer layer
◆ Train and Evaluate

You might also like