0% found this document useful (0 votes)
48 views6 pages

Handwriting Text Generation

Handwriting Text generation

Uploaded by

Ani G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views6 pages

Handwriting Text Generation

Handwriting Text generation

Uploaded by

Ani G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

1.

Handwriting Text generation

1. Handwritten Text Generation from Visual Archetypes’

1. Style Representation: The model takes as input a set of style samples,


which are handwritten text images in the desired style. These style
samples are processed by a pre-trained convolutional feature extractor,
which extracts high-level features from the images and generates a
sequence of style vectors.

2. Content Representation: Instead of using traditional one-hot vectors to


represent the content of the text, the model uses visual archetypes. These
visual archetypes are binary images that represent geometrically-related
characters. By leveraging the similarities between these archetypes, the
model is able to generate characters that are not directly observed in the
style samples.

3. Transformer Encoder: The style vectors obtained from the style samples
are fed into a Transformer encoder. The encoder uses self-attention
mechanisms to capture long-range dependencies and enriches the style
vectors with contextual information.

4. Transformer Decoder: The content strings, represented as sequences of


visual archetypes, are input to the Transformer decoder. The decoder
performs cross-attention between the style vectors and the content
strings, allowing the model to capture the entanglement between content
and style. This helps in capturing local writing style patterns.

5. Convolutional Decoder: The output of the Transformer decoder, which


represents the entangled content-style representation, is then fed into a
convolutional decoder. The decoder generates the final handwritten text
images conditioned on both the content and the style.

By combining these components, the model is able to generate handwritten


text images that mimic the desired style while preserving the content of
the input text.

Summary : I did not see this work as a writer specific style but
calligraphy specific style generation. The style is extracted in following
way 1st pretrained resnet-18 extracts the feature vectors from specific
writer images which are available but only few samples.It extracts
p-feature maps from p images flatten them and send to transformer
encoder.. The extracted vector is paased to transformer encoder to extract
long term dependency and then it is passes to transformer decoder. The
input to decoder is a styled vector from encoder as well as image
encoding/embeddings of visual archetype. This embedding decide which
content to generate. Then the transformer output is passed to CNN decoder
to generate the image. It also has the writer classification loss.

2.Disentangling Writer and Character Styles for Handwriting Generation:

To extract the style 2 more loose are proposed

Writer-wise Contrastive Learning:


Learning the feature from the same writer the distance between the feature
of the same writer it is minimized and distance between the different
writers is maximized so it's like suppose you are mini-batch a patch and
each example in batch is from the different writer then the distance
between the those example except itself is maximised.

3.2.2 Character-wise Contrastive Learning


he proposed approach, the character-wise style is learned by maximizing the mutual information
between diverse views of a character, thereby enforcing the glyph head to learn the character-wise
style. This is achieved by using a contrastive learning framework, where positive pairs are
independently sampled within a character, and negative samples are sampled from other characters.

3.Handwriting Transformers

This also uses transformer for HT generation. The generator consist of transformer encoder and
decoder. The encoder extracts the feature from input handwritten images. It uses resnet-18 for
feature extraction and then may be applying a self attention to get long term dependencies. The
decoder apply again multihead attention where k,v come from feature map provided by encoder
and q come from decoder input content. The output of decoder is passed to convolution layer to
generate the image. The framework uses the cyclic consistency loss. This loss looks similar to
MSE loss.

4.An Approach Based on Transformer and Deformable Convolution for


Realistic Handwriting Samples Generation

The paper uses the same transformer encoder-decoder architecture for style generation.This
paer uses focal frequency loss which is different than other works to preserve the style.
5. GANwriting: Content-Conditioned Generation of Styled Handwritten Word
Images

Get input style images X_i, e.g. 15 word images from a single writer
Pass each image through a CNN backbone (like VGG19) to extract feature maps. For example,
for an image of size 32x128:
VGG19 convolution layers extract 32 feature maps of size 8x32
These capture stylistic information like strokes, shapes, slant etc.
Aggregate features across all input images:
Resize all feature maps to the same (height, width)
Concatenate along channel dimension
E.g. if there are 15 images, the aggregated map is (32, 8, 32, 15)
Pass aggregated maps through additional convolution layers
To reduce dimensions and compute statistics
Outputs style features Fs, e.g. of size (256, 4, 8)
Add small random noise to Fs
This allows natural variations in style
Fs' = Fs + N(0, 0.05)

Letter-level Online Writer Identification:

Multi-Branch Encoder:
Uses multiple independent encoding branches to capture different representations of a letter
trajectory, with each branch specialized to extract a different prototype writing style.

LSA (Letters and Styles Adapter):


Applies distribution normalization and letter-specific feature selection to reduce discrepancies
between representations of different writing styles for the same letter, addressing intra-writer
variations.

6. HiGAN+: Handwriting Imitation GAN with Disentangled Representations

SLOGAN: Handwriting Style Synthesis for Arbitrary-Length and


Out-of-Vocabulary Text
HiGAN: Handwriting Imitation Conditioned on Arbitrary-Length Texts and
Disentangled Styles

SmartPatch: Improving Handwritten Word Imitation with Patch Discriminators

Content and Style Aware Generation of Text-Line Images for Handwriting


Recognition

ScrabbleGAN: Semi-Supervised Varying Length Handwritten Text Generation


How to Choose Pretrained Handwriting Recognition Models for Single Writer
Fine-Tuning

Here are the common points regarding how writer styles are extracted:

100 : (1,100,512)

(15,100,1024):
Mean of writer style:(1,100,1024)

Fig 1 [1] Hello [ _ _ _ _ _ ] [ 1,10,4096]


1. Style Representation: The papers utilize style samples or writer-specific images to extract the
desired style information. These style samples are processed by pre-trained models or feature
extractors to obtain style vectors or feature maps.

2. Transformer Encoder: Many papers employ Transformer encoders to capture long-range


dependencies and contextual information from the style vectors or feature maps. The
Transformer encoder enriches the style representation by using self-attention mechanisms.

3. Transformer Decoder: The content strings or input sequences, which represent the desired
text or characters, are input to the Transformer decoder. The decoder performs cross-attention
between the style vectors or feature maps obtained from the encoder and the content strings.
This allows the model to capture the entanglement between content and style, enabling the
generation of locally styled text.

4. Convolutional Decoder: After the Transformer decoder, some papers utilize convolutional
decoders to generate the final handwritten text images conditioned on both the content and
style information. The convolutional decoder takes the entangled content-style representation
and produces the output images.

5. Contrastive Learning: In some papers, additional techniques such as writer-wise contrastive


learning or character-wise contrastive learning are employed. These techniques aim to
disentangle writer-specific styles or character-specific styles by minimizing the distance between
features of the same writer or character while maximizing the distance between different writers
or characters.

6. Loss Functions: Different loss functions are used to train the models, such as cyclic
consistency loss, focal frequency loss, and adversarial loss (in the context of GANs).

Overall, these papers combine various techniques, including pre-trained models, Transformer
encoders and decoders, convolutional decoders, and contrastive learning, to extract and
represent writer-specific styles for generating handwritten text images.

[1] Bhunia, Ankan Kumar, et al. "Handwriting transformers." Proceedings of the IEEE/CVF international
conference on computer vision. 2021.
DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion

https://github.com/tmaham/DS-Fusion/tree/main

Improving Handwritten OCR with Training Samples Generated by Glyph Conditional Denoising Diffusion
Probabilistic Model

GlyphDiffusion: Text Generation Is Also Image Generation

GlyphControl: Glyph Conditional Control for Visual Text Generation

RenderDiffusion: Text Generation Is Also Image Generation

full: 33
5skips: 26.51
10skips: 26.99

Improving Handwritten OCR with Training Samples Generated by Glyph Conditional Denoising Diffusion
Probabilistic Model

You might also like