Skip to content

[BUG] - Bleu of machine translation #2485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
AparnaAgrawal02 opened this issue Jun 23, 2023 · 2 comments
Open

[BUG] - Bleu of machine translation #2485

AparnaAgrawal02 opened this issue Jun 23, 2023 · 2 comments

Comments

@AparnaAgrawal02
Copy link

AparnaAgrawal02 commented Jun 23, 2023

Add Link

https://pytorch.org/tutorials/beginner/translation_transformer.html

Describe the bug

  • Expected Blue score of 35 -30

  • got near 0.6 - tested on test set got from Multi30k -Attaching here. (format: de|en)

  • Same code as the tutorial

  • code used to calculate Bleu score:

    from nltk.translate.bleu_score import sentence_bleu import numpy as np def bleu4(candidate, reference): score = sentence_bleu([reference], candidate, weights=(0.25, 0.25, 0.25, 0.25)) # score = sentence_bleu([reference], candidate) return score

  • Modified BlocK:

    `from timeit import default_timer as timer
    NUM_EPOCHS = 100
    
    for epoch in range(1, NUM_EPOCHS+1):
        start_time = timer()
        train_loss = train_epoch(transformer, optimizer)
        end_time = timer()
        val_loss = evaluate(transformer)
        print((f"Epoch: {epoch}, Train loss: {train_loss:.3f}, Val loss: {val_loss:.3f}, "f"Epoch time = {(end_time - start_time):.3f}s"))
    
        if epoch % 10 == 0:
            transformer.eval()
            i=0
            total_bleu = 0
            test = open("test.txt", "r").readlines()
            for r in test:
                data = r.strip().split('|')
                reference = data[1].split()
                candidate = translate(transformer, data[0]).split()
                total_bleu += bleu4(candidate, reference)
                print ("----------------- START -----------------")
                print ("GT: ", reference)
                print ("OUT: ", candidate)
                print ("BLEU for this example: ", bleu4(candidate, reference))
                print ("Average BLEU: ", total_bleu/(i+1))
    
                print ("----------------- END -----------------")
                i+=1
            print ("------FINAL BLEU: ", total_bleu/(i))
            with open("bleu_ours.txt", "a") as f:
                f.write("Epoch: " + str(epoch) + " BLEU: " + str(total_bleu/(i)) + "\n")
            if epoch % 50 == 0:
                torch.save(transformer.state_dict(), "transformer_ours"+str(epoch)+".pt")
    

    `

Describe your environment

Linux
Name: torch
Version: 2.0.1

cc @pytorch/team-text-core @Nayef211

@nirajkamal
Copy link
Contributor

nirajkamal commented May 29, 2025

Hi @AparnaAgrawal02, the tutorial seems to be deprecated, can you specify which model you used or any other specifics, so that I can check if this issue exists in any of the current transformers tutorials

@AparnaAgrawal02
Copy link
Author

Hi @nirajkamal, it's been 2 years, I have no clue about this issue right now, you can close it off. Thanks for your attention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants