Natural Language Processing GPT-2
Natural Language Processing GPT-2
In [ ]:
In [2]:
nltk.download('reuters')
nltk.download('punkt')
In [47]:
tri_model = defaultdict(lambda : defaultdict(lambda : 0))
for i in most_expected:
print(i[0], i[1])
used 0.035736290819470114
paid 0.03203943314849045
a 0.031423290203327174
the 0.025261860751694395
made 0.02341343191620456
in 0.021565003080714726
able 0.020948860135551448
set 0.012322858903265557
held 0.01170671595810228
no 0.011090573012939002
In [27]:
In [27]:
# Text generation with Trigram model
I am astonished that the relief would undermine international support for development of
airlines during the 1981 tax cut for 1988 , analyst for Salomon Brothers .
There is now just over 500 mln stg while bankers ' acceptance rates of inflation will rea
ch 25 . 56 mln tonnes have traded between 151 and 153 yen after the christian democrats a
nd independents failed to stimulate activity .
In [38]:
br_model = defaultdict(lambda : defaultdict(lambda : 0))
for p1 in br_model:
total_count = float(sum(br_model[p1].values()))
for p2 in br_model[p1]:
br_model[p1][p2] /= total_count
for i in top_expected:
print(i[0], i[1])
expected 0.06490765171503958
a 0.05633245382585752
not 0.045646437994722955
the 0.0420844327176781
to 0.025725593667546173
likely 0.021372031662269128
subject 0.02071240105540897
also 0.01912928759894459
still 0.0183377308707124
in 0.017941952506596307
In [28]:
# Text Generation with Bigram
import random
i = 10
while i>0:
txt = [None]
sentence_finished = False
if txt[len(txt)-1] == None:
sentence_finished = True
Rain reached after years , or lease about reports today at the accord was rising internat
ional protocol to 2 , 179 days of state court on a merger talks .
U .
To the Economics Ministry of textiles and Emery a minimum five or its plan at between the
Commerce Department said it difficult last year was underlined the end of crop report , g
rains , March 26 . 1 mln hectares ) rather vague optimism about 20 pct of preferred the D
epartment said .
But that based on the Fed Chairman of calculating ICO COUNCIL ALLOWED APPEAL ON COSTS Dip
lomat Electronics Ltd is payable April 29 mln Interest rates alone representing 98 dlrs N
et includes gain of 1 .
U .
HARNISCHFEGER INDUSTRIES SELLS JORDAN - 15 pct interest rate of these argue the Wallenber
g company said .
In [48]:
# Tokenisation
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
In [94]:
sample_text = "Today, computers are small enough to fit into a single "
indexed_tokens = tokenizer.encode(sample_text)
In [95]:
token_tensor = torch.tensor([indexed_tokens])
In [96]:
gpt_2 = GPT2LMHeadModel.from_pretrained('gpt2')
In [98]:
with torch.no_grad():
outputs = gpt_2(token_tensor)
predictions = outputs[0]
In [103]:
predicted_index = torch.argmax(predictions[0, -1, :]).item()
In [104]:
print(predicted_text)
In [105]:
!git clone https://github.com/graykode/gpt-2-Pytorch
%cd gpt-2-Pytorch
!curl --output gpt2-pytorch_model.bin https://s3.amazonaws.com/models.huggingface.co/ber
t/gpt2-pytorch_model.bin
!pip install -r requirements.txt
"The New York Times' report about the alleged abuse of a teenage girl by a "bastard" is a
piece of flattery, not a story.
"The Times' report about the alleged abuse of a teenage girl by a "bastard" is a piece of
flattery, not a story." — Donald J. Trump Jr.
"The only reason that the New York Times is so critical of the Trump campaign and Russia
is because the New York Times is
In [ ]: