0% found this document useful (0 votes)
19 views2 pages

LSTM Ref05

phamdinhkhanh - khoa hoc du lieu - mang lstm part 05

Uploaded by

tuong.nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
19 views2 pages

LSTM Ref05

phamdinhkhanh - khoa hoc du lieu - mang lstm part 05

Uploaded by

tuong.nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 2
12721124, 9:52 AN hoa ge gu nchars = len(raw_text) n_vocab = 1en(chars_new) print(‘Total characters: *, n_chars) print(‘Total Vocab: *, n_vocab) Total characters: 163693, 2 Total Vocab: 33 Nhu vay sau chun héa van ban cla ching ta s& bao gbm 163693 tir va 33 ki tu. Tiép theo lé mot ham chuyén héa. mot cau thanh mét vector chi s6 cae ki ty. 1 def _encode_sen(text): 2 Fext = text-lower() 3 sen_vec = [] 4 for let in text s Af Let in chars_nen[:-2) 6 idx = chars_to_int[let] 7 else: 8 idx = chars_to_int['unk"] ° sen_vec. append( idx) 10 return sen_vec u 12x test = _encode_sen(‘ALice is a wonderful story. #") 13 print(x test) 1 [@, 11, 8, 2, 4, 29, 8, 18, 29, @, 29, 22, 14, 13, 3, 4, 17, 5, 28, 11, 29, 18, 19, 14, 17, 2 1 def _dacode_sen(vec) 2 text = [] 3 for i in vec 4 Let = int_to_chars[i] 5 ‘ext append(let) 6 text = "*.Join(text) 7 return text a 8 _decode_sen(x test) 1 ‘alice is a wonderful story. unk* 'D8 dng nhdt 46 dai du vao cho mé hin cn tao ra cdc chusi ki ty (window input) véri véi d@ da fa 100. Miye dich cia ching tala dy bdo ki ty tip theo tir 100 ki ty dBu vo. MBi mét phién du béo window input s8 dug tinh tin Ién 1k’ ty dé thu dugc cae ki ty dy bao lién tiép nhau va tir 46 ghép Iai thanh mét cau hoan chinh, hitps:phamdinhknanh github i120 18/04/22NLy thuyet_ve_mang_LSTM.him! ene 12721124, 9:52 AN a hoa ge gu # prepare the datoset of input to output pairs encoded os integers seq_length = 100 datax = [ aatay = [ for i in range(@, n_chars - seq_length, 1) # Udy ra 160 Ri ty Lian trac seq_in = raw_text[i:i + seq_length # Udy ra ki ty Lin sau 108 ki ty a5 seq_out = raw_text[i + seq_length] datax.append(_encode_sen(sea_in)) dataY-append(_encode_sen(seq_out)[@]) jatterns = len(datax) otal Patterns: ", n_patterns) print( Total Patterns: 163593 'Bé 06 thé dva vao m8 hinh LSTM, dau vao x c&n duge chudn héa thanh mot ma tran 3 chitu samples, tice steps, features . Trong dé: 1. samples: Sé lrgng mu du vao (irc 56 lwgng cira s6 window 100 length). 2. time steps: 86 dai cia cra sé window chinh la s6 lurgng cdc véng lap khi duge trai phing & hinh 2 cu tric tri ph&ng mang no ron, Trong mé hinh nay time steps = 100 3. features: S6 lrgng chiéu dug ma héa cla du vao. Trong mé hinh LSTM, mBi m@t tr ho8c ki ty (ty theo ching ta lam vige vei level nao} 1uémg urge ms héa theo 2 cach théng thuéng sau day: © ma héa theo one-hot encoding 48 mét k ty (@ bal thuc hanh nay la ki ty) duge bibu didn bai mat véc to ‘one-hot «ma héa theo gid tr véc to dug Idytir mé hinh word embedding pretrain tréc a6. Cé thé la word2ve6, fasttext, ELMo, BERT... Sé lygng cae chibu theo level ki ty thuréng it hon so v6i evel ti. Trong bai nay 48 don gidn ta khong sir dung mét lop embedding 6 du a8 ning th thanh véc to ma sir dung {ruc tiép gid tr du vao 8 index cia ki ty. Do 46 s6 features = 1. 4# reshape X to be [samples, time steps, features] X.train = numpy.reshape(datax, (n_patterns, seq_length, 1)) normalize Xtrain = Xtrain / float(n_vocab) 4 one hot encode the output variable y_train = np_utils.to_categorical (¢ataY) print("X [sasples, tine steps, features] shape: ‘, X train.shape) print(‘Y shape: ", y_train.shape) X [sanples, time steps, features] shape: (163593, 108, 2) Y shape: (263593, 33) print type (x, print type. srain)) rain)) “Théng ké s6 lugng cdc ki ty theo nhém. hitps:Jphamdinhknanh github i120 18/04/22NLy thuyet_ve_mang_LSTM.him! 016

You might also like