Murenei - Natural Language Processing With Python and NLTK
Murenei - Natural Language Processing With Python and NLTK
porter=nltk.PorterStemmer Initialise
Stemmer
[porter.stem(t) for t in words] Create list
of stems
WNL=nltk.WordNetLemmatizer() Initialise
WordNet
lemmatizer
[WNL.lemmatize(t) for t in words] Use the
lemmatizer
sents=nltk.corpus.treebank.tagged_sents(
)
df=pd.DataFrame(time_sents, columns=['text'])
df['text'].str.split().str.len()
df['text'].str.contains('word')
df['text'].str.count(r'\d')
df['text'].str.findall(r'\d')
df['text'].str.replace(r'\w+day\b', '???')
df['text'].str.extract(r'(\d?\d):(\d\d)')
df['text'].str.extractall(r'((\d?\d):(\d\d) ?([ap
]m))')
df['text'].str.extractall(r'(?P<digits>\d)')