1.2 - Handling Text in Python
1.2 - Handling Text in Python
IN PYTHON
• Capitalized words
>>> [w for w in text2 if w.istitle()]
['Ethics', 'United', 'Nations']
• s.startswith(t)
• s.endswith(t)
• t in s
• s.isupper(); s.islower(); s.istitle()
• s.isalpha(); s.isdigit(); s.isalnum()
2 Line Segment Title APPLIED TEXT MINING
2 Line Segment Title IN PYTHON
String Operations
Cleaning Text
>>> text8 = ' A quick brown fox jumped over the lazy dog. '
>>> text8.split(' ')
['', '', '\t', 'A', 'quick', 'brown', 'fox', 'jumped', 'over',
'the', 'lazy', 'dog.', '']
>>> text9 = text8.strip()
>>> text9.split(' ')
['A', 'quick', 'brown', 'fox', 'jumped', 'over', 'the', 'lazy',
'dog.']
APPLIED TEXT MINING
IN PYTHON
Changing Text
• Find and replace
>>> text9
'A quick brown fox jumped over the lazy dog.'
>>> text9.find('o')
10
>>> text9.rfind('o')
40
>>> text9.replace('o', 'O')
'A quick brOwn fOx jumped Over the lazy dOg.'
2 Line Segment Title APPLIED TEXT MINING
2 Line Segment Title IN PYTHON
File Operations
• f = open(filename, mode)
• f.readline(); f.read(); f.read(n)
• for line in f: doSomething(line)
• f.seek(n)
• f.write(message)
• f.close()
• f.closed
2 Line Segment Title APPLIED TEXT MINING
2 Line Segment Title IN PYTHON
– Works also for DOS newlines (^M) that shows up as '\r' or '\r\n'
2 Line Segment Title APPLIED TEXT MINING
2 Line Segment Title IN PYTHON