12-Files Parsing
12-Files Parsing
CS106AP Lecture 12
Roadmap B asics
in g
Programm The C
onsol Ima
e ges
Day 1!
Object-Oriented
Everyday Python
Programming
Command Line
2. What is Parsing?
topics Useful String Functions
How to Parse
3. What’s next?
Review
Command Line &
Arguments
PyCharm Terminal ==
Command Line Terminal/Command
Definition Prompt Definition
Command Line/Terminal Python Console/Interpreter
using Python,
run this script’s
main() function
What’s up with $?
Our convention is to let "$" represent the terminal prompt.
What’s up with $?
Our convention is to let "$" represent the terminal prompt.
e.g.
e.g.
e.g.
>>> 3 * 6
18
Think/Pair/Share:
Line-by-line: what’s happening in the
following code?
Arguments Think/Pair/Share:
Line-by-line: what’s
def main():
happening in the
args = sys.argv[1:] following code?
if len(args) == 1:
print_processed_text(args[0], ‘aei’)
if len(args) == 3 and args[0] == ‘-chars’:
print_processed_text(args[2], args[1])
Arguments Think/Pair/Share:
Line-by-line: what’s
def main():
happening in the
args = sys.argv[1:] following code?
if len(args) == 1:
print_processed_text(args[0], ‘aei’)
if len(args) == 3 and args[0] == ‘-chars’:
print_processed_text(args[2], args[1])
def main():
args = sys.argv[1:]
if len(args) == 1:
print_processed_text(args[0], ‘aei’)
if len(args) == 3 and args[0] == ‘-chars’:
print_processed_text(args[2], args[1])
using Python,
run this script
with all of these arguments!
Takeaways on arguments
● We can use sys.argv to get a list of strings that correspond to the
command line arguments!
● No bold/italics!
● Each line is ended by the ‘\n’ newline character!
○ Except for the last line, which doesn’t have a ‘\n’.
What’s in a text file?
0 The suns are able to fall and rise:\n
1 When that brief light has fallen for us,\n
2 we must sleep a never ending night.
● No bold/italics!
● Each line is ended by the ‘\n’ newline character!
○ Except for the last line, which doesn’t have a ‘\n’.
File Reading – catullus.txt
0 The suns are able to fall and rise:\n
1 When that brief light has fallen for us,\n
2 we must sleep a never ending night.
$GPGSA,M,3,09,23,07,16,30,03,27,,,,,,2.3,1.3,1.9*38
$GPRMC,005328.000,A,3726.1389,N,12210.2515,W,0.00,256.18,221217,,,D*78
$GPGGA,005329.000,3726.1389,N,12210.2515,W,2,07,1.3,22.5,M,-25.7,M,2.0,0000*71
$GPGSA,M,3,09,23,07,16,30,03,27,,,,,,2.3,1.3,1.9*38
$GPRMC,005329.000,A,3726.1389,N,12210.2515,W,0.00,256.18,221217,,,D*79
$GPGGA,005330.000,3726.1389,N,12210.2515,W,2,07,1.3,22.5,M,-25.7,M,3.0,0000*78
$GPGSA,M,3,09,23,07,16,30,03,27,,,,,,2.3,1.3,1.9*38
Definition
Parsing
The act of reading “raw” text and converting it
into a more useful format stored in memory.
● String Manipulation
Components of Parsing
● File Reading
● String Manipulation
● String Manipulation
● String Manipulation
s.isdigit()
s.isspace()
String Manipulation - Useful Functions
s.isalpha()
s.isdigit()
s.isspace()
applies to spaces, tabs, and newlines.
String Manipulation - Useful Functions
s.isalpha()
s.isdigit()
s.isspace()
applies to spaces, tabs, and newlines.
Tabs are written ‘\t’. Newlines are ‘\n’.
String Manipulation - Useful Functions
String Manipulation - Useful Functions
s.startswith(substr)
These functions return booleans!
s.endswith(substr)
String Manipulation - Useful Functions
s.startswith(substr)
These functions return booleans!
s.endswith(substr)
>>> ‘Sonja’.startswith(‘Son’)
String Manipulation - Useful Functions
s.startswith(substr)
These functions return booleans!
s.endswith(substr)
>>> ‘Sonja’.startswith(‘Son’)
True
String Manipulation - Useful Functions
>>> s = ‘computer’
String Manipulation - Useful Functions
>>> s = ‘computer’
>>> ‘put’ in s
String Manipulation - Useful Functions
>>> s = ‘computer’
>>> ‘put’ in s
True
String Manipulation - Useful Functions
>>> s = ‘hello!’
String Manipulation - Useful Functions
>>> s = ‘hello!’
>>> s.find(‘!’)
String Manipulation - Useful Functions
>>> s = ‘hello!’
find() returns the index of the
>>> s.find(‘!’)
first occurrence of the substring
5 you pass in
String Manipulation - Useful Functions
>>> s = ‘hello!’
find() returns the index of the
>>> s.find(‘!’)
first occurrence of the substring
5 you pass in
>>> s.find(‘l’)
String Manipulation - Useful Functions
>>> s = ‘hello!’
find() returns the index of the
>>> s.find(‘!’)
first occurrence of the substring
5 you pass in
>>> s.find(‘l’)
2
String Manipulation - Useful Functions
>>> s = ‘hello!’
>>> s.find(‘w’)
String Manipulation - Useful Functions
>>> s = ‘hello!’
if the string doesn’t contain the
>>> s.find(‘w’)
substring, return -1
-1
String Manipulation - Useful Functions
>>> s = ‘hello!’
>>> s.find(‘w’)
-1
optionally can pass in start index
>>> s.find(‘l’, 3) (or end index)
String Manipulation - Useful Functions
>>> s = ‘hello!’
>>> s.find(‘w’)
-1
optionally can pass in start index
>>> s.find(‘l’, 3) (or end index)
3
String Manipulation - Useful Functions
>>> s = ‘hello!’
>>> s.find(‘w’)
-1
the format is:
>>> s.find(‘l’, 3) s.find(substr, start_index, end_index)
3
String Manipulation - Useful Functions
>>> s = ‘hello!’
>>> s.find(‘w’)
-1
the format is:
>>> s.find(‘l’, 3) s.find(substr, start_index, end_index)
3
Think/Pair/Share:
Find the first ‘@’ in s. Return the
substring made of 0 or more alpha
characters following the ‘@’.
String Manipulation - Useful Functions
>>> s = ‘ hello world! ’
String Manipulation - Useful Functions
>>> s = ‘ hello world! ’
>>> s.strip()
can be used on newlines
and tabs as well as spaces
String Manipulation - Useful Functions
>>> s = ‘ hello world! ’
>>> s.strip()
can be used on newlines
'hello world!' and tabs as well as spaces
String Manipulation - Useful Functions
>>> s = ‘ hello world! ’
>>> s.strip()
can be used on newlines
'hello world!' and tabs as well as spaces
How can we avoid the extra
Recall: (output) output line?
The suns are able to fall and rise:\n\n
When that brief light has fallen for us,\n\n
we must sleep a never ending night.
>>> s = ‘\u03A9’
How do we represent strings?
● Google “omega uppercase unicode”
○ ‘03A9’
○ hexadecimal notation (base-16) = 0-9 plus letters A-F
>>> s = ‘\u03A9’
>>> s
How do we represent strings?
● Google “omega uppercase unicode”
○ ‘03A9’
○ hexadecimal notation (base-16) = 0-9 plus letters A-F
>>> s = ‘\u03A9’
>>> s
‘Ω’
Components of Parsing
● File Reading
● String Manipulation