Week 03 Structure
Week 03 Structure
2
In previous episodes …
◎ Representation of numbers and characters on stone,
clay, paper and in computer memory
◎ Printing and Typewriters and all the terminology we
inherited from there
◎ Bits, bytes and ASCII encoding
3
Structure of
text
We are so used to texts, we rarely
talk about the basics of their
structure
4
5
[
{
"postId": 2,
"userId": 1,
"title": "Cat explains quantum physics",
"videoUrl": "https://via.placeholder.com/600/deb887",
"thumbnailUrl": "https://via.placeholder.com/150/deb887"
},
{
"postId": 3,
"userId": 1,
"title": "The great pancake flip-off 2024",
"videoUrl": "https://via.placeholder.com/600/dc143c",
"thumbnailUrl": "https://via.placeholder.com/150/dc143c"
},
{
"postId": 1,
"userId": 1,
"title": "Dancing queen takes over the grocery aisle",
"videoUrl": "https://via.placeholder.com/600/8a2be2",
"thumbnailUrl": "https://via.placeholder.com/150/8a2be2"
},
]
7
What text consists of?
◎ Lines (newline)
◎ Spaces: whitespace & indentation
◎ Paragraphs
◎ Why all this structure? What is it for?
8
[
{
"postId": 2,
"userId": 1,
"title": "Cat explains quantum physics",
"videoUrl": "https://via.placeholder.com/600/deb887",
"thumbnailUrl": "https://via.placeholder.com/150/deb887"
},
{
"postId": 3,
"userId": 1,
"title": "The great pancake flip-off 2024",
"videoUrl": "https://via.placeholder.com/600/dc143c",
"thumbnailUrl": "https://via.placeholder.com/150/dc143c"
},
{
"postId": 1,
"userId": 1,
"title": "Dancing queen takes over the grocery aisle",
"videoUrl": "https://via.placeholder.com/600/8a2be2",
"thumbnailUrl": "https://via.placeholder.com/150/8a2be2"
},
]
9
[{"postId": 2, "userId": 1, "title": "Cat explains quantum physics", "videoUrl":
"https://via.placeholder.com/600/deb887", "thumbnailUrl":
"https://via.placeholder.com/150/deb887"}, {"postId": 3, "userId": 1, "title": "The
great pancake flip-off 2024", "videoUrl": "https://via.placeholder.com/600/dc143c",
"thumbnailUrl": "https://via.placeholder.com/150/dc143c"}, {"postId": 1, "userId":
1, "title": "Dancing queen takes over the grocery aisle", "videoUrl":
"https://via.placeholder.com/600/8a2be2", "thumbnailUrl":
"https://via.placeholder.com/150/8a2be2"}]
10
Non-alphanumeric characters
11
Non-alphanumeric characters
General Punctuation
Char Hex Name
, 0x2C comma
. 0x2E full stop, period, decimal point
: 0x3A colon
; 0x3B semicolon
? 0x3F question mark
! 0x21 exclamation point, bang
- 0x2D hyphen, minus sign
" 0x22 double quote mark
' 0x27 right (closing) quotation mark, apostrophe
` 0x60 left (opening) single quotation mark, backtick
12
Non-alphanumeric characters
Brackets
Char Hex Name
( 0x28 left parenthesis
) 0x29 right parenthesis
[ 0x5B left (square) bracket
] 0x5D right (square) bracket
{ 0x7B left (opening) brace
} 0x7D right (closing) brace
< 0x3C left (opening) angle bracket, less-than sign
> 0x3E right (closing) angle bracket, greater-than sign
13
Non-alphanumeric characters
Mathematical Symbols
Char Hex Name
+ 0x2B plus sign
* 0x2A asterisk, multiplication sign
/ 0x2F slash, division sign
= 0x3D equal sign
% 0x25 percent sign
14
Non-alphanumeric characters
Miscellaneous
Char Hex Name
# 0x23 number, hash or pound sign
$ 0x25 dollar sign
& 0x26 ampersand, and-sign
@ 0x40 commercial at-sign
\ 0x5C backslash
^ 0x5E caret, hat
_ 0x5F underscore
| 0x7C vertical bar, pipe
~ 0x7E tilde
15
Symbols with multiple
meanings and names
◎ Dash as in “8-bit number”
or
minus as in 8 - 3 = 5
◎ Angled bracket like in <br>
or
less-than symbol as in 3 < 5
16
Character names worth
remembering
◎ _ underscore – (ex: file_name)
◎ ^ caret – (ex: 2^n)
◎ ~ tilde – (ex: cd ~)
◎ \ backslash – (ex: \n)
◎ ` backtick, backquote
◎ | pipe, vertical line
17
Markup
Languages
18
Markup before
Reporter
computers Editor
19
Proofreading
Markup
20
Computer Markup Language
◎ Provide instructions on how to display text
21
Rendered
MarkDow Plain text
view view
n
22
MarkDown
◎ Very simple markup language
◎ Optimized for readability in plain text format
◎ Good for quick documentation
◎ Very popular in the last 10 years
◎ Try it in the browser:
https://markdownlivepreview.com/
23
Popular Markup Languages
◎ HTML – HyperText ML
◎ XML – eXtensible ML
◎ YAML – Yet Another ML
◎ MarkDown
◎ Wikitext / Wiki Markup – used to edit Wikipedia
24
<html
>
25
HTML and the World Wide Web
were proposed by British
scientist Tim Berners-Lee in
1989 while working at CERN
https://home.web.cern.ch/science/computing/birth-web
https://en.wikipedia.org/wiki/Tim_Berners-Lee 26
HTML and the Browsers
◎ HTML is rendered by the browser
◎ Extremely flexible
◎ The browsers are some of the most complex projects on the
planet
◎ Browsers adhere to many standards maintained and
developed by the W3C and some other organizations
◎ Thanks to strict standards all browsers can (mostly) display all
web pages
Even the very first page can still be displayed:
https://info.cern.ch/hypertext/WWW/TheProject.html
27
Structured
data
Structured data in text
format
JSON and XML
◎ A spreadsheet table feels more
structured than a regular text
document
◎ Columns often have names and
very different format, for example
calendar date or amount in
dollars
◎ Sometimes we want to store data
that is a lot like the spreadsheet
29
[
{
"Region": "Central",
"Rep": "Smith",
"Item": "Desk",
"Units": 2,
"Unit Cost": 125.00,
"Total": 250.00
},
{
"Region": "Central",
"Rep": "Kevin",
"Item": "Desk",
"Units": 5,
"Unit Cost": 125.00,
"Total": 625.00
},
{
"Region": "Central",
"Rep": "Gill",
"Item": "Pencil",
"Units": 7,
"Unit Cost": 1.29,
"Total": 9.03
},
...
30
<?xml version="1.0" encoding="UTF-8" ?>
<root>
<sales>
<record>
<Region>Central</Region>
<Rep>Smith</Rep>
<Item>Desk</Item>
<Units>2</Units>
<UnitCost>125.00</UnitCost>
<Total>250.00</Total>
</record>
<record>
<Region>Central</Region>
<Rep>Kevin</Rep>
<Item>Desk</Item>
<Units>5</Units>
<UnitCost>125.00</UnitCost>
<Total>625.00</Total>
</record>
</sales>
</root>
31
JSON and XML
32
Human vs Machine readable
formats
◎ In the barcode below, the bars are
very difficult for a human to read but
easy for the computer
◎ The number below contain the same
data, but for humans
◎ Text formats are usually designed to
balance human and machine
readability
33
Special
characters and
“escaping”
Characters with special meaning
MarkDown assigns special meaning to characters like # - they are
used to render titles.
They are called special or reserved characters
35
Example of markdown with escaped
# sign
36
Escaping other special
characters
◎ The backslash method is used in most languages:
◎ In MarkDown \# means print a literal #
37
◎ How do you ask MarkDown to
display literal \#?
38
39
Escaping quotation marks in
code
◎ s = "Don't go"
print(s)
◎ s = 'Don't go'
◎ s = 'Don\'t go'
40
Escaping in HTML
◎ In HTML the angled brackets are used for tags
<p> Something </p>
41
Escaping in HTML
◎ To display < and > we can use < and >
○ lt and gt stand for “less-than” and “greater-than”
○ This is called “HTML Character Reference”
◎ Lots of other &something; tags
○ & to display &
◎ See Character Reference on MDN
42
MDN and W3C – the reference for HTML, CSS, JS
○ w3.org
○ developer.mozilla.org
43
Any questions?
44
Links
• The OLDEST websites EVER (short)
45
DRAFTS
47
48