0% found this document useful (0 votes)
9 views49 pages

Week 03 Structure

The document outlines the structure of text files, discussing elements such as punctuation, whitespace, line breaks, and markup languages like Markdown and HTML. It also covers structured data formats like JSON and XML, emphasizing the importance of special characters and escaping in text representation. Additionally, it highlights the differences between human-readable and machine-readable formats, providing examples of how to handle special characters in various contexts.

Uploaded by

Yash Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views49 pages

Week 03 Structure

The document outlines the structure of text files, discussing elements such as punctuation, whitespace, line breaks, and markup languages like Markdown and HTML. It also covers structured data formats like JSON and XML, emphasizing the importance of special characters and escaping in text representation. Additionally, it highlights the differences between human-readable and machine-readable formats, providing examples of how to handle special characters in various contexts.

Uploaded by

Yash Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 49

COMP 1238

Lecture 3 – Structure of text


Agenda
◎ Structure of a text file
◎ Punctuation, whitespace, line breaks
◎ Markup and markup languages, raw vs rendered view.
MarkDown, HTML
◎ Structured data, JSON and XML formats
◎ Special characters and “escaping”

2
In previous episodes …
◎ Representation of numbers and characters on stone,
clay, paper and in computer memory
◎ Printing and Typewriters and all the terminology we
inherited from there
◎ Bits, bytes and ASCII encoding

3
Structure of
text
We are so used to texts, we rarely
talk about the basics of their
structure

4
5
[
{
"postId": 2,
"userId": 1,
"title": "Cat explains quantum physics",
"videoUrl": "https://via.placeholder.com/600/deb887",
"thumbnailUrl": "https://via.placeholder.com/150/deb887"
},
{
"postId": 3,
"userId": 1,
"title": "The great pancake flip-off 2024",
"videoUrl": "https://via.placeholder.com/600/dc143c",
"thumbnailUrl": "https://via.placeholder.com/150/dc143c"
},
{
"postId": 1,
"userId": 1,
"title": "Dancing queen takes over the grocery aisle",
"videoUrl": "https://via.placeholder.com/600/8a2be2",
"thumbnailUrl": "https://via.placeholder.com/150/8a2be2"
},
]
7
What text consists of?
◎ Lines (newline)
◎ Spaces: whitespace & indentation
◎ Paragraphs
◎ Why all this structure? What is it for?

8
[
{
"postId": 2,
"userId": 1,
"title": "Cat explains quantum physics",
"videoUrl": "https://via.placeholder.com/600/deb887",
"thumbnailUrl": "https://via.placeholder.com/150/deb887"
},
{
"postId": 3,
"userId": 1,
"title": "The great pancake flip-off 2024",
"videoUrl": "https://via.placeholder.com/600/dc143c",
"thumbnailUrl": "https://via.placeholder.com/150/dc143c"
},
{
"postId": 1,
"userId": 1,
"title": "Dancing queen takes over the grocery aisle",
"videoUrl": "https://via.placeholder.com/600/8a2be2",
"thumbnailUrl": "https://via.placeholder.com/150/8a2be2"
},
]
9
[{"postId": 2, "userId": 1, "title": "Cat explains quantum physics", "videoUrl":
"https://via.placeholder.com/600/deb887", "thumbnailUrl":
"https://via.placeholder.com/150/deb887"}, {"postId": 3, "userId": 1, "title": "The
great pancake flip-off 2024", "videoUrl": "https://via.placeholder.com/600/dc143c",
"thumbnailUrl": "https://via.placeholder.com/150/dc143c"}, {"postId": 1, "userId":
1, "title": "Dancing queen takes over the grocery aisle", "videoUrl":
"https://via.placeholder.com/600/8a2be2", "thumbnailUrl":
"https://via.placeholder.com/150/8a2be2"}]

10
Non-alphanumeric characters

◎ Alphanumeric characters – letters and digits


like A, b, c … and 0 to 9
◎ Non-alphanumeric characters are everything else –
Examples: {[(+)]};-)
sometimes shortened as non-alpha
◎ How many are there in ASCII?
◎ A brief look at them all …

11
Non-alphanumeric characters
General Punctuation
Char Hex Name
, 0x2C comma
. 0x2E full stop, period, decimal point
: 0x3A colon
; 0x3B semicolon
? 0x3F question mark
! 0x21 exclamation point, bang
- 0x2D hyphen, minus sign
" 0x22 double quote mark
' 0x27 right (closing) quotation mark, apostrophe
` 0x60 left (opening) single quotation mark, backtick
12
Non-alphanumeric characters
Brackets
Char Hex Name
( 0x28 left parenthesis
) 0x29 right parenthesis
[ 0x5B left (square) bracket
] 0x5D right (square) bracket
{ 0x7B left (opening) brace
} 0x7D right (closing) brace
< 0x3C left (opening) angle bracket, less-than sign
> 0x3E right (closing) angle bracket, greater-than sign

13
Non-alphanumeric characters
Mathematical Symbols
Char Hex Name
+ 0x2B plus sign
* 0x2A asterisk, multiplication sign
/ 0x2F slash, division sign
= 0x3D equal sign
% 0x25 percent sign

14
Non-alphanumeric characters
Miscellaneous
Char Hex Name
# 0x23 number, hash or pound sign
$ 0x25 dollar sign
& 0x26 ampersand, and-sign
@ 0x40 commercial at-sign
\ 0x5C backslash
^ 0x5E caret, hat
_ 0x5F underscore
| 0x7C vertical bar, pipe
~ 0x7E tilde
15
Symbols with multiple
meanings and names
◎ Dash as in “8-bit number”
or
minus as in 8 - 3 = 5
◎ Angled bracket like in <br>
or
less-than symbol as in 3 < 5

16
Character names worth
remembering
◎ _ underscore – (ex: file_name)
◎ ^ caret – (ex: 2^n)
◎ ~ tilde – (ex: cd ~)
◎ \ backslash – (ex: \n)
◎ ` backtick, backquote
◎ | pipe, vertical line

17
Markup
Languages

18
Markup before
Reporter
computers Editor

19
Proofreading
Markup

20
Computer Markup Language
◎ Provide instructions on how to display text

21
Rendered
MarkDow Plain text
view view
n

22
MarkDown
◎ Very simple markup language
◎ Optimized for readability in plain text format
◎ Good for quick documentation
◎ Very popular in the last 10 years
◎ Try it in the browser:
https://markdownlivepreview.com/

23
Popular Markup Languages
◎ HTML – HyperText ML
◎ XML – eXtensible ML
◎ YAML – Yet Another ML
◎ MarkDown
◎ Wikitext / Wiki Markup – used to edit Wikipedia

24
<html
>

25
HTML and the World Wide Web
were proposed by British
scientist Tim Berners-Lee in
1989 while working at CERN

https://home.web.cern.ch/science/computing/birth-web
https://en.wikipedia.org/wiki/Tim_Berners-Lee 26
HTML and the Browsers
◎ HTML is rendered by the browser
◎ Extremely flexible
◎ The browsers are some of the most complex projects on the
planet
◎ Browsers adhere to many standards maintained and
developed by the W3C and some other organizations
◎ Thanks to strict standards all browsers can (mostly) display all
web pages
Even the very first page can still be displayed:
https://info.cern.ch/hypertext/WWW/TheProject.html

27
Structured
data
Structured data in text
format
JSON and XML
◎ A spreadsheet table feels more
structured than a regular text
document
◎ Columns often have names and
very different format, for example
calendar date or amount in
dollars
◎ Sometimes we want to store data
that is a lot like the spreadsheet
29
[
{
"Region": "Central",
"Rep": "Smith",
"Item": "Desk",
"Units": 2,
"Unit Cost": 125.00,
"Total": 250.00
},
{
"Region": "Central",
"Rep": "Kevin",
"Item": "Desk",
"Units": 5,
"Unit Cost": 125.00,
"Total": 625.00
},
{
"Region": "Central",
"Rep": "Gill",
"Item": "Pencil",
"Units": 7,
"Unit Cost": 1.29,
"Total": 9.03
},
...
30
<?xml version="1.0" encoding="UTF-8" ?>
<root>
<sales>
<record>
<Region>Central</Region>
<Rep>Smith</Rep>
<Item>Desk</Item>
<Units>2</Units>
<UnitCost>125.00</UnitCost>
<Total>250.00</Total>
</record>
<record>
<Region>Central</Region>
<Rep>Kevin</Rep>
<Item>Desk</Item>
<Units>5</Units>
<UnitCost>125.00</UnitCost>
<Total>625.00</Total>
</record>
</sales>
</root>

31
JSON and XML

◎ JSON stands for JavaScript Object Notation


◎ XML stands for eXtensible Markup Language
◎ JSON is newer and taking over, but XML is still used a
lot
◎ Both are text-based formats used to serialize
structured data, they are intended to be read by
computer software, but still be human-readable

32
Human vs Machine readable
formats
◎ In the barcode below, the bars are
very difficult for a human to read but
easy for the computer
◎ The number below contain the same
data, but for humans
◎ Text formats are usually designed to
balance human and machine
readability

33
Special
characters and
“escaping”
Characters with special meaning
MarkDown assigns special meaning to characters like # - they are
used to render titles.
They are called special or reserved characters

What if we want to ask


MarkDown to treat some #
character as if it’s a regular
character with no special
meaning?

35
Example of markdown with escaped
# sign

36
Escaping other special
characters
◎ The backslash method is used in most languages:
◎ In MarkDown \# means print a literal #

37
◎ How do you ask MarkDown to
display literal \#?

38
39
Escaping quotation marks in
code
◎ s = "Don't go"
print(s)
◎ s = 'Don't go'
◎ s = 'Don\'t go'

40
Escaping in HTML
◎ In HTML the angled brackets are used for tags
<p> Something </p>

41
Escaping in HTML
◎ To display < and > we can use &lt; and &gt;
○ lt and gt stand for “less-than” and “greater-than”
○ This is called “HTML Character Reference”
◎ Lots of other &something; tags
○ &amp; to display &
◎ See Character Reference on MDN

42
MDN and W3C – the reference for HTML, CSS, JS
○ w3.org
○ developer.mozilla.org

43
Any questions?

44
Links
• The OLDEST websites EVER (short)

45
DRAFTS
47
48

You might also like