0% found this document useful (0 votes)
24 views55 pages

1.1 XML

The document discusses several key concepts in XML including elements, attributes, comments, relations and structure. Elements are tags that can contain text or other elements as content. Attributes provide additional information about elements. Comments are defined with special tags. The structure of an XML document forms an inverted tree with elements nested in a hierarchical manner.

Uploaded by

Goosman Draco
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views55 pages

1.1 XML

The document discusses several key concepts in XML including elements, attributes, comments, relations and structure. Elements are tags that can contain text or other elements as content. Attributes provide additional information about elements. Comments are defined with special tags. The structure of an XML document forms an inverted tree with elements nested in a hierarchical manner.

Uploaded by

Goosman Draco
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Markup languages - XML

Markup languages
2021 - 2022

Joan Puigcerver
[email protected]
❏ XML
❏ Elements
❏ Comments
❏ Relations and structures
XML ❏ Attributes
❏ XML declaration
❏ Entity relationship
❏ Well-structured XML
❏ Namespaces
XML (eXtensible Markup Language)
❏ XML (eXtensible Markup Language) is a language developed by W3C (World Wide
Web Consortium) that is based on SGML.
❏ XML is a language used for storing and exchanging structured data between
different platforms.
❏ XML is a metalanguage, meaning, it can be used for defining other languages
called XML dialects. Some languages based on ML are:
❏ GML (Geography Markup Language).
❏ MathML (Mathematical Markup Language).
❏ RSS (Really Simple Syndication).
❏ SVG (Scalable Vector Graphics).
❏ XHTML (eXtensible HyperText Markup Language).
❏ XML
❏ Elements
❏ Comments
❏ Relations and structures
XML ❏ Attributes
❏ XML declaration
❏ Entity relationship
❏ Well-structured XML
❏ Namespaces
Elements
XML documents are plain text documents (with no format) and they contain marks
(tags) defined by the developer. These marks are surrounded with the less-than “<” and
greater-than “>” characters. In order to close a tag, the slash “/” character is used.

Exmaple: We can store the name “Mireia” in a XML document with the following syntax:
<name>Mireia</name>
ELEMENTOS: Sintaxis
<tagName>value</tagName>

● tagName: It is the tag identifier. The opening and closing name must be the same.
● value: It is the value we want to store.

Example <name>Mireia</name>

→ A “name” tag has been opened with <name> and then closed with </name>. The
value of the tag has been specified between the opening and closing tag, “Mireia” in this
case.
ELEMENTS: Empty elements
XML documents support empty elements, that don’t contain any value. In that case,
we can write:
<tag></tag>

Or also:
<tag/>

Example: We can specify the “name” tag without a value with:


<name></name> or <name/>
ELEMENTS: Naming rules
Tag names and attributes have to follow the following format:
❏ Any letter of the alphabet (a..z, A..Z), numbers (0..9) , dots ".", dashes "-" and underscores"_".
Non-English letters (á, Á, ñ, Ñ...) are allowed, however, it's recommended not to use them in
order to avoid possible encoding problems.
❏ The first letter must be a letter or an underscore "_" and it can’t start with a number.

❏ Also, colon ":" may be used, however, its use is reserved to define namespaces.

❏ Case Sensitive (It distinguishes between lowercase and uppercase).

❏ No spaces nor blank lines are allowed in the tag names.


Exercise 1
Correct the following tag names:

<country/>
<day>21</dáy>
<month>10<month/>
<city>Barcelona</endcity>
<City>Barcelona</city>
<2colors>Orange and green</2colors>
< red>
< Hobbies >Cinema, Dancing, Swimming</ Hobbies >
<person><name>Maria</person></name>
<favorite color>white</favorite colo>
ELEMENTS: Content
The content of an element will be anything that is contained between the opning and
closing tags. We can find just text or maybe another elements.

Example The element <person> contains two more elements: <name> and <surname>,
that also contain some values:

<person>
<name>Pedro</name>
<surname>Martí</surname>
</person>
ELEMENTS: Mixed content
Also, the two previous cases can be combined so the content of an elements is
composed with text and elements. Mixed content elements.
<person>
Pere Martí
<position>Director</position>
</person>

There is no restriction about defining the content of an XML element.


❏ We can use any available character.
❏ The content can be as long as we need.
❏ It can be written in any language.
❏ There is no problem using white spaces and blank lines.
❏ XML
❏ Elements
❏ Comments
❏ Relations and structures
XML ❏ Attributes
❏ XML declaration
❏ Entity relationship
❏ Well-structured XML
❏ Namespaces
COMMENTS
We can define comments with <!-- and --> tags and they can be placed anywhere inside
the document, but never inside another name tag.

<!-- This would be a comment and it won’t be interpreted as


an element -->
❏ XML
❏ Elements
❏ Comments
❏ Relations and structures
XML ❏ Attributes
❏ XML declaration
❏ Entity relationship
❏ Well-structured XML
❏ Namespaces
RELATIONS
An element (father) can contain more elements
(children).
<person>
Example: The "person" element contains four
<name>Maria</name>
elements (children): "name", "woman", <woman/>
"birth_date" and "city". Also, the "birth_date" <birth_date>
element contains three more elements <day>21</day>
<month>10</month>
(children): "day", "month" and "year". The
<year>1998</year>
“woman” element is an empty element. </birth_date>
<city>Bilbao</city>
</person>
STRUCTURE
Every valid XML document contains the information in a hierarchical way.

The elements contained in another elements are called children. Following the
naming of human relationships, children of children elements are called
grandchildren.

Every XML element must have a single root element from which all others descend.
Structure: Inverted tree
The structure of any XML document can be represented as an inverted tree of
elements. The tree start with the root element and each branch correspond to its
children elements, and so on. Elements with no children are called leafs. The
elements give semantic information about the document.
person
<person>
<name>Maria</name>
<woman/>
<birth_date>
<day>21</day>
<month>10</month> name woman birth_date city
<year>1998</year>
</birth_date>
<city>Bilbao</city> day month year
</person>
Exercise 2
Draw the tree that corresponds to the following XML document:
<company>
<employee>
<surname>Perez</surname>
<name>Juan</name>
<employee_id>1234567890</employee_id>
<email>[email protected]</email>
<phone_number>666 555 444</phone_number>
<address>
<road>Carrer de Pau Claris, 121</road>
<city>Barcelona</city>
<postal_code>08009</postal_code>
</address>
</employee>
</company>
❏ XML
❏ Elements
❏ Comments
❏ Relations and structures
XML ❏ Attributes
❏ XML declaration
❏ Entity relationship
❏ Well-structured XML
❏ Namespaces
ATTRIBUTES
Attriibutes can be defined inside the opening tag of elements of an XML document.
Attributes are used to provide with additional information about the element.

<tagName attributeName=“attributeValue”>elementValue</tagName>

● tagName: Name of the element.


● attributeName: Name* of the attribute. The same rules as the tagName apply.
● attributeValue: Value of the attribute. It is surrounded with double quotes “”.

<person> <person name=”Mireia”>


<name>Mireia</name> </person>
</person>
ATTRIBUTES
It is possible to define as much attributes as needed, with no restriction. Each
attribute must be separated with a white space.

● The order is not important.


● Attributes names must follow the same syntactical rules as the tag names.
● Attribute names cannot be repeated inside the same element. They must be
unique in each element.
Exercise 3
Modify the XML in the previous exercise so the postal_code is an attribute from city and
employee_id is an attribute from employee.

Should road be an attribute of address? Should city be? Justify your answer.
Exercise 4
Write the XML code that corresponds to the following tree:
ATTRIIBUTE xml:lang
The xml:lang attribute is a predefined attribute in XML language. Its purpose is
defining the language of the content of a specific element.

Example
<message xml:lang="ca">Bon dia</message>
<message xml:lang="en">Good morning</message>
The xml:lang is inherited by all the children of an element.
Example
<message xml:lang="ca">
<greeting>Bon dia</greeting>
<goodbye>Adéu</goodbye>
</message>
Greeting and goodbye are also defined in Valencian.
Exercise 5
Fix the following XML code snippets:
<runner position=1>Manel Roure</runner>
<student delegate>Jaume Ravent</delegate>
<car fuel type=”gasoline”>Seat Ibiza</car>
<politician position="mayor" position="minister">
Diana Morant
</politician>

Write an equivalent XML document with only one tag:

<car>
<brand>Seat</brand>
<color>red</color>
<plate>B3456L</plate>
<driver>Jorge</driver>
<driver>Maria</driver>
<rented/>
</car>
❏ XML
❏ Elements
❏ Comments
❏ Relations and structures
XML ❏ Attributes
❏ XML declaration
❏ Entity relationship
❏ Well-structured XML
❏ Namespaces
XML DECLARATION
The XML declaration is a tag that contains details that prepare an XML processor to
parse the XML document. It is optional but if present it must be at the beginning of
the document.

<?xml version="versionNumber" encoding="encodingDeclaration"


standalone="standaloneStatus"?>
Example
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
XML declaration declaration rules
● If the XML declaration is present in the XML, it must be placed in the first line of the XML document.
● If the XML declaration is included, it must contain the version number attribute.
● The XML declaration has no closing tag i.e. </xml>
XML DECLARATION
Values Description

Version* 1.0, 1.1 XML version

Encoding UTF-8, F-8, UTF-16, Character code in which the text has
ISO-10646-UCS-2, been defined. Encoding
ISO-10646-UCS-4, ISO-8859-1, etc

Standalone no, yes Typing standalone = "yes" indicates that


the document is independent of others,
such as an external Document Type
Definition (DTD). Otherwise, it would mean
that the document is not independent.
Exercise 6
Fix the following xml code snippets:
<?xml encoding="UTF-8" standalone="yes"?>
<dog></dog>

-------------------------------------------
<?xml standalone="si"?>
</?xml>

-------------------------------------------
<cat></cat>
<?xml version="1.0" encoding="UTF-8"?>
<dog></dog>
❏ XML
❏ Elements
❏ Comments
❏ Relations and structures
XML ❏ Attributes
❏ XML declaration
❏ Entity relationship
❏ Well-structured XML
❏ Namespaces
Reserved characters
In XML there are some characters that are special for their meaning. In order to write
them in an XML document, you can use the references to entities shown in the
following table:

Character Entity Entity reference

< (less than) lt (less than) &lt;

> (greater than) gt (greater than) &gt;

" (quotation mark) quot (quotation mark) &quot;

' (apostrophe) apos (apostrophe) &apos;

& (ampersand) amp (ampersand) &amp;


EXAMPLE
<?xml version="1.0" encoding="UTF-8"?>
<entities>
<less_than>&lt;</less_than>
<greater_than>&gt;</greater_than>
<quotation>&quot;</quotation>
<apostrophe>&apos;</apostrophe>
<ampersand>&amp;</ampersand>
</entities>
CDATA
A CDATA section is a tag that begins with <![CDATA[ and ends with ]]> . The XML
processor does not interpret its content as marks, but as text. So, if special characters
(<& "') appear in a CDATA section, the XML processor will not interpret a mark
beginning, but considers it one more character.
It is often used in documents where these special characters appear very often, since
using references to entities (& lt; and & amp;) make it difficult to read the document.
EXAMPLE
<?xml version="1.0" encoding ="UTF-8"?>
<test>
<text>The characters < and & cannot be written
as a element value </text>
</test>

-------------------------------------------------------
<?xml version="1.0" encoding ="UTF-8"?>
<test>
<text>The characters &lt; and &amp; cannot
be written as a element value </text>
</test>

-------------------------------------------------------
<?xml version="1.0" encoding ="UTF-8"?>
<prueba>
<texto><![CDATA["The characters < and & cannot
be written as a element value"]]>
</texto></prueba>
Exercise 7
Fix the following XML code snippets and explain why they are invalid and how you resolve them.

<person>
<name>Pere Garcia</name>
</person>
<dog>
<name>Rufy</name>
</dog>
--------------------------------------
<person>
<name>Pere</name>
<surname> Pérez </surname>
</person>
<person>
<surname2>Garcia</surname2>
<age>22</age>
</person>
❏ XML
❏ Elements
❏ Comments
❏ Relations and structures
XML ❏ Attributes
❏ XML declaration
❏ Entity relationship
❏ Well-structured XML
❏ Namespaces
Well-structured XML
Checking the correctness of an XML document means checking that the XML definition
rules are met (previously seen)

❏ There is only one root element that contains the other elements.
❏ Label names must follow the rules.
❏ The opening, closing and empty labels are properly nested (do not overlap) and no
opening or closing label is missing or left over.
❏ Closing tags names match opening tags (even when using uppercase and
lowercase letters).
❏ Closing tags do not contain attributes.
❏ Attribute values must be enclosed in quotation marks (single or double) and all
attributes have some value.
❏ No tag has two attributes with the same name.
Well-structured XML
The easiest way to check if an XML document is valid is opening it with a browser.

There are other command line tools that allow you to validate XML; libxml2 (GNU) ,
Apache Xerces (java), MSXML (microsoft)

Also you can check online websites such as “XML formatter” to check and format your
XML.
XML Editors
An XML document is a plain text file, so any text editor can be used to create and edit
XML files.

❏ Plain text editors


notepad, gedit, jedit, vim, sublime text 3, Visual Studio Code (plugins)
❏ Specific XML editor
XML Copy Editor
❏ Diagram maker
draw.io
Exercice 8
Write an XML document that stores the following information (one XML document for each item)

1. The student Carles Femenia was born on February 23, 2001 in Barcelona.

2. Establishments: “Bar Pasqual” is a tapas bar on 232 Diagonal Avenue owned by Joan Alcover.
“The Meravelles Restaurant” is a menu restaurant on Meridiana 111 owned by Maria Garcia. The
“Racó de l'Àvia” restaurant is a menu restaurant on Av. Rome 221 owned by Alba Puig.

3. List of artworks in a museum: Blue on white is an abstract painting by Pere Àguila from 2014
made in oil. Dark Gray is a 2010 abstract Pere Àguila painting made in watercolor. Camino largo is a
1981 marble sculpture by Marta Lambert.
Exercice 9: Recepies
Create an XML that stores cooking recipe information:

● There must be at least two recipes.


● At least use one attribute.
● There must be at least 4 levels of labels.
Exercise 10: Writing a book (I)
We want to write a book using XML:
The book is structured in:
● Section (at least 2)
● Chapter (at least 2)
● Paragraph (at least 2).
= Parte

1. The book must include the list of authors (with last name and
first name).
2. All items must have a title, except the paragraph that contains
only text. = Capitulo
3. Propose an XML structuring of this document on the image
(with 2 authors, 2 sections, 2 chapters for each section and 2
paragraphs for the first chapter).
4. Check that it is a valid XML document.
Warning: Don’t use attributes.
Exercise 10: Writing a book (II)
1. Attributes
We want to complete the structure of the XML document from the previous exercise with the attributes of the
authors' first and last names and the title of the book, the sections and the chapters.
Analyze the structure of the new document. Are there possible simplifications?
Check, with the help of the editor, that your document is valid.

2. Tree
Draw the structure tree of this previous XML document, including the attributes.
❏ XML
❏ Elements
❏ Comments
❏ Relations and structures
XML ❏ Attributes
❏ XML declaration
❏ Entity relationship
❏ Well-structured XML
❏ Namespaces
NAMESPACES
The namespace is the XML language mechanism of avoiding name collisions between
elements and tags.

XML allows you to mix documents; so maybe there are tags with the same name that
represent different things.

Example: x3dom is an XML based language to define 3d environments.

https://www.x3dom.org/

This language contains the <box> element.


NAMESPACES
Imagine that we want to define our own language to save a list of products for
e-commerce and we want to add a 3D representation of the product.
<product>
<name>Laptop</name>
<price>1000</price>
The box tag is repeated in our
<box>cardboard</box> document and in the one defined
<representation3D>
<x3d width='500px' height='400px'>
by 3dom. Collision!
<scene>
<box>cardboard</box> <box></box>
<shape>
<appearance>
<material
diffuseColor='1 0 0'></material>
</appearance>
<box></box>
</shape>
</scene>
</x3d> For us, the label box refers to a packaging box, but for
</representation3D> the language x3d it means a geometric figure.
</product>
NAMESPACES
To avoid this problem, a unique string of characters is added to the name of the tag, a
URL. (this code is not valid, it is only on a conceptual level)
<http://store.com:product>
<http://store.com:name>Laptop</http://store.com:name>
<http://store.com:price>1000</http://store.com:price>
<http://store.com:box>cardboard</http://store.com:box>
<http://store.com:representation3D>
<https://www.x3dom.org:x3d width='500px' height='400px'>
<https://www.x3dom.org:scene>
<https://www.x3dom.org:shape>
<https://www.x3dom.org:appearance>
<https://www.x3dom.org:material diffuseColor='1 0 0'/>
</https://www.x3dom.org:appearance>
<https://www.x3dom.org:box></https://www.x3dom.org:box>
</https://www.x3dom.org:shape>
</https://www.x3dom.org:scene>
</https://www.x3dom.org:x3d>
</http://store.com:representation3D>
</http://store.com:product>
NAMESPACES
In order to make it easier to read, we can define an alias with xmlns attribute.
<store:product xmlns:store="http://store.com" xmlns:x3dom="https://www.x3dom.org">
<store:name>Laptop</store:name>
<store:precio>1000</store:precio>
<store:box>cardboard</store:box>
<store:representation3D>
<x3dom:x3d width='500px' height='400px'>
<x3dom:scene>
<x3dom:shape>
<x3dom:appearance>
<x3dom:material diffuseColor='1 0 0'></x3dom:material>
</x3dom:appearance>
<x3dom:box></x3dom:box>
</x3dom:shape>
</x3dom:scene>
</x3dom:x3d>
</store:representation3D>
</store:product>
NAMESPACES
❏ xmlns:prefix="URI"
❏ xmlns="URI"
❏ xmlns="" (no namespace)
Example
<e1:example xmlns:e1="http://www.myspace.com/example1"
xmlns:e2="http://www.myspace.com/example2" >
We can specify a namespace in any element.

● xmlns is a reserved attribute to define namespaces.


● prefix: aliast that we’ll use to identify the namespace and use it in the other tags.
● URI (Uniform Resource Identifier)
NAMESPACES
In order to make it even easier to read, we can define a default namespaces and take
advantage of inheritance, that forwards the namespace to its children.
<product xmlns="http://store.com" xmlns:x3dom="https://www.x3dom.org">
<name>Laptop</name>
<price>1000</price>
<box>cardboard</box>
<representation3D>
<x3dom:x3d width='500px' height='400px'>
<scene>
<shape>
<appearance>
<material diffuseColor='1 0 0'></material>
</appearance>
<box></box>
</shape>
</scene>
</x3dom:x3d>
</representation3D>
</product>
NAMESPACES: More examples
Let's imagine two XML documents with a list of students and teachers of the subject

<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>


<teachers>
If we made a document
<name>Pilar Ruiz Pérez</name> on the members of the
<name>Tomás Rodríguez Hernández</name>
</teachers>
ASIX course, the
teachers would not be
distinguished
<?xml version="1.0" encoding="iso-8859-1" standalone="yes" ?> of the students. In
<students>
<name>Fernando Fernández González</name> order to solve it, we will
<name>Isabel González Fernández</name> define a namespace for
<name>Ricardo Martínez López</name>
</students> each context.
NAMESPACES: More examples
In order to avoid colisions, we define a namespace for each context.

<?xml version="1.0" encoding="iso-8859-1" standalone="yes" ?>


<!DOCTYPE members>
<attendees xlmns:students="http://ASIX/alumnos" xlmns:teachers="http://ASIX/profesores">
<students:nombre>Fernando Fernández González</students:nombre>
<students:nombre>Isabel González Fernández</students:nombre>
<students:nombre>Ricardo Martínez López</students:nombre>
<teachers:nombre>Pilar Ruiz Pérez</teachers:nombre>
<teachers:nombre>Tomás Rodríguez Hernández</teachers:nombre>
</attendees>
Example
Avoid name collision in the following XML fragment, preserving the names of the tags and applying
the rules seen in the namespaces.

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>


<inversion>
<pais nombre="Francia">
<capital>
Paris
</capital>
<capital>
1200M€
</capital>
</pais>
</inversion>
Example - correction
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<inversion xmlns=”http://dinero.com” xmlns:geografia=”http://geografia.com” >
<pais nombre="Francia">
<geografia:capital>
Paris
</geografia:capital>
<capital>
1200M€
</capital>
</pais>
</inversion>

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>


<inversion xmlns=”http://dinero.com” >
<pais nombre="Francia" >
<geografia:capital xmlns:geografia=”
http://geografia.com”>
Paris
</geografia:capital>
<capital>
1200M€
</capital>
</pais>
</inversion>
Referencias
Referencia Oficial: https://www.w3schools.com/xml/default.asp

Tutorial XML: https://jorgesanchez.net/manuales/xml/introduccion-lenguajes-de-marcas.html

Tutorial XML: https://www.tutorialspoint.com/es/xml/

Espacio de nombres: https://www.abrirllave.com/xml/espacios-de-nombres.php

Material de Alícia Vázquez

You might also like