30.09.
2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes
TOPICS
Parsing WordPress Tips (9)
HTML in Web Performance (9)
PHP using Plugins (8)
Native Web Hosting (5)
Classes ThemesDisclosure:
(4) A liate links used
Updated on March 19, 2019 by Web Design (3)
Abhinav
Web Development (3)
As you might already Web Analytics (3)
know, PHP is a popular
Apps & Tools (3)
backend language that
powers many popular Web Security (2)
CMSs, including Social Media (1)
WordPress. If you are
stepping into
WordPress or PHP
development, you will
nd this article helpful.
You might already know
how to parse HTML
using Javascript or
JQuery if you have ever
dealt with DOM
(Document Object
Model) manipulation on
the front-end.
https://www.coralnodes.com/pars ng-html- n-php/ 1/20
30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes
Related: Should you
learn JQuery in 2019?
Since Javascript runs
on the client-side, it can
interact with the
browser DOM.
But what if we want to
process HTML data on
the server? In this post,
let us look at some of Disclosure: A liate links used
the useful PHP classes
which enables us to
process HTML on the
server-side.
Table of
Contents
1. What is Parsing &
What are its Uses?
2. Important DOM
classes in PHP
3. DOMDocument,
Nodes & Elements
4. Practical
Examples
4.1. Selecting by ID
4.2. Selecting a Tag
by Its Name
4.3. Find elements
with a particular
class
https://www.coralnodes.com/pars ng-html- n-php/ 2/20
30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes
4.4. Extract links
from a page
4.5. Modifying &
Saving HTML
4.5.1. Inserting new
HTML element into
the document
4.5.2. Deleting an
element from the
document
4.6. Manipulating
Disclosure: A liate links used
Attributes
5. Conclusion
What is
Parsing &
What are
its Uses?
“ Parsing
(in
this
case)
is
the
process
of
extracting
https://www.coralnodes.com/pars ng-html- n-php/ 3/20
30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes
or
modifying
useful
information
from
an
HTML
or
XML Disclosure: A liate links used
string.
A
parser
gives
us
easy
ways
to
query
raw
data
instead
of
using
regex.
https://www.coralnodes.com/pars ng-html- n-php/ 4/20
30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes
Suppose you want to
get all the links on a
web page. PHP DOM
parsing classes can
help you.
The Table of Contents
you see above is
another simple
application of PHP DOM
parsing classes. In that
Disclosure: A liate links used
plugin, it extracts all the
headings from the page,
sorts it, creates a new
element, and inserts it
back into the page
content.
Important
DOM
classes in
PHP
There are around
nineteen DOM-related
classes in PHP. Some
of the important ones
are:
DOMDocument
(extends DOMNode
class)
DOMNode
https://www.coralnodes.com/pars ng-html- n-php/ 5/20
30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes
DOMNodeList
DOMXPath
DOMElement
(extends DOMNode
class)
DOMDocument,
Nodes &
Elements
Disclosure: A liate links used
The DOMDocument is the
rst one to mention
here. It takes HTML as
input and returns an
object that gives access
to DOM elements. It can
load HTML or XML from
a string or le. The class
de nes several methods
like getElementById
which resemble the
functions in Javascript.
$dom = new DOMDocumen
//examples
//methods to load HTM
$dom->loadHTML($html_
$dom->loadHTMLFile('p
//methods to load XML
$dom->load('path/to/x
$dom->loadXML($xml_st
https://www.coralnodes.com/pars ng-html- n-php/ 6/20
30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes
$documentElement = $d
//object of DOMElemen
In this post, we will
mainly think about
HTML manipulation
over XML.
Nodes
The DOM made from
HTML is a tree-like Disclosure: A liate links used
structure made up of
individual nodes. These
nodes can be of any
type, say an element,
text, comment, attribute
etc. DOMNode is the base
class from which all
types of node classes
inherit.
Elements
The DOMElement class
extends the DOMNode
class which can
represent the elements
in your HTML markup.
An object of DOMElement
can be any element like
an image, div, span,
table etc.
https://www.coralnodes.com/pars ng-html- n-php/ 7/20
30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes
Practical
Examples
Without going more into
the theories, let us dive
into some practical
examples. First of all,
we want some HTML
data. For that, let us use
one of the posts in this
blog about image Disclosure: A liate links used
optimization.
We will do the following
jobs with our sample
HTML:
Select element by Id
Get elements by its
tag name
Find elements by
class
Find all links in a
page
Inserting HTML
element
Deleting an element
Dealing with
attributes
Here is the curl request:
https://www.coralnodes.com/pars ng-html- n-php/ 8/20
30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes
header('Content-Type:
$url = "https://www.c
$ch = curl_init();
curl_setopt($ch, CURL
curl_setopt($ch, CURL
curl_setopt($ch, CURL
$res = curl_exec($ch)
curl_close($ch);
The variable $res
Disclosure: A liate links used
contains the whole
HTML from the web-
page.
Selecting by ID
If you look at our
sample page, you can
see that it contains two
tables. Suppose I want
to nd the number of
rows in the rst table.
Using chrome dev-tools,
I found that the required
table has the Id –
#tablepress-3 .
$dom = new DomDocumen
@ $dom->loadHTML($res
$table = $dom->getEle
$child_elements = $ta
$row_count = $child_e
echo "No. of rows in
https://www.coralnodes.com/pars ng-html- n-php/ 9/20
30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes
The above code gives
the output:
No. of rows in the t
Selecting a Tag
by Its Name
Both the DOMDocument
Disclosure: A liate links used
and DOMElement classes
have the method
getElementsByTagName()
which allows us to
select elements using
the name of the tag. For
example, if we have to
get all the h2 headings
from a page, we can use
this function.
$dom = new DomDocumen
@ $dom->loadHTML($res
$h2s = $dom->getEleme
foreach( $h2s as $h2
echo $h2->textCon
}
The result:
Test Images
Results after Compre
ShortPixel
https://www.coralnodes.com/pars ng-html- n-php/ 10/20
30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes
reSmush.it
Imagify
TinyPNG Compress JPE
Kraken.IO
EWWW Image Optimizer
WP Smush
Do you actually need
Consclusion
Find elements
with a particular
class Disclosure: A liate links used
In Javascript, the
querySelectorAll()
method makes it easy
to select any elements
using a CSS selector. In
PHP, it is not that
straightforward. Instead,
we have to use the
DOMXpath class to query
and traverse the DOM
tree.
Example: Select all the
tables with the class
tablepress.
$dom = new DomDocumen
@ $dom->loadHTML($res
$xpath = new DOMXpath
$tables = $xpath->que
$count = $tables->len
https://www.coralnodes.com/pars ng-html- n-php/ 11/20
30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes
echo "No. of tables "
Just like
getElementByTagName() ,
the query() method of
DOMXpath also returns a
DOMNodeList . It takes an
expression as an
argument. This XPath
expression is so
Disclosure: A liate links used
versatile that we can
perform almost any type
of queries.
If you are new to XPath,
this cheatsheet from
Devhints.io contains a
wide list of CSS & JS
selectors and their
corresponding XPath
expressions. It will help
you in nding out the
appropriate expression
for the query you want
to perform.
Extract links
from a page
Parsing opens a number
of opportunities.
Extracting the links from
a web-page is one such
use. That’s how
https://www.coralnodes.com/pars ng-html- n-php/ 12/20
30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes
crawlers crawl the world
wide web.
Suppose I want to nd
all the external links to a
particular website on a
web-page. In our
sample page, what I like
to do is to nd all the
outbound links to the
wordpress.org website
Disclosure: A liate links used
from the blog post. So,
this is how I did it.
$dom = new DomDocumen
@ $dom->loadHTML($res
$links = $dom->getEle
$urls = [];
foreach($links as $li
$url = $link->get
$parsed_url = par
if( isset($parsed
$urls[] = $ur
}
}
var_dump($urls);
Modifying &
Saving HTML
So far we saw how to
extract or select the
required data from
HTML. Now, let us see
how we can modify it by
https://www.coralnodes.com/pars ng-html- n-php/ 13/20
30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes
adding or deleting
elements and attributes.
Inserting new
HTML element into
the document
In this example, we will
see how to add an
image with a link after
the rst paragraph. This
is how you insert banner
ads between posts. Disclosure: A liate links used
$dom = new DomDocumen
@ $dom->loadHTML($htm
$ps = $dom->getElemen
$first_para = $ps->it
$html_to_add = '<div>
$dom_to_add = new DOM
@ $dom_to_add->loadHT
$new_element = $dom_t
$imported_element = $
$first_para->parentNo
$output = @ $dom->sav
echo $output;
Note that The
saveHTML() method
return the manipulated
html string.
Deleting an element
from the document
https://www.coralnodes.com/pars ng-html- n-php/ 14/20
30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes
To delete an element
from our HTML, we can
make use of the
removeChild() method
from the DOMElement
class.
$html = '<p>This is o
<div class="del">Dele
<p>This is our second
<p>This is our third
<div class="del">Dele
Disclosure: A liate links used
$dom = new DomDocumen
@ $dom->loadHTML($htm
$documentElement = $d
echo $dom->saveHTML()
$xpath = new DOMXpath
$elems = $xpath->quer
foreach( $elems as $e
$elem->parentNode
}
echo '<br><br>-------
echo $dom->saveHTML()
Here we have
performed an XPath
query to nd all the
elements with the class
del . Then we remove
each node from the
document by iterating
over the DOMNodeList
object using a foreach
loop.
https://www.coralnodes.com/pars ng-html- n-php/ 15/20
30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes
This is our first pa
Delete this
This is our second p
This is our third pa
Delete this too
-------after deletio
This is our first pa
This is our second p
This is our third pa
Disclosure: A liate links used
Manipulating
Attributes
Classes and Ids are not
the only attributes we
can access in PHP
DOM. The DOMElement
class has several
functions which can get,
set or remove attributes
from an element. These
methods look similar to
that of Javascript. So
you will nd it easy to
understand.
getAttribute($attribute_name)
– get the value of an
attribute
setAttribute($attribute_name,
$attribute_value) –
https://www.coralnodes.com/pars ng-html- n-php/ 16/20
30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes
set the value of an
attribute
hasAttribute($attribute_name)
– checks whether an
element has a
certain attribute and
returns a true or
false
$html = '<span class=
$dom = new DomDocumen
Disclosure: A liate links used
@ $dom->loadHTML($htm
$elem = $dom->getElem
if( $elem->hasAttribu
echo 'attribute v
$elem->setAttribu
echo '<br>updated
}
Conclusion
So far, we have looked
into some of the
important DOM APIs in
PHP. I hope that it will
help you to get started
in parsing HTML and
XML data with ease. If I
am not clear in certain
points, do ask it in the
comments.
About the
https://www.coralnodes.com/pars ng-html- n-php/ 17/20
30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes
author
Abhinav R
(Vishnu) is a
blogger with
a keen
interest in
learning
web trends
and
exploring
the world of
Disclosure: A liate links used
WordPress.
Apart from
that, he also
has a
passion for
nature
photography
and travel.
Posted in Guides &
Tips Tagged Web
Development
WP How
Super to
Cache Delete
vs and
WP Limit
Fastest WordPr
Cache Post
– Revisio
https://www.coralnodes.com/pars ng-html- n-php/ 18/20
30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes
Which
is
the
Best?
Leave a Reply
Your email address will
not be published.
Required elds are
marked *
Disclosure: A liate links used
Comment
Name *
Email *
Website
POST COMMENT
USEFUL LINKS POPULAR TAGS
About Web Performance
Contact WordPress Themes
Privacy Policy WordPress Plugins
Terms and Conditions Analytics
Disclaimer SEO
Disclosure
https://www.coralnodes.com/pars ng-html- n-php/ 19/20
30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes
Copyright © 2011-2019 CoralNodes.Com Hosted with Cloudways on DigitalOcean
Disclosure: A liate links used
https://www.coralnodes.com/pars ng-html- n-php/ 20/20