PHP Simple HTML DOM Parser Manual
PHP Simple HTML DOM Parser Manual
Quick Start...................................................................................................................................2
How to create HTML DOM object?...........................................................................................2
How to find HTML elements?.....................................................................................................3
How to access the HTML element's attributes...........................................................................5
How to traverse the DOM tree?..................................................................................................6
How to dump contents of DOM object?.....................................................................................7
How to customize the parsing behavior?....................................................................................7
API Reference.................................................................................................................................8
Camel naming convertions...........................................................................................................10
Quick Start
Get HTML elements
// Find (N)th anchor, returns element object or null if not found (zero based)
$ret = $html->find('a', 0);
// Find lastest anchor, returns element object or null if not found (zero based)
$ret = $html->find('a', -1);
Advanced
Descendant selectors
Nested selectors
Attribute Filters
Supports these operators in attribute selectors:
Filter Description
[attribute] Matches elements that have the specified attribute.
[!attribute] Matches elements that don't have the specified attribute.
Matches elements that have the specified attribute with a certain
[attribute=value]
value.
Matches elements that don't have the specified attribute with a
[attribute!=value]
certain value.
Matches elements that have the specified attribute and it starts with
[attribute^=value]
a certain value.
Matches elements that have the specified attribute and it ends with a
[attribute$=value]
certain value.
Matches elements that have the specified attribute and it contains a
[attribute*=value]
certain value.
// Get a attribute ( If the attribute is non-value attribute (eg. checked, selected...), it will
returns true or false)
$value = $e->href;
// Set a attribute(If the attribute is non-value attribute (eg. checked, selected...), set it's
value as true or false)
$e->href = 'my link';
Magic attributes
// Example
$html = str_get_html("<div>foo <b>bar</b></div>");
$e = $html->find("div", 0);
// Wrap a element
$e->outertext = '<div class="wrap">' . $e->outertext . '<div>';
// Append a element
$e->outertext = $e->outertext . '<div>foo<div>';
// Insert a element
$e->outertext = '<div>foo<div>' . $e->outertext;
// If you are not so familiar with HTML DOM, check this link to learn more...
// Example
echo $html->find("#div1", 0)->children(1)->children(1)->children(2)->id;
// or
echo $html->getElementById("div1")->childNodes(1)->childNodes(1)-
>childNodes(2)->getAttribute('id');
// Print it!
echo $html;
Object-oriented way
API Reference
Helper functions
Name Description
object str_get_html ( string $content ) Creates a DOM object from a string.
object file_get_html ( string $filename ) Creates a DOM object from a file or a URL.
Name Description
void Constructor, set the filename parameter
__construct ( [string $filename] ) will automatically load the contents,
either text or file/url.
string Returns the contents extracted from
plaintext HTML.
void Clean up memory.
clear ()
void Load contents from a string.
load ( string $content )
string Dumps the internal DOM tree back into a
save ( [string $filename] ) string. If the $filename is set, result string
will save to file.
void Load contents from a from a file or a URL.
load_file ( string $filename )
void Set a callback function.
set_callback ( string $function_name )
mixed Find elements by the CSS selector. Returns
find ( string $selector [, int $index] ) the Nth element object if index is set,
otherwise return an array of object.
Name Description
string Read or write element's attribure value.
[attribute]
string Read or write the tag name of element.
tag
string Read or write the outer HTML text of
outertext element.
string Read or write the inner HTML text of
innertext element.
string Read or write the plain text of element.
plaintext
mixed Find children by the CSS selector. Returns
find ( string $selector [, int $index] ) the Nth element object if index is set,
otherwise, return an array of object.
DOM traversing
Name Description
mixed Returns the Nth child object if index is set,
$e->children ( [int $index] ) otherwise return an array of children.
element Returns the parent of element.
$e->parent ()
element Returns the first child of element, or null if
$e->first_child () not found.
element Returns the last child of element, or null if
$e->last_child () not found.
element Returns the next sibling of element, or
$e->next_sibling () null if not found.
element Returns the previous sibling of element, or
$e->prev_sibling () null if not found.
Method Mapping
array array
$e->getAllAttributes () $e->attr
string string
$e->getAttribute ( $name ) $e->attribute
void void
$e->setAttribute ( $name, $value ) $value = $e->attribute
bool bool
$e->hasAttribute ( $name ) isset($e->attribute)
void void
$e->removeAttribute ( $name ) $e->attribute = null
element mixed
$e->getElementById ( $id ) $e->find ( "#$id", 0 )
mixed mixed
$e->getElementsById ( $id [,$index] ) $e->find ( "#$id" [, int $index] )
element mixed
$e->getElementByTagName ($name ) $e->find ( $name, 0 )
mixed mixed
$e->getElementsByTagName ( $name [, $e->find ( $name [, int $index] )
$index] )
element element
$e->parentNode () $e->parent ()
mixed mixed
$e->childNodes ( [$index] ) $e->children ( [int $index] )
element element
$e->firstChild () $e->first_child ()
element element
$e->lastChild () $e->last_child ()
element element
$e->nextSibling () $e->next_sibling ()
element element
$e->previousSibling () $e->prev_sibling ()
Ejemplos
// Find all images, print their text with the "<>" included
foreach($html->find('img') as $e)
echo $e->outertext . '<br>';
Like I said earlier, this library is a dream for finding elements, just as the early JavaScript
frameworks and selector engines have become. Armed with the ability to pick content from
DOM nodes with PHP, it's time to analyze websites for changes.
The Script
The following script checks two websites for changes:
// Settings on top
$sitesToCheck = array(
// id is the page ID for selector
array("url" =>
"http://www.arsenal.com/first-team/players", "selector" => "#squad"),
array("url" =>
"http://www.liverpoolfc.tv/news", "selector" =>
"ul[style='height:400px;']")
);
$savePath = "cachedPages/";
$emailContent = "";
// If different, notify!
if($oldContent && $currentContent != $oldContent) {
// Here's where we can do a whoooooooooooooole lotta stuff
// We could tweet to an address
// We can send a simple email
// We can text ourselves
This solution isn't specific to just spying on footy -- you could use this type of script on any
number of sites. This script, however, is a bit simplistic in all cases. If you wanted to spy
on a website that had extremely dynamic code (i.e. a timestamp was in the code), you
would want to create a regular expressions that would isolate the content to just the block
you're looking for. Since each website is constructed differently, I'll leave it up to you to
create page-specific isolators. Have fun spying on websites though...and be sure to let me
know if you hear a good, reliable footy rumor!