Creating an HTML Document – Create or Load HTML in C#

This article offers a detailed guide on how to create an HTML document. Aspose.HTML for .NET API provides the HTMLDocument class that is the root of the HTML hierarchy and holds the entire content.

HTML Document

The HTMLDocument is a starting point for Aspose.HTML for .NET library. You can load the HTML page into the Document Object Model (DOM) by using the HTMLDocument class and then programmatically read, modify the document tree, add and remove nodes, change the node properties in the document as it is described in the official specifications.

The HTMLDocument class provides an in-memory representation of an HTML DOM and entirely based on W3C DOM and WHATWG DOM specifications that are supported in many modern browsers. If you are familiar with WHATWG DOM, WHATWG HTML, and JavaScript standards, you will find it quite comfy to use the Aspose.HTML for .NET. Otherwise, you can visit www.w3schools.com, where you can find a lot of examples and tutorials on how to work with HTML documents.

HTML documents can be created from scratch as an empty document with HTML structure, from a string, from a memory stream or loaded from a file or a URL. The HTMLDocument has several overloaded constructors allowing you to create or load HTML documents.

Create an Empty HTML Document

Once the document object is created, it can be filled later with HTML elements. The following code snippet shows the usage of the default HTMLDocument() constructor to create an empty HTML document and save it to a file.

 1// Create an empty HTML document using C#
 2
 3// Prepare an output path for a document saving
 4string documentPath = Path.Combine(OutputDir, "create-empty-document.html");
 5
 6// Initialize an empty HTML Document
 7using (HTMLDocument document = new HTMLDocument())
 8{
 9    // Work with the document
10
11    // Save the document to a file
12    document.Save(documentPath);
13}

After the creation, file create-empty-document.html appears with the initial document structure: the empty document includes elements such as <html> <head> and <body>. More details about HTML files saving are in the Save HTML Document article.

Create a New HTML Document

If you want to generate a document programmatically from scratch, please use constructor without parameters as specified in the following code snippet:

 1// Create an HTML document using C#
 2
 3// Prepare an output path for a document saving
 4string documentPath = Path.Combine(OutputDir, "create-new-document.html");
 5
 6// Initialize an empty HTML Document
 7using (HTMLDocument document = new HTMLDocument())
 8{
 9    // Create a text node and add it to the document
10    Text text = document.CreateTextNode("Hello, World!");
11    document.Body.AppendChild(text);
12
13    // Save the document to a disk
14    document.Save(documentPath);
15}

In the new document, we have created a text node, given the specified string, using the CreateTextNode() method and added it to the body element using AppendChild() method.

How to edit an HTML file is described in detail in the Edit HTML Document article.

Load from a File

Following code snippet shows how to load the HTMLDocument from an existing file:

 1// Load HTML from a file using C#
 2
 3string htmlFile = Path.Combine(OutputDir, "load-from-file.html");
 4
 5// Prepare a load-from-file.html document
 6File.WriteAllText(htmlFile, "Hello, World!");
 7
 8// Load from the load-from-file.html 
 9using (HTMLDocument document = new HTMLDocument(htmlFile))
10{
11    // Write the document content to the output stream
12    Console.WriteLine(document.DocumentElement.OuterHTML);
13}

In the example above, the HTML document loaded from a file using HTMLDocument (string) constructor. If you require to load an existing HTML file from a disk, work and save it, then the following code snippet will help you.

 1// Load an HTML documment from a file using C#
 2
 3// Prepare a file path
 4string documentPath = Path.Combine(DataDir, "sprite.html");
 5
 6// Initialize an HTML document from the file
 7using (HTMLDocument document = new HTMLDocument(documentPath))
 8{
 9    // Work with the document
10
11    // Save the document to a disk
12    document.Save(Path.Combine(OutputDir, "sprite_out.html"));
13}

Load from a URL

The ability to select files and interact with them on the user’s local device is one of the most used features of the Internet. In the next code snippet, you can see how to load a web page into the HTMLDocument.

In case if you pass a wrong URL that can’t be reached right at the moment, the library throws the DOMException with specialized code ‘NetworkError’ to inform you that the selected resource can not be found.

 1// Load HTML from a URL using C#
 2
 3// Load a document from 'https://docs.aspose.com/html/files/document.html' web page
 4using (HTMLDocument document = new HTMLDocument("https://docs.aspose.com/html/files/document.html"))
 5{
 6    string html = document.DocumentElement.OuterHTML;
 7
 8    // Write the document content to the output stream
 9    Console.WriteLine(html);
10}

In the example above, we have specified document.html file to load from the URL.

Load from HTML Code

If you prepare an HTML code as an in-memory System.String or System.IO.Stream objects, you don’t need to save them to the file, simply pass your HTML code into specialized constructors.

In case your HTML code has the linked resources (styles, scripts, images, etc.), you need to pass a valid baseUrl parameter to the constructor of the document. It will be used to resolve the location of the resource during the document loading.

Load from a String

You can create a document from a string content using HTMLDocument (string, string) constructor. If your case is to create a document from a user string directly in your code and save it to a file, the following example could help you: we produce an HTML document that contains “Hello, World!” text.

 1// Create HTML from a string using C#
 2
 3// Prepare HTML code
 4string html_code = "<p>Hello, World!</p>";
 5
 6// Initialize a document from the string variable
 7using (HTMLDocument document = new HTMLDocument(html_code, "."))
 8{
 9    // Save the document to a disk
10    document.Save(Path.Combine(OutputDir, "create-from-string.html"));
11}

Load from a Stream

To create an HTML document from a stream, you can use the HTMLDocument(stream, string) constructor:

 1// Load HTML from a stream using C#
 2
 3// Create a memory stream object
 4using (MemoryStream mem = new MemoryStream())
 5using (StreamWriter sw = new StreamWriter(mem))
 6{
 7    // Write the HTML code into memory object
 8    sw.Write("<p>Hello, World! I love HTML!</p>");
 9
10    // It is important to set the position to the beginning, since HTMLDocument starts the reading exactly from the current position within the stream
11    sw.Flush();
12    mem.Seek(0, SeekOrigin.Begin);
13
14    // Initialize a document from the string variable
15    using (HTMLDocument document = new HTMLDocument(mem, "."))
16    {
17        // Save the document to disk
18        document.Save(Path.Combine(OutputDir, "load-from-stream.html"));
19    }
20}

SVG Document

Since Scalable Vector Graphics (SVG) is a part of W3C standards and could be embedded into the HTMLDocument, we implemented the SVGDocument and all its functionality. Our implementation is based on official SVG2 specification, so you can load, read, and manipulate SVG documents as described officially.

Since the SVGDocument and the HTMLDocument are based on the same WHATWG DOM standard, all operations such as loading, reading, editing, converting and saving are similar for both documents. So, all examples where you can see manipulation with the HTMLDocument are applicable for the SVGDocument as well.

You can create a document from string content using the SVGDocument(string, string) constructor. If you want to load the SVG Document from the in-memory System.String variable and you don’t need to save it to a file; the example below shows you how to do it:

1// Load SVG from a string using C#
2
3// Initialize an SVG document from a string object
4using (SVGDocument document = new SVGDocument("<svg xmlns='http://www.w3.org/2000/svg'><circle cx='50' cy='50' r='40'/></svg>", "."))
5{
6    // Write the document content to the output stream
7    Console.WriteLine(document.DocumentElement.OuterHTML);
8}

In the example above, we have produced an SVG document that contains a circle with a radius of 40 pixels. You can learn more about working with SVG documents from the How to work with Aspose.SVG API chapter.

MHTML Document

MHTML stands for MIME encapsulation of aggregate HTML documents. An MHTML file is an archive containing all the content of a web page. It stores the HTML of a web page as well as related resources on a web page, which can include CSS, JavaScript, images, and audio files. It is a specialized format to create web page archives, and web developers primarily use MHTML files to save the current state of a web page for archiving purposes. The Aspose.HTML for .NET library supports this format but with some limitations. We only support the rendering operations from MHTML to the supported output formats. For more details, please read the Converting Between Formats article.

EPUB Document

EPUB is a format supported by a majority of eReaders and compatible with most devices you read on – smartphones, tablets, and computers. For EPUB format, which represents an electronic publication format, we have the same limitation as for MHTML. We only support the rendering operations from EPUB to the supported output formats. For more details, please read the Converting Between Formats article.

Asynchronous Operations

We realize that loading a document could be a resource-intensive operation since it’s required loading not only the document itself but all linked resources and processing all scripts. So, in the following code snippets, we show you how to use asynchronous operations and load the HTMLDocument without blocking the main thread:

 1// Load HTML asynchronously using C#
 2
 3// Initialize an AutoResetEvent
 4AutoResetEvent resetEvent = new AutoResetEvent(false);
 5
 6// Create an instance of an HTML document
 7HTMLDocument document = new HTMLDocument();
 8
 9// Create a string variable for the OuterHTML property reading
10string outerHTML = string.Empty;
11
12// Subscribe to ReadyStateChange event
13// This event will be fired during the document loading process
14document.OnReadyStateChange += (sender, @event) =>
15{
16    // Check the value of the ReadyState property
17    // This property is representing the status of the document. For detail information please visit https://www.w3schools.com/jsref/prop_doc_readystate.asp
18    if (document.ReadyState == "complete")
19    {
20        // Fill the outerHTML variable by value of loaded document                  
21        outerHTML = document.DocumentElement.OuterHTML;
22        resetEvent.Set();
23    }
24};
25
26// Navigate asynchronously at the specified Uri
27document.Navigate("https://docs.aspose.com/html/files/document.html");
28
29// Here the outerHTML is empty yet
30
31Console.WriteLine($"outerHTML = {outerHTML}");
32
33//  Wait 5 seconds for the file to load
34
35// Here the outerHTML is filled 
36Console.WriteLine("outerHTML = {0}", outerHTML);

ReadyStateChange is not the only event that can used to handle an async loading operation, you can also subscribe for Load event, as it follows:

 1// Handle an HTML document load using C#
 2
 3// Initialize an AutoResetEvent
 4AutoResetEvent resetEvent = new AutoResetEvent(false);
 5
 6// Initialize an HTML document
 7HTMLDocument document = new HTMLDocument();
 8bool isLoading = false;
 9
10// Subscribe to the OnLoad event
11// This event will be fired once the document is fully loaded
12document.OnLoad += (sender, @event) =>
13{
14    isLoading = true;
15    resetEvent.Set();
16};
17
18// Navigate asynchronously at the specified Uri
19document.Navigate("https://docs.aspose.com/html/files/document.html");
20
21Console.WriteLine("outerHTML = {0}", document.DocumentElement.OuterHTML);

You can download the complete examples and data files from GitHub.

Subscribe to Aspose Product Updates

Get monthly newsletters & offers directly delivered to your mailbox.