How to Load HTML in C#

HTML loading is a fundamental operation for various web tasks, including web development, web page rendering, data extraction, content management, document processing, testing, and more. Aspose.HTML for .NET library provides the HTMLDocument class with a set of HTMLDocument() constructors that load HTML and initialize an HTMLDocument object for futhure manipulation. HTML documents can be loaded from a file or URL and can be created and loaded from a string or memory stream. So, let’s look at ways to load HTML!

First, make sure you have Aspose.HTML for .NET installed in your project. The installation process of this library is relatively simple. You can install it via NuGet Package Manager Console using the following command:


Install Aspose.HTML for .NET

Install-Package Aspose.HTML

Load HTML from a File

Loading HTML from a file is a good starting point for working with existing HTML files, templates, or data stored in HTML format. If you are required to load an existing HTML file from a file, work on it, and save it, then the following code snippet will help you:

  1. Load an HTML document from a file using the HTMLDocument(address) constructor that loads the HTML document from an address (local document path).
  2. Save the HTML file using the Save() method.

Aspose.HTML for .NET library offers a set of constructors that allow you to load HTML document from a file. For example, the HTMLDocument(address, configuration) loads an HTML document from an address with specified environment configuration settings. For more information, please see the API Reference HTMLDocument chapter.


C# code to load HTML from a file

using System.IO;
using Aspose.Html;
...

    // Prepare a file path
	string documentPath = Path.Combine(DataDir, "sprite.html");

	// Initialize an HTML document from the file
	using (var document = new HTMLDocument(documentPath))
	{
		// Work with the document

		// Save the document to a disk
		document.Save(Path.Combine(OutputDir, "sprite_out.html"));
	}

Load HTML from a URL

Loading HTML from a URL can be useful when you need to extract information from a web page. You can load HTML directly from a URL:

  1. Load an HTML document from a URL using the HTMLDocument(Url) constructor. You can use the HTMLDocument(Url, configuration) if you want to load HTML from a URL with specified environment configuration settings.
  2. Use the OuterHTM property to get the complete HTML content of the document. This includes the HTML of the entire document, including the HTML element itself.

If you need to save the HTML document on your local drive, use the Save() method.


C# code to load HTML from a URL

using System.IO;
using Aspose.Html;
...

	// Load a document from 'https://docs.aspose.com/html/net/creating-a-document/' web page
	using (var document = new HTMLDocument("https://docs.aspose.com/html/net/creating-a-document/"))
	{
		var html = document.DocumentElement.OuterHTML;

		// Write the document content to the output stream
		Console.WriteLine(html);
	}

Load HTML from a string

Loading HTML from a string is an important capability that allows you to manipulate HTML content and convert unstructured HTML strings into a structured document that you can manipulate, parse, or display:

  1. First, prepare code for an HTML document.
  2. Use the HTMLDocument(content, baseUri) constructor to initialize an HTML document from a string content with specified baseUri.
  3. Save the HTML file using the Save() method.

C# code to load HTML from a string

using System.IO;
using Aspose.Html;
...

	// Prepare HTML code
    var html_code = "<p>Learn how to load HTML</p>";

    // Initialize a document from the string variable
    using (var document = new HTMLDocument(html_code, "."))
    {
        // Save the document to a disk
        document.Save(Path.Combine(OutputDir, "load-html-from-string.html"));
    }

Load HTML from a memory stream

Loading HTML from a stream is useful for memory efficiency or working with in-memory data. The following C# code demonstrates how to load HTML from a MemoryStream and save it to a file using Aspose.HTML for .NET:

  1. Initialize objects of the MemoryStream and StreamWriter. StreamWriter is used to write the HTML code into the MemoryStream.
  2. Write the HTML code to MemoryStream using the Write() method.
  3. Call the Flush() to ensure that any buffered data is written to the stream, and use the Seek(0, SeekOrigin.Begin) to set the position of the stream to the beginning. This is important because an HTMLDocument reads the content from the current position within the stream.
  4. Initialize the HTMLDocument from the MemoryStream using the HTMLDocument(content, baseUri) constructor. The instance of HTMLDocument is created by passing the MemoryStream object and the baseUri as parameters.
  5. Save the HTML file to a local drive using the Save() method.

C# code to load HTML from a a memory stream

using System.IO;
using Aspose.Html;
...

    // Create a memory stream object
	using (var mem = new MemoryStream())
	using (var sw = new StreamWriter(mem))
	{
		// Write the HTML code into the memory object
		sw.Write("<p>Load HTML from a memory stream</p>");

		// It is important to set the position to the beginning, since HTMLDocument starts the reading exactly from the current position within the stream
		sw.Flush();
		mem.Seek(0, SeekOrigin.Begin);

		// Initialize a document from the string variable
		using (var document = new HTMLDocument(mem, "."))
		{
			// Save the document to a local disk
			document.Save(Path.Combine(OutputDir, "load-html-from-stream.html"));
		}
	}

To learn more about Aspose.HTML API, please visit our documentation guide. From the Create HTML Document article, you will find information on how to load a document from a file, URL and stream or create it from scratch. The Edit HTML Document documentation article gives you basic information on how to read or edit the Document Object Model using Aspose.HTML for .NET API. You will explore how to create HTML elements and how to work with them – modify the document by inserting new nodes, removing, or editing the content of existing nodes.