What types of data can I extract with Aspose.HTML for Python via .NET?

The library allows you to work with various types of web resources: embedded HTML page elements, files accessible directly via URLs, and dynamically generated content. Whether the data comes from a web page or a separate link, it can be accessed and processed programmatically.

Do I need to load the entire web page to get file?

Not always. If file is available via a direct URL, you can download and save it immediately. Loading the HTML document is only required if the data is part of the page structure.

Do I need external libraries or browser engines to extract data?

No. Aspose.HTML for Python via .NET is entirely self-contained. All parsing, rendering, and data extraction occur within the library, without the need for third-party tools.

HTML JPG PDF XML MHTML

Save File from URL in Python

Use Aspose.HTML for Python via .NET to automate file downloading from online sources.

Download

How to Download Files from URL

The ability to download file from URL is important for various applications such as web scraping and content analysis. Aspose.HTML for Python via .NET is a robust library that simplifies this process by offering developers a set of tools to navigate and gather information from HTML documents seamlessly. Let’s explore how to save file from URL using Python.

Save File from URL Using Python

The following Python code demonstrates how to download a file (such as an image, PDF or any other resource) from a specified URL using the Aspose.HTML for Python via .NET. The code creates an empty HTML document solely to gain access to the network context, makes a file request using the URL, and downloads the resource if the response is successful. The retrieved content is then saved in a local output directory using the original file name:

Python code to download file from URLimport os
import aspose.html as ah
import aspose.html.net as ahnet

# Define output directory
output_dir = "output/"
os.makedirs(output_dir, exist_ok=True)

# Create a blank document
doc = ah.HTMLDocument()

# Create a URL with the path to the resource you want to save
url = ah.Url("https://docs.aspose.com/html/images/handlers/message-handlers.png")

# Create a file request message
request = ahnet.RequestMessage(url)

# Extract file from URL
response = doc.context.network.send(request)

# Check whether the response is successful
if response.is_success:
    # Save the file to a local file system
    file_path = os.path.join(output_dir, os.path.basename(url.pathname))
    with open(file_path, "wb") as file:
        file.write(response.content.read_as_byte_array())

Steps to Save File from URL

Use the HTMLDocument() constructor to create an empty instance of the HTMLDocument class. This step is required to enable network access within the context of the document.
Create an instance of the Url class with the path to the resource you want to save.
Create a RequestMessage object using the Url instance. This object represents the HTTP request used to fetch the remote file.
Send the request and receive the response from the specified URL. Check the is_success property of the response to ensure that the file was retrieved successfully.
Use os.path.basename(url.pathname) to extract the file name from the URL, and define the output path.
Save the file to a local file system by opening a binary file stream and writing the content using response.content.read_as_byte_array().

Downloading files from URLs can be helpful for offline access when your internet connection is limited, for collaboration and sharing content, for archiving and backing up to prevent data loss, or simply for storing essential resources, such as documents, images, videos, or audio files, for future use. It is also a useful method for remote access to educational materials, allowing you to study anytime and anywhere – even while commuting or traveling.

To learn more about how to programmatically download files from URLs using Python, refer to the documentation article Save File from URL in Python .

Note: It is important to respect copyright laws and obtain the proper permissions or licenses before using saved files for commercial purposes. We do not support the extraction and use of other people’s files for commercial purposes without their consent.

Get Started with Python API

If you want to parse, manipulate, and manage HTML documents, install our flexible, high-speed Aspose.HTML for Python via .NET API. The easiest way to download and install it is with pip. To do this, run the following command:

Install Aspose.HTML for Python via .NETpip install aspose-html-net

For more details about Python library installation and system requirements, please refer to Aspose.HTML Documentation.

Other Supported Features

Use the Aspose.HTML for Python via .NET library to parse and manipulate HTML-based documents. Clear, safe and simple!

Extract images from web page

Extract SVG from website

Extract tables from website

How to add color in HTML

How to change text color