How to Download Files from URL

The ability to download file from URL is important for various applications such as web scraping and content analysis. Aspose.HTML for Python via .NET is a robust library that simplifies this process by offering developers a set of tools to navigate and gather information from HTML documents seamlessly. Let’s explore how to save file from URL using Python.


Save File from URL Using Python

The following Python code demonstrates how to download a file (such as an image, PDF or any other resource) from a specified URL using the Aspose.HTML for Python via .NET. The code creates an empty HTML document solely to gain access to the network context, makes a file request using the URL, and downloads the resource if the response is successful. The retrieved content is then saved in a local output directory using the original file name:


Python code to download file from URL

import os
import aspose.html as ah
import aspose.html.net as ahnet

# Define output directory
output_dir = "output/"
os.makedirs(output_dir, exist_ok=True)

# Create a blank document
doc = ah.HTMLDocument()

# Create a URL with the path to the resource you want to save
url = ah.Url("https://docs.aspose.com/html/images/handlers/message-handlers.png")

# Create a file request message
request = ahnet.RequestMessage(url)

# Extract file from URL
response = doc.context.network.send(request)

# Check whether the response is successful
if response.is_success:
    # Save the file to a local file system
    file_path = os.path.join(output_dir, os.path.basename(url.pathname))
    with open(file_path, "wb") as file:
        file.write(response.content.read_as_byte_array())


Steps to Save File from URL

  1. Use the HTMLDocument() constructor to create an empty instance of the HTMLDocument class. This step is required to enable network access within the context of the document.
  2. Create an instance of the Url class with the path to the resource you want to save.
  3. Create a RequestMessage object using the Url instance. This object represents the HTTP request used to fetch the remote file.
  4. Send the request and receive the response from the specified URL. Check the is_success property of the response to ensure that the file was retrieved successfully.
  5. Use os.path.basename(url.pathname) to extract the file name from the URL, and define the output path.
  6. Save the file to a local file system by opening a binary file stream and writing the content using response.content.read_as_byte_array().

Downloading files from URLs can be helpful for offline access when your internet connection is limited, for collaboration and sharing content, for archiving and backing up to prevent data loss, or simply for storing essential resources, such as documents, images, videos, or audio files, for future use. It is also a useful method for remote access to educational materials, allowing you to study anytime and anywhere – even while commuting or traveling.

To learn more about how to programmatically download files from URLs using Python, refer to the documentation article Save File from URL in Python .

Note: It is important to respect copyright laws and obtain the proper permissions or licenses before using saved files for commercial purposes. We do not support the extraction and use of other people’s files for commercial purposes without their consent.



Get Started with Python API

If you want to parse, manipulate, and manage HTML documents, install our flexible, high-speed Aspose.HTML for Python via .NET API. pip is the easiest way to download and install Aspose.HTML for Python via .NET. To do this, run the following command:

pip install aspose-html-net

For more details about Python library installation and system requirements, please refer to Aspose.HTML Documentation.

Other Supported Features

Use the Aspose.HTML for Python via .NET library to parse and manipulate HTML-based documents. Clear, safe and simple!