How to Extract Image from Web Page

The ability to extract images from HTML is crucial for various applications, including web scraping and content analysis. Aspose.HTML for Python via .NET is a robust library that simplifies this process by offering developers a set of tools to navigate and gather information from HTML documents seamlessly. This powerful solution is ideal for anyone who needs to collect images for analysis, archiving, or content creation – eliminating the need for manual work. Let’s explore how to download images from web pages.


Extract Images Using Python

Using Aspose.HTML for Python via .NET, you can easily create your own application, as our API provides a robust set of tools for parsing and extracting information from HTML documents. If you want to use HTML data parsing features in your product or programmatically extract data from HTML, see the code example below.


Python code to download images from web page

import os
import aspose.html as ah
import aspose.html.net as ahnet

# Prepare output directory
output_dir = "output/"
os.makedirs(output_dir, exist_ok=True)

# Open HTML document from URL
with ah.HTMLDocument("https://docs.aspose.com/svg/net/drawing-basics/svg-color/") as doc:
    # Collect all <img> elements
    images = doc.get_elements_by_tag_name("img")

    # Get distinct relative image URLs
    urls = set(img.get_attribute("src") for img in images)

    # Create absolute image URLs
    abs_urls = [ah.Url(url, doc.base_uri) for url in urls]

    for url in abs_urls:
        # Create a network request
        request = ahnet.RequestMessage(url.href)

        # Send request
        response = doc.context.network.send(request)

        # Check if successful
        if response.is_success:
            # Extract file name
            file_name = os.path.basename(url.pathname)

            # Save image locally
            with open(os.path.join(output_dir, file_name), "wb") as f:
                f.write(response.content.read_as_byte_array())


Steps to Extract Images from Web Page

  1. Open the target HTML document, a web page, using the HTMLDocument class. This document is the source from which images will be extracted.
  2. Call the get_elements_by_tag_name(“img”) method of the HTMLDocument object to collect all <img> elements within the HTML document.
  3. Extract unique image URLs by iterating over the collection of <img> elements and accessing each element’s src attribute using the get_attribute(“src”) method. Store these URLs in a set to ensure there are no duplicates.
  4. Create absolute image URLs by passing each relative or incomplete URL along with the document’s base_uri to the Url constructor. This ensures each URL is complete and valid for network access.
  5. For each absolute image URL, create a RequestMessage object to represent the HTTP request needed to fetch the image data.
  6. Use the doc.context.network.send(request) method to send the request and receive a response. Check if the response is successful by evaluating the is_success property.
  7. Parse the absolute image URL using os.path.basename() to extract the file name, then save the image content to the output directory by writing the binary data from the response to a file.

To learn more about how to programmatically extract various types of images from a website using Python, refer to the documentation article Extracting Images from a Website in Python .

Note: Always respect copyright laws. Make sure you have the appropriate rights, permissions, or licenses before using the extracted images for commercial purposes. We do not endorse or support the unauthorized use of copyrighted content.



Get Started with Python API

If you want to parse, manipulate, and manage HTML documents, install our flexible, high-speed Aspose.HTML for Python via .NET API. pip is the easiest way to download and install Aspose.HTML for Python via .NET. To do this, run the following command:

pip install aspose-html-net

For more details about Python library installation and system requirements, please refer to Aspose.HTML Documentation.

Other Supported Features

Use the Aspose.HTML for Python via .NET library to parse and manipulate HTML-based documents. Clear, safe and simple!