How to Extract SVG from Website
The ability to extract images from HTML is crucial for various applications, including web scraping and content analysis. Aspose.HTML for Python via .NET is a robust library that simplifies this process by offering developers a set of tools to navigate and gather information from HTML documents seamlessly. This powerful solution is ideal for anyone who needs to collect SVGs for analysis, archiving, or content creation – eliminating the need for manual work. Let’s explore how to download SVG images from web pages.
Extract SVG Using Python
SVG images in HTML documents come in two forms – inline SVG and external SVG. The following Python code demonstrates how to automate the extraction of external SVG images, SVG files stored separately from HTML and referenced via <img>
tags, from a web page using the Aspose.HTML for Python via .NET library:
Python code to download SVGs from a web page
import os
import aspose.html as ah
import aspose.html.net as ahnet
# Define the output directory
output_dir = "output/svg/"
os.makedirs(output_dir, exist_ok=True)
# Open the document you want to extract external SVGs from
document = ah.HTMLDocument("https://products.aspose.com/html/python-net/")
# Collect all <img> elements
images = document.get_elements_by_tag_name("img")
# Create a distinct collection of relative image URLs
urls = set(img.get_attribute("src") for img in images)
# Filter only SVG images
svg_urls = [url for url in urls if url.endswith(".svg")]
# Convert relative URLs to absolute using Url from aspose.html
abs_urls = [ah.Url(url, document.base_uri) for url in svg_urls]
for url in abs_urls:
# Create a network request for the SVG
request = ahnet.RequestMessage(url.href)
# Send request to fetch the SVG
response = document.context.network.send(request)
# Check if request succeeded
if response.is_success:
# Determine local file path
file_path = os.path.join(output_dir, os.path.basename(url.pathname))
# Save SVG to a local file system
with open(file_path, "wb") as f:
f.write(response.content.read_as_byte_array())
Steps to Extract SVG from website
- Use the
HTMLDocument(Url)
constructor to create an instance of the HTMLDocument class and pass it the URL of the website from which you want to extract external SVG images. - Use the
get_elements_by_tag_name("img")
method to collect all<img>
elements from the HTML document. This method returns a list of image elements embedded in the page. - Iterate through the collected
<img>
elements and use theget_attribute("src")
method to extract thesrc
attribute from each element. Store these values in a set to eliminate duplicates. - Filter the extracted URLs by checking if they end with the “.svg” extension to isolate only the external SVG images.
- Use the
Url
class and the document’sbase_uri
to create absolute SVG image URLs. - Create a
RequestMessage
instance for each absolute SVG URL to prepare an HTTP request for retrieving the image. - Send the request and check the
is_success
property to ensure the response was successful. - Use
os.path.basename(url.pathname)
to get the file name from the SVG URL, then save the image to the local file system by writing the binary content to the output directory.
To learn more about how to programmatically extract inline and external SVGs from a website, refer to the documentation article Extracting SVG from a Website in Python .
Note: Always respect copyright laws. Make sure you have the appropriate rights, permissions, or licenses before using the extracted images for commercial purposes. We do not endorse or support the unauthorized use of copyrighted content.
Get Started with Python API
If you want to parse, manipulate, and manage HTML documents, install our flexible, high-speed Aspose.HTML for Python via .NET API. pip
is the easiest way to download and install Aspose.HTML for Python via .NET. To do this, run the following command:
pip install aspose-html-net
For more details about Python library installation and system requirements, please refer to Aspose.HTML Documentation.
Other Supported Features
Use the Aspose.HTML for Python via .NET library to parse and manipulate HTML-based documents. Clear, safe and simple!