Programmatically Extract SVG from Websites
The ability to extract images from HTML is important for various applications such as web scraping and content analysis. Aspose.HTML for Java is a robust library that simplifies this process by offering developers a set of tools to navigate and gather information from HTML documents seamlessly. Let’s explore how to extract external SVG images from a website.
Extract SVGs from HTML Using Java
With the Aspose.HTML library for Java, you can quickly build your own application using a robust set of tools for parsing and extracting data from HTML documents. The example below shows how to extract all external SVGs from an HTML document with just a few lines of Java code.
Java code to extract SVG from website
// Open a document you want to download external SVGs from
final HTMLDocument document = new HTMLDocument("https://products.aspose.com/html/net/");
// Collect all <img> elements
HTMLCollection images = document.getElementsByTagName("img");
// Create a distinct collection of relative image URLs
java.util.Set<String> urls = new HashSet<>();
for (Element element : images) {
urls.add(element.getAttribute("src"));
}
// Filter out non SVG images
java.util.List<String> svgUrls = new ArrayList<>();
for (String url : urls) {
if (url.endsWith(".svg")) {
svgUrls.add(url);
}
}
// Create absolute SVG image URLs
java.util.List<Url> absUrls = svgUrls.stream()
.map(src -> new Url(src, document.getBaseURI()))
.collect(Collectors.toList());
for (Url url : absUrls) {
// Create a downloading request
final RequestMessage request = new RequestMessage(url);
// Download SVG image
final ResponseMessage response = document.getContext().getNetwork().send(request);
// Check whether response is successful
if (response.isSuccess()) {
String[] split = url.getPathname().split("/");
String path = split[split.length - 1];
// Save file to a local file system
FileHelper.writeAllBytes(path, response.getContent().readAsByteArray());
}
}
Steps to Extract SVGs from HTML
- Use the
HTMLDocument(
Url) constructor to create an instance of the HTMLDocument class and pass the URL of the website from which you want to extract external SVG images. - Use the
getElementsByTagName(
"img") method to collect all<img>elements. - Extract the
srcattribute from each image element using the getAttribute("src") method and create a distinct collection of relative image URLs. - Filter only SVG image URLs by checking if each URL ends with
.svg, and add those to a new list. - Create absolute image URLs using the
Url
class and the
BaseURIproperty of theHTMLDocumentclass. - For each absolute URL, create a request using the
RequestMessage(
url) constructor. Send each request and check the response for success. - If the response was successful, use the
FileHelper.writeAllBytes()to save the SVG content to the local file system.
With Aspose.HTML for Java, you can easily create a tool that parses a web page, identifies SVG image sources, and downloads SVGs. It is a powerful solution for those who need to collect SVGs for analysis, archiving, or content creation - without the hassle of doing it manually. To learn more about how to programmatically extract different types (inline and external) of SVGs from a website using Java, refer to the documentation article Extract SVG From Website in Java .
Note: It is important to respect copyright laws and obtain the proper permissions or licenses before using saved images for commercial purposes. We do not support the extraction and use of other people’s files for commercial purposes without their consent.
Get Started with Aspose.HTML for Java Library
Aspose.HTML for Java is an advanced web scraping and HTML parsing library. One can create, edit, navigate through nodes, extract data and convert HTML, XHTML, and MHTML files to PDF, Images, and other formats. Moreover, it also handles CSS, HTML Canvas, SVG, XPath, and JavaScript out-of-the-box to extend manipulation tasks. It’s a standalone API and does not require any software installation.
You can download its latest version directly from
Aspose Maven Repository
and install it within your Maven-based project by adding the following configurations to the pom.xml.
Repository
<repository>
<id>AsposeJavaAPI</id>
<name>Aspose Java API</name>
<url>https://repository.aspose.com/repo/</url>
</repository>
Dependency
<dependency>
<groupId>com.aspose</groupId>
<artifactId>aspose-html</artifactId>
<version>version of aspose-html API</version>
<classifier>jdk17</classifier>
</dependency>
Other Supported Features
Use the Aspose.HTML for Java library to parse and manipulate HTML-based documents. Clear, safe and simple!