Extract Images from Website

A quick and easy way to programmatically search and extract images from any website.

How to Extract Image from Website

The ability to extract images from HTML is important for various applications such as web scraping and content analysis. Aspose.HTML for Java is a robust library that simplifies this process by offering developers a set of tools to navigate and gather information from HTML documents seamlessly. Let’s explore how to extract images from HTML documents.

Extract Images from HTML Using Java

Using the Aspose.HTML library for Java, you can easily create your own application, as our API provides a robust set of tools for parsing and extracting information from HTML documents. If you want to use HTML data parsing features in your product or programmatically extract data from HTML, see the code example below.

Java code to extract images from website

// Open a document you want to download images from
final HTMLDocument document = new HTMLDocument("https://docs.aspose.com/svg/net/drawing-basics/svg-shapes/");

// Collect all <img> elements
HTMLCollection images = document.getElementsByTagName("img");

// Create a distinct collection of relative image URLs
Iterator<Element> iterator = images.iterator();
java.util.Set<String> urls = new HashSet<>();
for (Element e : images) {
    urls.add(e.getAttribute("src"));
}

// Create absolute image URLs
java.util.List<Url> absUrls = urls.stream()
    .map(src -> new Url(src, document.getBaseURI()))
    .collect(Collectors.toList());

for (Url url : absUrls) {
    // Create an image request message
    final RequestMessage request = new RequestMessage(url);

    // Extract image
    final ResponseMessage response = document.getContext().getNetwork().send(request);

    // Check whether a response is successful
    if (response.isSuccess()) {
        String[] split = url.getPathname().split("/");
        String path = split[split.length - 1];

        // Save file to a local file system
        FileHelper.writeAllBytes(path, response.getContent().readAsByteArray());
    }
}

Steps to Extract Images from Website

Use the HTMLDocument(Url) constructor to initialize an HTML document.
Use the getElementsByTagName("img") method to collect all <img> elements from the document. The method returns a collection of <img> elements present on the web page.
Iterate through the <img> elements and use the getAttribute("src") method to extract the src attribute of each <img> element.
Create absolute image URLs using the Url class and the BaseURI property of the HTMLDocument class.
For each absolute image URL, create a request using the RequestMessage(url) constructor and send it. The response is checked to ensure it was successful.
If the response was successful, extract the image data and save it to your local file system using FileHelper.writeAllBytes().

With Aspose.HTML for Java, you can easily create a tool that parses an HTML page, identifies image sources, and downloads those images. It is a powerful solution for those who need to collect images for analysis, archiving, or content creation - without the hassle of doing it manually. To learn more about how to programmatically extract different types of images from a website using Java, refer to the documentation article Extract Images From Website in Java .

Note: It is essential to comply with copyright laws and obtain appropriate permissions or licenses before using saved images commercially. We do not support the extraction and use of other people’s files for commercial purposes without their consent.

Save file from URL

Extract SVG From Website

Extract images from website

Get Started with Aspose.HTML for Java Library

Aspose.HTML for Java is an advanced web scraping and HTML parsing library. One can create, edit, navigate through nodes, extract data and convert HTML, XHTML, and MHTML files to PDF, Images, and other formats. Moreover, it also handles CSS, HTML Canvas, SVG, XPath, and JavaScript out-of-the-box to extend manipulation tasks. It’s a standalone API and does not require any software installation.
You can download its latest version directly from Aspose Maven Repository and install it within your Maven-based project by adding the following configurations to the pom.xml.

Repository<repository>
<id>AsposeJavaAPI</id>
<name>Aspose Java API</name>
<url>https://repository.aspose.com/repo/</url>
</repository>

Dependency<dependency>
<groupId>com.aspose</groupId>
<artifactId>aspose-html</artifactId>
<version>version of aspose-html API</version>
<classifier>jdk17</classifier>
</dependency>