How to Extract Image from Website
The ability to extract images from HTML is important for various applications such as web scraping and content analysis. Aspose.HTML for Java is a robust library that simplifies this process by offering developers a set of tools to navigate and gather information from HTML documents seamlessly. Let’s explore how to extract images from HTML documents.
Extract Images from HTML Using Java
Using the Aspose.HTML library for Java, you can easily create your own application, as our API provides a robust set of tools for parsing and extracting information from HTML documents. If you want to use HTML data parsing features in your product or programmatically extract data from HTML, see the code example below.
Java code to extract images from website
// Open a document you want to download images from
final HTMLDocument document = new HTMLDocument("https://docs.aspose.com/svg/net/drawing-basics/svg-shapes/");
// Collect all <img> elements
HTMLCollection images = document.getElementsByTagName("img");
// Create a distinct collection of relative image URLs
Iterator<Element> iterator = images.iterator();
java.util.Set<String> urls = new HashSet<>();
for (Element e : images) {
urls.add(e.getAttribute("src"));
}
// Create absolute image URLs
java.util.List<Url> absUrls = urls.stream()
.map(src -> new Url(src, document.getBaseURI()))
.collect(Collectors.toList());
for (Url url : absUrls) {
// Create an image request message
final RequestMessage request = new RequestMessage(url);
// Extract image
final ResponseMessage response = document.getContext().getNetwork().send(request);
// Check whether a response is successful
if (response.isSuccess()) {
String[] split = url.getPathname().split("/");
String path = split[split.length - 1];
// Save file to a local file system
FileHelper.writeAllBytes(path, response.getContent().readAsByteArray());
}
}
Steps to Extract Images from Website
- Use the
HTMLDocument(
Url
) constructor to initialize an HTML document. - Use the
getElementsByTagName(
"img"
) method to collect all<img>
elements from the document. The method returns a collection of<img>
elements present on the web page. - Iterate through the
<img>
elements and use the getAttribute("src"
) method to extract thesrc
attribute of each<img>
element. - Create absolute image URLs using the
Url
class and the
BaseURI
property of theHTMLDocument
class. - For each absolute image URL, create a request using the
RequestMessage(
url
) constructor and send it. The response is checked to ensure it was successful. - If the response was successful, extract the image data and save it to your local file system using
FileHelper.writeAllBytes()
.
With Aspose.HTML for Java, you can easily create a tool that parses an HTML page, identifies image sources, and downloads those images. It is a powerful solution for those who need to collect images for analysis, archiving, or content creation - without the hassle of doing it manually. To learn more about how to programmatically extract different types of images from a website using Java, refer to the documentation article Extract Images From Website in Java .
Note: It is essential to comply with copyright laws and obtain appropriate permissions or licenses before using saved images commercially. We do not support the extraction and use of other people’s files for commercial purposes without their consent.
Get Started with Aspose.HTML for Java Library
Aspose.HTML for Java is an advanced web scraping and HTML parsing library. One can create, edit, navigate through nodes, extract data and convert HTML, XHTML, and MHTML files to PDF, Images, and other formats. Moreover, it also handles CSS, HTML Canvas, SVG, XPath, and JavaScript out-of-the-box to extend manipulation tasks. It’s a standalone API and does not require any software installation.
You can download its latest version directly from
Aspose Maven Repository
and install it within your Maven-based project by adding the following configurations to the pom.xml.
Repository
<repository>
<id>AsposeJavaAPI</id>
<name>Aspose Java API</name>
<url>https://repository.aspose.com/repo/</url>
</repository>
Dependency
<dependency>
<groupId>com.aspose</groupId>
<artifactId>aspose-html</artifactId>
<version>version of aspose-html API</version>
<classifier>jdk17</classifier>
</dependency>
Other Supported Features
Use the Aspose.HTML for Java library to parse and manipulate HTML-based documents. Clear, safe and simple!