PNG JPG BMP TIFF HTML
  Product Family

Search HTML Formats in Java

Native and high performance HTML file search using Java APIs, without the use of any software like Microsoft or Adobe PDF.

How to Search HTML File Using Java

In order to search HTML file, we’ll use

Aspose.Words for Java

API which is a feature-rich, powerful and easy to use Search API for Java platform. You can download its latest version directly from

Maven

and install it within your Maven-based project by adding the following configurations to the pom.xml.

Repository


<repository>
<id>AsposeJavaAPI</id>
<name>Aspose Java API</name>
<url>https://repository.aspose.com/repo/</url>
</repository>

Dependency

<dependency>
<groupId>com.aspose</groupId>
<artifactId>aspose-words</artifactId>
<version>version of aspose-words API</version>
<classifier>jdk17</classifier>
</dependency>

Steps to Search HTML Files in Java

Developers can easily integrate code with just few lines as listed.

  • Load HTML file by instantiating Document Class object.
  • Instantiate FindReplaceOptions.
  • Use Pattern.compile() method to define a regex pattern
  • Use getRange().replace method to find and replace
  • Save HTML file.

System Requirements

Before integrating the code, make sure that you have the following prerequisites.

  • Microsoft Windows or a compatible OS with Java Runtime Environment for JSP/JSF Application and Desktop Applications.
  • Get latest version of Aspose.Words for Java directly from Maven .
 

Search HTML Files - Java

// Load HTML file
Document html = new Document("sourceFile.html");
// Find and replace similar pattern words in the file
FindReplaceOptions options = new FindReplaceOptions();
html.getRange().replace(Pattern.compile("[B|S|M]ad"), "[replaced]", options);
// Save the HTML file
html.save("output.html");  
 
  • Java Words API can be used to load, view and convert Microsoft Word and OpenDocument Formats like DOC, DOCX, ODT to PDF, XPS, HTML and various other formats. You can also create new documents from scratch and save them in the supported formats. It is a standalone API that is suitable for server side and backend systems where high performance is required. It does not depend on any software like Microsoft or OpenOffice. ‎

    Online HTML Search Live Demos

    HTML  What is HTML  File Format

    HTML (Hyper Text Markup Language) is the extension for web pages created for display in browsers. Known as language of the web, HTML has evolved with requirements of new information requirements to be displayed as part of web pages. The latest variant is known as HTML 5 that gives a lot of flexibility for working with the language. HTML pages are either received from server, where these are hosted, or can be loaded from local system as well. Each HTML page is made up of HTML elements such as forms, text, images, animations, links, etc. These elements are represented by tags such as img, a, p and several others where each tag has start and end. It can also embed applications written in scripting languages such as JavaScript and Style Sheets (CSS) for overall layout representation.

    Read More

    Other Supported Search Documents

    Using Java, one can also search other files including.

    DOC (Microsoft Word Binary Format)
    DOCX (Office 2007+ Words Document)
    MHTML (Web Page Archive Format)
    ODT (OpenDocument Text File Format)
    OTT (OpenDocument Standard Format)
    RTF (Rich Text Format)
    TXT (Text Document)
    XHTML (XML Text Based Markup)