There are few cases when there is a need to manipulate documents other than PDF while having the parsing data available in PDF formats. So, for such applications, there will be two scenarios: either they add the functionality of PDF parsing within their solution or add the PDF conversion functionality to manipulate data as supported formats. For the second scenario to convert PDF to Word, Excel, HTML, Images or any required format, implementing PHP reader and converter for PDF code within Java-based code is simple. We are discussing a few cases here so programmers can modify these conversion code snippets according to their requirements.
PDF to Microsoft Word Conversion
// Include the required libraries
require_once ("java/Java.inc");
require_once ("lib/aspose.pdf.php");
// Import the necessary classes from the Aspose.PDF for Java library
use com\aspose\pdf\License;
use com\aspose\pdf\Document;
use com\aspose\pdf\DocSaveOptions;
use com\aspose\pdf\DocSaveOptions_DocFormat;
use com\aspose\pdf\DocSaveOptions_RecognitionMode;
// Set the license file for Aspose.PDF for Java
$license = "Aspose.PDF.PHPviaJava.lic";
$licenceObject = new License();
$licenceObject->setLicense($license);
// Set the input and output file paths
$dataDir = getcwd() . DIRECTORY_SEPARATOR . "samples";
$inputFile = $dataDir . DIRECTORY_SEPARATOR . "sample.pdf";
$outputFile = $dataDir . DIRECTORY_SEPARATOR . 'result-pdf-to-docx.docx';
// Load the PDF document
$document = new Document($inputFile);
// Create the save options for converting to DOCX format
$saveOption = new DocSaveOptions();
$saveOption->setMode(DocSaveOptions_RecognitionMode::$EnhancedFlow);
$saveOption->setFormat(DocSaveOptions_DocFormat::$DocX);
// Save the document in DOCX format
$document->save($outputFile, $saveOption);
Aspose.PDF for PHP library supports all PDF to Word conversions. In case we are just converting Microsoft Word documents without any special settings, we just load the PDF file using the Save method from the Document class and will use with output Word document path and SaveFormat as parameters. For the special cases where there is a need to enhance the lines distance, image resolution, and more settings, API has DocSaveOptions class that exposes all such settings.
Save PDF as Excel Files
// Include the required libraries
require_once ("java/Java.inc");
require_once ("lib/aspose.pdf.php");
// Import the necessary classes from the Aspose.PDF for Java library
use com\aspose\pdf\Document;
use com\aspose\pdf\ExcelSaveOptions;
use com\aspose\pdf\ExcelSaveOptions_ExcelFormat;
use com\aspose\pdf\License;
// Set the path to the Aspose.PDF license file
$license = "Aspose.PDF.PHPviaJava.lic";
// Create a new License object and set the license file
$licenceObject = new License();
$licenceObject->setLicense($license);
// Set the path to the input PDF file
$dataDir = getcwd() . DIRECTORY_SEPARATOR . "samples";
$inputFile = $dataDir . DIRECTORY_SEPARATOR . "sample.pdf";
// Set the path to the output Excel file
$outputFile = $dataDir . DIRECTORY_SEPARATOR . 'sample.xlsx';
// Create a new Document object and load the input PDF file
$document = new Document($inputFile);
// Create a new ExcelSaveOptions object
$saveOption = new ExcelSaveOptions();
// Set the output format to XLSX
$saveOption->setFormat(ExcelSaveOptions_ExcelFormat::$XLSX);
// Save the document as an Excel file using the specified save options
$document->save($outputFile, $saveOption);
Specialized SaveFormat.Excel Enumeration available for saving PDF to specific Microsoft Excel XLS XLSX output formats. Moreover, PHP/Java PDF Library also have a speicific ExcelSaveOptions class that not only deals saving to Excel formats but also provides different functions and properties for setting different attributes like exact output format, minimize number of worksheets and more.
Convert PDF to PowerPoint Presentations
// Include the required Java and Aspose.PDF for PHP libraries
require_once ("java/Java.inc");
require_once ("lib/aspose.pdf.php");
// Import the necessary classes from the Aspose.PDF for PHP library
use com\aspose\pdf\Document;
use com\aspose\pdf\PptxSaveOptions;
use com\aspose\pdf\License;
// Set the path to the Aspose.PDF license file
$license = "Aspose.PDF.PHPviaJava.lic";
// Create a new License object and set the license file
$licenceObject = new License();
$licenceObject->setLicense($license);
// Set the path to the input PDF file
$dataDir = getcwd() . DIRECTORY_SEPARATOR . "samples";
$inputFile = $dataDir . DIRECTORY_SEPARATOR . "sample.pdf";
// Set the path to the output PPTX file
$outputFile = $dataDir . DIRECTORY_SEPARATOR . "results" . DIRECTORY_SEPARATOR . 'sample.pptx';
// Load the input PDF document
$document = new Document($inputFile);
// Create an instance of PptxSaveOptions
$saveOption = new PptxSaveOptions();
// Save the PDF document as a PPTX file
$document->save($outputFile, $saveOption);
PHP API supports converting PDF pages to PowerPoint Presentation Slides with selectable text or images by rendering slides as images. Pattern of saving Portable Document Format to PowerPoint is almost same, Loading the file using Document class and then calling the Save method with output file path and SaveFormat as parameters. In case of rendering with special presentation options, Programmers can use PptxSaveOptions class with any relevant specific rendering options. Calling the save method and passing the options as parameter.
PDF to HTML Conversion
// Include the required libraries
require_once ("java/Java.inc");
require_once ("lib/aspose.pdf.php");
// Import the necessary classes from the Aspose.PDF library
use com\aspose\pdf\Document;
use com\aspose\pdf\HtmlSaveOptions;
use com\aspose\pdf\License;
// Set the path to the license file
$licensePath = "Aspose.PDF.PHPviaJava.lic";
// Create a new License object and set the license using the provided file path
$license = new License();
$license->setLicense($licensePath);
// Set the path to the input PDF file
$dataDir = getcwd() . DIRECTORY_SEPARATOR . "samples";
$inputFile = $dataDir . DIRECTORY_SEPARATOR . "sample.pdf";
// Set the path to the output HTML file
$outputFile = $dataDir . DIRECTORY_SEPARATOR . 'pdf-to-html.html';
// Create a new Document object and load the input PDF file
$document = new Document($inputFile);
// Create a new HtmlSaveOptions object for saving the document as HTML
$saveOption = new HtmlSaveOptions();
// Save the document as HTML using the specified save options
$document->save($outputFile, $saveOption);
PDF Parsing Library supports saving PDF to HTML as whole as well as with embedded resources including images. Procedure of conversion is same as PDF to other formats for generic cases, like loading the source document and calling the Save method with output HTML file path and SaveFormat.Html as parameters. In case of saving with embedded resources, there is a HtmlSaveOptions class having multiple options like saving images to a specific folder during the conversion, splitting the resultant HTML into multiple pages and more.
Convert PDF to Images
// Include the required libraries
require_once ("java/Java.inc");
require_once ("lib/aspose.pdf.php");
// Import the necessary classes from the Aspose.PDF for PHP via Java library
use com\aspose\pdf\Document;
use com\aspose\pdf\devices_Resolution;
use com\aspose\pdf\devices_JpegDevice;
use com\aspose\pdf\License;
// Create a License object and set the license file
$licenceObject = new License();
$licenceObject->setLicense("Aspose.PDF.PHPviaJava.lic");
// Set the path to the input PDF file
$dataDir = getcwd() . DIRECTORY_SEPARATOR . "samples";
$inputFile = $dataDir . DIRECTORY_SEPARATOR . "sample.pdf";
// Set the path and template for the output JPEG files
$imageFileNameTemplate = $dataDir . DIRECTORY_SEPARATOR . 'pdf-to-jpeg-';
// Open the target document
$document = new Document($inputFile);
$pages = $document->getPages();
$count = $pages->size();
// Create a Resolution object with a resolution of 300 dpi
$resolution = new devices_Resolution(300);
// Create a JpegDevice object with the specified resolution
$imageDevice = new devices_JpegDevice($resolution);
// Loop through each page of the document
for ($pageCount = 1; $pageCount <= $document->getPages()->size(); $pageCount++) {
// Convert a particular page and save the image to a file
$imageFileName = $imageFileNameTemplate . $pageCount . '.jpg';
$page = $document->getPages()->get_Item($pageCount);
$imageDevice->process($page, $imageFileName);
}
Converting PDF pages into images including PNG, JPEG, TIFF, BMP etc is easy within Java based applications using code snippets listed below. Developers can loop through PDF pages after loading the file and convert Page by Page to required image format. Developers can set the horizental and vertical resolution of images using Resolution class