PDF Format Converter via Python for Java

Export PDF to Word, Excel, PowerPoint, Images, HTML and fixed-layout formats using Python for Java

Overview

Are you looking for a way to convert PDF files to other formats using Python and Java? Aspose.PDF for Python via Java is the ideal solution for converting PDF documents. Python is an object-oriented programming language that is commonly used to develop software prototypes for web applications and data processing. In this article, we’ll show you how to convert PDF to text using Python via Java. PDF files can contain a variety of content, including text, images, clickable buttons, hyperlinks, embedded fonts, signatures, and stamps. When converting a PDF file to a different format, users are often interested in being able to edit the PDF content. With Aspose.PDF for Python via Java, you can easily and quickly convert your PDF documents to the most popular formats and vice versa. Our library ensures that your PDF files are converted successfully and accurately.

Convert PDF to Word

Example: Python via Java for PDF to Word Conversion

    from asposepdf import Api

    DIR_INPUT = "testdata/"
    DIR_OUTPUT = "testout/"

    input_pdf = DIR_INPUT + "Hello.pdf"
    output_pdf = DIR_OUTPUT + "convert_pdf_to_doc_with_options.docx"
    # Open PDF document
    document = Api.Document(input_pdf)

    save_options = Api.DocSaveOptions()
    save_options.format = Api.DocSaveOptions.DocFormat.Docx
    # Set the recognition mode as Flow
    save_options.mode = Api.DocSaveOptions.RecognitionMode.Flow
    # Set the Horizontal proximity as 2.5
    save_options.relative_horizontal_proximity = 2.5
    # Enable the value to recognize bullets during conversion process
    save_options.recognize_bullets = True

    # Save the file into MS Word document format
    document.save(output_pdf, save_options)

Using Aspose.PDF for Python API, you can easily read and convert PDF documents to DOCX format in Python via Java. DOCX is a widely used file format for Microsoft Word documents, which has a structure that combines XML and binary files, unlike the plain binary format used by its predecessor, DOC. While DOCX files can be opened with Word 2007 and later versions, earlier versions of MS Word that support DOC file extensions are unable to open them. With this code, you can seamlessly convert your PDF documents to the DOCX format using Aspose.PDF for Python API.

Convert PDF to Excel Files

Convert PDF to Excel Files

    documentName = "testdata/source.pdf"
    doc = Api.Document(documentName)
    documentOutName = "testout/result2.xls"
    doc.save(documentOutName, Api.SaveFormat.Excel)

Aspose.PDF for Python via Java provides a feature for converting PDF files to Excel and CSV formats. This enables you to easily extract tabular data from PDF files and use it in Excel or other applications that support CSV files. Aspose.PDF for Python via Java is a powerful PDF manipulation component that now includes a feature for rendering PDF files as Excel workbooks (XLSX files). With this feature, each page of the PDF file is converted to a separate Excel worksheet, making it easy to work with and analyze the data in Excel. Whether you need to extract data from PDF files or convert them to Excel for further analysis, Aspose.PDF for Python via Java can provide the functionality you need to get the job done quickly and easily.

Convert PDF to PowerPoint Presentations

Example: Python via Java PDF to PowerPoint Conversion

    DIR_INPUT = "testdata/"
    DIR_OUTPUT = "testout/"

    input_pdf = DIR_INPUT + "Hello.pdf"
    output_pdf = DIR_OUTPUT + "convert_pdf_to_pptx_with_options.pptx"
    # Open PDF document
    document = Api.Document(input_pdf)

    save_options = Api.PptxSaveOptions()
    save_options._ImageResolution = 300
    save_options._SeparateImages = True
    save_options._OptimizeTextBoxes = True

    # Save the file into MS Word document format
    document.save(output_pdf, save_options)

With Aspose.PDF for Python via Java, you can easily track the progress of PDF to PPTX conversion. This can be helpful when working with large or complex PDF files that take some time to convert. In addition to Aspose.PDF conversion, we also offer the Aspose.Slides API, which provides the ability to create and manipulate PPT/PPTX presentations. This API includes a feature for converting PPT/PPTX files to PDF format. During this conversion process, each page of the PDF file is converted to a separate slide in the PPTX file, making it easy to work with and edit the presentation. Whether you need to convert PDF files to PPTX or create and manipulate presentations, Aspose.PDF for Python via Java and Aspose.Slides API can provide the functionality you need to streamline your workflow and get the job done efficiently.

Convert PDF to HTML file

Example: Python via Java for PDF to HTML Conversion

    from asposepdf import Api

    documentName = "../../testdata/source.pdf"
    documentOutName = "../../testout/result.html"
    # Open PDF document
    document = Api.Document(documentName)

    # save document in HTML format
    save_options = Api.HtmlSaveOptions()
    document.save(documentOutName, save_options)

Aspose.PDF for Python via Java is a powerful tool for converting various file formats to PDF documents, as well as converting PDF files to different output formats. We will explore how to convert a PDF file to HTML using Aspose.PDF for Python via Java. Converting PDF to HTML can be useful if you want to create a website or add content to an online forum. With just a few lines of Python code, you can easily convert your PDF documents to HTML format. This process can be automated using Python, making it a quick and efficient way to convert large numbers of files. Whether you need to convert a single PDF file or a large batch of files, Aspose.PDF for Python via Java can provide the functionality you need to streamline your workflow and get the job done efficiently.

Convert PDF to Images

Example: Python via Java for PDF to Images conversion

    from asposepdf import Api, Device

    DIR_INPUT = "../../testdata/"
    DIR_OUTPUT = "../../testout/"

    input_pdf = DIR_INPUT + "source.pdf"
    output_pdf = DIR_OUTPUT + "image"
    # Open PDF document
    document = Api.Document(input_pdf)

    # Create Resolution object
    resolution = Device.Resolution(300)
    device = Device.JpegDevice(resolution)

    for i in range(0, document.getPages.size):
        # Create filename for save
        imageFileName = output_pdf + "_page_" + str(i + 1) + "_out.jpeg"
        # Convert a particular page and save the image to file
        device.process(document.getPages.getPage(i + 1), outputFileName=imageFileName)

Aspose.PDF for Python provides different methods to convert PDF documents to images. Two common approaches are the Device approach and the SaveOption approach. In this section, we will explore how to use these approaches to convert PDF to popular image formats such as BMP, JPEG, GIF, PNG, EMF, TIFF, and SVG. The library contains various classes that enable you to use a virtual device to transform images. The DocumentDevice class is designed to convert the entire document, while the ImageDevice class is intended for a specific page.