PPT
PPTX
ODP
POT
ppsx
PPTX
Extract Text and Images from PPTX presentation using Python
Build your own Python apps for extracting text, image, video and audio files from PowerPoint using server-side APIs.
Extract Text from PPTX Presentation via Python
To scan the text from the whole presentation, use the GetAllTextFrames static method exposed by the SlideUtil class. The code below scans the text and formatting information from a presentation, including the master slides.
Extracting Text from PPTX Presentation using Python
import aspose.slides as slides
#Instatiate Presentation class that represents a PPTX file
with slides.Presentation("pres.pptx") as pptxPresentation:
# Get an Array of ITextFrame objects from all slides in the PPTX
textFramesPPTX = slides.util.SlideUtil.get_all_text_frames(pptxPresentation, True)
# Loop through the Array of TextFrames
for i in range(len(textFramesPPTX)):
# Loop through paragraphs in current ITextFrame
for para in textFramesPPTX[i].paragraphs:
# Loop through portions in the current IParagraph
for port in para.portions:
# Display text in the current portion
print(port.text)
# Display font height of the text
print(port.portion_format.font_height)
# Display font name of the text
if port.portion_format.latin_font != None:
print(port.portion_format.latin_font.font_name)
How to Extract Text from PPTX via Python
These are the steps to Parse PPTX files.
Load PPTX with an instance of Presentation
Get an Array of TextFrame objects from all slides in the PPTX
Loop through the Array of TextFrames
Loop through paragraphs in current TextFrame
Loop through portions in the current Paragraph
Get text in the current portion