public class PDFTextStripperByArea extends PDFTextStripper
charactersByArticle, document, LINE_SEPARATOR, output| Constructor | Description |
|---|---|
PDFTextStripperByArea() |
Constructor.
|
| Modifier and Type | Method | Description |
|---|---|---|
void |
addRegion(java.lang.String regionName,
java.awt.geom.Rectangle2D rect) |
Add a new region to group text by.
|
void |
extractRegions(PDPage page) |
Process the page to extract the region text.
|
java.util.List<java.lang.String> |
getRegions() |
Get the list of regions that have been setup.
|
java.lang.String |
getTextForRegion(java.lang.String regionName) |
Get the text for the region, this should be called after extractRegions().
|
protected void |
processTextPosition(TextPosition text) |
This will process a TextPosition object and add the text to the list of characters on a page.
|
void |
removeRegion(java.lang.String regionName) |
Delete a region to group text by.
|
void |
setShouldSeparateByBeads(boolean aShouldSeparateByBeads) |
This method does nothing in this derived class, because beads and regions are incompatible.
|
protected void |
showGlyph(Matrix textRenderingMatrix,
PDFont font,
int code,
java.lang.String unicode,
Vector displacement) |
This method was originally written by Ben Litchfield for PDFStreamEngine.
|
protected void |
writePage() |
This will print the processed page text to the output stream.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitaddOperator, applyTextAdjustment, beginText, endText, getAppearance, getCurrentPage, getGraphicsStackSize, getGraphicsState, getInitialMatrix, getResources, getTextLineMatrix, getTextMatrix, operatorException, processAnnotation, processChildStream, processOperator, processOperator, processSoftMask, processTilingPattern, processTilingPattern, processTransparencyGroup, processType3Stream, registerOperatorProcessor, restoreGraphicsStack, restoreGraphicsState, saveGraphicsStack, saveGraphicsState, setLineDashPattern, setTextLineMatrix, setTextMatrix, showAnnotation, showFontGlyph, showForm, showText, showTextString, showTextStrings, showTransparencyGroup, showType3Glyph, transformedPoint, transformWidth, unsupportedOperatorendArticle, endDocument, endPage, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getWordSeparator, matchPattern, processPage, processPages, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageStart, setParagraphEnd, setParagraphStart, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startArticle, startDocument, startPage, writeCharacters, writeLineSeparator, writePageEnd, writePageStart, writeParagraphEnd, writeParagraphSeparator, writeParagraphStart, writeString, writeString, writeText, writeWordSeparatorpublic PDFTextStripperByArea()
throws java.io.IOException
java.io.IOException - If there is an error loading properties.public final void setShouldSeparateByBeads(boolean aShouldSeparateByBeads)
setShouldSeparateByBeads in class PDFTextStripperaShouldSeparateByBeads - The new grouping of beads.public void addRegion(java.lang.String regionName,
java.awt.geom.Rectangle2D rect)
regionName - The name of the region.rect - The rectangle area to retrieve the text from.public void removeRegion(java.lang.String regionName)
regionName - The name of the region to delete.public java.util.List<java.lang.String> getRegions()
public java.lang.String getTextForRegion(java.lang.String regionName)
regionName - The name of the region to get the text from.public void extractRegions(PDPage page) throws java.io.IOException
page - The page to extract the regions from.java.io.IOException - If there is an error while extracting text.protected void processTextPosition(TextPosition text)
processTextPosition in class PDFTextStrippertext - The text to process.protected void writePage()
throws java.io.IOException
writePage in class PDFTextStripperjava.io.IOException - If there is an error writing the text.protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, java.lang.String unicode, Vector displacement) throws java.io.IOException
showGlyph in class PDFStreamEnginetextRenderingMatrix - the current text rendering matrix, Trmfont - the current fontcode - internal PDF character code for the glyphunicode - the Unicode text for this glyph, or null if the PDF does provide itdisplacement - the displacement (i.e. advance) of the glyph in text spacejava.io.IOException - if the glyph cannot be processedCopyright © 2002–2018. All rights reserved.