public class PDFTextStripperByArea extends PDFTextStripper
charactersByArticle, document, output, outputEncoding, systemLineSeparator| Constructor and Description |
|---|
PDFTextStripperByArea()
Constructor.
|
PDFTextStripperByArea(java.util.Properties props)
Instantiate a new PDFTextStripperArea object.
|
PDFTextStripperByArea(java.lang.String encoding)
Instantiate a new PDFTextStripperArea object.
|
| Modifier and Type | Method and Description |
|---|---|
void |
addRegion(java.lang.String regionName,
java.awt.geom.Rectangle2D rect)
Add a new region to group text by.
|
void |
extractRegions(PDPage page)
Process the page to extract the region text.
|
java.util.List<java.lang.String> |
getRegions()
Get the list of regions that have been setup.
|
java.lang.String |
getTextForRegion(java.lang.String regionName)
Get the text for the region, this should be called after extractRegions().
|
protected void |
processTextPosition(TextPosition text)
This will process a TextPosition object and add the
text to the list of characters on a page.
|
void |
removeRegion(java.lang.String regionName)
Delete a region to group text by.
|
protected void |
writePage()
This will print the processed page text to the output stream.
|
endArticle, endDocument, endPage, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageSeparator, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getText, getWordSeparator, handleLineSeparation, inspectFontEncoding, isParagraphSeparation, matchListItemPattern, matchPattern, processPage, processPages, resetEngine, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageSeparator, setPageStart, setParagraphEnd, setParagraphStart, setShouldSeparateByBeads, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startArticle, startDocument, startPage, writeCharacters, writeLineSeparator, writePageEnd, writePageSeperator, writePageStart, writeParagraphEnd, writeParagraphSeparator, writeParagraphStart, writeString, writeString, writeText, writeText, writeWordSeparatorgetColorSpaces, getCurrentPage, getFonts, getGraphicsStack, getGraphicsState, getGraphicsStates, getResources, getTextLineMatrix, getTextMatrix, getTotalCharCnt, getValidCharCnt, getXObjects, isForceParsing, processEncodedText, processOperator, processOperator, processStream, processSubStream, registerOperatorProcessor, setColorSpaces, setFonts, setForceParsing, setGraphicsStack, setGraphicsState, setGraphicsStates, setTextLineMatrix, setTextMatrixpublic PDFTextStripperByArea()
throws java.io.IOException
java.io.IOException - If there is an error loading properties.public PDFTextStripperByArea(java.util.Properties props)
throws java.io.IOException
props - The properties containing the mapping of operators to
PDFOperator classes.java.io.IOException - If there is an error reading the properties.public PDFTextStripperByArea(java.lang.String encoding)
throws java.io.IOException
encoding - The encoding that the output will be written in.java.io.IOException - If there is an error reading the properties.public void addRegion(java.lang.String regionName,
java.awt.geom.Rectangle2D rect)
regionName - The name of the region.rect - The rectangle area to retrieve the text from.public void removeRegion(java.lang.String regionName)
regionName - The name of the region to delete.public java.util.List<java.lang.String> getRegions()
public java.lang.String getTextForRegion(java.lang.String regionName)
regionName - The name of the region to get the text from.public void extractRegions(PDPage page) throws java.io.IOException
page - The page to extract the regions from.java.io.IOException - If there is an error while extracting text.protected void processTextPosition(TextPosition text)
processTextPosition in class PDFTextStrippertext - The text to process.protected void writePage()
throws java.io.IOException
writePage in class PDFTextStripperjava.io.IOException - If there is an error writing the text.