public class PDFHighlighter extends PDFTextStripper
charactersByArticle, document, output, outputEncoding, systemLineSeparator| Constructor and Description |
|---|
PDFHighlighter()
Default constructor.
|
| Modifier and Type | Method and Description |
|---|---|
protected void |
endPage(PDPage pdPage)
End a page.
|
void |
generateXMLHighlight(PDDocument pdDocument,
java.lang.String[] sWords,
java.io.Writer xmlOutput)
Generate an XML highlight string based on the PDF.
|
void |
generateXMLHighlight(PDDocument pdDocument,
java.lang.String highlightWord,
java.io.Writer xmlOutput)
Generate an XML highlight string based on the PDF.
|
static void |
main(java.lang.String[] args)
Command line application.
|
endArticle, endDocument, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageSeparator, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getText, getWordSeparator, handleLineSeparation, inspectFontEncoding, isParagraphSeparation, matchListItemPattern, matchPattern, processPage, processPages, processTextPosition, resetEngine, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageSeparator, setPageStart, setParagraphEnd, setParagraphStart, setShouldSeparateByBeads, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startArticle, startDocument, startPage, writeCharacters, writeLineSeparator, writePage, writePageEnd, writePageSeperator, writePageStart, writeParagraphEnd, writeParagraphSeparator, writeParagraphStart, writeString, writeString, writeText, writeText, writeWordSeparatorgetColorSpaces, getCurrentPage, getFonts, getGraphicsStack, getGraphicsState, getGraphicsStates, getResources, getTextLineMatrix, getTextMatrix, getTotalCharCnt, getValidCharCnt, getXObjects, isForceParsing, processEncodedText, processOperator, processOperator, processStream, processSubStream, registerOperatorProcessor, setColorSpaces, setFonts, setForceParsing, setGraphicsStack, setGraphicsState, setGraphicsStates, setTextLineMatrix, setTextMatrixpublic PDFHighlighter()
throws java.io.IOException
java.io.IOException - If there is an error constructing this class.public void generateXMLHighlight(PDDocument pdDocument, java.lang.String highlightWord, java.io.Writer xmlOutput) throws java.io.IOException
pdDocument - The PDF to find words in.highlightWord - The word to search for.xmlOutput - The resulting output xml file.java.io.IOException - If there is an error reading from the PDF, or writing to the XML.public void generateXMLHighlight(PDDocument pdDocument, java.lang.String[] sWords, java.io.Writer xmlOutput) throws java.io.IOException
pdDocument - The PDF to find words in.sWords - The words to search for.xmlOutput - The resulting output xml file.java.io.IOException - If there is an error reading from the PDF, or writing to the XML.protected void endPage(PDPage pdPage) throws java.io.IOException
endPage in class PDFTextStripperpdPage - The page we are about to process.java.io.IOException - If there is any error writing to the stream.public static void main(java.lang.String[] args)
throws java.io.IOException
args - The command line arguments to the application.java.io.IOException - If there is an error generating the highlight file.