Package org.apache.pdfbox.pdfparser
Class ConformingPDFParser
- java.lang.Object
-
- org.apache.pdfbox.pdfparser.BaseParser
-
- org.apache.pdfbox.pdfparser.ConformingPDFParser
-
public class ConformingPDFParser extends BaseParser
- Author:
- Adam Nichols
-
-
Field Summary
Fields Modifier and Type Field Description protected RandomAccessinputFile-
Fields inherited from class org.apache.pdfbox.pdfparser.BaseParser
DEF, document, ENDOBJ, ENDSTREAM, forceParsing, pdfSource, PROP_PUSHBACK_SIZE
-
-
Constructor Summary
Constructors Constructor Description ConformingPDFParser(java.io.File inputFile)Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected byteconsumeWhitespace()This will read all bytes until a non-whitespace character is found.protected byteconsumeWhitespaceBackwards()This will read all bytes (backwards) until a non-whitespace character is found.COSDocumentgetDocument()This will get the document that was parsed.COSBasegetObject(long objectNumber, long generation)PDDocumentgetPDDocument()This will get the PD document that was parsed.booleanisRecursivlyRead()voidparse()This will parse the stream and populate the COSDocument object.protected COSNumberparseNumber(java.lang.String number)protected longparseTrailerInformation()protected COSBaseprocessCosObject(java.lang.String string)protected java.lang.StringreadBackwardUntilWhitespace()protected bytereadByte()protected bytereadByteBackwards()protected COSDictionaryreadDictionaryBackwards()protected intreadInt()This will read an integer from the stream.protected java.lang.StringreadLine()This will read a line starting with the byte at offset and going forward until it finds a newline.protected java.lang.StringreadLineBackwards()This will read a line starting with the byte at offset and going backwards until it finds a newline.protected longreadLongBackwards()This will consume any whitespace, read in bytes until whitespace is found again and then parse the characters which have been read as a long.protected COSNamereadNameBackwards()protected COSNumberreadNumber()This will read in a number and return the COS version of the number (be it a COSInteger or a COSFloat).protected COSBasereadObject()This actually reads the object data.COSBasereadObject(long objectNumber, long generation)This will read an object from the inputFile at whatever our currentOffset is.protected COSBasereadObjectBackwards()protected java.lang.StringreadString()This will read the next string from the stream.protected java.lang.StringreadWord()voidsetRecursivlyRead(boolean recursivlyRead)-
Methods inherited from class org.apache.pdfbox.pdfparser.BaseParser
clearResources, isClosing, isClosing, isEndOfName, isEOL, isEOL, isWhitespace, isWhitespace, parseBoolean, parseCOSArray, parseCOSDictionary, parseCOSName, parseCOSStream, parseCOSString, parseCOSString, parseDirObject, readExpectedString, readGenerationNumber, readLong, readObjectNumber, readString, readStringNumber, readUntilEndStream, setDocument, skipSpaces
-
-
-
-
Field Detail
-
inputFile
protected RandomAccess inputFile
-
-
Method Detail
-
parse
public void parse() throws java.io.IOExceptionThis will parse the stream and populate the COSDocument object. This will close the stream when it is done parsing.- Throws:
java.io.IOException- If there is an error reading from the stream or corrupt data is found.
-
getDocument
public COSDocument getDocument() throws java.io.IOException
This will get the document that was parsed. parse() must be called before this is called. When you are done with this document you must call close() on it to release resources.- Returns:
- The document that was parsed.
- Throws:
java.io.IOException- If there is an error getting the document.
-
getPDDocument
public PDDocument getPDDocument() throws java.io.IOException
This will get the PD document that was parsed. When you are done with this document you must call close() on it to release resources.- Returns:
- The document at the PD layer.
- Throws:
java.io.IOException- If there is an error getting the document.
-
parseTrailerInformation
protected long parseTrailerInformation() throws java.io.IOException, java.lang.NumberFormatException- Throws:
java.io.IOExceptionjava.lang.NumberFormatException
-
readByteBackwards
protected byte readByteBackwards() throws java.io.IOException- Throws:
java.io.IOException
-
readByte
protected byte readByte() throws java.io.IOException- Throws:
java.io.IOException
-
readBackwardUntilWhitespace
protected java.lang.String readBackwardUntilWhitespace() throws java.io.IOException- Throws:
java.io.IOException
-
consumeWhitespaceBackwards
protected byte consumeWhitespaceBackwards() throws java.io.IOExceptionThis will read all bytes (backwards) until a non-whitespace character is found. To save you an extra read, the non-whitespace character is returned. If the current character is not whitespace, this method will just return the current char.- Returns:
- the first non-whitespace character found
- Throws:
java.io.IOException- if there is an error reading from the file
-
consumeWhitespace
protected byte consumeWhitespace() throws java.io.IOExceptionThis will read all bytes until a non-whitespace character is found. To save you an extra read, the non-whitespace character is returned. If the current character is not whitespace, this method will just return the current char.- Returns:
- the first non-whitespace character found
- Throws:
java.io.IOException- if there is an error reading from the file
-
readLongBackwards
protected long readLongBackwards() throws java.io.IOException, java.lang.NumberFormatExceptionThis will consume any whitespace, read in bytes until whitespace is found again and then parse the characters which have been read as a long. The current offset will then point at the first whitespace character which preceeds the number.- Returns:
- the parsed number
- Throws:
java.io.IOException- if there is an error reading from the filejava.lang.NumberFormatException- if the bytes read can not be converted to a number
-
readInt
protected int readInt() throws java.io.IOExceptionDescription copied from class:BaseParserThis will read an integer from the stream.- Overrides:
readIntin classBaseParser- Returns:
- The integer that was read from the stream.
- Throws:
java.io.IOException- If there is an error reading from the stream.
-
readNumber
protected COSNumber readNumber() throws java.io.IOException
This will read in a number and return the COS version of the number (be it a COSInteger or a COSFloat).- Returns:
- the COSNumber which was read/parsed
- Throws:
java.io.IOException
-
parseNumber
protected COSNumber parseNumber(java.lang.String number) throws java.io.IOException
- Throws:
java.io.IOException
-
processCosObject
protected COSBase processCosObject(java.lang.String string) throws java.io.IOException
- Throws:
java.io.IOException
-
readObjectBackwards
protected COSBase readObjectBackwards() throws java.io.IOException
- Throws:
java.io.IOException
-
readNameBackwards
protected COSName readNameBackwards() throws java.io.IOException
- Throws:
java.io.IOException
-
getObject
public COSBase getObject(long objectNumber, long generation) throws java.io.IOException
- Throws:
java.io.IOException
-
readObject
public COSBase readObject(long objectNumber, long generation) throws java.io.IOException
This will read an object from the inputFile at whatever our currentOffset is. If the object and generation are not the expected values and this object is set to throw an exception for non-conforming documents, then an exception will be thrown.- Parameters:
objectNumber- the object number you expect to readgeneration- the generation you expect this object to be- Returns:
- the object being read.
- Throws:
java.io.IOException
-
readObject
protected COSBase readObject() throws java.io.IOException
This actually reads the object data.- Returns:
- the object which is read
- Throws:
java.io.IOException
-
readString
protected java.lang.String readString() throws java.io.IOExceptionThis will read the next string from the stream.- Overrides:
readStringin classBaseParser- Returns:
- The string that was read from the stream.
- Throws:
java.io.IOException- If there is an error reading from the stream.
-
readDictionaryBackwards
protected COSDictionary readDictionaryBackwards() throws java.io.IOException
- Throws:
java.io.IOException
-
readLineBackwards
protected java.lang.String readLineBackwards() throws java.io.IOExceptionThis will read a line starting with the byte at offset and going backwards until it finds a newline. This should only be used if we are certain that the data will only be text, and not binary data.- Returns:
- the string which was read
- Throws:
java.io.IOException- if there was an error reading data from the file
-
readLine
protected java.lang.String readLine() throws java.io.IOExceptionThis will read a line starting with the byte at offset and going forward until it finds a newline. This should only be used if we are certain that the data will only be text, and not binary data.- Overrides:
readLinein classBaseParser- Returns:
- the string which was read
- Throws:
java.io.IOException- if there was an error reading data from the file
-
readWord
protected java.lang.String readWord() throws java.io.IOException- Throws:
java.io.IOException
-
isRecursivlyRead
public boolean isRecursivlyRead()
- Returns:
- the recursivlyRead
-
setRecursivlyRead
public void setRecursivlyRead(boolean recursivlyRead)
- Parameters:
recursivlyRead- the recursivlyRead to set
-
-