Package org.biojavax.bio.seq.io
Class INSDseqFormat
- java.lang.Object
-
- org.biojavax.bio.seq.io.RichSequenceFormat.BasicFormat
-
- org.biojavax.bio.seq.io.INSDseqFormat
-
- All Implemented Interfaces:
SequenceFormat,RichSequenceFormat
public class INSDseqFormat extends RichSequenceFormat.BasicFormat
Format reader for INSDseq files. This version of INSDseq format will generate and write RichSequence objects. Loosely Based on code from the old, deprecated, org.biojava.bio.seq.io.GenbankXmlFormat object. Understands http://www.ebi.ac.uk/embl/Documentation/DTD/INSDC_V1.4.dtd.txt Does NOT understand the "sites" keyword in INSDReference_position. Interprets this instead as an empty location. This is because there is no obvious way of representing the "sites" keyword in BioSQL. Note also that the INSDInterval tags and associate stuff are not read, as this is duplicate information to the INSDFeature_location tag which is already fully parsed. However, they are written on output, although there is no guarantee that the INSDInterval tags will exactly match the INSDFeature_location tag as it is not possible to exactly reflect its contents using these.- Since:
- 1.5
- Author:
- Alan Li (code based on his work), Richard Holland, George Waldon
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classINSDseqFormat.TermsImplements some INSDseq-specific terms.-
Nested classes/interfaces inherited from interface org.biojavax.bio.seq.io.RichSequenceFormat
RichSequenceFormat.BasicFormat, RichSequenceFormat.HeaderlessFormat
-
-
Field Summary
Fields Modifier and Type Field Description protected static java.lang.StringACC_VERSION_TAGprotected static java.lang.StringACCESSION_TAGprotected static java.lang.StringAUTHOR_TAGprotected static java.lang.StringAUTHORS_GROUP_TAGprotected static java.lang.StringCOMMENT_TAGprotected static java.lang.StringCONSORTIUM_TAGprotected static java.lang.StringCONTIG_TAGprotected static java.lang.StringCREATE_DATE_TAGprotected static java.lang.StringCREATE_REL_TAGprotected static java.lang.StringDATABASE_XREF_TAGprotected static java.util.regex.Patterndbxpprotected static java.lang.StringDEFINITION_TAGprotected static java.lang.StringDIVISION_TAGprotected static java.lang.StringFEATURE_ACCESSION_TAGprotected static java.lang.StringFEATURE_FROM_TAGprotected static java.lang.StringFEATURE_INTERBP_TAGprotected static java.lang.StringFEATURE_INTERVAL_TAGprotected static java.lang.StringFEATURE_INTERVALS_GROUP_TAGprotected static java.lang.StringFEATURE_ISCOMP_TAGprotected static java.lang.StringFEATURE_KEY_TAGprotected static java.lang.StringFEATURE_LOC_TAGprotected static java.lang.StringFEATURE_OPERATOR_TAGprotected static java.lang.StringFEATURE_PARTIAL3_TAGprotected static java.lang.StringFEATURE_PARTIAL5_TAGprotected static java.lang.StringFEATURE_POINT_TAGprotected static java.lang.StringFEATURE_TAGprotected static java.lang.StringFEATURE_TO_TAGprotected static java.lang.StringFEATUREQUAL_NAME_TAGprotected static java.lang.StringFEATUREQUAL_TAGprotected static java.lang.StringFEATUREQUAL_VALUE_TAGprotected static java.lang.StringFEATUREQUALS_GROUP_TAGprotected static java.lang.StringFEATURES_GROUP_TAGstatic java.lang.StringINSDSEQ_FORMATThe name of this formatprotected static java.lang.StringINSDSEQ_TAGprotected static java.lang.StringINSDSEQS_GROUP_TAGprotected static java.lang.StringJOURNAL_TAGprotected static java.lang.StringKEYWORD_TAGprotected static java.lang.StringKEYWORDS_GROUP_TAGprotected static java.lang.StringLENGTH_TAGprotected static java.lang.StringLOCUS_TAGprotected static java.lang.StringMOLTYPE_TAGprotected static java.lang.StringORGANISM_TAGprotected static java.lang.StringOTHER_SEQID_TAGprotected static java.lang.StringOTHER_SEQIDS_GROUP_TAGprotected static java.lang.StringPUBMED_TAGprotected static java.lang.StringREFERENCE_LOCATION_TAGprotected static java.lang.StringREFERENCE_POSITION_TAGprotected static java.lang.StringREFERENCE_TAGprotected static java.lang.StringREFERENCES_GROUP_TAGprotected static java.lang.StringREMARK_TAGprotected static java.lang.StringSECONDARY_ACCESSION_TAGprotected static java.lang.StringSECONDARY_ACCESSIONS_GROUP_TAGprotected static java.lang.StringSEQUENCE_TAGprotected static java.lang.StringSOURCE_TAGprotected static java.lang.StringSTRANDED_TAGprotected static java.lang.StringTAXONOMY_TAGprotected static java.lang.StringTITLE_TAGprotected static java.lang.StringTOPOLOGY_TAGprotected static java.lang.StringUPDATE_DATE_TAGprotected static java.lang.StringUPDATE_REL_TAGprotected static java.util.regex.PatternxmlSchemaprotected static java.lang.StringXREF_DBNAME_TAGprotected static java.lang.StringXREF_ID_TAGprotected static java.lang.StringXREF_TAG
-
Constructor Summary
Constructors Constructor Description INSDseqFormat()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidbeginWriting()Informs the writer that we want to start writing.booleancanRead(java.io.BufferedInputStream stream)Check to see if a given stream is in our format.booleancanRead(java.io.File file)Check to see if a given file is in our format.voidfinishWriting()Informs the writer that are done writing.java.lang.StringgetDefaultFormat()getDefaultFormatreturns the String identifier for the default sub-format written by aSequenceFormatimplementation.SymbolTokenizationguessSymbolTokenization(java.io.BufferedInputStream stream)On the assumption that the stream is readable by this format (not checked), attempt to guess which symbol tokenization we should use to read it.SymbolTokenizationguessSymbolTokenization(java.io.File file)On the assumption that the file is readable by this format (not checked), attempt to guess which symbol tokenization we should use to read it.booleanreadRichSequence(java.io.BufferedReader reader, SymbolTokenization symParser, RichSeqIOListener rlistener, Namespace ns)Reads a sequence from the given buffered reader using the given tokenizer to parse sequence symbols.booleanreadSequence(java.io.BufferedReader reader, SymbolTokenization symParser, SeqIOListener listener)Read a sequence and pass data on to a SeqIOListener.voidwriteSequence(Sequence seq, java.io.PrintStream os)writeSequencewrites a sequence to the specified PrintStream, using the default format.voidwriteSequence(Sequence seq, java.lang.String format, java.io.PrintStream os)writeSequencewrites a sequence to the specifiedPrintStream, using the specified format.voidwriteSequence(Sequence seq, Namespace ns)Writes a sequence out to the outputstream given by beginWriting() using the default format of the implementing class.-
Methods inherited from class org.biojavax.bio.seq.io.RichSequenceFormat.BasicFormat
getElideComments, getElideFeatures, getElideReferences, getElideSymbols, getLineWidth, getPrintStream, setElideComments, setElideFeatures, setElideReferences, setElideSymbols, setLineWidth, setPrintStream
-
-
-
-
Field Detail
-
INSDSEQ_FORMAT
public static final java.lang.String INSDSEQ_FORMAT
The name of this format- See Also:
- Constant Field Values
-
INSDSEQS_GROUP_TAG
protected static final java.lang.String INSDSEQS_GROUP_TAG
- See Also:
- Constant Field Values
-
INSDSEQ_TAG
protected static final java.lang.String INSDSEQ_TAG
- See Also:
- Constant Field Values
-
LOCUS_TAG
protected static final java.lang.String LOCUS_TAG
- See Also:
- Constant Field Values
-
LENGTH_TAG
protected static final java.lang.String LENGTH_TAG
- See Also:
- Constant Field Values
-
TOPOLOGY_TAG
protected static final java.lang.String TOPOLOGY_TAG
- See Also:
- Constant Field Values
-
STRANDED_TAG
protected static final java.lang.String STRANDED_TAG
- See Also:
- Constant Field Values
-
MOLTYPE_TAG
protected static final java.lang.String MOLTYPE_TAG
- See Also:
- Constant Field Values
-
DIVISION_TAG
protected static final java.lang.String DIVISION_TAG
- See Also:
- Constant Field Values
-
UPDATE_DATE_TAG
protected static final java.lang.String UPDATE_DATE_TAG
- See Also:
- Constant Field Values
-
CREATE_DATE_TAG
protected static final java.lang.String CREATE_DATE_TAG
- See Also:
- Constant Field Values
-
UPDATE_REL_TAG
protected static final java.lang.String UPDATE_REL_TAG
- See Also:
- Constant Field Values
-
CREATE_REL_TAG
protected static final java.lang.String CREATE_REL_TAG
- See Also:
- Constant Field Values
-
DEFINITION_TAG
protected static final java.lang.String DEFINITION_TAG
- See Also:
- Constant Field Values
-
DATABASE_XREF_TAG
protected static final java.lang.String DATABASE_XREF_TAG
- See Also:
- Constant Field Values
-
XREF_TAG
protected static final java.lang.String XREF_TAG
- See Also:
- Constant Field Values
-
ACCESSION_TAG
protected static final java.lang.String ACCESSION_TAG
- See Also:
- Constant Field Values
-
ACC_VERSION_TAG
protected static final java.lang.String ACC_VERSION_TAG
- See Also:
- Constant Field Values
-
SECONDARY_ACCESSIONS_GROUP_TAG
protected static final java.lang.String SECONDARY_ACCESSIONS_GROUP_TAG
- See Also:
- Constant Field Values
-
SECONDARY_ACCESSION_TAG
protected static final java.lang.String SECONDARY_ACCESSION_TAG
- See Also:
- Constant Field Values
-
OTHER_SEQIDS_GROUP_TAG
protected static final java.lang.String OTHER_SEQIDS_GROUP_TAG
- See Also:
- Constant Field Values
-
OTHER_SEQID_TAG
protected static final java.lang.String OTHER_SEQID_TAG
- See Also:
- Constant Field Values
-
KEYWORDS_GROUP_TAG
protected static final java.lang.String KEYWORDS_GROUP_TAG
- See Also:
- Constant Field Values
-
KEYWORD_TAG
protected static final java.lang.String KEYWORD_TAG
- See Also:
- Constant Field Values
-
SOURCE_TAG
protected static final java.lang.String SOURCE_TAG
- See Also:
- Constant Field Values
-
ORGANISM_TAG
protected static final java.lang.String ORGANISM_TAG
- See Also:
- Constant Field Values
-
TAXONOMY_TAG
protected static final java.lang.String TAXONOMY_TAG
- See Also:
- Constant Field Values
-
REFERENCES_GROUP_TAG
protected static final java.lang.String REFERENCES_GROUP_TAG
- See Also:
- Constant Field Values
-
REFERENCE_TAG
protected static final java.lang.String REFERENCE_TAG
- See Also:
- Constant Field Values
-
REFERENCE_LOCATION_TAG
protected static final java.lang.String REFERENCE_LOCATION_TAG
- See Also:
- Constant Field Values
-
REFERENCE_POSITION_TAG
protected static final java.lang.String REFERENCE_POSITION_TAG
- See Also:
- Constant Field Values
-
TITLE_TAG
protected static final java.lang.String TITLE_TAG
- See Also:
- Constant Field Values
-
JOURNAL_TAG
protected static final java.lang.String JOURNAL_TAG
- See Also:
- Constant Field Values
-
PUBMED_TAG
protected static final java.lang.String PUBMED_TAG
- See Also:
- Constant Field Values
-
XREF_DBNAME_TAG
protected static final java.lang.String XREF_DBNAME_TAG
- See Also:
- Constant Field Values
-
XREF_ID_TAG
protected static final java.lang.String XREF_ID_TAG
- See Also:
- Constant Field Values
-
REMARK_TAG
protected static final java.lang.String REMARK_TAG
- See Also:
- Constant Field Values
-
AUTHORS_GROUP_TAG
protected static final java.lang.String AUTHORS_GROUP_TAG
- See Also:
- Constant Field Values
-
AUTHOR_TAG
protected static final java.lang.String AUTHOR_TAG
- See Also:
- Constant Field Values
-
CONSORTIUM_TAG
protected static final java.lang.String CONSORTIUM_TAG
- See Also:
- Constant Field Values
-
COMMENT_TAG
protected static final java.lang.String COMMENT_TAG
- See Also:
- Constant Field Values
-
FEATURES_GROUP_TAG
protected static final java.lang.String FEATURES_GROUP_TAG
- See Also:
- Constant Field Values
-
FEATURE_TAG
protected static final java.lang.String FEATURE_TAG
- See Also:
- Constant Field Values
-
FEATURE_KEY_TAG
protected static final java.lang.String FEATURE_KEY_TAG
- See Also:
- Constant Field Values
-
FEATURE_LOC_TAG
protected static final java.lang.String FEATURE_LOC_TAG
- See Also:
- Constant Field Values
-
FEATURE_INTERVALS_GROUP_TAG
protected static final java.lang.String FEATURE_INTERVALS_GROUP_TAG
- See Also:
- Constant Field Values
-
FEATURE_INTERVAL_TAG
protected static final java.lang.String FEATURE_INTERVAL_TAG
- See Also:
- Constant Field Values
-
FEATURE_FROM_TAG
protected static final java.lang.String FEATURE_FROM_TAG
- See Also:
- Constant Field Values
-
FEATURE_TO_TAG
protected static final java.lang.String FEATURE_TO_TAG
- See Also:
- Constant Field Values
-
FEATURE_POINT_TAG
protected static final java.lang.String FEATURE_POINT_TAG
- See Also:
- Constant Field Values
-
FEATURE_ISCOMP_TAG
protected static final java.lang.String FEATURE_ISCOMP_TAG
- See Also:
- Constant Field Values
-
FEATURE_INTERBP_TAG
protected static final java.lang.String FEATURE_INTERBP_TAG
- See Also:
- Constant Field Values
-
FEATURE_ACCESSION_TAG
protected static final java.lang.String FEATURE_ACCESSION_TAG
- See Also:
- Constant Field Values
-
FEATURE_OPERATOR_TAG
protected static final java.lang.String FEATURE_OPERATOR_TAG
- See Also:
- Constant Field Values
-
FEATURE_PARTIAL5_TAG
protected static final java.lang.String FEATURE_PARTIAL5_TAG
- See Also:
- Constant Field Values
-
FEATURE_PARTIAL3_TAG
protected static final java.lang.String FEATURE_PARTIAL3_TAG
- See Also:
- Constant Field Values
-
FEATUREQUALS_GROUP_TAG
protected static final java.lang.String FEATUREQUALS_GROUP_TAG
- See Also:
- Constant Field Values
-
FEATUREQUAL_TAG
protected static final java.lang.String FEATUREQUAL_TAG
- See Also:
- Constant Field Values
-
FEATUREQUAL_NAME_TAG
protected static final java.lang.String FEATUREQUAL_NAME_TAG
- See Also:
- Constant Field Values
-
FEATUREQUAL_VALUE_TAG
protected static final java.lang.String FEATUREQUAL_VALUE_TAG
- See Also:
- Constant Field Values
-
SEQUENCE_TAG
protected static final java.lang.String SEQUENCE_TAG
- See Also:
- Constant Field Values
-
CONTIG_TAG
protected static final java.lang.String CONTIG_TAG
- See Also:
- Constant Field Values
-
dbxp
protected static final java.util.regex.Pattern dbxp
-
xmlSchema
protected static final java.util.regex.Pattern xmlSchema
-
-
Method Detail
-
canRead
public boolean canRead(java.io.File file) throws java.io.IOExceptionCheck to see if a given file is in our format. Some formats may be able to determine this by filename, whilst others may have to open the file and read it to see what format it is in. A file is in INSDseq format if the second XML line contains the phrase "http://www.ebi.ac.uk/dtd/INSD_INSDSeq.dtd".- Specified by:
canReadin interfaceRichSequenceFormat- Overrides:
canReadin classRichSequenceFormat.BasicFormat- Parameters:
file- theFileto check.- Returns:
- true if the file is readable by this format, false if not.
- Throws:
java.io.IOException- in case the file is inaccessible.
-
guessSymbolTokenization
public SymbolTokenization guessSymbolTokenization(java.io.File file) throws java.io.IOException
On the assumption that the file is readable by this format (not checked), attempt to guess which symbol tokenization we should use to read it. For formats that only accept one tokenization, just return it without checking the file. For formats that accept multiple tokenizations, its up to you how you do it. Always returns a DNA tokenizer.- Specified by:
guessSymbolTokenizationin interfaceRichSequenceFormat- Overrides:
guessSymbolTokenizationin classRichSequenceFormat.BasicFormat- Parameters:
file- theFileobject to guess the format of.- Returns:
- a
SymbolTokenizationto read the file with. - Throws:
java.io.IOException- if the file is unrecognisable or inaccessible.
-
canRead
public boolean canRead(java.io.BufferedInputStream stream) throws java.io.IOExceptionCheck to see if a given stream is in our format. A stream is in INSDseq format if the second XML line contains the phrase "http://www.ebi.ac.uk/dtd/INSD_INSDSeq.dtd".- Parameters:
stream- theBufferedInputStreamto check.- Returns:
- true if the stream is readable by this format, false if not.
- Throws:
java.io.IOException- in case the stream is inaccessible.
-
guessSymbolTokenization
public SymbolTokenization guessSymbolTokenization(java.io.BufferedInputStream stream) throws java.io.IOException
On the assumption that the stream is readable by this format (not checked), attempt to guess which symbol tokenization we should use to read it. For formats that only accept one tokenization, just return it without checking the stream. For formats that accept multiple tokenizations, its up to you how you do it. Always returns a DNA tokenizer.- Parameters:
stream- theBufferedInputStreamobject to guess the format of.- Returns:
- a
SymbolTokenizationto read the stream with. - Throws:
java.io.IOException- if the stream is unrecognisable or inaccessible.
-
readSequence
public boolean readSequence(java.io.BufferedReader reader, SymbolTokenization symParser, SeqIOListener listener) throws IllegalSymbolException, java.io.IOException, ParseExceptionRead a sequence and pass data on to a SeqIOListener.- Parameters:
reader- The stream of data to parse.symParser- A SymbolParser defining a mapping from character data to Symbols.listener- A listener to notify when data is extracted from the stream.- Returns:
- a boolean indicating whether or not the stream contains any more sequences.
- Throws:
IllegalSymbolException- if it is not possible to translate character data from the stream into valid BioJava symbols.java.io.IOException- if an error occurs while reading from the stream.ParseException
-
readRichSequence
public boolean readRichSequence(java.io.BufferedReader reader, SymbolTokenization symParser, RichSeqIOListener rlistener, Namespace ns) throws IllegalSymbolException, java.io.IOException, ParseExceptionReads a sequence from the given buffered reader using the given tokenizer to parse sequence symbols. Events are passed to the listener, and the namespace used for sequences read is the one given. If the namespace is null, then the default namespace for the parser is used, which may depend on individual implementations of this interface.- Parameters:
reader- the input sourcesymParser- the tokenizer which understands the sequence being readrlistener- the listener to send sequence events tons- the namespace to read sequences into.- Returns:
- true if there is more to read after this, false otherwise.
- Throws:
IllegalSymbolException- if the tokenizer couldn't understand one of the sequence symbols in the file.java.io.IOException- if there was a read error.ParseException
-
beginWriting
public void beginWriting() throws java.io.IOExceptionInforms the writer that we want to start writing. This will do any initialisation required, such as writing the opening tags of an XML file that groups sequences together.- Throws:
java.io.IOException- if writing fails.
-
finishWriting
public void finishWriting() throws java.io.IOExceptionInforms the writer that are done writing. This will do any finalisation required, such as writing the closing tags of an XML file that groups sequences together.- Throws:
java.io.IOException- if writing fails.
-
writeSequence
public void writeSequence(Sequence seq, java.io.PrintStream os) throws java.io.IOException
writeSequencewrites a sequence to the specified PrintStream, using the default format.- Parameters:
seq- the sequence to write out.os- the printstream to write to.- Throws:
java.io.IOException
-
writeSequence
public void writeSequence(Sequence seq, java.lang.String format, java.io.PrintStream os) throws java.io.IOException
writeSequencewrites a sequence to the specifiedPrintStream, using the specified format.- Parameters:
seq- aSequenceto write out.format- aStringindicating which sub-format of those available from a particularSequenceFormatimplemention to use when writing.os- aPrintStreamobject.- Throws:
java.io.IOException- if an error occurs.
-
writeSequence
public void writeSequence(Sequence seq, Namespace ns) throws java.io.IOException
Writes a sequence out to the outputstream given by beginWriting() using the default format of the implementing class. If namespace is given, sequences will be written with that namespace, otherwise they will be written with the default namespace of the implementing class (which is usually the namespace of the sequence itself). If you pass this method a sequence which is not a RichSequence, it will attempt to convert it using RichSequence.Tools.enrich(). Obviously this is not going to guarantee a perfect conversion, so it's better if you just use RichSequences to start with! Namespace is ignored as INSDseq has no concept of it.- Parameters:
seq- the sequence to writens- the namespace to write it with- Throws:
java.io.IOException- in case it couldn't write something
-
getDefaultFormat
public java.lang.String getDefaultFormat()
getDefaultFormatreturns the String identifier for the default sub-format written by aSequenceFormatimplementation.- Returns:
- a
String.
-
-