Package org.biojavax.bio.seq.io
Class EMBLxmlFormat
- java.lang.Object
-
- org.biojavax.bio.seq.io.RichSequenceFormat.BasicFormat
-
- org.biojavax.bio.seq.io.EMBLxmlFormat
-
- All Implemented Interfaces:
SequenceFormat,RichSequenceFormat
public class EMBLxmlFormat extends RichSequenceFormat.BasicFormat
Format reader for EMBLxml files. This version of EMBLxml format will generate and write RichSequence objects. Loosely Based on code from the old, deprecated, org.biojava.bio.seq.io.GenbankXmlFormat object. Understands http://www.ebi.ac.uk/embl/dtd/EMBL_Services_V1.1.dtd- Since:
- 1.5
- Author:
- Alan Li (code based on his work), Richard Holland, Mark Schreiber
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classEMBLxmlFormat.TermsImplements some EMBLxml-specific terms.-
Nested classes/interfaces inherited from interface org.biojavax.bio.seq.io.RichSequenceFormat
RichSequenceFormat.BasicFormat, RichSequenceFormat.HeaderlessFormat
-
-
Field Summary
Fields Modifier and Type Field Description protected static java.lang.StringAUTHOR_TAGprotected static java.lang.StringBASEPOSITION_TAGprotected static java.lang.StringBASEPOSITION_TYPE_ATTRprotected static java.lang.StringCITATION_DATE_ATTRprotected static java.lang.StringCITATION_FIRST_ATTRprotected static java.lang.StringCITATION_ID_ATTRprotected static java.lang.StringCITATION_INSTITUTE_ATTRprotected static java.lang.StringCITATION_ISSUE_ATTRprotected static java.lang.StringCITATION_LAST_ATTRprotected static java.lang.StringCITATION_LOCATION_TAGprotected static java.lang.StringCITATION_NAME_ATTRprotected static java.lang.StringCITATION_PATENT_ATTRprotected static java.lang.StringCITATION_PUB_ATTRprotected static java.lang.StringCITATION_TAGprotected static java.lang.StringCITATION_TYPE_ATTRprotected static java.lang.StringCITATION_VOL_ATTRprotected static java.lang.StringCITATION_YEAR_ATTRprotected static java.lang.StringCOMMENT_TAGprotected static java.lang.StringCOMNAME_TAGprotected static java.lang.StringCONSORTIUM_TAGprotected static java.lang.StringCONTIG_TAGprotected static java.lang.StringDBREF_DB_ATTRprotected static java.lang.StringDBREF_PRIMARY_ATTRprotected static java.lang.StringDBREF_SEC_ATTRprotected static java.lang.StringDBREFERENCE_TAGprotected static java.lang.StringDESC_TAGprotected static java.lang.StringEDITOR_TAGstatic java.lang.StringEMBLXML_FORMATThe name of this formatprotected static java.lang.StringENTRY_ACCESSION_ATTRprotected static java.lang.StringENTRY_CREATED_ATTRprotected static java.lang.StringENTRY_DATACLASS_ATTRprotected static java.lang.StringENTRY_GROUP_TAGprotected static java.lang.StringENTRY_RELCREATED_ATTRprotected static java.lang.StringENTRY_RELUPDATED_ATTRprotected static java.lang.StringENTRY_STATUS_ATTRprotected static java.lang.StringENTRY_STATUS_DATE_ATTRprotected static java.lang.StringENTRY_SUBACC_ATTRprotected static java.lang.StringENTRY_SUBVER_ATTRprotected static java.lang.StringENTRY_SUBWGSVER_ATTRprotected static java.lang.StringENTRY_TAGprotected static java.lang.StringENTRY_TAX_DIVISION_ATTRprotected static java.lang.StringENTRY_UPDATED_ATTRprotected static java.lang.StringENTRY_VER_ATTRprotected static java.lang.StringFEATURE_NAME_ATTRprotected static java.lang.StringFEATURE_TAGprotected static java.lang.StringKEYWORD_TAGprotected static java.lang.StringLINEAGE_TAGprotected static java.lang.StringLOC_ELEMENT_ACC_ATTRprotected static java.lang.StringLOC_ELEMENT_COMPL_ATTRprotected static java.lang.StringLOC_ELEMENT_TYPE_ATTRprotected static java.lang.StringLOC_ELEMENT_VER_ATTRprotected static java.lang.StringLOCATION_COMPL_ATTRprotected static java.lang.StringLOCATION_ELEMENT_TAGprotected static java.lang.StringLOCATION_TAGprotected static java.lang.StringLOCATION_TYPE_ATTRprotected static java.lang.StringLOCATOR_TAGprotected static java.lang.StringORGANELLE_TAGprotected static java.lang.StringORGANISM_TAGprotected static java.lang.StringPATENT_TAGprotected static java.lang.StringPROJ_ACC_TAGprotected static java.lang.StringQUALIFIER_NAME_ATTRprotected static java.lang.StringQUALIFIER_TAGprotected static java.lang.StringREF_POS_BEGIN_ATTRprotected static java.lang.StringREF_POS_END_ATTRprotected static java.lang.StringREFERENCE_TAGprotected static java.lang.StringSCINAME_TAGprotected static java.lang.StringSEC_ACC_TAGprotected static java.lang.StringSEQUENCE_LENGTH_ATTRprotected static java.lang.StringSEQUENCE_TAGprotected static java.lang.StringSEQUENCE_TOPOLOGY_ATTRprotected static java.lang.StringSEQUENCE_TYPE_ATTRprotected static java.lang.StringSEQUENCE_VER_ATTRprotected static java.lang.StringTAXID_TAGprotected static java.lang.StringTAXON_TAGprotected static java.lang.StringTITLE_TAGprotected static java.util.regex.PatternxmlSchema
-
Constructor Summary
Constructors Constructor Description EMBLxmlFormat()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidbeginWriting()Informs the writer that we want to start writing.booleancanRead(java.io.BufferedInputStream stream)Check to see if a given stream is in our format.booleancanRead(java.io.File file)Check to see if a given file is in our format.voidfinishWriting()Informs the writer that are done writing.java.lang.StringgetDefaultFormat()getDefaultFormatreturns the String identifier for the default sub-format written by aSequenceFormatimplementation.SymbolTokenizationguessSymbolTokenization(java.io.BufferedInputStream stream)On the assumption that the stream is readable by this format (not checked), attempt to guess which symbol tokenization we should use to read it.SymbolTokenizationguessSymbolTokenization(java.io.File file)On the assumption that the file is readable by this format (not checked), attempt to guess which symbol tokenization we should use to read it.booleanreadRichSequence(java.io.BufferedReader reader, SymbolTokenization symParser, RichSeqIOListener rlistener, Namespace ns)Reads a sequence from the given buffered reader using the given tokenizer to parse sequence symbols.booleanreadSequence(java.io.BufferedReader reader, SymbolTokenization symParser, SeqIOListener listener)Read a sequence and pass data on to a SeqIOListener.voidwriteSequence(Sequence seq, java.io.PrintStream os)writeSequencewrites a sequence to the specified PrintStream, using the default format.voidwriteSequence(Sequence seq, java.lang.String format, java.io.PrintStream os)writeSequencewrites a sequence to the specifiedPrintStream, using the specified format.voidwriteSequence(Sequence seq, Namespace ns)Writes a sequence out to the outputstream given by beginWriting() using the default format of the implementing class.-
Methods inherited from class org.biojavax.bio.seq.io.RichSequenceFormat.BasicFormat
getElideComments, getElideFeatures, getElideReferences, getElideSymbols, getLineWidth, getPrintStream, setElideComments, setElideFeatures, setElideReferences, setElideSymbols, setLineWidth, setPrintStream
-
-
-
-
Field Detail
-
EMBLXML_FORMAT
public static final java.lang.String EMBLXML_FORMAT
The name of this format- See Also:
- Constant Field Values
-
ENTRY_GROUP_TAG
protected static final java.lang.String ENTRY_GROUP_TAG
- See Also:
- Constant Field Values
-
ENTRY_TAG
protected static final java.lang.String ENTRY_TAG
- See Also:
- Constant Field Values
-
ENTRY_ACCESSION_ATTR
protected static final java.lang.String ENTRY_ACCESSION_ATTR
- See Also:
- Constant Field Values
-
ENTRY_TAX_DIVISION_ATTR
protected static final java.lang.String ENTRY_TAX_DIVISION_ATTR
- See Also:
- Constant Field Values
-
ENTRY_DATACLASS_ATTR
protected static final java.lang.String ENTRY_DATACLASS_ATTR
- See Also:
- Constant Field Values
-
ENTRY_CREATED_ATTR
protected static final java.lang.String ENTRY_CREATED_ATTR
- See Also:
- Constant Field Values
-
ENTRY_RELCREATED_ATTR
protected static final java.lang.String ENTRY_RELCREATED_ATTR
- See Also:
- Constant Field Values
-
ENTRY_UPDATED_ATTR
protected static final java.lang.String ENTRY_UPDATED_ATTR
- See Also:
- Constant Field Values
-
ENTRY_RELUPDATED_ATTR
protected static final java.lang.String ENTRY_RELUPDATED_ATTR
- See Also:
- Constant Field Values
-
ENTRY_VER_ATTR
protected static final java.lang.String ENTRY_VER_ATTR
- See Also:
- Constant Field Values
-
ENTRY_SUBACC_ATTR
protected static final java.lang.String ENTRY_SUBACC_ATTR
- See Also:
- Constant Field Values
-
ENTRY_SUBVER_ATTR
protected static final java.lang.String ENTRY_SUBVER_ATTR
- See Also:
- Constant Field Values
-
ENTRY_SUBWGSVER_ATTR
protected static final java.lang.String ENTRY_SUBWGSVER_ATTR
- See Also:
- Constant Field Values
-
ENTRY_STATUS_ATTR
protected static final java.lang.String ENTRY_STATUS_ATTR
- See Also:
- Constant Field Values
-
ENTRY_STATUS_DATE_ATTR
protected static final java.lang.String ENTRY_STATUS_DATE_ATTR
- See Also:
- Constant Field Values
-
SEC_ACC_TAG
protected static final java.lang.String SEC_ACC_TAG
- See Also:
- Constant Field Values
-
PROJ_ACC_TAG
protected static final java.lang.String PROJ_ACC_TAG
- See Also:
- Constant Field Values
-
DESC_TAG
protected static final java.lang.String DESC_TAG
- See Also:
- Constant Field Values
-
KEYWORD_TAG
protected static final java.lang.String KEYWORD_TAG
- See Also:
- Constant Field Values
-
REFERENCE_TAG
protected static final java.lang.String REFERENCE_TAG
- See Also:
- Constant Field Values
-
CITATION_TAG
protected static final java.lang.String CITATION_TAG
- See Also:
- Constant Field Values
-
CITATION_ID_ATTR
protected static final java.lang.String CITATION_ID_ATTR
- See Also:
- Constant Field Values
-
CITATION_TYPE_ATTR
protected static final java.lang.String CITATION_TYPE_ATTR
- See Also:
- Constant Field Values
-
CITATION_DATE_ATTR
protected static final java.lang.String CITATION_DATE_ATTR
- See Also:
- Constant Field Values
-
CITATION_NAME_ATTR
protected static final java.lang.String CITATION_NAME_ATTR
- See Also:
- Constant Field Values
-
CITATION_VOL_ATTR
protected static final java.lang.String CITATION_VOL_ATTR
- See Also:
- Constant Field Values
-
CITATION_ISSUE_ATTR
protected static final java.lang.String CITATION_ISSUE_ATTR
- See Also:
- Constant Field Values
-
CITATION_FIRST_ATTR
protected static final java.lang.String CITATION_FIRST_ATTR
- See Also:
- Constant Field Values
-
CITATION_LAST_ATTR
protected static final java.lang.String CITATION_LAST_ATTR
- See Also:
- Constant Field Values
-
CITATION_PUB_ATTR
protected static final java.lang.String CITATION_PUB_ATTR
- See Also:
- Constant Field Values
-
CITATION_PATENT_ATTR
protected static final java.lang.String CITATION_PATENT_ATTR
- See Also:
- Constant Field Values
-
CITATION_INSTITUTE_ATTR
protected static final java.lang.String CITATION_INSTITUTE_ATTR
- See Also:
- Constant Field Values
-
CITATION_YEAR_ATTR
protected static final java.lang.String CITATION_YEAR_ATTR
- See Also:
- Constant Field Values
-
DBREFERENCE_TAG
protected static final java.lang.String DBREFERENCE_TAG
- See Also:
- Constant Field Values
-
DBREF_DB_ATTR
protected static final java.lang.String DBREF_DB_ATTR
- See Also:
- Constant Field Values
-
DBREF_PRIMARY_ATTR
protected static final java.lang.String DBREF_PRIMARY_ATTR
- See Also:
- Constant Field Values
-
DBREF_SEC_ATTR
protected static final java.lang.String DBREF_SEC_ATTR
- See Also:
- Constant Field Values
-
CONSORTIUM_TAG
protected static final java.lang.String CONSORTIUM_TAG
- See Also:
- Constant Field Values
-
TITLE_TAG
protected static final java.lang.String TITLE_TAG
- See Also:
- Constant Field Values
-
EDITOR_TAG
protected static final java.lang.String EDITOR_TAG
- See Also:
- Constant Field Values
-
AUTHOR_TAG
protected static final java.lang.String AUTHOR_TAG
- See Also:
- Constant Field Values
-
PATENT_TAG
protected static final java.lang.String PATENT_TAG
- See Also:
- Constant Field Values
-
LOCATOR_TAG
protected static final java.lang.String LOCATOR_TAG
- See Also:
- Constant Field Values
-
CITATION_LOCATION_TAG
protected static final java.lang.String CITATION_LOCATION_TAG
- See Also:
- Constant Field Values
-
REF_POS_BEGIN_ATTR
protected static final java.lang.String REF_POS_BEGIN_ATTR
- See Also:
- Constant Field Values
-
REF_POS_END_ATTR
protected static final java.lang.String REF_POS_END_ATTR
- See Also:
- Constant Field Values
-
COMMENT_TAG
protected static final java.lang.String COMMENT_TAG
- See Also:
- Constant Field Values
-
FEATURE_TAG
protected static final java.lang.String FEATURE_TAG
- See Also:
- Constant Field Values
-
FEATURE_NAME_ATTR
protected static final java.lang.String FEATURE_NAME_ATTR
- See Also:
- Constant Field Values
-
ORGANISM_TAG
protected static final java.lang.String ORGANISM_TAG
- See Also:
- Constant Field Values
-
SCINAME_TAG
protected static final java.lang.String SCINAME_TAG
- See Also:
- Constant Field Values
-
COMNAME_TAG
protected static final java.lang.String COMNAME_TAG
- See Also:
- Constant Field Values
-
TAXID_TAG
protected static final java.lang.String TAXID_TAG
- See Also:
- Constant Field Values
-
LINEAGE_TAG
protected static final java.lang.String LINEAGE_TAG
- See Also:
- Constant Field Values
-
TAXON_TAG
protected static final java.lang.String TAXON_TAG
- See Also:
- Constant Field Values
-
ORGANELLE_TAG
protected static final java.lang.String ORGANELLE_TAG
- See Also:
- Constant Field Values
-
QUALIFIER_TAG
protected static final java.lang.String QUALIFIER_TAG
- See Also:
- Constant Field Values
-
QUALIFIER_NAME_ATTR
protected static final java.lang.String QUALIFIER_NAME_ATTR
- See Also:
- Constant Field Values
-
LOCATION_TAG
protected static final java.lang.String LOCATION_TAG
- See Also:
- Constant Field Values
-
LOCATION_TYPE_ATTR
protected static final java.lang.String LOCATION_TYPE_ATTR
- See Also:
- Constant Field Values
-
LOCATION_COMPL_ATTR
protected static final java.lang.String LOCATION_COMPL_ATTR
- See Also:
- Constant Field Values
-
LOCATION_ELEMENT_TAG
protected static final java.lang.String LOCATION_ELEMENT_TAG
- See Also:
- Constant Field Values
-
LOC_ELEMENT_TYPE_ATTR
protected static final java.lang.String LOC_ELEMENT_TYPE_ATTR
- See Also:
- Constant Field Values
-
LOC_ELEMENT_ACC_ATTR
protected static final java.lang.String LOC_ELEMENT_ACC_ATTR
- See Also:
- Constant Field Values
-
LOC_ELEMENT_VER_ATTR
protected static final java.lang.String LOC_ELEMENT_VER_ATTR
- See Also:
- Constant Field Values
-
LOC_ELEMENT_COMPL_ATTR
protected static final java.lang.String LOC_ELEMENT_COMPL_ATTR
- See Also:
- Constant Field Values
-
BASEPOSITION_TAG
protected static final java.lang.String BASEPOSITION_TAG
- See Also:
- Constant Field Values
-
BASEPOSITION_TYPE_ATTR
protected static final java.lang.String BASEPOSITION_TYPE_ATTR
- See Also:
- Constant Field Values
-
CONTIG_TAG
protected static final java.lang.String CONTIG_TAG
- See Also:
- Constant Field Values
-
SEQUENCE_TAG
protected static final java.lang.String SEQUENCE_TAG
- See Also:
- Constant Field Values
-
SEQUENCE_TYPE_ATTR
protected static final java.lang.String SEQUENCE_TYPE_ATTR
- See Also:
- Constant Field Values
-
SEQUENCE_LENGTH_ATTR
protected static final java.lang.String SEQUENCE_LENGTH_ATTR
- See Also:
- Constant Field Values
-
SEQUENCE_TOPOLOGY_ATTR
protected static final java.lang.String SEQUENCE_TOPOLOGY_ATTR
- See Also:
- Constant Field Values
-
SEQUENCE_VER_ATTR
protected static final java.lang.String SEQUENCE_VER_ATTR
- See Also:
- Constant Field Values
-
xmlSchema
protected static final java.util.regex.Pattern xmlSchema
-
-
Method Detail
-
canRead
public boolean canRead(java.io.File file) throws java.io.IOExceptionCheck to see if a given file is in our format. Some formats may be able to determine this by filename, whilst others may have to open the file and read it to see what format it is in. A file is in EMBLxml format if the second XML line contains the phrase "http://www.ebi.ac.uk/schema/EMBL_schema.xsd".- Specified by:
canReadin interfaceRichSequenceFormat- Overrides:
canReadin classRichSequenceFormat.BasicFormat- Parameters:
file- theFileto check.- Returns:
- true if the file is readable by this format, false if not.
- Throws:
java.io.IOException- in case the file is inaccessible.
-
guessSymbolTokenization
public SymbolTokenization guessSymbolTokenization(java.io.File file) throws java.io.IOException
On the assumption that the file is readable by this format (not checked), attempt to guess which symbol tokenization we should use to read it. For formats that only accept one tokenization, just return it without checking the file. For formats that accept multiple tokenizations, its up to you how you do it. Always returns a DNA tokenizer.- Specified by:
guessSymbolTokenizationin interfaceRichSequenceFormat- Overrides:
guessSymbolTokenizationin classRichSequenceFormat.BasicFormat- Parameters:
file- theFileobject to guess the format of.- Returns:
- a
SymbolTokenizationto read the file with. - Throws:
java.io.IOException- if the file is unrecognisable or inaccessible.
-
canRead
public boolean canRead(java.io.BufferedInputStream stream) throws java.io.IOExceptionCheck to see if a given stream is in our format. A stream is in EMBLxml format if the second XML line contains the phrase "http://www.ebi.ac.uk/schema/EMBL_schema.xsd".- Parameters:
stream- theBufferedInputStreamto check.- Returns:
- true if the stream is readable by this format, false if not.
- Throws:
java.io.IOException- in case the stream is inaccessible.
-
guessSymbolTokenization
public SymbolTokenization guessSymbolTokenization(java.io.BufferedInputStream stream) throws java.io.IOException
On the assumption that the stream is readable by this format (not checked), attempt to guess which symbol tokenization we should use to read it. For formats that only accept one tokenization, just return it without checking the stream. For formats that accept multiple tokenizations, its up to you how you do it. Always returns a DNA tokenizer.- Parameters:
stream- theBufferedInputStreamobject to guess the format of.- Returns:
- a
SymbolTokenizationto read the stream with. - Throws:
java.io.IOException- if the stream is unrecognisable or inaccessible.
-
readSequence
public boolean readSequence(java.io.BufferedReader reader, SymbolTokenization symParser, SeqIOListener listener) throws IllegalSymbolException, java.io.IOException, ParseExceptionRead a sequence and pass data on to a SeqIOListener.- Parameters:
reader- The stream of data to parse.symParser- A SymbolParser defining a mapping from character data to Symbols.listener- A listener to notify when data is extracted from the stream.- Returns:
- a boolean indicating whether or not the stream contains any more sequences.
- Throws:
IllegalSymbolException- if it is not possible to translate character data from the stream into valid BioJava symbols.java.io.IOException- if an error occurs while reading from the stream.ParseException
-
readRichSequence
public boolean readRichSequence(java.io.BufferedReader reader, SymbolTokenization symParser, RichSeqIOListener rlistener, Namespace ns) throws IllegalSymbolException, java.io.IOException, ParseExceptionReads a sequence from the given buffered reader using the given tokenizer to parse sequence symbols. Events are passed to the listener, and the namespace used for sequences read is the one given. If the namespace is null, then the default namespace for the parser is used, which may depend on individual implementations of this interface.- Parameters:
reader- the input sourcesymParser- the tokenizer which understands the sequence being readrlistener- the listener to send sequence events tons- the namespace to read sequences into.- Returns:
- true if there is more to read after this, false otherwise.
- Throws:
IllegalSymbolException- if the tokenizer couldn't understand one of the sequence symbols in the file.java.io.IOException- if there was a read error.ParseException
-
beginWriting
public void beginWriting() throws java.io.IOExceptionInforms the writer that we want to start writing. This will do any initialisation required, such as writing the opening tags of an XML file that groups sequences together.- Throws:
java.io.IOException- if writing fails.
-
finishWriting
public void finishWriting() throws java.io.IOExceptionInforms the writer that are done writing. This will do any finalisation required, such as writing the closing tags of an XML file that groups sequences together.- Throws:
java.io.IOException- if writing fails.
-
writeSequence
public void writeSequence(Sequence seq, java.io.PrintStream os) throws java.io.IOException
writeSequencewrites a sequence to the specified PrintStream, using the default format.- Parameters:
seq- the sequence to write out.os- the printstream to write to.- Throws:
java.io.IOException
-
writeSequence
public void writeSequence(Sequence seq, java.lang.String format, java.io.PrintStream os) throws java.io.IOException
writeSequencewrites a sequence to the specifiedPrintStream, using the specified format.- Parameters:
seq- aSequenceto write out.format- aStringindicating which sub-format of those available from a particularSequenceFormatimplemention to use when writing.os- aPrintStreamobject.- Throws:
java.io.IOException- if an error occurs.
-
writeSequence
public void writeSequence(Sequence seq, Namespace ns) throws java.io.IOException
Writes a sequence out to the outputstream given by beginWriting() using the default format of the implementing class. If namespace is given, sequences will be written with that namespace, otherwise they will be written with the default namespace of the implementing class (which is usually the namespace of the sequence itself). If you pass this method a sequence which is not a RichSequence, it will attempt to convert it using RichSequence.Tools.enrich(). Obviously this is not going to guarantee a perfect conversion, so it's better if you just use RichSequences to start with! Namespace is ignored as EMBLxml has no concept of it.- Parameters:
seq- the sequence to writens- the namespace to write it with- Throws:
java.io.IOException- in case it couldn't write something
-
getDefaultFormat
public java.lang.String getDefaultFormat()
getDefaultFormatreturns the String identifier for the default sub-format written by aSequenceFormatimplementation.- Returns:
- a
String.
-
-