public final class CzechAnalyzer extends Analyzer
Analyzer for Czech language.
Supports an external list of stopwords (words that will not be indexed at all). A default set of stopwords is used unless an alternative list is specified.
NOTE: This class uses the same Version
dependent settings as StandardAnalyzer.
| Modifier and Type | Field and Description |
|---|---|
static java.lang.String[] |
CZECH_STOP_WORDS
List of typical stopwords.
|
overridesTokenStreamMethod| Constructor and Description |
|---|
CzechAnalyzer()
Deprecated.
Use
CzechAnalyzer(Version) instead |
CzechAnalyzer(java.io.File stopwords)
Deprecated.
Use
CzechAnalyzer(Version, File) instead |
CzechAnalyzer(java.util.HashSet stopwords)
Deprecated.
Use
CzechAnalyzer(Version, HashSet) instead |
CzechAnalyzer(java.lang.String[] stopwords)
Deprecated.
Use
CzechAnalyzer(Version, String[]) instead |
CzechAnalyzer(Version matchVersion)
Builds an analyzer with the default stop words (
CZECH_STOP_WORDS). |
CzechAnalyzer(Version matchVersion,
java.io.File stopwords)
Builds an analyzer with the given stop words.
|
CzechAnalyzer(Version matchVersion,
java.util.HashSet stopwords) |
CzechAnalyzer(Version matchVersion,
java.lang.String[] stopwords)
Builds an analyzer with the given stop words.
|
| Modifier and Type | Method and Description |
|---|---|
void |
loadStopWords(java.io.InputStream wordfile,
java.lang.String encoding)
Loads stopwords hash from resource stream (file, database...).
|
TokenStream |
reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader)
Returns a (possibly reused)
TokenStream which tokenizes all the text in
the provided Reader. |
TokenStream |
tokenStream(java.lang.String fieldName,
java.io.Reader reader)
Creates a
TokenStream which tokenizes all the text in the provided Reader. |
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setOverridesTokenStreamMethod, setPreviousTokenStreampublic static final java.lang.String[] CZECH_STOP_WORDS
public CzechAnalyzer()
CzechAnalyzer(Version) insteadCZECH_STOP_WORDS).public CzechAnalyzer(Version matchVersion)
CZECH_STOP_WORDS).public CzechAnalyzer(java.lang.String[] stopwords)
CzechAnalyzer(Version, String[]) insteadpublic CzechAnalyzer(Version matchVersion, java.lang.String[] stopwords)
public CzechAnalyzer(java.util.HashSet stopwords)
CzechAnalyzer(Version, HashSet) insteadpublic CzechAnalyzer(Version matchVersion, java.util.HashSet stopwords)
public CzechAnalyzer(java.io.File stopwords)
throws java.io.IOException
CzechAnalyzer(Version, File) insteadjava.io.IOExceptionpublic CzechAnalyzer(Version matchVersion, java.io.File stopwords) throws java.io.IOException
java.io.IOExceptionpublic void loadStopWords(java.io.InputStream wordfile,
java.lang.String encoding)
wordfile - File containing the wordlistencoding - Encoding used (win-1250, iso-8859-2, ...), null for default system encodingpublic final TokenStream tokenStream(java.lang.String fieldName, java.io.Reader reader)
TokenStream which tokenizes all the text in the provided Reader.tokenStream in class AnalyzerTokenStream built from a StandardTokenizer filtered with
StandardFilter, LowerCaseFilter, and StopFilterpublic TokenStream reusableTokenStream(java.lang.String fieldName, java.io.Reader reader) throws java.io.IOException
TokenStream which tokenizes all the text in
the provided Reader.reusableTokenStream in class AnalyzerTokenStream built from a StandardTokenizer filtered with
StandardFilter, LowerCaseFilter, and StopFilterjava.io.IOExceptionCopyright © 2000-2016 Apache Software Foundation. All Rights Reserved.