public class ArabicLetterTokenizer
extends org.apache.lucene.analysis.LetterTokenizer
The problem with the standard Letter tokenizer is that it fails on diacritics. Handling similar to this is necessary for Indic Scripts, Hebrew, Thaana, etc.
| Constructor and Description |
|---|
ArabicLetterTokenizer(org.apache.lucene.util.AttributeSource.AttributeFactory factory,
java.io.Reader in) |
ArabicLetterTokenizer(org.apache.lucene.util.AttributeSource source,
java.io.Reader in) |
ArabicLetterTokenizer(java.io.Reader in) |
| Modifier and Type | Method and Description |
|---|---|
protected boolean |
isTokenChar(char c)
Allows for Letter category or NonspacingMark category
|
end, incrementToken, next, next, normalize, resetgetOnlyUseNewAPI, reset, setOnlyUseNewAPIaddAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, restoreState, toStringpublic ArabicLetterTokenizer(java.io.Reader in)
public ArabicLetterTokenizer(org.apache.lucene.util.AttributeSource source,
java.io.Reader in)
public ArabicLetterTokenizer(org.apache.lucene.util.AttributeSource.AttributeFactory factory,
java.io.Reader in)
Copyright © 2000-2016 Apache Software Foundation. All Rights Reserved.