public final class NGramTokenFilter extends TokenFilter
If you were using this TokenFilter
to perform partial highlighting,
this won't work anymore since this filter doesn't update offsets. You should
modify your analysis chain to use NGramTokenizer
, and potentially
override NGramTokenizer.isTokenChar(int)
to perform pre-tokenization.
AttributeSource.State
Modifier and Type | Field and Description |
---|---|
private CharacterUtils |
charUtils |
private int |
curCodePointCount |
private int |
curGramSize |
private int |
curPos |
private int |
curPosInc |
private int |
curPosLen |
private char[] |
curTermBuffer |
private int |
curTermLength |
static int |
DEFAULT_MAX_NGRAM_SIZE |
static int |
DEFAULT_MIN_NGRAM_SIZE |
private boolean |
hasIllegalOffsets |
private int |
maxGram |
private int |
minGram |
private OffsetAttribute |
offsetAtt |
private PositionIncrementAttribute |
posIncAtt |
private PositionLengthAttribute |
posLenAtt |
private CharTermAttribute |
termAtt |
private int |
tokEnd |
private int |
tokStart |
input
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
Constructor and Description |
---|
NGramTokenFilter(TokenStream input)
Creates NGramTokenFilter with default min and max n-grams.
|
NGramTokenFilter(TokenStream input,
int minGram,
int maxGram)
Creates NGramTokenFilter with given min and max n-grams.
|
Modifier and Type | Method and Description |
---|---|
boolean |
incrementToken()
Returns the next token in the stream, or null at EOS.
|
void |
reset()
This method is called by a consumer before it begins consumption using
TokenStream.incrementToken() . |
close, end
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
public static final int DEFAULT_MIN_NGRAM_SIZE
public static final int DEFAULT_MAX_NGRAM_SIZE
private final int minGram
private final int maxGram
private char[] curTermBuffer
private int curTermLength
private int curCodePointCount
private int curGramSize
private int curPos
private int curPosInc
private int curPosLen
private int tokStart
private int tokEnd
private boolean hasIllegalOffsets
private final CharacterUtils charUtils
private final CharTermAttribute termAtt
private final PositionIncrementAttribute posIncAtt
private final PositionLengthAttribute posLenAtt
private final OffsetAttribute offsetAtt
public NGramTokenFilter(TokenStream input, int minGram, int maxGram)
input
- TokenStream
holding the input to be tokenizedminGram
- the smallest n-gram to generatemaxGram
- the largest n-gram to generatepublic NGramTokenFilter(TokenStream input)
input
- TokenStream
holding the input to be tokenizedpublic final boolean incrementToken() throws java.io.IOException
incrementToken
in class TokenStream
java.io.IOException
public void reset() throws java.io.IOException
TokenFilter
TokenStream.incrementToken()
.
Resets this stream to a clean state. Stateful implementations must implement this method so that they can be reused, just as if they had been created fresh.
If you override this method, always call super.reset()
, otherwise
some internal state will not be correctly reset (e.g., Tokenizer
will
throw IllegalStateException
on further usage).
NOTE:
The default implementation chains the call to the input TokenStream, so
be sure to call super.reset()
when overriding this method.
reset
in class TokenFilter
java.io.IOException