public class FuzzyTermsEnum extends TermsEnum
Term enumerations are always ordered by
BytesRef.compareTo(org.apache.lucene.util.BytesRef)
. Each term in the enumeration is
greater than all that precede it.
Modifier and Type | Class and Description |
---|---|
private class |
FuzzyTermsEnum.AutomatonFuzzyTermsEnum
Implement fuzzy enumeration with Terms.intersect.
|
static interface |
FuzzyTermsEnum.LevenshteinAutomataAttribute
reuses compiled automata across different segments,
because they are independent of the index
|
static class |
FuzzyTermsEnum.LevenshteinAutomataAttributeImpl
Stores compiled automata as a list (indexed by edit distance)
|
TermsEnum.SeekStatus
Modifier and Type | Field and Description |
---|---|
private BoostAttribute |
actualBoostAtt |
private TermsEnum |
actualEnum |
private BoostAttribute |
boostAtt |
private float |
bottom |
private BytesRef |
bottomTerm |
private FuzzyTermsEnum.LevenshteinAutomataAttribute |
dfaAtt |
private MaxNonCompetitiveBoostAttribute |
maxBoostAtt |
protected int |
maxEdits |
protected float |
minSimilarity |
private BytesRef |
queuedBottom |
protected boolean |
raw |
protected int |
realPrefixLength |
protected float |
scale_factor |
private Term |
term |
protected int |
termLength |
protected Terms |
terms |
protected int[] |
termText |
private boolean |
transpositions |
Constructor and Description |
---|
FuzzyTermsEnum(Terms terms,
AttributeSource atts,
Term term,
float minSimilarity,
int prefixLength,
boolean transpositions)
Constructor for enumeration of all terms from specified
reader which share a prefix of
length prefixLength with term and which have a fuzzy similarity >
minSimilarity . |
Modifier and Type | Method and Description |
---|---|
private void |
bottomChanged(BytesRef lastTerm,
boolean init)
fired when the max non-competitive boost has changed.
|
private float |
calculateMaxBoost(int nEdits) |
int |
docFreq()
Returns the number of documents containing the current
term.
|
protected TermsEnum |
getAutomatonEnum(int editDistance,
BytesRef lastTerm)
return an automata-based enum for matching up to editDistance from
lastTerm, if possible
|
float |
getMinSimilarity() |
float |
getScaleFactor() |
private java.util.List<CompiledAutomaton> |
initAutomata(int maxDistance)
initialize levenshtein DFAs up to maxDistance, if possible
|
private int |
initialMaxDistance(float minimumSimilarity,
int termLen) |
protected void |
maxEditDistanceChanged(BytesRef lastTerm,
int maxEdits,
boolean init) |
BytesRef |
next()
Increments the iteration to the next
BytesRef in the iterator. |
long |
ord()
Returns ordinal position for current term.
|
PostingsEnum |
postings(PostingsEnum reuse,
int flags)
Get
PostingsEnum for the current term, with
control over whether freqs, positions, offsets or payloads
are required. |
TermsEnum.SeekStatus |
seekCeil(BytesRef text)
Seeks to the specified term, if it exists, or to the
next (ceiling) term.
|
boolean |
seekExact(BytesRef text)
Attempts to seek to the exact term, returning
true if the term is found.
|
void |
seekExact(BytesRef term,
TermState state)
Expert: Seeks a specific position by
TermState previously obtained
from TermsEnum.termState() . |
void |
seekExact(long ord)
Seeks to the specified term by ordinal (position) as
previously returned by
TermsEnum.ord() . |
protected void |
setEnum(TermsEnum actualEnum)
swap in a new actual enum to proxy to
|
BytesRef |
term()
Returns current term.
|
TermState |
termState()
Expert: Returns the TermsEnums internal state to position the TermsEnum
without re-seeking the term dictionary.
|
long |
totalTermFreq()
Returns the total number of occurrences of this term
across all documents (the sum of the freq() for each
doc that has this term).
|
attributes, postings
private TermsEnum actualEnum
private BoostAttribute actualBoostAtt
private final BoostAttribute boostAtt
private final MaxNonCompetitiveBoostAttribute maxBoostAtt
private final FuzzyTermsEnum.LevenshteinAutomataAttribute dfaAtt
private float bottom
private BytesRef bottomTerm
protected final float minSimilarity
protected final float scale_factor
protected final int termLength
protected int maxEdits
protected final boolean raw
protected final Terms terms
private final Term term
protected final int[] termText
protected final int realPrefixLength
private final boolean transpositions
private BytesRef queuedBottom
public FuzzyTermsEnum(Terms terms, AttributeSource atts, Term term, float minSimilarity, int prefixLength, boolean transpositions) throws java.io.IOException
reader
which share a prefix of
length prefixLength
with term
and which have a fuzzy similarity >
minSimilarity
.
After calling the constructor the enumeration is already pointing to the first valid term if such a term exists.
terms
- Delivers terms.atts
- AttributeSource
created by the rewrite method of MultiTermQuery
thats contains information about competitive boosts during rewrite. It is also used
to cache DFAs between segment transitions.term
- Pattern term.minSimilarity
- Minimum required similarity for terms from the reader. Pass an integer value
representing edit distance. Passing a fraction is deprecated.prefixLength
- Length of required common prefix. Default value is 0.java.io.IOException
- if there is a low-level IO errorprotected TermsEnum getAutomatonEnum(int editDistance, BytesRef lastTerm) throws java.io.IOException
java.io.IOException
private java.util.List<CompiledAutomaton> initAutomata(int maxDistance)
protected void setEnum(TermsEnum actualEnum)
private void bottomChanged(BytesRef lastTerm, boolean init) throws java.io.IOException
java.io.IOException
protected void maxEditDistanceChanged(BytesRef lastTerm, int maxEdits, boolean init) throws java.io.IOException
java.io.IOException
private int initialMaxDistance(float minimumSimilarity, int termLen)
private float calculateMaxBoost(int nEdits)
public BytesRef next() throws java.io.IOException
BytesRefIterator
BytesRef
in the iterator.
Returns the resulting BytesRef
or null
if the end of
the iterator is reached. The returned BytesRef may be re-used across calls
to next. After this method returns null, do not call it again: the results
are undefined.BytesRef
in the iterator or null
if
the end of the iterator is reached.java.io.IOException
- If there is a low-level I/O error.public int docFreq() throws java.io.IOException
TermsEnum
TermsEnum.SeekStatus.END
.public long totalTermFreq() throws java.io.IOException
TermsEnum
totalTermFreq
in class TermsEnum
java.io.IOException
public PostingsEnum postings(PostingsEnum reuse, int flags) throws java.io.IOException
TermsEnum
PostingsEnum
for the current term, with
control over whether freqs, positions, offsets or payloads
are required. Do not call this when the enum is
unpositioned. This method may return null if the postings
information required is not available from the index
NOTE: the returned iterator may return deleted documents, so
deleted documents have to be checked on top of the PostingsEnum
.
postings
in class TermsEnum
reuse
- pass a prior PostingsEnum for possible reuseflags
- specifies which optional per-document values
you require; see PostingsEnum.FREQS
java.io.IOException
public void seekExact(BytesRef term, TermState state) throws java.io.IOException
TermsEnum
TermState
previously obtained
from TermsEnum.termState()
. Callers should maintain the TermState
to
use this method. Low-level implementations may position the TermsEnum
without re-seeking the term dictionary.
Seeking by TermState
should only be used iff the state was obtained
from the same TermsEnum
instance.
NOTE: Using this method with an incompatible TermState
might leave
this TermsEnum
in undefined state. On a segment level
TermState
instances are compatible only iff the source and the
target TermsEnum
operate on the same field. If operating on segment
level, TermState instances must not be used across segments.
NOTE: A seek by TermState
might not restore the
AttributeSource
's state. AttributeSource
states must be
maintained separately if this method is used.
public TermState termState() throws java.io.IOException
TermsEnum
NOTE: A seek by TermState
might not capture the
AttributeSource
's state. Callers must maintain the
AttributeSource
states separately
termState
in class TermsEnum
java.io.IOException
TermState
,
TermsEnum.seekExact(BytesRef, TermState)
public long ord() throws java.io.IOException
TermsEnum
UnsupportedOperationException
). Do not call this
when the enum is unpositioned.public boolean seekExact(BytesRef text) throws java.io.IOException
TermsEnum
TermsEnum.seekCeil(org.apache.lucene.util.BytesRef)
.public TermsEnum.SeekStatus seekCeil(BytesRef text) throws java.io.IOException
TermsEnum
public void seekExact(long ord) throws java.io.IOException
TermsEnum
TermsEnum.ord()
. The target ord
may be before or after the current ord, and must be
within bounds.public BytesRef term() throws java.io.IOException
TermsEnum
public float getMinSimilarity()
public float getScaleFactor()