private class SlowFuzzyTermsEnum.LinearFuzzyTermsEnum extends FilteredTermsEnum
FilteredTermsEnum.AcceptStatus
TermsEnum.SeekStatus
Modifier and Type | Field and Description |
---|---|
private BoostAttribute |
boostAtt |
private int[] |
d |
private int[] |
p |
private BytesRef |
prefixBytesRef |
private int[] |
text |
private IntsRefBuilder |
utf32 |
actualTerm, tenum
Constructor and Description |
---|
LinearFuzzyTermsEnum()
Constructor for enumeration of all terms from specified
reader which share a prefix of
length prefixLength with term and which have a fuzzy similarity >
minSimilarity . |
Modifier and Type | Method and Description |
---|---|
protected FilteredTermsEnum.AcceptStatus |
accept(BytesRef term)
The termCompare method in FuzzyTermEnum uses Levenshtein distance to
calculate the distance between the given term and the comparing term.
|
private int |
calcDistance(int[] target,
int offset,
int length)
calcDistance returns the Levenshtein distance between the query term
and the target term.
|
private float |
calcSimilarity(int edits,
int m,
int n) |
private int |
calculateMaxDistance(int m)
The max Distance is the maximum Levenshtein distance for the text
compared to some other value that results in score that is
better than the minimum similarity.
|
attributes, docFreq, next, nextSeekTerm, ord, postings, seekCeil, seekExact, seekExact, seekExact, setInitialSeekTerm, term, termState, totalTermFreq
private int[] d
private int[] p
private final int[] text
private final BoostAttribute boostAtt
private final BytesRef prefixBytesRef
private final IntsRefBuilder utf32
public LinearFuzzyTermsEnum() throws java.io.IOException
reader
which share a prefix of
length prefixLength
with term
and which have a fuzzy similarity >
minSimilarity
.
After calling the constructor the enumeration is already pointing to the first valid term if such a term exists.
java.io.IOException
- If there is a low-level I/O error.protected final FilteredTermsEnum.AcceptStatus accept(BytesRef term)
The termCompare method in FuzzyTermEnum uses Levenshtein distance to calculate the distance between the given term and the comparing term.
If the minSimilarity is >= 1.0, this uses the maxEdits as the comparison. Otherwise, this method uses the following logic to calculate similarity.
similarity = 1 - ((float)distance / (float) (prefixLength + Math.min(textlen, targetlen)));where distance is the Levenshtein distance for the two words.
accept
in class FilteredTermsEnum
private final int calcDistance(int[] target, int offset, int length)
calcDistance returns the Levenshtein distance between the query term and the target term.
Embedded within this algorithm is a fail-fast Levenshtein distance algorithm. The fail-fast algorithm differs from the standard Levenshtein distance algorithm in that it is aborted if it is discovered that the minimum distance between the words is greater than some threshold.
Levenshtein distance (also known as edit distance) is a measure of similarity between two strings where the distance is measured as the number of character deletions, insertions or substitutions required to transform one string to the other string.
target
- the target word or phraseoffset
- the offset at which to start the comparisonlength
- the length of what's left of the string to compareprivate float calcSimilarity(int edits, int m, int n)
private int calculateMaxDistance(int m)
m
- the length of the "other value"