public class TokenStreamToAutomaton
extends java.lang.Object
Automaton
where the transition labels are UTF8 bytes (or Unicode
code points if unicodeArcs is true) from the TermToBytesRefAttribute
. Between tokens we insert
POS_SEP and for holes we insert HOLE.Modifier and Type | Class and Description |
---|---|
private static class |
TokenStreamToAutomaton.Position |
private static class |
TokenStreamToAutomaton.Positions |
Modifier and Type | Field and Description |
---|---|
static int |
HOLE
We add this arc to represent a hole.
|
static int |
POS_SEP
We create transition between two adjacent tokens.
|
private boolean |
preservePositionIncrements |
private boolean |
unicodeArcs |
Constructor and Description |
---|
TokenStreamToAutomaton()
Sole constructor.
|
Modifier and Type | Method and Description |
---|---|
private static void |
addHoles(Automaton.Builder builder,
RollingBuffer<TokenStreamToAutomaton.Position> positions,
int pos) |
protected BytesRef |
changeToken(BytesRef in)
Subclass and implement this if you need to change the
token (such as escaping certain bytes) before it's
turned into a graph.
|
void |
setPreservePositionIncrements(boolean enablePositionIncrements)
Whether to generate holes in the automaton for missing positions,
true by default. |
void |
setUnicodeArcs(boolean unicodeArcs)
Whether to make transition labels Unicode code points instead of UTF8 bytes,
false by default |
Automaton |
toAutomaton(TokenStream in)
Pulls the graph (including
PositionLengthAttribute ) from the provided TokenStream , and creates the corresponding
automaton where arcs are bytes (or Unicode code points
if unicodeArcs = true) from each term. |
private boolean preservePositionIncrements
private boolean unicodeArcs
public static final int POS_SEP
public static final int HOLE
public void setPreservePositionIncrements(boolean enablePositionIncrements)
true
by default.public void setUnicodeArcs(boolean unicodeArcs)
false
by defaultprotected BytesRef changeToken(BytesRef in)
public Automaton toAutomaton(TokenStream in) throws java.io.IOException
PositionLengthAttribute
) from the provided TokenStream
, and creates the corresponding
automaton where arcs are bytes (or Unicode code points
if unicodeArcs = true) from each term.java.io.IOException
private static void addHoles(Automaton.Builder builder, RollingBuffer<TokenStreamToAutomaton.Position> positions, int pos)