public final class BlockTreeTermsWriter extends FieldsConsumer
Writes terms dict and index, block-encoding (column stride) each term's metadata for each set of terms between two index terms.
If minItemsInAutoPrefix
is not zero, then for
IndexOptions.DOCS
fields we detect prefixes that match
"enough" terms and insert auto-prefix terms into the index, which are
used by Terms.intersect(org.apache.lucene.util.automaton.CompiledAutomaton, org.apache.lucene.util.BytesRef)
at search time to speed up prefix
and range queries. Besides Terms.intersect(org.apache.lucene.util.automaton.CompiledAutomaton, org.apache.lucene.util.BytesRef)
, these
auto-prefix terms are invisible to all other APIs (don't change terms
stats, don't show up in normal TermsEnum
s, etc.).
Files:
The .tim file contains the list of terms in each field along with per-term statistics (such as docfreq) and per-term metadata (typically pointers to the postings list for that term in the inverted index).
The .tim is arranged in blocks: with blocks containing a variable number of entries (by default 25-48), where each entry is either a term or a reference to a sub-block.
NOTE: The term dictionary can plug into different postings implementations: the postings writer/reader are actually responsible for encoding and decoding the Postings Metadata and Term Metadata sections.
CodecHeader
Uint64
VInt
length followed by the byte[]VInt
VLong
CodecFooter
Notes:
CodecHeader
storing the version information
for the BlockTree implementation.FieldInfos
. (.fnm)The .tip file contains an index into the term dictionary, so that it can be accessed randomly. The index is also used to determine when a given term cannot exist on disk (in the .tim file), saving a disk seek.
CodecHeader
Uint64
VLong
FST<byte[]>
CodecFooter
Notes:
BlockTreeTermsReader
Modifier and Type | Class and Description |
---|---|
private static class |
BlockTreeTermsWriter.FieldMetaData |
private static class |
BlockTreeTermsWriter.PendingBlock |
private static class |
BlockTreeTermsWriter.PendingEntry |
private static class |
BlockTreeTermsWriter.PendingTerm |
(package private) class |
BlockTreeTermsWriter.TermsWriter |
Modifier and Type | Field and Description |
---|---|
private boolean |
closed |
static int |
DEFAULT_MAX_BLOCK_SIZE
Suggested default value for the
maxItemsInBlock parameter to BlockTreeTermsWriter(SegmentWriteState,PostingsWriterBase,int,int) . |
static int |
DEFAULT_MIN_BLOCK_SIZE
Suggested default value for the
minItemsInBlock parameter to BlockTreeTermsWriter(SegmentWriteState,PostingsWriterBase,int,int) . |
(package private) static BytesRef |
EMPTY_BYTES_REF |
(package private) FieldInfos |
fieldInfos |
private java.util.List<BlockTreeTermsWriter.FieldMetaData> |
fields |
private IndexOutput |
indexOut |
(package private) int |
maxDoc |
(package private) int |
maxItemsInAutoPrefix |
(package private) int |
maxItemsInBlock |
(package private) int |
minItemsInAutoPrefix |
(package private) int |
minItemsInBlock |
(package private) PostingsWriterBase |
postingsWriter |
(package private) FixedBitSet |
prefixDocs |
private PostingsEnum |
prefixDocsEnum
Reused in getAutoPrefixTermsEnum:
|
(package private) BitSetTermsEnum |
prefixFixedBitsTermsEnum
Reused in getAutoPrefixTermsEnum:
|
private TermsEnum |
prefixTermsEnum
Reused in getAutoPrefixTermsEnum:
|
private RAMOutputStream |
scratchBytes |
private IntsRefBuilder |
scratchIntsRef |
private IndexOutput |
termsOut |
Constructor and Description |
---|
BlockTreeTermsWriter(SegmentWriteState state,
PostingsWriterBase postingsWriter,
int minItemsInBlock,
int maxItemsInBlock)
Create a new writer, using default values for auto-prefix terms.
|
BlockTreeTermsWriter(SegmentWriteState state,
PostingsWriterBase postingsWriter,
int minItemsInBlock,
int maxItemsInBlock,
int minItemsInAutoPrefix,
int maxItemsInAutoPrefix)
Create a new writer.
|
Modifier and Type | Method and Description |
---|---|
(package private) static java.lang.String |
brToString(byte[] b) |
(package private) static java.lang.String |
brToString(BytesRef b) |
void |
close() |
(package private) static long |
encodeOutput(long fp,
boolean hasTerms,
boolean isFloor) |
private TermsEnum |
getAutoPrefixTermsEnum(Terms terms,
AutoPrefixTermsWriter.PrefixTerm prefix) |
static void |
validateAutoPrefixSettings(int minItemsInAutoPrefix,
int maxItemsInAutoPrefix)
Throws
IllegalArgumentException if any of these settings
is invalid. |
static void |
validateSettings(int minItemsInBlock,
int maxItemsInBlock)
Throws
IllegalArgumentException if any of these settings
is invalid. |
void |
write(Fields fields)
Write all fields, terms and postings.
|
private static void |
writeBytesRef(IndexOutput out,
BytesRef bytes) |
private void |
writeIndexTrailer(IndexOutput indexOut,
long dirStart)
Writes the index file trailer.
|
private void |
writeTrailer(IndexOutput out,
long dirStart)
Writes the terms file trailer.
|
merge
public static final int DEFAULT_MIN_BLOCK_SIZE
minItemsInBlock
parameter to BlockTreeTermsWriter(SegmentWriteState,PostingsWriterBase,int,int)
.public static final int DEFAULT_MAX_BLOCK_SIZE
maxItemsInBlock
parameter to BlockTreeTermsWriter(SegmentWriteState,PostingsWriterBase,int,int)
.private final IndexOutput termsOut
private final IndexOutput indexOut
final int maxDoc
final int minItemsInBlock
final int maxItemsInBlock
final int minItemsInAutoPrefix
final int maxItemsInAutoPrefix
final PostingsWriterBase postingsWriter
final FieldInfos fieldInfos
private final java.util.List<BlockTreeTermsWriter.FieldMetaData> fields
final FixedBitSet prefixDocs
final BitSetTermsEnum prefixFixedBitsTermsEnum
private TermsEnum prefixTermsEnum
private PostingsEnum prefixDocsEnum
private final RAMOutputStream scratchBytes
private final IntsRefBuilder scratchIntsRef
static final BytesRef EMPTY_BYTES_REF
private boolean closed
public BlockTreeTermsWriter(SegmentWriteState state, PostingsWriterBase postingsWriter, int minItemsInBlock, int maxItemsInBlock) throws java.io.IOException
java.io.IOException
public BlockTreeTermsWriter(SegmentWriteState state, PostingsWriterBase postingsWriter, int minItemsInBlock, int maxItemsInBlock, int minItemsInAutoPrefix, int maxItemsInAutoPrefix) throws java.io.IOException
minItemsInAutoPrefix
other terms or prefixes,
and at most maxItemsInAutoPrefix
other terms
or prefixes. Set minItemsInAutoPrefix
to 0
to disable auto-prefix terms.java.io.IOException
private void writeTrailer(IndexOutput out, long dirStart) throws java.io.IOException
java.io.IOException
private void writeIndexTrailer(IndexOutput indexOut, long dirStart) throws java.io.IOException
java.io.IOException
public static void validateSettings(int minItemsInBlock, int maxItemsInBlock)
IllegalArgumentException
if any of these settings
is invalid.public static void validateAutoPrefixSettings(int minItemsInAutoPrefix, int maxItemsInAutoPrefix)
IllegalArgumentException
if any of these settings
is invalid.public void write(Fields fields) throws java.io.IOException
FieldsConsumer
Notes:
write
in class FieldsConsumer
java.io.IOException
private TermsEnum getAutoPrefixTermsEnum(Terms terms, AutoPrefixTermsWriter.PrefixTerm prefix) throws java.io.IOException
java.io.IOException
static long encodeOutput(long fp, boolean hasTerms, boolean isFloor)
static java.lang.String brToString(BytesRef b)
static java.lang.String brToString(byte[] b)
public void close() throws java.io.IOException
close
in interface java.io.Closeable
close
in interface java.lang.AutoCloseable
close
in class FieldsConsumer
java.io.IOException
private static void writeBytesRef(IndexOutput out, BytesRef bytes) throws java.io.IOException
java.io.IOException