@Deprecated public class DocTermOrds extends java.lang.Object implements Accountable
getOrdTermsEnum(org.apache.lucene.index.LeafReader)
method, and then seek-by-ord to get the term's bytes.
While normally term ords are type long, in this API they are
int as the internal representation here cannot address
more than MAX_INT unique terms. Also, typically this
class is used on fields with relatively few unique terms
vs the number of documents. In addition, there is an
internal limit (16 MB) on how many bytes each chunk of
documents may consume. If you trip this limit you'll hit
an IllegalStateException.
Deleted documents are skipped during uninversion, and if
you look them up you'll get 0 ords.
The returned per-document ords do not retain their
original order in the document. Instead they are returned
in sorted (by ord, ie term's BytesRef comparator) order. They
are also de-dup'd (ie if doc has same term more than once
in this field, you'll only get that ord back once).
This class
will create its own term index internally, allowing to
create a wrapped TermsEnum that can handle ord. The
getOrdTermsEnum(org.apache.lucene.index.LeafReader)
method then provides this
wrapped enum.
The RAM consumption of this class can be high!Modifier and Type | Class and Description |
---|---|
private class |
DocTermOrds.Iterator
Deprecated.
|
private class |
DocTermOrds.OrdWrappedTermsEnum
Deprecated.
"wrap" our own terms index around the original IndexReader.
|
Modifier and Type | Field and Description |
---|---|
protected boolean |
checkForDocValues
Deprecated.
If true, check and throw an exception if the field has docValues enabled.
|
static int |
DEFAULT_INDEX_INTERVAL_BITS
Deprecated.
Every 128th term is indexed, by default.
|
protected java.lang.String |
field
Deprecated.
Field we are uninverting.
|
protected int[] |
index
Deprecated.
Holds the per-document ords or a pointer to the ords.
|
protected BytesRef[] |
indexedTermsArray
Deprecated.
Holds the indexed (by default every 128th) terms.
|
private int |
indexInterval
Deprecated.
|
private int |
indexIntervalBits
Deprecated.
|
private int |
indexIntervalMask
Deprecated.
|
protected int |
maxTermDocFreq
Deprecated.
Don't uninvert terms that exceed this count.
|
private long |
memsz
Deprecated.
|
protected int |
numTermsInField
Deprecated.
Number of terms in the field.
|
protected int |
ordBase
Deprecated.
Ordinal of the first term in the field, or 0 if the
PostingsFormat does not implement TermsEnum.ord() . |
protected int |
phase1_time
Deprecated.
Time for phase1 of the uninvert process.
|
protected PostingsEnum |
postingsEnum
Deprecated.
Used while uninverting.
|
protected BytesRef |
prefix
Deprecated.
If non-null, only terms matching this prefix were
indexed.
|
protected long |
sizeOfIndexedStrings
Deprecated.
Total bytes (sum of term lengths) for all indexed terms.
|
protected long |
termInstances
Deprecated.
Total number of references to term numbers.
|
private static int |
TNUM_OFFSET
Deprecated.
|
protected byte[][] |
tnums
Deprecated.
Holds term ords for documents.
|
protected int |
total_time
Deprecated.
Total time to uninvert the field.
|
Modifier | Constructor and Description |
---|---|
|
DocTermOrds(LeafReader reader,
Bits liveDocs,
java.lang.String field)
Deprecated.
Inverts all terms
|
|
DocTermOrds(LeafReader reader,
Bits liveDocs,
java.lang.String field,
BytesRef termPrefix)
Deprecated.
Inverts only terms starting w/ prefix
|
|
DocTermOrds(LeafReader reader,
Bits liveDocs,
java.lang.String field,
BytesRef termPrefix,
int maxTermDocFreq)
Deprecated.
Inverts only terms starting w/ prefix, and only terms
whose docFreq (not taking deletions into account) is
<= maxTermDocFreq
|
|
DocTermOrds(LeafReader reader,
Bits liveDocs,
java.lang.String field,
BytesRef termPrefix,
int maxTermDocFreq,
int indexIntervalBits)
Deprecated.
Inverts only terms starting w/ prefix, and only terms
whose docFreq (not taking deletions into account) is
<= maxTermDocFreq, with a custom indexing interval
(default is every 128nd term).
|
protected |
DocTermOrds(java.lang.String field,
int maxTermDocFreq,
int indexIntervalBits)
Deprecated.
Subclass inits w/ this, but be sure you then call
uninvert, only once
|
Modifier and Type | Method and Description |
---|---|
TermsEnum |
getOrdTermsEnum(LeafReader reader)
Deprecated.
Returns a TermsEnum that implements ord, or null if no terms in field.
|
boolean |
isEmpty()
Deprecated.
Returns
true if no terms were indexed. |
SortedSetDocValues |
iterator(LeafReader reader)
Deprecated.
Returns a SortedSetDocValues view of this instance
|
BytesRef |
lookupTerm(TermsEnum termsEnum,
int ord)
Deprecated.
Returns the term (
BytesRef ) corresponding to
the provided ordinal. |
int |
numTerms()
Deprecated.
Returns the number of terms in this field
|
long |
ramBytesUsed()
Deprecated.
Returns total bytes used.
|
protected void |
setActualDocFreq(int termNum,
int df)
Deprecated.
Invoked during
uninvert(org.apache.lucene.index.LeafReader,Bits,BytesRef)
to record the document frequency for each uninverted
term. |
protected void |
uninvert(LeafReader reader,
Bits liveDocs,
BytesRef termPrefix)
Deprecated.
Call this only once (if you subclass!)
|
private static int |
vIntSize(int x)
Deprecated.
Number of bytes to represent an unsigned int as a vint.
|
protected void |
visitTerm(TermsEnum te,
int termNum)
Deprecated.
Subclass can override this
|
private static int |
writeInt(int x,
byte[] arr,
int pos)
Deprecated.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getChildResources
private static final int TNUM_OFFSET
public static final int DEFAULT_INDEX_INTERVAL_BITS
private int indexIntervalBits
private int indexIntervalMask
private int indexInterval
protected final int maxTermDocFreq
protected final java.lang.String field
protected int numTermsInField
protected long termInstances
private long memsz
protected int total_time
protected int phase1_time
protected int[] index
protected byte[][] tnums
protected long sizeOfIndexedStrings
protected BytesRef[] indexedTermsArray
protected BytesRef prefix
protected int ordBase
PostingsFormat
does not implement TermsEnum.ord()
.protected PostingsEnum postingsEnum
protected boolean checkForDocValues
public DocTermOrds(LeafReader reader, Bits liveDocs, java.lang.String field) throws java.io.IOException
java.io.IOException
public DocTermOrds(LeafReader reader, Bits liveDocs, java.lang.String field, BytesRef termPrefix) throws java.io.IOException
java.io.IOException
public DocTermOrds(LeafReader reader, Bits liveDocs, java.lang.String field, BytesRef termPrefix, int maxTermDocFreq) throws java.io.IOException
java.io.IOException
public DocTermOrds(LeafReader reader, Bits liveDocs, java.lang.String field, BytesRef termPrefix, int maxTermDocFreq, int indexIntervalBits) throws java.io.IOException
java.io.IOException
protected DocTermOrds(java.lang.String field, int maxTermDocFreq, int indexIntervalBits)
public long ramBytesUsed()
ramBytesUsed
in interface Accountable
public TermsEnum getOrdTermsEnum(LeafReader reader) throws java.io.IOException
we build a "private" terms index internally (WARNING: consumes RAM) and use that index to implement ord. This also enables ord on top of a composite reader. The returned TermsEnum is unpositioned. This returns null if there are no terms.
NOTE: you must pass the same reader that was used when creating this class
java.io.IOException
public int numTerms()
public boolean isEmpty()
true
if no terms were indexed.protected void visitTerm(TermsEnum te, int termNum) throws java.io.IOException
java.io.IOException
protected void setActualDocFreq(int termNum, int df) throws java.io.IOException
uninvert(org.apache.lucene.index.LeafReader,Bits,BytesRef)
to record the document frequency for each uninverted
term.java.io.IOException
protected void uninvert(LeafReader reader, Bits liveDocs, BytesRef termPrefix) throws java.io.IOException
java.io.IOException
private static int vIntSize(int x)
private static int writeInt(int x, byte[] arr, int pos)
public BytesRef lookupTerm(TermsEnum termsEnum, int ord) throws java.io.IOException
BytesRef
) corresponding to
the provided ordinal.java.io.IOException
public SortedSetDocValues iterator(LeafReader reader) throws java.io.IOException
java.io.IOException