Interface RowGroupPredicate
- All Known Implementing Classes:
RowGroupPredicate.And, RowGroupPredicate.ByteRange
A predicate over row groups, used to select which row groups a reader scans.
Distinct from FilterPredicate, which expresses constraints on column values and is
checked against per-column statistics. A RowGroupPredicate expresses constraints on the
row group itself (its byte position in the file, its ordinal index, the row range it
spans). The two are sibling, AND-combined inputs to a reader: a row group is read if and only if it
passes both.
Granularity: filtering happens at row-group resolution, not row resolution. A row group
passes byteRange(long, long) when its midpoint falls in the given byte range — every
row in that row group is then read, even rows whose data extends outside the range. This
is the standard Hadoop-input-format split convention.
Usage:
// Single split: read row groups whose midpoint is in [start, end).
ColumnReader r = file.buildColumnReader("price")
.filter(RowGroupPredicate.byteRange(splitStart, splitEnd))
.build();
// Stacks with column-stats predicates — both apply, AND-combined.
ColumnReader r = file.buildColumnReader("price")
.filter(FilterPredicate.gt("price", 100))
.filter(RowGroupPredicate.byteRange(splitStart, splitEnd))
.build();
-
Nested Class Summary
Nested ClassesModifier and TypeInterfaceDescriptionstatic final recordConjunction of row-group predicates — a row group passes if and only if every child passes.static final recordKeep row groups whose midpoint falls in[startInclusive, endExclusive). -
Method Summary
Static MethodsModifier and TypeMethodDescriptionstatic RowGroupPredicateand(RowGroupPredicate... children) Keep row groups that match every child predicate (intersection).static RowGroupPredicatebyteRange(long startInclusive, long endExclusive) Keep row groups whose data midpoint — start of the first column chunk plus half of the on-disk compressed size — falls in[startInclusive, endExclusive).
-
Method Details
-
byteRange
Keep row groups whose data midpoint — start of the first column chunk plus half of the on-disk compressed size — falls in
[startInclusive, endExclusive).This is the standard split convention: every row group lands in exactly one byte range across a partitioning of the file, regardless of where the boundary falls inside it.
endExclusive < startInclusiveis treated as an empty range. This matches callers that passsplitStart + splitLengthand tolerate long overflow on tail splits. -
and
Keep row groups that match every child predicate (intersection).
-