All Known Implementing Classes:: RowGroupPredicate.And, RowGroupPredicate.ByteRange

public sealed interface RowGroupPredicate permits RowGroupPredicate.ByteRange, RowGroupPredicate.And

A predicate over row groups, used to select which row groups a reader scans.

Distinct from FilterPredicate, which expresses constraints on column values and is checked against per-column statistics. A RowGroupPredicate expresses constraints on the row group itself (its byte position in the file, its ordinal index, the row range it spans). The two are sibling, AND-combined inputs to a reader: a row group is read if and only if it passes both.

Granularity: filtering happens at row-group resolution, not row resolution. A row group passes byteRange(long, long) when its midpoint falls in the given byte range — every row in that row group is then read, even rows whose data extends outside the range. This is the standard Hadoop-input-format split convention.

Usage:

// Single split: read row groups whose midpoint is in [start, end).
ColumnReader r = file.buildColumnReader("price")
        .filter(RowGroupPredicate.byteRange(splitStart, splitEnd))
        .build();

// Stacks with column-stats predicates — both apply, AND-combined.
ColumnReader r = file.buildColumnReader("price")
        .filter(FilterPredicate.gt("price", 100))
        .filter(RowGroupPredicate.byteRange(splitStart, splitEnd))
        .build();

Nested Class Summary

Nested Classes

Modifier and Type

Interface

Description

static final record

RowGroupPredicate.And

Conjunction of row-group predicates — a row group passes if and only if every child passes.

static final record

RowGroupPredicate.ByteRange

Keep row groups whose midpoint falls in [startInclusive, endExclusive).
Method Summary

Static Methods

Modifier and Type

Method

Description

static RowGroupPredicate

and(RowGroupPredicate... children)

Keep row groups that match every child predicate (intersection).

static RowGroupPredicate

byteRange(long startInclusive, long endExclusive)

Keep row groups whose data midpoint — start of the first column chunk plus half of the on-disk compressed size — falls in [startInclusive, endExclusive).

Method Details
- byteRange
  
  static RowGroupPredicate byteRange(long startInclusive, long endExclusive)
  
  Keep row groups whose data midpoint — start of the first column chunk plus half of the on-disk compressed size — falls in [startInclusive, endExclusive).
  
  This is the standard split convention: every row group lands in exactly one byte range across a partitioning of the file, regardless of where the boundary falls inside it.
  
  endExclusive < startInclusive is treated as an empty range. This matches callers that pass splitStart + splitLength and tolerate long overflow on tail splits.
- and
  
  static RowGroupPredicate and(RowGroupPredicate... children)
  
  Keep row groups that match every child predicate (intersection).

Interface RowGroupPredicate

Nested Class Summary

Method Summary

Method Details

byteRange

and