Interface RowGroupPredicate

All Known Implementing Classes:
RowGroupPredicate.And, RowGroupPredicate.ByteRange

public sealed interface RowGroupPredicate permits RowGroupPredicate.ByteRange, RowGroupPredicate.And

A predicate over row groups, used to select which row groups a reader scans.

Distinct from FilterPredicate, which expresses constraints on column values and is checked against per-column statistics. A RowGroupPredicate expresses constraints on the row group itself (its byte position in the file, its ordinal index, the row range it spans). The two are sibling, AND-combined inputs to a reader: a row group is read if and only if it passes both.

Granularity: filtering happens at row-group resolution, not row resolution. A row group passes byteRange(long, long) when its midpoint falls in the given byte range — every row in that row group is then read, even rows whose data extends outside the range. This is the standard Hadoop-input-format split convention.

Usage:

// Single split: read row groups whose midpoint is in [start, end).
ColumnReader r = file.buildColumnReader("price")
        .filter(RowGroupPredicate.byteRange(splitStart, splitEnd))
        .build();

// Stacks with column-stats predicates — both apply, AND-combined.
ColumnReader r = file.buildColumnReader("price")
        .filter(FilterPredicate.gt("price", 100))
        .filter(RowGroupPredicate.byteRange(splitStart, splitEnd))
        .build();
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Interface
    Description
    static final record 
    Conjunction of row-group predicates — a row group passes if and only if every child passes.
    static final record 
    Keep row groups whose midpoint falls in [startInclusive, endExclusive).
  • Method Summary

    Static Methods
    Modifier and Type
    Method
    Description
    and(RowGroupPredicate... children)
    Keep row groups that match every child predicate (intersection).
    byteRange(long startInclusive, long endExclusive)
    Keep row groups whose data midpoint — start of the first column chunk plus half of the on-disk compressed size — falls in [startInclusive, endExclusive).
  • Method Details

    • byteRange

      static RowGroupPredicate byteRange(long startInclusive, long endExclusive)

      Keep row groups whose data midpoint — start of the first column chunk plus half of the on-disk compressed size — falls in [startInclusive, endExclusive).

      This is the standard split convention: every row group lands in exactly one byte range across a partitioning of the file, regardless of where the boundary falls inside it.

      endExclusive < startInclusive is treated as an empty range. This matches callers that pass splitStart + splitLength and tolerate long overflow on tail splits.

    • and

      static RowGroupPredicate and(RowGroupPredicate... children)
      Keep row groups that match every child predicate (intersection).