java.lang.Object

dev.hardwood.reader.ParquetFileReader.ColumnReadersBuilder

Enclosing class:: ParquetFileReader

public static final class ParquetFileReader.ColumnReadersBuilder extends Object

Builds a ColumnReaders collection for batch-oriented access to a projection of columns.

Obtained from ParquetFileReader.buildColumnReaders(ColumnProjection). Works for both single- and multi-file readers; the underlying iterator handles cross-file prefetch transparently.

try (ColumnReaders cols = file.buildColumnReaders(ColumnProjection.columns("a", "b"))
        .filter(FilterPredicate.eq("a", 7))
        .build()) {
    ColumnReader a = cols.getColumnReader("a");
    // ...
}

Method Summary

Modifier and Type

Method

Description

ParquetFileReader.ColumnReadersBuilder

batchSize(int batchSize)

Set the maximum number of records to return in each batch for all columns.

ColumnReaders

build()

ParquetFileReader.ColumnReadersBuilder

filter(FilterPredicate filter)

Apply a filter predicate.

ParquetFileReader.ColumnReadersBuilder

filter(RowGroupPredicate rowGroupFilter)

Apply a row-group selection predicate (e.g. byte-range, for split-aware reading).

Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- filter
  
  public ParquetFileReader.ColumnReadersBuilder filter(FilterPredicate filter)
  
  Apply a filter predicate. Every column in the projection returns only the rows matching filter — exact, row-aligned across columns, with no client-side residual. Row groups and pages proven non-matching by statistics are skipped; the surviving rows are then filtered exactly. The predicate may reference a projected column or a column that is not part of the projection. Default: no filter.
- filter
  
  public ParquetFileReader.ColumnReadersBuilder filter(RowGroupPredicate rowGroupFilter)
  
  Apply a row-group selection predicate (e.g. byte-range, for split-aware reading). Default: read every row group. Combines with filter(FilterPredicate) via intersection: a row group is read if and only if it passes both.
- batchSize
  
  public ParquetFileReader.ColumnReadersBuilder batchSize(int batchSize)
  
  Set the maximum number of records to return in each batch for all columns.
  
  When unset, the batch size is chosen adaptively from the projected columns' physical widths so the per-batch arrays stay within the CPU cache — the same byte-budgeted sizing the RowReader path uses — rather than a fixed record count. Set this explicitly to override.
- build
  
  public ColumnReaders build()

Class ParquetFileReader.ColumnReadersBuilder

Method Summary

Methods inherited from class Object

Method Details

filter

filter

batchSize

build