Class ParquetFileReader.ColumnReadersBuilder

java.lang.Object
dev.hardwood.reader.ParquetFileReader.ColumnReadersBuilder
Enclosing class:
ParquetFileReader

public static final class ParquetFileReader.ColumnReadersBuilder extends Object

Builds a ColumnReaders collection for batch-oriented access to a projection of columns.

Obtained from ParquetFileReader.buildColumnReaders(ColumnProjection). Works for both single- and multi-file readers; the underlying iterator handles cross-file prefetch transparently.

try (ColumnReaders cols = file.buildColumnReaders(ColumnProjection.columns("a", "b"))
        .filter(FilterPredicate.eq("a", 7))
        .build()) {
    ColumnReader a = cols.getColumnReader("a");
    // ...
}
  • Method Details

    • filter

      Apply a filter predicate. Every column in the projection returns only the rows matching filter — exact, row-aligned across columns, with no client-side residual. Row groups and pages proven non-matching by statistics are skipped; the surviving rows are then filtered exactly. The predicate may reference a projected column or a column that is not part of the projection. Default: no filter.
    • filter

      Apply a row-group selection predicate (e.g. byte-range, for split-aware reading). Default: read every row group. Combines with filter(FilterPredicate) via intersection: a row group is read if and only if it passes both.
    • batchSize

      public ParquetFileReader.ColumnReadersBuilder batchSize(int batchSize)

      Set the maximum number of records to return in each batch for all columns.

      When unset, the batch size is chosen adaptively from the projected columns' physical widths so the per-batch arrays stay within the CPU cache — the same byte-budgeted sizing the RowReader path uses — rather than a fixed record count. Set this explicitly to override.

    • build

      public ColumnReaders build()