Class ParquetFileReader.ColumnReaderBuilder

java.lang.Object
dev.hardwood.reader.ParquetFileReader.ColumnReaderBuilder
Enclosing class:
ParquetFileReader

public static final class ParquetFileReader.ColumnReaderBuilder extends Object

Builds a single-column ColumnReader with an optional filter.

Obtained from ParquetFileReader.buildColumnReader(String) or ParquetFileReader.buildColumnReader(int). Single-file only — multi-file readers must use ParquetFileReader.buildColumnReaders(ColumnProjection) with a projection.

ColumnReader col = file.buildColumnReader("id")
        .filter(FilterPredicate.lt("id", 1000L))
        .build();
  • Method Details

    • filter

      Apply a filter predicate. The built reader returns only the rows matching filter — exact, with no client-side residual: a direct aggregate over the output is correct. Row groups and pages proven non-matching by statistics are skipped; the surviving rows are then filtered exactly. The predicate may reference this column, another column, or a column that is not otherwise read. Default: no filter.
    • filter

      Apply a row-group selection predicate (e.g. byte-range, for split-aware reading). Default: read every row group. Combines with filter(FilterPredicate) via intersection: a row group is read if and only if it passes both.
    • batchSize

      public ParquetFileReader.ColumnReaderBuilder batchSize(int batchSize)

      Set the maximum number of records to return in each batch.

      When unset, the batch size is chosen adaptively from the column's physical width so the per-batch arrays stay within the CPU cache — the same byte-budgeted sizing the RowReader path uses — rather than a fixed record count. Set this explicitly to override.

    • build

      public ColumnReader build()