Class ParquetFileReader.ColumnReadersBuilder
java.lang.Object
dev.hardwood.reader.ParquetFileReader.ColumnReadersBuilder
- Enclosing class:
ParquetFileReader
Builds a ColumnReaders collection for batch-oriented access to a
projection of columns.
Obtained from ParquetFileReader.buildColumnReaders(ColumnProjection).
Works for both single- and multi-file readers; the underlying iterator
handles cross-file prefetch transparently.
try (ColumnReaders cols = file.buildColumnReaders(ColumnProjection.columns("a", "b"))
.filter(FilterPredicate.eq("a", 7))
.build()) {
ColumnReader a = cols.getColumnReader("a");
// ...
}
-
Method Summary
Modifier and TypeMethodDescriptionbatchSize(int batchSize) Set the maximum number of records to return in each batch for all columns.build()filter(FilterPredicate filter) Apply a filter predicate.filter(RowGroupPredicate rowGroupFilter) Apply a row-group selection predicate (e.g. byte-range, for split-aware reading).
-
Method Details
-
filter
Apply a filter predicate. Every column in the projection returns only the rows matchingfilter— exact, row-aligned across columns, with no client-side residual. Row groups and pages proven non-matching by statistics are skipped; the surviving rows are then filtered exactly. The predicate may reference a projected column or a column that is not part of the projection. Default: no filter. -
filter
Apply a row-group selection predicate (e.g. byte-range, for split-aware reading). Default: read every row group. Combines withfilter(FilterPredicate)via intersection: a row group is read if and only if it passes both. -
batchSize
Set the maximum number of records to return in each batch for all columns.
When unset, the batch size is chosen adaptively from the projected columns' physical widths so the per-batch arrays stay within the CPU cache — the same byte-budgeted sizing the
RowReaderpath uses — rather than a fixed record count. Set this explicitly to override. -
build
-