Class ParquetFileReader.ColumnReaderBuilder
java.lang.Object
dev.hardwood.reader.ParquetFileReader.ColumnReaderBuilder
- Enclosing class:
ParquetFileReader
Builds a single-column ColumnReader with an optional filter.
Obtained from ParquetFileReader.buildColumnReader(String) or
ParquetFileReader.buildColumnReader(int). Single-file only —
multi-file readers must use ParquetFileReader.buildColumnReaders(ColumnProjection)
with a projection.
ColumnReader col = file.buildColumnReader("id")
.filter(FilterPredicate.lt("id", 1000L))
.build();
-
Method Summary
Modifier and TypeMethodDescriptionbatchSize(int batchSize) Set the maximum number of records to return in each batch.build()filter(FilterPredicate filter) Apply a filter predicate.filter(RowGroupPredicate rowGroupFilter) Apply a row-group selection predicate (e.g. byte-range, for split-aware reading).
-
Method Details
-
filter
Apply a filter predicate. The built reader returns only the rows matchingfilter— exact, with no client-side residual: a direct aggregate over the output is correct. Row groups and pages proven non-matching by statistics are skipped; the surviving rows are then filtered exactly. The predicate may reference this column, another column, or a column that is not otherwise read. Default: no filter. -
filter
Apply a row-group selection predicate (e.g. byte-range, for split-aware reading). Default: read every row group. Combines withfilter(FilterPredicate)via intersection: a row group is read if and only if it passes both. -
batchSize
Set the maximum number of records to return in each batch.
When unset, the batch size is chosen adaptively from the column's physical width so the per-batch arrays stay within the CPU cache — the same byte-budgeted sizing the
RowReaderpath uses — rather than a fixed record count. Set this explicitly to override. -
build
-