Class ParquetFileReader.RowReaderBuilder
- Enclosing class:
ParquetFileReader
Builds a RowReader with optional projection, filter, and head/tail
row limit.
Obtained from ParquetFileReader.buildRowReader(). Each setter returns
the builder for chaining; build() consumes the configuration and
creates the reader. The builder is not reusable after build().
RowReader reader = file.buildRowReader()
.projection(ColumnProjection.columns("id", "name"))
.filter(FilterPredicate.eq("status", "active"))
.head(1000)
.build();
-
Method Summary
Modifier and TypeMethodDescriptionbuild()filter(FilterPredicate filter) Apply a column-statistics / record-level filter predicate.filter(RowGroupPredicate rowGroupFilter) Apply a row-group selection predicate (e.g. byte-range, for split-aware reading).head(long maxRows) Limit to the firstmaxRowsrows.projection(ColumnProjection projection) Restrict reading to the given columns.skip(long skip) Skip leading rows before reading — SQLOFFSET.tail(long tailRows) Limit to the lasttailRowsrows.
-
Method Details
-
projection
Restrict reading to the given columns. Default: all columns. -
filter
Apply a column-statistics / record-level filter predicate. Default: no filter. -
filter
Apply a row-group selection predicate (e.g. byte-range, for split-aware reading). Default: read every row group. Combines with
filter(FilterPredicate)via intersection: a row group is read if and only if it passes both.Composes with
head(long)andskip(long)over the filtered row-group sequence —skip(N)skipsNrows of the kept set,head(N)caps atNrows of the kept set. Mutually exclusive withtail(long)(tail mode requires a known total row count, which row-group filtering invalidates). -
head
Limit to the firstmaxRowsrows. When combined withfilter(FilterPredicate), the cap is on the number of matching rows, not the number scanned — the reader keeps scanning untilmaxRowsrows satisfy the predicate or the input is exhausted (SQLLIMITover the filtered relation). Mutually exclusive withtail(long). -
tail
Limit to the lasttailRowsrows. Row groups that do not overlap the tail are skipped entirely, so pages for earlier row groups are never fetched or decoded — useful on remote backends. Mutually exclusive withhead(long),filter, andskip. Single-file only. -
skip
Skip leading rows before reading — SQL
OFFSET. Its meaning depends on whether afilter(FilterPredicate)is present:- Without a filter: a physical absolute row index. Earlier row groups
are not opened — an O(1 row-group) seek on remote backends (the leading
residue within the target row group is still decoded).
skip >= totalRowsyields an empty reader. Indexes into the first file's rows for multi-file readers. - With a filter: a logical offset over the matched rows — discards the
first
nrows matching the predicate, symmetric withhead(long)asLIMIT. The O(1) seek does not apply: the reader decodes earlier groups to count matches (groups proven non-matching by statistics are still pruned). Askippast the match count yields an empty reader, counting across all files in order.
skip == 0is the no-op default. Mutually exclusive withtail(long); composes withhead(long)(skip(n).head(k)isOFFSET n LIMIT k) and withfilter(RowGroupPredicate)over the kept row-group sequence. - Without a filter: a physical absolute row index. Earlier row groups
are not opened — an O(1 row-group) seek on remote backends (the leading
residue within the target row group is still decoded).
-
build
-