dev.hardwood.reader.ParquetFileReader.RowReaderBuilder

Enclosing class:: ParquetFileReader

public static final class ParquetFileReader.RowReaderBuilder extends Object

Builds a RowReader with optional projection, filter, and head/tail row limit.

Obtained from ParquetFileReader.buildRowReader(). Each setter returns the builder for chaining; build() consumes the configuration and creates the reader. The builder is not reusable after build().

RowReader reader = file.buildRowReader()
        .projection(ColumnProjection.columns("id", "name"))
        .filter(FilterPredicate.eq("status", "active"))
        .head(1000)
        .build();

Method Summary

Modifier and Type

Method

Description

RowReader

build()

ParquetFileReader.RowReaderBuilder

filter(FilterPredicate filter)

Apply a column-statistics / record-level filter predicate.

ParquetFileReader.RowReaderBuilder

filter(RowGroupPredicate rowGroupFilter)

Apply a row-group selection predicate (e.g. byte-range, for split-aware reading).

ParquetFileReader.RowReaderBuilder

head(long maxRows)

Limit to the first maxRows rows.

ParquetFileReader.RowReaderBuilder

projection(ColumnProjection projection)

Restrict reading to the given columns.

ParquetFileReader.RowReaderBuilder

skip(long skip)

Skip leading rows before reading — SQL OFFSET.

ParquetFileReader.RowReaderBuilder

tail(long tailRows)

Limit to the last tailRows rows.

Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- projection
  
  public ParquetFileReader.RowReaderBuilder projection(ColumnProjection projection)
  
  Restrict reading to the given columns. Default: all columns.
- filter
  
  public ParquetFileReader.RowReaderBuilder filter(FilterPredicate filter)
  
  Apply a column-statistics / record-level filter predicate. Default: no filter.
- filter
  
  public ParquetFileReader.RowReaderBuilder filter(RowGroupPredicate rowGroupFilter)
  
  Apply a row-group selection predicate (e.g. byte-range, for split-aware reading). Default: read every row group. Combines with filter(FilterPredicate) via intersection: a row group is read if and only if it passes both.
  
  Composes with head(long) and skip(long) over the filtered row-group sequence — skip(N) skips N rows of the kept set, head(N) caps at N rows of the kept set. Mutually exclusive with tail(long) (tail mode requires a known total row count, which row-group filtering invalidates).
- head
  
  public ParquetFileReader.RowReaderBuilder head(long maxRows)
  
  Limit to the first maxRows rows. When combined with filter(FilterPredicate), the cap is on the number of matching rows, not the number scanned — the reader keeps scanning until maxRows rows satisfy the predicate or the input is exhausted (SQL LIMIT over the filtered relation). Mutually exclusive with tail(long).
- tail
  
  public ParquetFileReader.RowReaderBuilder tail(long tailRows)
  
  Limit to the last tailRows rows. Row groups that do not overlap the tail are skipped entirely, so pages for earlier row groups are never fetched or decoded — useful on remote backends. Mutually exclusive with head(long), filter, and skip. Single-file only.
- skip
  public ParquetFileReader.RowReaderBuilder skip(long skip)
  
  Skip leading rows before reading — SQL OFFSET. Its meaning depends on whether a filter(FilterPredicate) is present:
  
  Without a filter: a physical absolute row index. Earlier row groups are not opened — an O(1 row-group) seek on remote backends (the leading residue within the target row group is still decoded). skip >= totalRows yields an empty reader. Indexes into the first file's rows for multi-file readers.
  
  With a filter: a logical offset over the matched rows — discards the first n rows matching the predicate, symmetric with head(long) as LIMIT. The O(1) seek does not apply: the reader decodes earlier groups to count matches (groups proven non-matching by statistics are still pruned). A skip past the match count yields an empty reader, counting across all files in order.
  
  skip == 0 is the no-op default. Mutually exclusive with tail(long); composes with head(long) (skip(n).head(k) is OFFSET n LIMIT k) and with filter(RowGroupPredicate) over the kept row-group sequence.
- build
  
  public RowReader build()

Class ParquetFileReader.RowReaderBuilder

Method Summary

Methods inherited from class Object

Method Details

projection

filter

filter

head

tail

skip

build