Class ParquetFileReader.RowReaderBuilder

java.lang.Object
dev.hardwood.reader.ParquetFileReader.RowReaderBuilder
Enclosing class:
ParquetFileReader

public static final class ParquetFileReader.RowReaderBuilder extends Object

Builds a RowReader with optional projection, filter, and head/tail row limit.

Obtained from ParquetFileReader.buildRowReader(). Each setter returns the builder for chaining; build() consumes the configuration and creates the reader. The builder is not reusable after build().

RowReader reader = file.buildRowReader()
        .projection(ColumnProjection.columns("id", "name"))
        .filter(FilterPredicate.eq("status", "active"))
        .head(1000)
        .build();
  • Method Details

    • projection

      public ParquetFileReader.RowReaderBuilder projection(ColumnProjection projection)
      Restrict reading to the given columns. Default: all columns.
    • filter

      Apply a column-statistics / record-level filter predicate. Default: no filter.
    • filter

      public ParquetFileReader.RowReaderBuilder filter(RowGroupPredicate rowGroupFilter)

      Apply a row-group selection predicate (e.g. byte-range, for split-aware reading). Default: read every row group. Combines with filter(FilterPredicate) via intersection: a row group is read if and only if it passes both.

      Composes with head(long) and skip(long) over the filtered row-group sequence — skip(N) skips N rows of the kept set, head(N) caps at N rows of the kept set. Mutually exclusive with tail(long) (tail mode requires a known total row count, which row-group filtering invalidates).

    • head

      public ParquetFileReader.RowReaderBuilder head(long maxRows)
      Limit to the first maxRows rows. When combined with filter(FilterPredicate), the cap is on the number of matching rows, not the number scanned — the reader keeps scanning until maxRows rows satisfy the predicate or the input is exhausted (SQL LIMIT over the filtered relation). Mutually exclusive with tail(long).
    • tail

      public ParquetFileReader.RowReaderBuilder tail(long tailRows)
      Limit to the last tailRows rows. Row groups that do not overlap the tail are skipped entirely, so pages for earlier row groups are never fetched or decoded — useful on remote backends. Mutually exclusive with head(long), filter, and skip. Single-file only.
    • skip

      public ParquetFileReader.RowReaderBuilder skip(long skip)

      Skip leading rows before reading — SQL OFFSET. Its meaning depends on whether a filter(FilterPredicate) is present:

      • Without a filter: a physical absolute row index. Earlier row groups are not opened — an O(1 row-group) seek on remote backends (the leading residue within the target row group is still decoded). skip >= totalRows yields an empty reader. Indexes into the first file's rows for multi-file readers.
      • With a filter: a logical offset over the matched rows — discards the first n rows matching the predicate, symmetric with head(long) as LIMIT. The O(1) seek does not apply: the reader decodes earlier groups to count matches (groups proven non-matching by statistics are still pruned). A skip past the match count yields an empty reader, counting across all files in order.

      skip == 0 is the no-op default. Mutually exclusive with tail(long); composes with head(long) (skip(n).head(k) is OFFSET n LIMIT k) and with filter(RowGroupPredicate) over the kept row-group sequence.

    • build

      public RowReader build()