Class ParquetFileReader.RowReaderBuilder

java.lang.Object
dev.hardwood.reader.ParquetFileReader.RowReaderBuilder
Enclosing class:
ParquetFileReader

public static final class ParquetFileReader.RowReaderBuilder extends Object

Builds a RowReader with optional projection, filter, and head/tail row limit.

Obtained from ParquetFileReader.buildRowReader(). Each setter returns the builder for chaining; build() consumes the configuration and creates the reader. The builder is not reusable after build().

RowReader reader = file.buildRowReader()
        .projection(ColumnProjection.columns("id", "name"))
        .filter(FilterPredicate.eq("status", "active"))
        .head(1000)
        .build();
  • Method Details

    • projection

      public ParquetFileReader.RowReaderBuilder projection(ColumnProjection projection)
      Restrict reading to the given columns. Default: all columns.
    • filter

      Apply a row-group / record-level filter predicate. Default: no filter.
    • head

      public ParquetFileReader.RowReaderBuilder head(long maxRows)
      Limit to the first maxRows rows. Mutually exclusive with tail(long).
    • tail

      public ParquetFileReader.RowReaderBuilder tail(long tailRows)
      Limit to the last tailRows rows. Row groups that do not overlap the tail are skipped entirely, so pages for earlier row groups are never fetched or decoded — useful on remote backends. Mutually exclusive with head(long), filter, and firstRow. Single-file only.
    • firstRow

      public ParquetFileReader.RowReaderBuilder firstRow(long firstRow)

      Begin reading from the given absolute row index. Earlier row groups are not opened — their pages are not fetched or decoded — so this is an O(1 RG) seek on remote backends, in contrast to walking next() from row 0.

      Cost within the target row group. The reader still yields rows from the row group's first row, then walks next() firstRow - rowGroupFirstRow times to discard the leading residue. Those residue rows are decoded — firstRow near the end of a 1 M-row group walks ~1 M decoded next() calls. Page-level skip via OffsetIndex is tracked as #381.

      firstRow == 0 is the no-op default. firstRow == totalRows produces an empty reader. firstRow > totalRows throws IllegalArgumentException at build() time. Indexes into the first file's rows for multi-file readers; cross-file firstRow is out of scope. Mutually exclusive with tail(long); composes with head(long) for a bounded [firstRow, firstRow + maxRows) window.

    • build

      public RowReader build()