Class ColumnReaders

java.lang.Object
dev.hardwood.reader.ColumnReaders
All Implemented Interfaces:
AutoCloseable

public class ColumnReaders extends Object implements AutoCloseable

Holds multiple ColumnReader instances backed by a shared RowGroupIterator for batch-oriented projection reads. Works for both single- and multi-file ParquetFileReader inputs; the iterator transparently handles cross-file prefetching when more than one file is involved.

Use nextBatch() to advance every underlying reader in lockstep — this is the structurally-safe path for multi-column consumption: a single call drives every reader, returns false when any is exhausted, and validates that the readers report matching record counts.

try (ParquetFileReader parquet = ParquetFileReader.openAll(files);
     ColumnReaders columns = parquet.buildColumnReaders(
             ColumnProjection.columns("passenger_count", "trip_distance", "fare_amount"))
             .build()) {

    while (columns.nextBatch()) {
        int count = columns.getRecordCount();
        double[] v0 = columns.getColumnReader(0).getDoubles();
        double[] v1 = columns.getColumnReader(1).getDoubles();
        double[] v2 = columns.getColumnReader(2).getDoubles();
        // ...
    }
}
  • Method Details

    • getColumnCount

      public int getColumnCount()
      Get the number of projected columns.
    • getColumnReader

      public ColumnReader getColumnReader(String columnName)
      Get the ColumnReader for a named column. For nested columns, use the dot-separated field path (e.g. "address.zip").
      Parameters:
      columnName - the column name or dot-separated field path (must have been requested in the projection)
      Returns:
      the ColumnReader for the column
      Throws:
      IllegalArgumentException - if the column was not requested
    • getColumnReader

      public ColumnReader getColumnReader(int index)
      Get the ColumnReader by index within the requested columns.
      Parameters:
      index - index within the requested column names (0-based)
      Returns:
      the ColumnReader at the given index
    • nextBatch

      public boolean nextBatch()

      Advance every underlying ColumnReader to its next batch in lockstep.

      All readers share the same RowGroupIterator, so they always publish batches at the same row boundaries. This method drives every reader once and returns:

      • true when every reader produced a new batch — callers can then read values via the per-column accessors. The aligned record count is exposed through getRecordCount().
      • false when any reader is exhausted — partial advancement is impossible because all readers consume from the shared iterator, so once one is done they all are.

      As a defensive guard, a mismatch between the readers' published record counts throws IllegalStateException. Under correct internal behavior this can't happen — the guard exists to detect future regressions in the per-column drain workers, not to be triggered in production.

      Single-column consumers, or consumers that need fine-grained control over the per-reader cadence, can still call ColumnReader.nextBatch() directly on the readers returned by getColumnReader(int) / getColumnReader(String).

      Returns:
      true if a new aligned batch is available across all readers, false if exhausted
      Throws:
      IllegalStateException - if the readers report mismatched record counts
    • getRecordCount

      public int getRecordCount()

      Number of records in the most recently published batch.

      Equal to every underlying reader's ColumnReader.getRecordCount() — alignment is validated by nextBatch().

      Throws:
      IllegalStateException - if no batch is currently available — call nextBatch() first
    • close

      public void close()
      Specified by:
      close in interface AutoCloseable