dev.hardwood.reader.ColumnReaders

All Implemented Interfaces:: AutoCloseable

public class ColumnReaders extends Object implements AutoCloseable

Holds multiple ColumnReader instances backed by a shared RowGroupIterator for batch-oriented projection reads. Works for both single- and multi-file ParquetFileReader inputs; the iterator transparently handles cross-file prefetching when more than one file is involved.

Use nextBatch() to advance every underlying reader in lockstep — this is the structurally-safe path for multi-column consumption: a single call drives every reader, returns false when any is exhausted, and validates that the readers report matching record counts.

try (ParquetFileReader parquet = ParquetFileReader.openAll(files);
     ColumnReaders columns = parquet.buildColumnReaders(
             ColumnProjection.columns("passenger_count", "trip_distance", "fare_amount"))
             .build()) {

    while (columns.nextBatch()) {
        int count = columns.getRecordCount();
        double[] v0 = columns.getColumnReader(0).getDoubles();
        double[] v1 = columns.getColumnReader(1).getDoubles();
        double[] v2 = columns.getColumnReader(2).getDoubles();
        // ...
    }
}

Method Summary

Modifier and Type

Method

Description

void

close()

int

getColumnCount()

Get the number of projected columns.

ColumnReader

getColumnReader(int index)

Get the ColumnReader by index within the requested columns.

ColumnReader

getColumnReader(String columnName)

Get the ColumnReader for a named column.

int

getRecordCount()

Number of records in the most recently published batch.

boolean

nextBatch()

Advance every underlying ColumnReader to its next batch in lockstep.

Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- getColumnCount
  
  public int getColumnCount()
  
  Get the number of projected columns.
- getColumnReader
  
  public ColumnReader getColumnReader(String columnName)
  
  Get the ColumnReader for a named column. For nested columns, use the dot-separated field path (e.g. "address.zip").
  
  Parameters:
  
  columnName - the column name or dot-separated field path (must have been requested in the projection)
  
  Returns:
  
  the ColumnReader for the column
  
  Throws:
  
  IllegalArgumentException - if the column was not requested
- getColumnReader
  
  public ColumnReader getColumnReader(int index)
  
  Get the ColumnReader by index within the requested columns.
  
  Parameters:
  
  index - index within the requested column names (0-based)
  
  Returns:
  
  the ColumnReader at the given index
- nextBatch
  public boolean nextBatch()
  
  Advance every underlying ColumnReader to its next batch in lockstep.
  
  All readers share the same RowGroupIterator, so they always publish batches at the same row boundaries. This method drives every reader once and returns:
  
  true when every reader produced a new batch — callers can then read values via the per-column accessors. The aligned record count is exposed through getRecordCount().
  
  false when any reader is exhausted — partial advancement is impossible because all readers consume from the shared iterator, so once one is done they all are.
  
  As a defensive guard, a mismatch between the readers' published record counts throws IllegalStateException. Under correct internal behavior this can't happen — the guard exists to detect future regressions in the per-column drain workers, not to be triggered in production.
  
  Single-column consumers, or consumers that need fine-grained control over the per-reader cadence, can still call ColumnReader.nextBatch() directly on the readers returned by getColumnReader(int) / getColumnReader(String).
  
  Returns:
  
  true if a new aligned batch is available across all readers, false if exhausted
  
  Throws:
  
  IllegalStateException - if the readers report mismatched record counts
- getRecordCount
  
  public int getRecordCount()
  
  Number of records in the most recently published batch.
  
  Equal to every underlying reader's ColumnReader.getRecordCount() — alignment is validated by nextBatch().
  
  Throws:
  
  IllegalStateException - if no batch is currently available — call nextBatch() first
- close
  
  public void close()
  
  Specified by:
  
  close in interface AutoCloseable

Class ColumnReaders

Method Summary

Methods inherited from class Object

Method Details

getColumnCount

getColumnReader

getColumnReader

nextBatch

getRecordCount

close