Class ColumnReaders
- All Implemented Interfaces:
AutoCloseable
Holds multiple ColumnReader instances backed by a shared
RowGroupIterator for batch-oriented projection reads. Works for both
single- and multi-file ParquetFileReader inputs; the iterator
transparently handles cross-file prefetching when more than one file is
involved.
Use nextBatch() to advance every underlying reader in lockstep — this is
the structurally-safe path for multi-column consumption: a single call drives
every reader, returns false when any is exhausted, and validates that the
readers report matching record counts.
try (ParquetFileReader parquet = ParquetFileReader.openAll(files);
ColumnReaders columns = parquet.buildColumnReaders(
ColumnProjection.columns("passenger_count", "trip_distance", "fare_amount"))
.build()) {
while (columns.nextBatch()) {
int count = columns.getRecordCount();
double[] v0 = columns.getColumnReader(0).getDoubles();
double[] v1 = columns.getColumnReader(1).getDoubles();
double[] v2 = columns.getColumnReader(2).getDoubles();
// ...
}
}
-
Method Summary
Modifier and TypeMethodDescriptionvoidclose()intGet the number of projected columns.getColumnReader(int index) Get the ColumnReader by index within the requested columns.getColumnReader(String columnName) Get the ColumnReader for a named column.intNumber of records in the most recently published batch.booleanAdvance every underlyingColumnReaderto its next batch in lockstep.
-
Method Details
-
getColumnCount
public int getColumnCount()Get the number of projected columns. -
getColumnReader
Get the ColumnReader for a named column. For nested columns, use the dot-separated field path (e.g."address.zip").- Parameters:
columnName- the column name or dot-separated field path (must have been requested in the projection)- Returns:
- the ColumnReader for the column
- Throws:
IllegalArgumentException- if the column was not requested
-
getColumnReader
Get the ColumnReader by index within the requested columns.- Parameters:
index- index within the requested column names (0-based)- Returns:
- the ColumnReader at the given index
-
nextBatch
public boolean nextBatch()Advance every underlying
ColumnReaderto its next batch in lockstep.All readers share the same
RowGroupIterator, so they always publish batches at the same row boundaries. This method drives every reader once and returns:truewhen every reader produced a new batch — callers can then read values via the per-column accessors. The aligned record count is exposed throughgetRecordCount().falsewhen any reader is exhausted — partial advancement is impossible because all readers consume from the shared iterator, so once one is done they all are.
As a defensive guard, a mismatch between the readers' published record counts throws
IllegalStateException. Under correct internal behavior this can't happen — the guard exists to detect future regressions in the per-column drain workers, not to be triggered in production.Single-column consumers, or consumers that need fine-grained control over the per-reader cadence, can still call
ColumnReader.nextBatch()directly on the readers returned bygetColumnReader(int)/getColumnReader(String).- Returns:
- true if a new aligned batch is available across all readers, false if exhausted
- Throws:
IllegalStateException- if the readers report mismatched record counts
-
getRecordCount
public int getRecordCount()Number of records in the most recently published batch.
Equal to every underlying reader's
ColumnReader.getRecordCount()— alignment is validated bynextBatch().- Throws:
IllegalStateException- if no batch is currently available — callnextBatch()first
-
close
public void close()- Specified by:
closein interfaceAutoCloseable
-