Class ParquetFileReader

java.lang.Object
dev.hardwood.reader.ParquetFileReader
All Implemented Interfaces:
AutoCloseable

public class ParquetFileReader extends Object implements AutoCloseable

Reader for one or more Parquet files.

A reader opened over a list of files exposes the schema of the first file and reads rows / column batches across all files in order, with cross-file prefetching handled by the underlying iterator.

// Single file
try (ParquetFileReader reader = ParquetFileReader.open(InputFile.of(path))) {
    RowReader rows = reader.rowReader();
    // ...
}

// Multiple files (use Hardwood for a shared thread pool)
try (Hardwood hardwood = Hardwood.create();
     ParquetFileReader reader = hardwood.openAll(files)) {
    try (ColumnReaders cols = reader.columnReaders(
                ColumnProjection.columns("a", "b"))) {
        // ...
    }
}

Limitation: When using the default memory-mapped InputFile, individual files must be at most 2 GB (Integer.MAX_VALUE bytes). Larger datasets should be split across multiple files.