Class MultiFileParquetReader

java.lang.Object
dev.hardwood.reader.MultiFileParquetReader
All Implemented Interfaces:
AutoCloseable

public class MultiFileParquetReader extends Object implements AutoCloseable

Entry point for reading multiple Parquet files with cross-file prefetching.

This is the multi-file equivalent of ParquetFileReader. It opens the first file, reads the schema, and lets you choose between row-oriented or column-oriented access with a specific column projection.

Usage:

try (Hardwood hardwood = Hardwood.create();
     MultiFileParquetReader reader = hardwood.openAll(files)) {

    FileSchema schema = reader.getFileSchema();

    // Row-oriented access:
    try (MultiFileRowReader rows = reader.createRowReader(
            ColumnProjection.columns("col1", "col2"))) { ... }

    // Column-oriented access:
    try (MultiFileColumnReaders columns = reader.createColumnReaders(
            ColumnProjection.columns("col1", "col2"))) { ... }
}
  • Constructor Details

    • MultiFileParquetReader

      public MultiFileParquetReader(List<InputFile> inputFiles, dev.hardwood.internal.reader.HardwoodContextImpl context) throws IOException

      Creates a MultiFileParquetReader for the given InputFile instances.

      The files will be opened automatically as needed. Closing this reader closes all the files.

      Parameters:
      inputFiles - the input files to read (must not be empty)
      context - the shared context
      Throws:
      IOException - if the first file cannot be opened or read
  • Method Details

    • getFileSchema

      public FileSchema getFileSchema()
      Get the file schema (common across all files).
    • createRowReader

      public MultiFileRowReader createRowReader()
      Create a row reader that iterates over all rows in all files.
    • createRowReader

      public MultiFileRowReader createRowReader(FilterPredicate filter)
      Create a row reader with a filter, iterating over all columns but only matching row groups.
      Parameters:
      filter - predicate for row group filtering based on statistics
    • createRowReader

      public MultiFileRowReader createRowReader(ColumnProjection projection)
      Create a row reader that iterates over selected columns in all files.
      Parameters:
      projection - specifies which columns to read
    • createRowReader

      public MultiFileRowReader createRowReader(ColumnProjection projection, FilterPredicate filter)
      Create a row reader that iterates over selected columns in only matching row groups.
      Parameters:
      projection - specifies which columns to read
      filter - predicate for row group and record-level filtering
    • createColumnReaders

      public MultiFileColumnReaders createColumnReaders(ColumnProjection projection)
      Create column readers for batch-oriented access to the requested columns.
      Parameters:
      projection - specifies which columns to read
    • createColumnReaders

      public MultiFileColumnReaders createColumnReaders(ColumnProjection projection, FilterPredicate filter)
      Create column readers for batch-oriented access to the requested columns, skipping row groups that don't match the filter.
      Parameters:
      projection - specifies which columns to read
      filter - predicate for row group filtering based on statistics
    • close

      public void close()
      Specified by:
      close in interface AutoCloseable