Class MultiFileParquetReader
java.lang.Object
dev.hardwood.reader.MultiFileParquetReader
- All Implemented Interfaces:
AutoCloseable
Entry point for reading multiple Parquet files with cross-file prefetching.
This is the multi-file equivalent of ParquetFileReader. It opens the
first file, reads the schema, and lets you choose between row-oriented or
column-oriented access with a specific column projection.
Usage:
try (Hardwood hardwood = Hardwood.create();
MultiFileParquetReader reader = hardwood.openAll(files)) {
FileSchema schema = reader.getFileSchema();
// Row-oriented access:
try (MultiFileRowReader rows = reader.createRowReader(
ColumnProjection.columns("col1", "col2"))) { ... }
// Column-oriented access:
try (MultiFileColumnReaders columns = reader.createColumnReaders(
ColumnProjection.columns("col1", "col2"))) { ... }
}
-
Constructor Summary
ConstructorsConstructorDescriptionMultiFileParquetReader(List<InputFile> inputFiles, dev.hardwood.internal.reader.HardwoodContextImpl context) Creates a MultiFileParquetReader for the givenInputFileinstances. -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()createColumnReaders(ColumnProjection projection) Create column readers for batch-oriented access to the requested columns.createColumnReaders(ColumnProjection projection, FilterPredicate filter) Create column readers for batch-oriented access to the requested columns, skipping row groups that don't match the filter.Create a row reader that iterates over all rows in all files.createRowReader(FilterPredicate filter) Create a row reader with a filter, iterating over all columns but only matching row groups.createRowReader(ColumnProjection projection) Create a row reader that iterates over selected columns in all files.createRowReader(ColumnProjection projection, FilterPredicate filter) Create a row reader that iterates over selected columns in only matching row groups.Get the file schema (common across all files).
-
Constructor Details
-
MultiFileParquetReader
public MultiFileParquetReader(List<InputFile> inputFiles, dev.hardwood.internal.reader.HardwoodContextImpl context) throws IOException Creates a MultiFileParquetReader for the given
InputFileinstances.The files will be opened automatically as needed. Closing this reader closes all the files.
- Parameters:
inputFiles- the input files to read (must not be empty)context- the shared context- Throws:
IOException- if the first file cannot be opened or read
-
-
Method Details
-
getFileSchema
Get the file schema (common across all files). -
createRowReader
Create a row reader that iterates over all rows in all files. -
createRowReader
Create a row reader with a filter, iterating over all columns but only matching row groups.- Parameters:
filter- predicate for row group filtering based on statistics
-
createRowReader
Create a row reader that iterates over selected columns in all files.- Parameters:
projection- specifies which columns to read
-
createRowReader
Create a row reader that iterates over selected columns in only matching row groups.- Parameters:
projection- specifies which columns to readfilter- predicate for row group and record-level filtering
-
createColumnReaders
Create column readers for batch-oriented access to the requested columns.- Parameters:
projection- specifies which columns to read
-
createColumnReaders
public MultiFileColumnReaders createColumnReaders(ColumnProjection projection, FilterPredicate filter) Create column readers for batch-oriented access to the requested columns, skipping row groups that don't match the filter.- Parameters:
projection- specifies which columns to readfilter- predicate for row group filtering based on statistics
-
close
public void close()- Specified by:
closein interfaceAutoCloseable
-