Class ParquetFileReader
- All Implemented Interfaces:
AutoCloseable
Reader for one or more Parquet files.
A reader opened over a list of files exposes the schema of the first file and reads rows / column batches across all files in order, with cross-file prefetching handled by the underlying iterator.
// Single file
try (ParquetFileReader reader = ParquetFileReader.open(InputFile.of(path))) {
RowReader rows = reader.rowReader();
// ...
}
// Multiple files (use Hardwood for a shared thread pool)
try (Hardwood hardwood = Hardwood.create();
ParquetFileReader reader = hardwood.openAll(files)) {
try (ColumnReaders cols = reader.columnReaders(
ColumnProjection.columns("a", "b"))) {
// ...
}
}
Limitation: When using the default memory-mapped InputFile,
individual files must be at most 2 GB (Integer.MAX_VALUE bytes).
Larger datasets should be split across multiple files.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final classBuilds a single-columnColumnReaderwith an optional filter.static final classBuilds aColumnReaderscollection for batch-oriented access to a projection of columns.static final classBuilds aRowReaderwith optional projection, filter, and head/tail row limit. -
Method Summary
Modifier and TypeMethodDescriptionbuildColumnReader(int columnIndex) Begin configuring a single-columnColumnReaderby column index.buildColumnReader(String columnName) Begin configuring a single-columnColumnReader.buildColumnReaders(ColumnProjection projection) Begin configuring aColumnReaderscollection for batch-oriented access to a column projection.Begin configuring aRowReaderwith optional projection, filter, and head/tail limit.voidclose()columnReader(int columnIndex) Shortcut forbuildColumnReader(int).build() — read every row group of the column at the given index with no filter.columnReader(String columnName) Shortcut forbuildColumnReader(String).build() — read every row group of the named column with no filter.columnReaders(ColumnProjection projection) Shortcut forbuildColumnReaders(ColumnProjection).build() — every row group, no filter.File metadata of the first input file.booleantruewhen this reader was opened over more than one input file.static ParquetFileReaderOpen a single Parquet file with a dedicated context.static ParquetFileReaderopen(InputFile inputFile, HardwoodContext context) Open a single Parquet file with a shared context.static ParquetFileReaderOpen multiple Parquet files with a dedicated context.static ParquetFileReaderopenAll(List<InputFile> inputFiles, HardwoodContext context) Open multiple Parquet files with a shared context.Shortcut forbuildRowReader().build() — read every row of every column with no filter.
-
Method Details
-
open
Open a single Parquet file with a dedicated context.
Calls
InputFile.open()and takes ownership of the file; it is closed when this reader is closed.- Throws:
IOException
-
open
public static ParquetFileReader open(InputFile inputFile, HardwoodContext context) throws IOException Open a single Parquet file with a shared context.
Calls
InputFile.open()and takes ownership of the file; it is closed when this reader is closed. The caller retains ownership of the context.- Throws:
IOException
-
openAll
Open multiple Parquet files with a dedicated context. The schema is read from the first file and is assumed to be common across all files. Files are opened on demand by the iterator; the first file is opened eagerly so any I/O or metadata error surfaces immediately.- Throws:
IOException
-
openAll
public static ParquetFileReader openAll(List<InputFile> inputFiles, HardwoodContext context) throws IOException Open multiple Parquet files with a shared context.- Throws:
IOException
-
getFileMetaData
File metadata of the first input file. For multi-file readers, per-file metadata for files beyond the first is not exposed; open those files individually to inspect their metadata. -
getFileSchema
-
isMultiFile
public boolean isMultiFile()truewhen this reader was opened over more than one input file. -
rowReader
Shortcut forbuildRowReader().build() — read every row of every column with no filter. -
buildRowReader
Begin configuring aRowReaderwith optional projection, filter, and head/tail limit. -
columnReader
Shortcut forbuildColumnReader(String).build() — read every row group of the named column with no filter. Single-file only. -
columnReader
Shortcut forbuildColumnReader(int).build() — read every row group of the column at the given index with no filter. Single-file only. -
buildColumnReader
Begin configuring a single-columnColumnReader. Single-file only. -
buildColumnReader
Begin configuring a single-columnColumnReaderby column index. Single-file only. -
columnReaders
Shortcut forbuildColumnReaders(ColumnProjection).build() — every row group, no filter. Works for single- and multi-file. -
buildColumnReaders
Begin configuring aColumnReaderscollection for batch-oriented access to a column projection. Works for single- and multi-file. -
close
- Specified by:
closein interfaceAutoCloseable- Throws:
IOException
-