Class ParquetFileReader
java.lang.Object
dev.hardwood.reader.ParquetFileReader
- All Implemented Interfaces:
AutoCloseable
Reader for individual Parquet files.
For single-file usage:
try (ParquetFileReader reader = ParquetFileReader.open(InputFile.of(path))) {
RowReader rows = reader.createRowReader();
// ...
}
For multi-file usage with shared thread pool, use Hardwood.
Limitation: When using the default memory-mapped InputFile,
individual files must be at most 2 GB (Integer.MAX_VALUE bytes).
Larger datasets should be split across multiple files and read via
MultiFileParquetReader.
-
Method Summary
Modifier and TypeMethodDescriptionvoidclose()createColumnReader(int columnIndex) Create a ColumnReader for a column by index, spanning all row groups.createColumnReader(int columnIndex, FilterPredicate filter) Create a ColumnReader for a column by index, spanning only row groups that match the filter.createColumnReader(String columnName) Create a ColumnReader for a named column, spanning all row groups.createColumnReader(String columnName, FilterPredicate filter) Create a ColumnReader for a named column, spanning only row groups that match the filter.Create a RowReader that iterates over all rows in all row groups.createRowReader(long maxRows) Create a RowReader that returns at mostmaxRowsrows.createRowReader(FilterPredicate filter) Create a RowReader with a filter, iterating over all columns but only matching row groups.createRowReader(FilterPredicate filter, long maxRows) Create a RowReader with a filter that returns at mostmaxRowsrows.createRowReader(ColumnProjection projection) Create a RowReader that iterates over selected columns in all row groups.createRowReader(ColumnProjection projection, long maxRows) Create a RowReader with column projection that returns at mostmaxRowsrows.createRowReader(ColumnProjection projection, FilterPredicate filter) Create a RowReader that iterates over selected columns in only matching row groups.createRowReader(ColumnProjection projection, FilterPredicate filter, long maxRows) Create a RowReader with column projection and filter that returns at mostmaxRowsrows.static ParquetFileReaderOpen a Parquet file from anInputFilewith a dedicated context.static ParquetFileReaderopen(InputFile inputFile, HardwoodContext context) Open a Parquet file from anInputFilewith a shared context.
-
Method Details
-
open
Open a Parquet file from an
InputFilewith a dedicated context.This method calls
InputFile.open()and takes ownership of the file; it will be closed when this reader is closed.- Throws:
IOException
-
open
public static ParquetFileReader open(InputFile inputFile, HardwoodContext context) throws IOException Open a Parquet file from an
InputFilewith a shared context.This method calls
InputFile.open()and takes ownership of the file; it will be closed when this reader is closed. The caller retains ownership of the context.- Throws:
IOException
-
getFileMetaData
-
getFileSchema
-
createColumnReader
Create a ColumnReader for a named column, spanning all row groups. -
createColumnReader
Create a ColumnReader for a named column, spanning only row groups that match the filter.- Parameters:
columnName- the column to readfilter- predicate for row group filtering based on statistics
-
createColumnReader
Create a ColumnReader for a column by index, spanning all row groups. -
createColumnReader
Create a ColumnReader for a column by index, spanning only row groups that match the filter.- Parameters:
columnIndex- the column index to readfilter- predicate for row group filtering based on statistics
-
createRowReader
Create a RowReader that iterates over all rows in all row groups. -
createRowReader
Create a RowReader with a filter, iterating over all columns but only matching row groups.- Parameters:
filter- predicate for row group filtering based on statistics
-
createRowReader
Create a RowReader that iterates over selected columns in all row groups.- Parameters:
projection- specifies which columns to read- Returns:
- a RowReader for the selected columns
-
createRowReader
Create a RowReader that iterates over selected columns in only matching row groups.- Parameters:
projection- specifies which columns to readfilter- predicate for row group filtering based on statistics
-
createRowReader
Create a RowReader that returns at mostmaxRowsrows.- Parameters:
maxRows- maximum number of rows to return (must be > 0)
-
createRowReader
Create a RowReader with column projection that returns at mostmaxRowsrows.- Parameters:
projection- specifies which columns to readmaxRows- maximum number of rows to return (must be > 0)
-
createRowReader
Create a RowReader with a filter that returns at mostmaxRowsrows.- Parameters:
filter- predicate for row group filtering based on statisticsmaxRows- maximum number of rows to return (must be > 0)
-
createRowReader
Create a RowReader with column projection and filter that returns at mostmaxRowsrows.- Parameters:
projection- specifies which columns to readfilter- predicate for row group filtering based on statisticsmaxRows- maximum number of rows to return (must be > 0)
-
close
- Specified by:
closein interfaceAutoCloseable- Throws:
IOException
-