Class ParquetFileReader

java.lang.Object
dev.hardwood.reader.ParquetFileReader
All Implemented Interfaces:
AutoCloseable

public class ParquetFileReader extends Object implements AutoCloseable

Reader for individual Parquet files.

For single-file usage:

try (ParquetFileReader reader = ParquetFileReader.open(InputFile.of(path))) {
    RowReader rows = reader.createRowReader();
    // ...
}

For multi-file usage with shared thread pool, use Hardwood.

Limitation: When using the default memory-mapped InputFile, individual files must be at most 2 GB (Integer.MAX_VALUE bytes). Larger datasets should be split across multiple files and read via MultiFileParquetReader.

  • Method Details

    • open

      public static ParquetFileReader open(InputFile inputFile) throws IOException

      Open a Parquet file from an InputFile with a dedicated context.

      This method calls InputFile.open() and takes ownership of the file; it will be closed when this reader is closed.

      Throws:
      IOException
    • open

      public static ParquetFileReader open(InputFile inputFile, HardwoodContext context) throws IOException

      Open a Parquet file from an InputFile with a shared context.

      This method calls InputFile.open() and takes ownership of the file; it will be closed when this reader is closed. The caller retains ownership of the context.

      Throws:
      IOException
    • getFileMetaData

      public FileMetaData getFileMetaData()
    • getFileSchema

      public FileSchema getFileSchema()
    • createColumnReader

      public ColumnReader createColumnReader(String columnName)
      Create a ColumnReader for a named column, spanning all row groups.
    • createColumnReader

      public ColumnReader createColumnReader(String columnName, FilterPredicate filter)
      Create a ColumnReader for a named column, spanning only row groups that match the filter.
      Parameters:
      columnName - the column to read
      filter - predicate for row group filtering based on statistics
    • createColumnReader

      public ColumnReader createColumnReader(int columnIndex)
      Create a ColumnReader for a column by index, spanning all row groups.
    • createColumnReader

      public ColumnReader createColumnReader(int columnIndex, FilterPredicate filter)
      Create a ColumnReader for a column by index, spanning only row groups that match the filter.
      Parameters:
      columnIndex - the column index to read
      filter - predicate for row group filtering based on statistics
    • createRowReader

      public RowReader createRowReader()
      Create a RowReader that iterates over all rows in all row groups.
    • createRowReader

      public RowReader createRowReader(FilterPredicate filter)
      Create a RowReader with a filter, iterating over all columns but only matching row groups.
      Parameters:
      filter - predicate for row group filtering based on statistics
    • createRowReader

      public RowReader createRowReader(ColumnProjection projection)
      Create a RowReader that iterates over selected columns in all row groups.
      Parameters:
      projection - specifies which columns to read
      Returns:
      a RowReader for the selected columns
    • createRowReader

      public RowReader createRowReader(ColumnProjection projection, FilterPredicate filter)
      Create a RowReader that iterates over selected columns in only matching row groups.
      Parameters:
      projection - specifies which columns to read
      filter - predicate for row group filtering based on statistics
    • createRowReader

      public RowReader createRowReader(long maxRows)
      Create a RowReader that returns at most maxRows rows.
      Parameters:
      maxRows - maximum number of rows to return (must be > 0)
    • createRowReader

      public RowReader createRowReader(ColumnProjection projection, long maxRows)
      Create a RowReader with column projection that returns at most maxRows rows.
      Parameters:
      projection - specifies which columns to read
      maxRows - maximum number of rows to return (must be > 0)
    • createRowReader

      public RowReader createRowReader(FilterPredicate filter, long maxRows)
      Create a RowReader with a filter that returns at most maxRows rows.
      Parameters:
      filter - predicate for row group filtering based on statistics
      maxRows - maximum number of rows to return (must be > 0)
    • createRowReader

      public RowReader createRowReader(ColumnProjection projection, FilterPredicate filter, long maxRows)
      Create a RowReader with column projection and filter that returns at most maxRows rows.
      Parameters:
      projection - specifies which columns to read
      filter - predicate for row group filtering based on statistics
      maxRows - maximum number of rows to return (must be > 0)
    • close

      public void close() throws IOException
      Specified by:
      close in interface AutoCloseable
      Throws:
      IOException