dev.hardwood.reader (Hardwood Parent 1.0.0.Final API)

package dev.hardwood.reader

Parquet file readers with row-oriented and column-oriented APIs.

ParquetFileReader opens one or more files and provides access to metadata and schema. From there, create a RowReader for row-at-a-time access with typed getters, a ColumnReader for single-column batch-oriented access, or a ColumnReaders for multi-column projection access. FilterPredicate enables predicate pushdown at both the row-group and page level.

For reading multiple files as a single dataset with cross-file prefetching, use Hardwood to share a thread pool across readers.

Related Packages

Package

Description

dev.hardwood

Core entry points for reading Parquet files.
Class

Description

ColumnReader

Batch-oriented column reader for reading a single column across all row groups.

ColumnReaders

Holds multiple ColumnReader instances backed by a shared RowGroupIterator for batch-oriented projection reads.

FilterPredicate

A predicate for filtering row groups based on column statistics.

FilterPredicate.And

FilterPredicate.BinaryColumnPredicate

FilterPredicate.BinaryInPredicate

FilterPredicate.BooleanColumnPredicate

FilterPredicate.DateColumnPredicate

Predicate for DATE columns.

FilterPredicate.DecimalColumnPredicate

Predicate for DECIMAL columns.

FilterPredicate.DoubleColumnPredicate

FilterPredicate.FloatColumnPredicate

FilterPredicate.InstantColumnPredicate

Predicate for TIMESTAMP columns.

FilterPredicate.IntColumnPredicate

FilterPredicate.IntersectsPredicate

Predicate for spatial bounding box.

FilterPredicate.IntInPredicate

FilterPredicate.IsNotNullPredicate

Predicate that matches rows where the column value is not null.

FilterPredicate.IsNullPredicate

Predicate that matches rows where the column value is null.

FilterPredicate.LongColumnPredicate

FilterPredicate.LongInPredicate

FilterPredicate.Not

FilterPredicate.Operator

FilterPredicate.Or

FilterPredicate.SignedBinaryColumnPredicate

Predicate for decimal columns stored as FIXED_LEN_BYTE_ARRAY, which require signed (two's complement) comparison.

FilterPredicate.TimeColumnPredicate

Predicate for TIME columns.

FilterPredicate.UUIDColumnPredicate

LayerKind

Classifies a ColumnReader layer between root and leaf.

ParquetFileReader

Reader for one or more Parquet files.

ParquetFileReader.ColumnReaderBuilder

Builds a single-column ColumnReader with an optional filter.

ParquetFileReader.ColumnReadersBuilder

Builds a ColumnReaders collection for batch-oriented access to a projection of columns.

ParquetFileReader.RowReaderBuilder

Builds a RowReader with optional projection, filter, and head/tail row limit.

RowGroupPredicate

A predicate over row groups, used to select which row groups a reader scans.

RowGroupPredicate.And

Conjunction of row-group predicates — a row group passes if and only if every child passes.

RowGroupPredicate.ByteRange

Keep row groups whose midpoint falls in [startInclusive, endExclusive).

RowReader

Provides row-oriented iteration over a Parquet file.

SchemaIncompatibleException

Thrown when a file's schema is incompatible with the reference schema during multi-file reading.

Validity

Per-item null bitmap at a ColumnReader scope (a STRUCT / REPEATED layer or the leaf).

Package dev.hardwood.reader