How-to Guides¶

Read Parquet files with Hardwood — pick the guide that matches what you need:

Read Row by Row — RowReader, typed accessors, nested structs / lists / maps.
Read Column by Column — ColumnReader and ColumnReaders, the layer model, hot-loop patterns.
Filter, Project, Limit, and Split — predicate pushdown, column projection, row limits, split-aware reading. Apply to both reader types.
Read Multiple Files as One Dataset — Hardwood.openAll(...) with cross-file prefetching and a shared thread pool.
Read into Avro GenericRecords — AvroRowReader and schema conversion (hardwood-avro module).
Read from S3 — object-store reading without Hadoop (hardwood-s3 module).
Read with the parquet-java API — drop-in org.apache.parquet.* replacement (experimental).
Read Variant Columns — getVariant and the PqVariant API.
Read Geospatial Columns — GEOMETRY / GEOGRAPHY columns, bounding-box filter pushdown.
Inspect File Metadata — file metadata, row groups, column chunks, schema introspection.

For detailed class-level documentation, see the JavaDoc.

Choosing a Reader¶

Hardwood provides two reader APIs:

RowReader — row-oriented access with typed getters, including nested structs, lists, and maps. Best for general-purpose reading where you process one row at a time.
ColumnReader — batch-oriented columnar access with typed primitive arrays. Best for analytical workloads where you process columns independently (e.g. summing a column, computing statistics).

For the reasoning behind the two APIs and the ergonomics-versus-throughput trade-off, see RowReader vs. ColumnReader.

Both support column projection and predicate pushdown. Each reader has a no-arg shortcut for default reads and a builder form for filtered or limited reads:

Reader	Shortcut	Builder
`RowReader`	`reader.rowReader()`	`reader.buildRowReader().…build()`
`ColumnReader` (single)	`reader.columnReader("id")`	`reader.buildColumnReader("id").…build()`
`ColumnReaders` (multiple)	`reader.columnReaders(projection)`	`reader.buildColumnReaders(projection).…build()`

To read multiple files as a single dataset with cross-file prefetching, open the ParquetFileReader with a list of InputFiles via the Hardwood class — see Reading Multiple Files.

For the exceptions the readers can throw and when, see Error Handling.