How-to Guides¶
Read Parquet files with Hardwood — pick the guide that matches what you need:
- Read Row by Row —
RowReader, typed accessors, nested structs / lists / maps. - Read Column by Column —
ColumnReaderandColumnReaders, the layer model, hot-loop patterns. - Filter, Project, Limit, and Split — predicate pushdown, column projection, row limits, split-aware reading. Apply to both reader types.
- Read Multiple Files as One Dataset —
Hardwood.openAll(...)with cross-file prefetching and a shared thread pool. - Read into Avro GenericRecords —
AvroRowReaderand schema conversion (hardwood-avromodule). - Read from S3 — object-store reading without Hadoop (
hardwood-s3module). - Read with the parquet-java API — drop-in
org.apache.parquet.*replacement (experimental). - Read Variant Columns —
getVariantand thePqVariantAPI. - Read Geospatial Columns — GEOMETRY / GEOGRAPHY columns, bounding-box filter pushdown.
- Inspect File Metadata — file metadata, row groups, column chunks, schema introspection.
For detailed class-level documentation, see the JavaDoc.
Choosing a Reader¶
Hardwood provides two reader APIs:
RowReader— row-oriented access with typed getters, including nested structs, lists, and maps. Best for general-purpose reading where you process one row at a time.ColumnReader— batch-oriented columnar access with typed primitive arrays. Best for analytical workloads where you process columns independently (e.g. summing a column, computing statistics).
For the reasoning behind the two APIs and the ergonomics-versus-throughput trade-off, see RowReader vs. ColumnReader.
Both support column projection and predicate pushdown. Each reader has a no-arg shortcut for default reads and a builder form for filtered or limited reads:
| Reader | Shortcut | Builder |
|---|---|---|
RowReader |
reader.rowReader() |
reader.buildRowReader().…build() |
ColumnReader (single) |
reader.columnReader("id") |
reader.buildColumnReader("id").…build() |
ColumnReaders (multiple) |
reader.columnReaders(projection) |
reader.buildColumnReaders(projection).…build() |
To read multiple files as a single dataset with cross-file prefetching, open the ParquetFileReader with a list of InputFiles via the Hardwood class — see Reading Multiple Files.
For the exceptions the readers can throw and when, see Error Handling.