Skip to content

How-to Guides

Read Parquet files with Hardwood — pick the guide that matches what you need:

For detailed class-level documentation, see the JavaDoc.

Choosing a Reader

Hardwood provides two reader APIs:

  • RowReader — row-oriented access with typed getters, including nested structs, lists, and maps. Best for general-purpose reading where you process one row at a time.
  • ColumnReader — batch-oriented columnar access with typed primitive arrays. Best for analytical workloads where you process columns independently (e.g. summing a column, computing statistics).

For the reasoning behind the two APIs and the ergonomics-versus-throughput trade-off, see RowReader vs. ColumnReader.

Both support column projection and predicate pushdown. Each reader has a no-arg shortcut for default reads and a builder form for filtered or limited reads:

Reader Shortcut Builder
RowReader reader.rowReader() reader.buildRowReader().…build()
ColumnReader (single) reader.columnReader("id") reader.buildColumnReader("id").…build()
ColumnReaders (multiple) reader.columnReaders(projection) reader.buildColumnReaders(projection).…build()

To read multiple files as a single dataset with cross-file prefetching, open the ParquetFileReader with a list of InputFiles via the Hardwood class — see Reading Multiple Files.

For the exceptions the readers can throw and when, see Error Handling.