Avro Support¶
If your application already works with Avro records — for instance in a Kafka or Spark pipeline — you can read Parquet files directly into GenericRecord instances instead of using Hardwood's own row API. The hardwood-avro module handles the schema conversion and record materialization, matching the behavior of parquet-java's AvroReadSupport. Add it alongside hardwood-core:
Read rows as GenericRecord:
import dev.hardwood.avro.AvroReaders;
import dev.hardwood.avro.AvroRowReader;
import dev.hardwood.reader.ParquetFileReader;
import org.apache.avro.Schema;
import org.apache.avro.generic.GenericRecord;
try (ParquetFileReader fileReader = ParquetFileReader.open(InputFile.of(path));
AvroRowReader reader = AvroReaders.createRowReader(fileReader)) {
Schema avroSchema = reader.getSchema();
while (reader.hasNext()) {
GenericRecord record = reader.next();
// Access fields by name
long id = (Long) record.get("id");
String name = (String) record.get("name");
// Nested structs are nested GenericRecords
GenericRecord address = (GenericRecord) record.get("address");
if (address != null) {
String city = (String) address.get("city");
}
// Lists and maps use standard Java collections
@SuppressWarnings("unchecked")
List<String> tags = (List<String>) record.get("tags");
}
}
AvroReaders supports all reader options: column projection, predicate pushdown, and their combination:
// With filter
AvroRowReader reader = AvroReaders.createRowReader(fileReader,
FilterPredicate.gt("id", 1000L));
// With projection
AvroRowReader reader = AvroReaders.createRowReader(fileReader,
ColumnProjection.columns("id", "name"));
// With both
AvroRowReader reader = AvroReaders.createRowReader(fileReader,
ColumnProjection.columns("id", "name"),
FilterPredicate.gt("id", 1000L));
Values are stored in Avro's standard representations: timestamps as Long (millis/micros since epoch), dates as Integer (days since epoch), decimals as ByteBuffer, binary data as ByteBuffer. This matches the behavior of parquet-java's AvroReadSupport.