Parquet-Java Compatibility¶
If you have existing code that uses Apache parquet-java's ParquetReader<Group> API and want to switch to Hardwood without rewriting it, the hardwood-parquet-java-compat module provides a drop-in replacement. It implements the same org.apache.parquet.* interfaces backed by Hardwood's reader, so you get Hardwood's performance with minimal code changes.
Features:
- Provides
org.apache.parquet.*namespace classes compatible with parquet-java - Includes Hadoop shims (
Path,Configuration) — no Hadoop dependency required - Supports S3 reading via
HadoopInputFilewith the samefs.s3a.*configuration properties - Supports filter predicate pushdown with the standard
FilterApi/FilterCompatclasses
Reading Local Files¶
import org.apache.hadoop.fs.Path;
import org.apache.parquet.example.data.Group;
import org.apache.parquet.hadoop.GroupReadSupport;
import org.apache.parquet.hadoop.ParquetReader;
Path path = new Path("data.parquet");
try (ParquetReader<Group> reader = ParquetReader.builder(new GroupReadSupport(), path).build()) {
Group record;
while ((record = reader.read()) != null) {
long id = record.getLong("id", 0);
String name = record.getString("name", 0);
Group address = record.getGroup("address", 0);
String city = address.getString("city", 0);
}
}
Reading from S3¶
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.parquet.hadoop.util.HadoopInputFile;
Configuration conf = new Configuration();
conf.set("fs.s3a.access.key", "...");
conf.set("fs.s3a.secret.key", "...");
conf.set("fs.s3a.endpoint", "https://s3.us-east-1.amazonaws.com");
Path path = new Path("s3a://my-bucket/data.parquet");
try (ParquetReader<Group> reader = ParquetReader.builder(new GroupReadSupport(), path)
.withConf(conf)
.build()) {
// read as usual
}
S3 support requires hardwood-s3 on the classpath. The compat layer loads it via reflection — if missing, a clear error message indicates which dependency to add.
Filter Pushdown¶
import static org.apache.parquet.filter2.predicate.FilterApi.*;
import org.apache.parquet.filter2.compat.FilterCompat;
FilterPredicate pred = and(
gtEq(longColumn("id"), 100L),
lt(doubleColumn("amount"), 500.0)
);
try (ParquetReader<Group> reader = ParquetReader.builder(new GroupReadSupport(), path)
.withFilter(FilterCompat.get(pred))
.build()) {
// only rows from matching row groups are returned
}
Warning
This module provides its own type shims in the org.apache.parquet.* namespace. It cannot be used alongside parquet-java on the same classpath.