Interface FilterPredicate

All Known Implementing Classes:
FilterPredicate.And, FilterPredicate.BinaryColumnPredicate, FilterPredicate.BinaryInPredicate, FilterPredicate.BooleanColumnPredicate, FilterPredicate.DateColumnPredicate, FilterPredicate.DecimalColumnPredicate, FilterPredicate.DoubleColumnPredicate, FilterPredicate.FloatColumnPredicate, FilterPredicate.InstantColumnPredicate, FilterPredicate.IntColumnPredicate, FilterPredicate.IntersectsPredicate, FilterPredicate.IntInPredicate, FilterPredicate.IsNotNullPredicate, FilterPredicate.IsNullPredicate, FilterPredicate.LongColumnPredicate, FilterPredicate.LongInPredicate, FilterPredicate.Not, FilterPredicate.Or, FilterPredicate.SignedBinaryColumnPredicate, FilterPredicate.TimeColumnPredicate, FilterPredicate.UUIDColumnPredicate

A predicate for filtering row groups based on column statistics.

Filter predicates enable predicate push-down: row groups whose statistics prove that no rows can match the predicate are skipped entirely, avoiding unnecessary I/O and decoding.

Usage examples:

// Simple comparison
FilterPredicate filter = FilterPredicate.gt("age", 21);

// Compound predicate
FilterPredicate filter = FilterPredicate.and(
    FilterPredicate.gtEq("salary", 50000L),
    FilterPredicate.lt("age", 65)
);

// Use with reader
try (ColumnReader reader = fileReader.buildColumnReader("salary").filter(filter).build()) {
    while (reader.nextBatch()) { ... }
}

Null handling

All comparison predicates (eq, notEq, lt, ltEq, gt, gtEq, in, inStrings) follow SQL three-valued logic: comparing a null column value against any operand yields UNKNOWN, and rows whose predicate is UNKNOWN are not returned. In practice this means rows with a null in the tested column are never returned by a comparison predicate. Use isNull / isNotNull for explicit null checks, or or(...) to include null rows alongside a comparison — e.g. or(gt("age", 30), isNull("age")).

not(p) preserves this behavior: rows where p is UNKNOWN stay UNKNOWN under negation and are dropped. The SQL identity not(gt(x, v)) ≡ ltEq(x, v) holds on all rows, including null ones.

This matches the SQL semantics of WHERE predicates and differs from parquet-java's notEq, which treats null <> v as true and therefore includes null rows. To reproduce parquet-java's behavior in Hardwood, write the null-inclusion explicitly: or(notEq("x", v), isNull("x")).

Float and double comparisons

Predicates on float and double columns use the Float.compare(float, float) / Double.compare(double, double) total order, not IEEE 754 equality. Two consequences matter in practice:

  • -0.0 is strictly less than +0.0. eq(0.0) matches only +0.0 values; to match either zero, use or(eq(0.0), eq(-0.0)).
  • NaN sorts above every finite value. eq(NaN) matches only NaN (whereas IEEE NaN == anything is always false). lt and ltEq against any value never match NaN rows; gt and gtEq against a finite value always include NaN rows.

Predicate pushdown is defensive against non-conformant writers: if a column's statistics carry NaN as min or max (forbidden by the Parquet spec, but produced by older / buggy writers), the bound is treated as no-bound and pruning is skipped on that side — matching rows are never dropped.