Typed Accessors¶
RowReader — and the nested PqStruct / PqList / PqMap flyweights — decode each column to its
logical-type Java representation through typed accessor methods. This page is the full
correspondence between accessor, Parquet type, and Java type, together with the null- and
type-mismatch contracts every accessor obeys. For the task-oriented walkthrough, see
Read Row by Row.
Accessor type mapping¶
All accessors are available in two forms — name-based (getInt("column_name")) and index-based
(getInt(columnIndex)); see Index-based access.
| Method | Physical Type | Logical Type | Java Type |
|---|---|---|---|
getBoolean |
BOOLEAN | boolean |
|
getInt |
INT32 | int |
|
getLong |
INT64 | long |
|
getFloat |
FLOAT, or FIXED_LEN_BYTE_ARRAY(2) | FLOAT16 (optional) | float |
getDouble |
DOUBLE | double |
|
getBinary |
BYTE_ARRAY | BSON (optional) | byte[] |
getString |
BYTE_ARRAY | STRING or JSON | String |
getDate |
INT32 | DATE | LocalDate |
getTime |
INT32 or INT64 | TIME | LocalTime |
getTimestamp |
INT64, or legacy INT96 | TIMESTAMP (isAdjustedToUTC = true) |
Instant |
getLocalTimestamp |
INT64 | TIMESTAMP (isAdjustedToUTC = false) |
LocalDateTime |
getDecimal |
INT32, INT64, or FIXED_LEN_BYTE_ARRAY | DECIMAL | BigDecimal |
getUuid |
FIXED_LEN_BYTE_ARRAY | UUID | UUID |
getInterval |
FIXED_LEN_BYTE_ARRAY(12) | INTERVAL | PqInterval |
getStruct |
PqStruct |
||
getList |
LIST | PqList |
|
getMap |
MAP | PqMap |
|
getVariant |
BYTE_ARRAY pair | VARIANT | PqVariant |
isNull |
Any | Any | boolean |
All methods are available as both method(name) and method(index).
Null handling¶
Primitive accessors (getInt, getLong, getFloat, getDouble, getBoolean) throw
NullPointerException if the field is null — always check isNull() first. Object accessors
(getString, getDate, getTimestamp, getLocalTimestamp, getDecimal, getUuid,
getInterval, getStruct, getList, getMap) return null for null fields.
Type mismatches¶
Requesting the wrong type for a column (e.g. getInt on a LONG column, getDate on a STRING
column) is a programming error; the call fails at runtime with an unchecked exception. The
specific exception type is unspecified and may change between releases — do not catch it as part of
normal control flow. If the column type isn't known statically, check it up front via
reader.getFileSchema().getColumn(name) and inspect the returned ColumnSchema's type() /
logicalType() — see Inspect File Metadata.
The getTimestamp / getLocalTimestamp pair is split along the column's isAdjustedToUTC flag:
getTimestamp requires isAdjustedToUTC = true and returns Instant; getLocalTimestamp
requires isAdjustedToUTC = false and returns LocalDateTime. Calling the wrong one for a column
throws IllegalStateException naming the column and the actual flag value. If the kind isn't known
statically, branch on ((LogicalType.TimestampType) column.logicalType()).isAdjustedToUTC() before
the accessor call, or use the generic getValue accessor, which returns Instant or
LocalDateTime per the column's flag. For why the pair is split, see
Timestamp Semantics.
The TIME logical type also carries an isAdjustedToUTC flag, but LocalTime has no zone of its
own, so getTime returns LocalTime either way and the flag is informational — inspect
((LogicalType.TimeType) column.logicalType()).isAdjustedToUTC() if the distinction matters to
your application.
FLOAT16 columns¶
getFloat accepts FLOAT16 columns (FIXED_LEN_BYTE_ARRAY(2) annotated with the FLOAT16 logical
type) and decodes the 2-byte IEEE 754 half-precision payload to a single-precision float. The
widening is lossless — half-precision NaN, ±Infinity, and signed zero round-trip cleanly, and the
original NaN bit pattern is preserved (the Parquet spec does not canonicalize NaNs on write). Use
Float.isNaN(value) for NaN checks rather than equality. As with all primitive accessors,
isNull() must be checked before getFloat() since FLOAT16 columns can be optional.
Legacy INT96 timestamps¶
Parquet files written by older versions of Apache Spark and Hive store timestamps in the deprecated
INT96 physical type without a TIMESTAMP logical type annotation. getTimestamp detects INT96
automatically and decodes it to an Instant; no caller-side handling is required.
Legacy converted-type annotations¶
Writers predating the modern logical-type union (older parquet-mr / Hive / Impala / Spark) annotate
primitive columns with only a legacy converted_type and no logicalType. Hardwood promotes each
one to its logical type, so the column decodes through the normal typed accessor with no
caller-side opt-in:
converted_type |
Accessor | Java type |
|---|---|---|
UTF8 |
getString |
String |
JSON |
getString |
String |
ENUM, BSON |
getBinary |
byte[] |
DATE |
getDate |
LocalDate |
DECIMAL |
getDecimal |
BigDecimal |
TIME_MILLIS, TIME_MICROS |
getTime |
LocalTime |
TIMESTAMP_MILLIS, TIMESTAMP_MICROS |
getTimestamp |
Instant |
INT_8, INT_16, INT_32, INT_64 |
getValue |
Byte / Short / Integer / Long |
UINT_8, UINT_16, UINT_32, UINT_64 |
getValue |
Integer / Long (raw two's-complement bit pattern) |
INTERVAL |
getInterval |
PqInterval |
TIME_* columns decode to a UTC-normalized LocalTime time-of-day and TIMESTAMP_* columns to a
UTC-normalized Instant, matching the parquet-format backward-compatibility rule for these
annotations. Unsigned columns preserve the stored bit pattern — reinterpret with
Integer.toUnsignedLong / Long.toUnsignedString for the unsigned magnitude. When a file carries
both a converted_type and a modern logicalType, the logicalType takes precedence.
The MAP group annotation has a legacy form too: some older parquet-mr / Hive / Impala files annotate
only the inner repeated key_value group with MAP_KEY_VALUE and leave the outer group unannotated.
Hardwood recognizes this form as a map, so getMap returns a PqMap and the group's
SchemaNode.GroupNode.isMap() reports true (with logicalType() returning a MapType), exactly as
for the modern MAP annotation.