Skip to content

Typed Accessors

RowReader — and the nested PqStruct / PqList / PqMap flyweights — decode each column to its logical-type Java representation through typed accessor methods. This page is the full correspondence between accessor, Parquet type, and Java type, together with the null- and type-mismatch contracts every accessor obeys. For the task-oriented walkthrough, see Read Row by Row.

Accessor type mapping

All accessors are available in two forms — name-based (getInt("column_name")) and index-based (getInt(columnIndex)); see Index-based access.

Method Physical Type Logical Type Java Type
getBoolean BOOLEAN boolean
getInt INT32 int
getLong INT64 long
getFloat FLOAT, or FIXED_LEN_BYTE_ARRAY(2) FLOAT16 (optional) float
getDouble DOUBLE double
getBinary BYTE_ARRAY BSON (optional) byte[]
getString BYTE_ARRAY STRING or JSON String
getDate INT32 DATE LocalDate
getTime INT32 or INT64 TIME LocalTime
getTimestamp INT64, or legacy INT96 TIMESTAMP (isAdjustedToUTC = true) Instant
getLocalTimestamp INT64 TIMESTAMP (isAdjustedToUTC = false) LocalDateTime
getDecimal INT32, INT64, or FIXED_LEN_BYTE_ARRAY DECIMAL BigDecimal
getUuid FIXED_LEN_BYTE_ARRAY UUID UUID
getInterval FIXED_LEN_BYTE_ARRAY(12) INTERVAL PqInterval
getStruct PqStruct
getList LIST PqList
getMap MAP PqMap
getVariant BYTE_ARRAY pair VARIANT PqVariant
isNull Any Any boolean

All methods are available as both method(name) and method(index).

Null handling

Primitive accessors (getInt, getLong, getFloat, getDouble, getBoolean) throw NullPointerException if the field is null — always check isNull() first. Object accessors (getString, getDate, getTimestamp, getLocalTimestamp, getDecimal, getUuid, getInterval, getStruct, getList, getMap) return null for null fields.

Type mismatches

Requesting the wrong type for a column (e.g. getInt on a LONG column, getDate on a STRING column) is a programming error; the call fails at runtime with an unchecked exception. The specific exception type is unspecified and may change between releases — do not catch it as part of normal control flow. If the column type isn't known statically, check it up front via reader.getFileSchema().getColumn(name) and inspect the returned ColumnSchema's type() / logicalType() — see Inspect File Metadata.

The getTimestamp / getLocalTimestamp pair is split along the column's isAdjustedToUTC flag: getTimestamp requires isAdjustedToUTC = true and returns Instant; getLocalTimestamp requires isAdjustedToUTC = false and returns LocalDateTime. Calling the wrong one for a column throws IllegalStateException naming the column and the actual flag value. If the kind isn't known statically, branch on ((LogicalType.TimestampType) column.logicalType()).isAdjustedToUTC() before the accessor call, or use the generic getValue accessor, which returns Instant or LocalDateTime per the column's flag. For why the pair is split, see Timestamp Semantics.

The TIME logical type also carries an isAdjustedToUTC flag, but LocalTime has no zone of its own, so getTime returns LocalTime either way and the flag is informational — inspect ((LogicalType.TimeType) column.logicalType()).isAdjustedToUTC() if the distinction matters to your application.

FLOAT16 columns

getFloat accepts FLOAT16 columns (FIXED_LEN_BYTE_ARRAY(2) annotated with the FLOAT16 logical type) and decodes the 2-byte IEEE 754 half-precision payload to a single-precision float. The widening is lossless — half-precision NaN, ±Infinity, and signed zero round-trip cleanly, and the original NaN bit pattern is preserved (the Parquet spec does not canonicalize NaNs on write). Use Float.isNaN(value) for NaN checks rather than equality. As with all primitive accessors, isNull() must be checked before getFloat() since FLOAT16 columns can be optional.

Legacy INT96 timestamps

Parquet files written by older versions of Apache Spark and Hive store timestamps in the deprecated INT96 physical type without a TIMESTAMP logical type annotation. getTimestamp detects INT96 automatically and decodes it to an Instant; no caller-side handling is required.

Legacy converted-type annotations

Writers predating the modern logical-type union (older parquet-mr / Hive / Impala / Spark) annotate primitive columns with only a legacy converted_type and no logicalType. Hardwood promotes each one to its logical type, so the column decodes through the normal typed accessor with no caller-side opt-in:

converted_type Accessor Java type
UTF8 getString String
JSON getString String
ENUM, BSON getBinary byte[]
DATE getDate LocalDate
DECIMAL getDecimal BigDecimal
TIME_MILLIS, TIME_MICROS getTime LocalTime
TIMESTAMP_MILLIS, TIMESTAMP_MICROS getTimestamp Instant
INT_8, INT_16, INT_32, INT_64 getValue Byte / Short / Integer / Long
UINT_8, UINT_16, UINT_32, UINT_64 getValue Integer / Long (raw two's-complement bit pattern)
INTERVAL getInterval PqInterval

TIME_* columns decode to a UTC-normalized LocalTime time-of-day and TIMESTAMP_* columns to a UTC-normalized Instant, matching the parquet-format backward-compatibility rule for these annotations. Unsigned columns preserve the stored bit pattern — reinterpret with Integer.toUnsignedLong / Long.toUnsignedString for the unsigned magnitude. When a file carries both a converted_type and a modern logicalType, the logicalType takes precedence.

The MAP group annotation has a legacy form too: some older parquet-mr / Hive / Impala files annotate only the inner repeated key_value group with MAP_KEY_VALUE and leave the outer group unannotated. Hardwood recognizes this form as a map, so getMap returns a PqMap and the group's SchemaNode.GroupNode.isMap() reports true (with logicalType() returning a MapType), exactly as for the modern MAP annotation.