Class ColumnReader

java.lang.Object
dev.hardwood.reader.ColumnReader
All Implemented Interfaces:
AutoCloseable

public class ColumnReader extends Object implements AutoCloseable
Batch-oriented column reader for reading a single column across all row groups.

Provides typed primitive arrays for zero-boxing access. For nested/repeated columns, multi-level offsets and per-level null bitmaps enable efficient traversal without per-row virtual dispatch.

Flat column usage:

try (ColumnReader reader = fileReader.createColumnReader("fare_amount")) {
    while (reader.nextBatch()) {
        int count = reader.getRecordCount();
        double[] values = reader.getDoubles();
        BitSet nulls = reader.getElementNulls();
        for (int i = 0; i < count; i++) {
            if (nulls == null || !nulls.get(i)) sum += values[i];
        }
    }
}

Simple list usage (nestingDepth=1):

try (ColumnReader reader = fileReader.createColumnReader("fare_components")) {
    while (reader.nextBatch()) {
        int recordCount = reader.getRecordCount();
        int valueCount = reader.getValueCount();
        double[] values = reader.getDoubles();
        int[] offsets = reader.getOffsets(0);
        BitSet recordNulls = reader.getLevelNulls(0);
        BitSet elementNulls = reader.getElementNulls();
        for (int r = 0; r < recordCount; r++) {
            if (recordNulls != null && recordNulls.get(r)) continue;
            int start = offsets[r];
            int end = (r + 1 < recordCount) ? offsets[r + 1] : valueCount;
            for (int i = start; i < end; i++) {
                if (elementNulls == null || !elementNulls.get(i)) sum += values[i];
            }
        }
    }
}
  • Method Details

    • nextBatch

      public boolean nextBatch()
      Advance to the next batch.
      Returns:
      true if a batch is available, false if exhausted
    • getRecordCount

      public int getRecordCount()
      Number of top-level records in the current batch.
    • getValueCount

      public int getValueCount()
      Total number of leaf values in the current batch. For flat columns, this equals getRecordCount().
    • getInts

      public int[] getInts()
    • getLongs

      public long[] getLongs()
    • getFloats

      public float[] getFloats()
    • getDoubles

      public double[] getDoubles()
    • getBooleans

      public boolean[] getBooleans()
    • getBinaries

      public byte[][] getBinaries()
    • getStrings

      public String[] getStrings()
      String values for STRING/JSON/BSON logical type columns. Converts the underlying byte arrays to UTF-8 strings. Null values are represented as null entries in the array.
      Returns:
      String array with converted values
      Throws:
      IllegalStateException - if the column is not a BYTE_ARRAY type
    • getElementNulls

      public BitSet getElementNulls()
      Null bitmap over leaf values. For flat columns this doubles as record-level nulls.
      Returns:
      BitSet where set bits indicate null values, or null if all elements are required
    • getLevelNulls

      public BitSet getLevelNulls(int level)
      Null bitmap at a given nesting level. Only valid for nested columns (0 <= level < getNestingDepth()).
      Parameters:
      level - the nesting level (0 = outermost group)
      Returns:
      BitSet where set bits indicate null groups, or null if that level is required
    • getNestingDepth

      public int getNestingDepth()
      Nesting depth: 0 for flat, maxRepetitionLevel for nested.
    • getOffsets

      public int[] getOffsets(int level)
      Offset array for a given nesting level. Maps items at level k to positions in the next level (or leaf values for the innermost level).
      Parameters:
      level - the nesting level (0-indexed)
      Returns:
      offset array for the given level
    • getColumnSchema

      public ColumnSchema getColumnSchema()
    • close

      public void close()
      Specified by:
      close in interface AutoCloseable