Class ProjectedSchema

java.lang.Object
dev.hardwood.schema.ProjectedSchema

public final class ProjectedSchema extends Object
Represents a projected view of a Parquet schema containing only selected columns.

This class handles the mapping between projected column indices (dense, 0-based) and original column indices, allowing the reader to skip I/O, decoding, and memory allocation for non-projected columns.

For nested schemas, projecting a parent group includes all its child columns. For example, if "address" is a struct containing "city" and "street", projecting "address" includes both child columns.

  • Method Details

    • create

      public static ProjectedSchema create(FileSchema schema, ColumnProjection projection)
      Creates a projected schema from the given full schema and projection.
      Parameters:
      schema - the original file schema
      projection - the column projection specifying which columns to include
      Returns:
      a projected schema containing only the selected columns
      Throws:
      IllegalArgumentException - if a projected column name is not found in the schema
    • getOriginalSchema

      public FileSchema getOriginalSchema()
      Returns the original file schema.
    • getProjectedColumnCount

      public int getProjectedColumnCount()
      Returns the number of projected columns.
    • toOriginalIndex

      public int toOriginalIndex(int projectedIndex)
      Converts a projected column index to the original column index.
      Parameters:
      projectedIndex - the index in the projected schema (0-based)
      Returns:
      the corresponding index in the original schema
      Throws:
      IndexOutOfBoundsException - if projectedIndex is out of range
    • toProjectedIndex

      public int toProjectedIndex(int originalIndex)
      Converts an original column index to the projected column index.
      Parameters:
      originalIndex - the index in the original schema
      Returns:
      the corresponding index in the projected schema, or -1 if not projected
    • getProjectedColumns

      public List<ColumnSchema> getProjectedColumns()
      Returns the list of projected columns.
    • getProjectedColumn

      public ColumnSchema getProjectedColumn(int projectedIndex)
      Returns the projected column at the given projected index.
    • getProjectedFieldIndices

      public int[] getProjectedFieldIndices()
      Returns the indices of projected top-level fields in the root node's children. This is used by NestedBatchDataView to build a sparse record structure.
    • projectsAll

      public boolean projectsAll()
      Returns true if all columns are projected.