Vela

Dataframes & Data Science

Current native dataframe helpers plus the planned typed Arrow-backed dataframe design.

Stage-0 includes a native DataFrame value with CSV input and dataframe helper functions. The broader Arrow-backed, row-typed dataframe engine is a planned compiler/runtime design, not fully implemented in the seed.

1. Current Stage-0 API

The source-backed guide lists these helpers:

df = read_csv("examples/data/grades.csv")

df_nrows(df)
df_cols(df)
df_col(df, "score")
df_rows(df)
df_from([{name: "Ava", score: 96}])
df_select(df, ["student", "score"])
df_sort(df, "score")
df_group_sum(df, "subject", "score")
df_describe(df)
df_join(left, right, "region")

CSV input does per-cell type inference: integer-looking cells become Int, decimal cells become Float, and the rest remain String.

2. Filter rows and rebuild a frame

df_rows exposes rows as records, so ordinary filter and map compose with df_from.

# Source: .sources/vela/examples/codex_df_grades.vela
df = read_csv("examples/data/grades.csv")
print(df_describe(df))

top = df_rows(df) |> filter(r => r.score >= 90) |> df_from
print(df_sort(top, "score"))

by_subject = df_group_sum(df, "subject", "score")
print(df_sort(by_subject, "sum_score"))

3. Derived columns

Use record literals to create a new row shape, then convert back with df_from.

# Source: .sources/vela/docs/DATAFRAMES.md, derived from codex_df_weather.vela
with_f = df_rows(df) |> map(r => {
    date: r.date,
    city: r.city,
    temp_c: r.temp_c,
    temp_f: (r.temp_c * 9.0 / 5.0) + 32.0,
    rain_mm: r.rain_mm
}) |> df_from

Use float literals such as 9.0 when you need real arithmetic. Integer division truncates.

4. Planned typed dataframe model

design/DATAFRAME.md describes the target design:

Column<T> is planned to use Apache Arrow array layout.
DataFrame<{age: Int, income: F64?, region: String}> carries a schema in the type system.
Bare column names inside dataframe verbs are planned to resolve as typed column references.
Joins, selections, and mutations are planned to compute output row types.
Query optimization, operator fusion, SIMD lowering, and zero-copy Arrow interop are planned compiler/runtime work.

Treat this as roadmap material. If a page shows read_csv, df_rows, df_sort, or df_group_sum, it describes current Stage-0 behavior. If it mentions static row types, Arrow-native columns, or compile-time query planning, it describes the planned compiler.