Dataframes & Data Science
Current native dataframe helpers plus the planned typed Arrow-backed dataframe design.
Stage-0 includes a native DataFrame value with CSV input and dataframe helper functions. The broader
Arrow-backed, row-typed dataframe engine is a planned compiler/runtime design, not fully implemented in the seed.
1. Current Stage-0 API
The source-backed guide lists these helpers:
df = read_csv("examples/data/grades.csv")
df_nrows(df)
df_cols(df)
df_col(df, "score")
df_rows(df)
df_from([{name: "Ava", score: 96}])
df_select(df, ["student", "score"])
df_sort(df, "score")
df_group_sum(df, "subject", "score")
df_describe(df)
df_join(left, right, "region")
CSV input does per-cell type inference: integer-looking cells become Int, decimal cells become Float, and the
rest remain String.
2. Filter rows and rebuild a frame
df_rows exposes rows as records, so ordinary filter and map compose with df_from.
# Source: .sources/vela/examples/codex_df_grades.vela
df = read_csv("examples/data/grades.csv")
print(df_describe(df))
top = df_rows(df) |> filter(r => r.score >= 90) |> df_from
print(df_sort(top, "score"))
by_subject = df_group_sum(df, "subject", "score")
print(df_sort(by_subject, "sum_score"))
3. Derived columns
Use record literals to create a new row shape, then convert back with df_from.
# Source: .sources/vela/docs/DATAFRAMES.md, derived from codex_df_weather.vela
with_f = df_rows(df) |> map(r => {
date: r.date,
city: r.city,
temp_c: r.temp_c,
temp_f: (r.temp_c * 9.0 / 5.0) + 32.0,
rain_mm: r.rain_mm
}) |> df_from
Use float literals such as 9.0 when you need real arithmetic. Integer division truncates.
4. Planned typed dataframe model
design/DATAFRAME.md describes the target design:
Column<T>is planned to use Apache Arrow array layout.DataFrame<{age: Int, income: F64?, region: String}>carries a schema in the type system.- Bare column names inside dataframe verbs are planned to resolve as typed column references.
- Joins, selections, and mutations are planned to compute output row types.
- Query optimization, operator fusion, SIMD lowering, and zero-copy Arrow interop are planned compiler/runtime work.
Treat this as roadmap material. If a page shows read_csv, df_rows, df_sort, or df_group_sum, it describes
current Stage-0 behavior. If it mentions static row types, Arrow-native columns, or compile-time query planning, it
describes the planned compiler.