polars-random
Generate random numbers and statistical distributions natively in Polars DataFrames — a NumPy-style random API exposed as first-class Polars expressions, with reproducible seeds and per-row parameters.
polars-random is a Rust plugin offering four equivalent entry points so it composes naturally with the rest of polars:
| Use case | API |
|---|---|
| "Add a column of random draws to a DataFrame" | df.random.<dist>(...) |
| Same thing, lazy | lf.random.<dist>(...) |
Inside any expression / with_columns / select |
pl.col("x").random.<dist>(...) or polars_random.<dist>(...) |
| Just give me N values as a Series | polars_random.<dist>(..., size=N) |
import polars as pl
import polars_random as pr # registers DataFrame/LazyFrame/Expr namespaces
# 1. eager Series
pr.normal(mean=0.0, std=1.0, size=5, seed=42)
# 2. as a polars expression in any context
df = pl.DataFrame({"id": range(5)})
df.with_columns(noise=pr.normal(mean=0.0, std=1.0, seed=42))
df.with_columns(noise=pl.col("id").random.normal(seed=42))
# 3. as a DataFrame method (returns a new DataFrame with the column appended)
df.random.normal(mean=0.0, std=1.0, seed=42, name="noise")
# 4. inside a lazy pipeline
df.lazy().random.normal(seed=42, name="noise").collect()
Available distributions: rand / uniform, normal, binomial, randint. Every parameter (low, high, mean, std, n, p) accepts a Python scalar, a column name ("my_col"), or any pl.Expr. Nulls in column-valued parameters propagate as null in the output (no panic).
Why polars-random?
- Polars-native — outputs are regular Polars columns, composable with the rest of your pipeline (no NumPy round-trips).
- Per-row parameters —
mean,std,low,high,n,pcan come from other columns, so each row can be drawn from a different distribution. - Reproducible — pass
seed=...for deterministic draws. - Fast — implemented in Rust on top of
rand/rand_distr.
Installation
uv add polars-random
poetry add polars-random
pip install polars-random
How it works (mental model)
Every distribution is a single underlying Rust kernel exposed in four ways:
| Form | Returns |
|---|---|
polars_random.<dist>(..., size=N) |
pl.Series |
polars_random.<dist>(...) |
pl.Expr |
pl.col("x").random.<dist>(...) |
pl.Expr |
df.random.<dist>(...) / lf.random.<dist>(...) |
pl.DataFrame / pl.LazyFrame (column appended) |
- Each parameter accepts a Python literal, a column name as a string, or a Polars expression (
pl.col(...), arithmetic, etc.). Within a single call, the distribution's parameters must be the same kind — either all literals or all expressions/column-names (no mixing). seedmakes the draw reproducible. Omit it for entropy-based randomness.- For DataFrame/LazyFrame methods,
nameis the new column's name. Defaults to the distribution name ("rand","normal","binomial","randint"). - Nulls in column-valued parameters become null in the output.
Coming from NumPy?
| NumPy | polars-random |
|---|---|
np.random.uniform(low, high, size=n) |
pr.rand(low=low, high=high, size=n) |
np.random.normal(mean, std, size=n) |
pr.normal(mean=mean, std=std, size=n) |
np.random.binomial(n, p, size=size) |
pr.binomial(n=n, p=p, size=size) |
np.random.randint(low, high, size=n) |
pr.randint(low=low, high=high, size=n) |
np.random.seed(42) (global) |
seed=42 per call |
| Different params per row (loop / vectorize manually) | Pass a column name or pl.col(...) as the parameter |
When used as a DataFrame/LazyFrame method or via the pl.col(...).random namespace, the output length is taken from the parent — no size= needed. Use size=N only with the top-level functions for "give me N values without a frame."
Distributions
rand (uniform) · also aliased as uniform
| Parameter | Type | Default | Description |
|---|---|---|---|
low |
float, str, pl.Expr, or None |
0.0 |
Lower bound (inclusive). |
high |
float, str, pl.Expr, or None |
1.0 |
Upper bound (exclusive). |
seed |
int or None |
None |
Reproducible draws. |
import polars as pl
import polars_random as pr
df = pl.DataFrame({
"custom_low": [0.0, 10.0, 100.0],
"custom_high": [1.0, 20.0, 200.0],
})
# DataFrame-method form
(
df
.random.rand(low=1_000., high=2_000., seed=42, name="rand_scalar")
.random.rand(seed=42, name="rand_default") # default range [0, 1)
.random.rand(low=pl.col("custom_low"), high=pl.col("custom_high"), seed=42, name="rand_expr")
.random.rand(low="custom_low", high="custom_high", seed=42, name="rand_str")
)
# Top-level / expression forms
pr.rand(low=0, high=1, size=100, seed=42) # Series of 100
df.with_columns(r=pr.rand(low=0, high=1, seed=42)) # uses df's height
df.with_columns(r=pl.col("custom_low").random.rand(seed=42)) # anchored to a column
normal
| Parameter | Type | Default | Description |
|---|---|---|---|
mean |
float, str, pl.Expr, or None |
0.0 |
Mean of the normal distribution. |
std |
float, str, pl.Expr, or None |
1.0 |
Standard deviation (must be > 0). |
seed |
int or None |
None |
Reproducible draws. |
df = pl.DataFrame({
"custom_mean": [0.0, 5.0, -3.0],
"custom_std": [1.0, 2.0, 0.5],
})
(
df
.random.normal(mean=3., std=2., seed=42, name="normal_scalar")
.random.normal(seed=42, name="normal_default")
.random.normal(mean=pl.col("custom_mean"), std=pl.col("custom_std"), seed=42, name="normal_expr")
.random.normal(mean="custom_mean", std="custom_std", seed=42, name="normal_str")
)
binomial
| Parameter | Type | Default | Description |
|---|---|---|---|
n |
int, str, or pl.Expr |
(required) | Number of trials. |
p |
float, str, or pl.Expr |
(required) | Probability of success on each trial (0 ≤ p ≤ 1). |
seed |
int or None |
None |
Reproducible draws. |
df = pl.DataFrame({
"n": [10, 50, 100],
"p": [0.1, 0.5, 0.9],
})
(
df
.random.binomial(n=100, p=.5, seed=42, name="binomial_scalar")
.random.binomial(n=pl.col("n"), p=pl.col("p"), seed=42, name="binomial_expr")
.random.binomial(n="n", p="p", seed=42, name="binomial_str")
)
randint
Uniform random integers in [low, high) — high is exclusive (matches numpy.random.randint).
| Parameter | Type | Default | Description |
|---|---|---|---|
low |
int, str, or pl.Expr |
0 |
Lower bound (inclusive). |
high |
int, str, or pl.Expr |
2 |
Upper bound (exclusive). |
seed |
int or None |
None |
Reproducible draws. |
df.random.randint(low=0, high=10, seed=42) # one column, scalar bounds
df.random.randint(low="lo", high="hi", seed=42) # per-row bounds via columns
pr.randint(low=0, high=10, size=1_000, seed=42) # Series of 1000 ints
API reference
See the API Reference for the full signatures and docstrings.