Skip to content

polars-random

PyPI version Python versions License: MIT CI

Generate random numbers and statistical distributions natively in Polars DataFrames — a NumPy-style random API exposed as first-class Polars expressions, with reproducible seeds and per-row parameters.

polars-random is a Rust plugin offering four equivalent entry points so it composes naturally with the rest of polars:

Use case API
"Add a column of random draws to a DataFrame" df.random.<dist>(...)
Same thing, lazy lf.random.<dist>(...)
Inside any expression / with_columns / select pl.col("x").random.<dist>(...)  or  polars_random.<dist>(...)
Just give me N values as a Series polars_random.<dist>(..., size=N)
import polars as pl
import polars_random as pr  # registers DataFrame/LazyFrame/Expr namespaces

# 1. eager Series
pr.normal(mean=0.0, std=1.0, size=5, seed=42)

# 2. as a polars expression in any context
df = pl.DataFrame({"id": range(5)})
df.with_columns(noise=pr.normal(mean=0.0, std=1.0, seed=42))
df.with_columns(noise=pl.col("id").random.normal(seed=42))

# 3. as a DataFrame method (returns a new DataFrame with the column appended)
df.random.normal(mean=0.0, std=1.0, seed=42, name="noise")

# 4. inside a lazy pipeline
df.lazy().random.normal(seed=42, name="noise").collect()

Available distributions: rand / uniform, normal, binomial, randint. Every parameter (low, high, mean, std, n, p) accepts a Python scalar, a column name ("my_col"), or any pl.Expr. Nulls in column-valued parameters propagate as null in the output (no panic).

Why polars-random?

  • Polars-native — outputs are regular Polars columns, composable with the rest of your pipeline (no NumPy round-trips).
  • Per-row parametersmean, std, low, high, n, p can come from other columns, so each row can be drawn from a different distribution.
  • Reproducible — pass seed=... for deterministic draws.
  • Fast — implemented in Rust on top of rand / rand_distr.

Installation

uv add polars-random
poetry add polars-random
pip install polars-random

How it works (mental model)

Every distribution is a single underlying Rust kernel exposed in four ways:

Form Returns
polars_random.<dist>(..., size=N) pl.Series
polars_random.<dist>(...) pl.Expr
pl.col("x").random.<dist>(...) pl.Expr
df.random.<dist>(...) / lf.random.<dist>(...) pl.DataFrame / pl.LazyFrame (column appended)
  • Each parameter accepts a Python literal, a column name as a string, or a Polars expression (pl.col(...), arithmetic, etc.). Within a single call, the distribution's parameters must be the same kind — either all literals or all expressions/column-names (no mixing).
  • seed makes the draw reproducible. Omit it for entropy-based randomness.
  • For DataFrame/LazyFrame methods, name is the new column's name. Defaults to the distribution name ("rand", "normal", "binomial", "randint").
  • Nulls in column-valued parameters become null in the output.

Coming from NumPy?

NumPy polars-random
np.random.uniform(low, high, size=n) pr.rand(low=low, high=high, size=n)
np.random.normal(mean, std, size=n) pr.normal(mean=mean, std=std, size=n)
np.random.binomial(n, p, size=size) pr.binomial(n=n, p=p, size=size)
np.random.randint(low, high, size=n) pr.randint(low=low, high=high, size=n)
np.random.seed(42) (global) seed=42 per call
Different params per row (loop / vectorize manually) Pass a column name or pl.col(...) as the parameter

When used as a DataFrame/LazyFrame method or via the pl.col(...).random namespace, the output length is taken from the parent — no size= needed. Use size=N only with the top-level functions for "give me N values without a frame."

Distributions

rand (uniform) · also aliased as uniform

Parameter Type Default Description
low float, str, pl.Expr, or None 0.0 Lower bound (inclusive).
high float, str, pl.Expr, or None 1.0 Upper bound (exclusive).
seed int or None None Reproducible draws.
import polars as pl
import polars_random as pr

df = pl.DataFrame({
    "custom_low":  [0.0, 10.0, 100.0],
    "custom_high": [1.0, 20.0, 200.0],
})

# DataFrame-method form
(
    df
    .random.rand(low=1_000., high=2_000., seed=42, name="rand_scalar")
    .random.rand(seed=42, name="rand_default")  # default range [0, 1)
    .random.rand(low=pl.col("custom_low"), high=pl.col("custom_high"), seed=42, name="rand_expr")
    .random.rand(low="custom_low", high="custom_high", seed=42, name="rand_str")
)

# Top-level / expression forms
pr.rand(low=0, high=1, size=100, seed=42)                    # Series of 100
df.with_columns(r=pr.rand(low=0, high=1, seed=42))           # uses df's height
df.with_columns(r=pl.col("custom_low").random.rand(seed=42)) # anchored to a column

normal

Parameter Type Default Description
mean float, str, pl.Expr, or None 0.0 Mean of the normal distribution.
std float, str, pl.Expr, or None 1.0 Standard deviation (must be > 0).
seed int or None None Reproducible draws.
df = pl.DataFrame({
    "custom_mean": [0.0, 5.0, -3.0],
    "custom_std":  [1.0, 2.0, 0.5],
})

(
    df
    .random.normal(mean=3., std=2., seed=42, name="normal_scalar")
    .random.normal(seed=42, name="normal_default")
    .random.normal(mean=pl.col("custom_mean"), std=pl.col("custom_std"), seed=42, name="normal_expr")
    .random.normal(mean="custom_mean", std="custom_std", seed=42, name="normal_str")
)

binomial

Parameter Type Default Description
n int, str, or pl.Expr (required) Number of trials.
p float, str, or pl.Expr (required) Probability of success on each trial (0 ≤ p ≤ 1).
seed int or None None Reproducible draws.
df = pl.DataFrame({
    "n": [10, 50, 100],
    "p": [0.1, 0.5, 0.9],
})

(
    df
    .random.binomial(n=100, p=.5, seed=42, name="binomial_scalar")
    .random.binomial(n=pl.col("n"), p=pl.col("p"), seed=42, name="binomial_expr")
    .random.binomial(n="n", p="p", seed=42, name="binomial_str")
)

randint

Uniform random integers in [low, high)high is exclusive (matches numpy.random.randint).

Parameter Type Default Description
low int, str, or pl.Expr 0 Lower bound (inclusive).
high int, str, or pl.Expr 2 Upper bound (exclusive).
seed int or None None Reproducible draws.
df.random.randint(low=0, high=10, seed=42)             # one column, scalar bounds
df.random.randint(low="lo", high="hi", seed=42)        # per-row bounds via columns
pr.randint(low=0, high=10, size=1_000, seed=42)        # Series of 1000 ints

API reference

See the API Reference for the full signatures and docstrings.