Aggregations & Window Functions
Aggregation Functions (11)#
Aggregate functions collapse multiple rows to a single value. Use inside RETURN or WITH — ArcFlow infers implicit GROUP BY from the surrounding non-aggregated columns.
| Function | Syntax | Description |
|---|---|---|
count | count(expr) | Number of non-null values |
count(DISTINCT) | count(DISTINCT expr) | Distinct non-null values |
sum | sum(expr) | Sum of numeric values |
avg | avg(expr) | Arithmetic mean |
min | min(expr) | Minimum value |
max | max(expr) | Maximum value |
collect | collect(expr) | Collect values into a list |
percentile | percentile(expr, p) | Percentile (p: 0.0–1.0) |
stdev | stdev(expr) | Sample standard deviation |
variance | variance(expr) | Sample variance |
median | median(expr) | Median (50th percentile) |
MATCH (n:Person)
RETURN n.department AS dept,
count(n) AS headcount,
avg(n.salary) AS avg_salary,
percentile(n.salary, 0.95) AS p95_salary,
stdev(n.salary) AS salary_std-- Collect names per city
MATCH (n:Person)
RETURN n.city, collect(n.name) AS residents-- Count distinct values
MATCH (n:Transaction)
RETURN count(DISTINCT n.account_id) AS unique_accountsWindow Functions (7)#
Window functions compute a value over a moving window of rows. Unlike aggregation functions, they do not collapse rows — each row gets its own computed value.
Syntax:
function() OVER (
[PARTITION BY column, ...]
[ORDER BY column [ASC|DESC], ...]
[ROWS BETWEEN N PRECEDING AND CURRENT ROW]
)| Function | Description |
|---|---|
lag(expr, N) | Value from N rows before the current row within the partition |
lead(expr, N) | Value from N rows after the current row within the partition |
row_number() | Sequential integer position within each partition (1-based) |
percent_rank() | Relative rank as a fraction (0.0–1.0) |
stddev_pop(expr) | Population standard deviation over the window frame |
dense_rank() | Rank without gaps (tied rows share a rank) |
LAG — look back#
MATCH (n:DailyBar)
RETURN n.symbol, n.date, n.close,
lag(n.close, 1) OVER (PARTITION BY n.symbol ORDER BY n.date) AS prev_closePrevious day's price, grouped per symbol and ordered by date.
LEAD — look forward#
MATCH (n:DailyBar)
RETURN n.symbol, n.date, n.close,
lead(n.close, 5) OVER (PARTITION BY n.symbol ORDER BY n.date) AS next_5d_closeRolling statistics#
MATCH (n:DailyBar)
RETURN n.symbol, n.date, n.close,
avg(n.close) OVER (
PARTITION BY n.symbol ORDER BY n.date
ROWS BETWEEN 199 PRECEDING AND CURRENT ROW
) AS sma_200,
stddev_pop(n.close) OVER (
PARTITION BY n.symbol ORDER BY n.date
ROWS BETWEEN 59 PRECEDING AND CURRENT ROW
) AS vol_60d200-day simple moving average and 60-day volatility, per symbol.
ROW_NUMBER — position within partition#
MATCH (n:Product)
RETURN n.category, n.name, n.revenue,
row_number() OVER (PARTITION BY n.category ORDER BY n.revenue DESC) AS rank_in_categoryRank products within each category by revenue.
PERCENT_RANK — cross-sectional ranking#
MATCH (n:DailyBar)
RETURN n.symbol, n.date,
percent_rank() OVER (PARTITION BY n.date ORDER BY n.close) AS cross_sectional_rankRank each symbol by its closing price on each date. Returns 0.0 (lowest) to 1.0 (highest).
Live Window Views#
Window functions work inside CREATE LIVE VIEW:
-- Maintained incrementally as new bars arrive
CREATE LIVE VIEW rolling_volatility AS
MATCH (n:DailyBar)
RETURN n.symbol, n.date,
stddev_pop(n.close) OVER (
PARTITION BY n.symbol ORDER BY n.date
ROWS BETWEEN 59 PRECEDING AND CURRENT ROW
) AS vol_60dEach new DailyBar triggers a delta update — only the affected partition is recomputed, not the full result set.
HAVING — post-aggregation filter#
HAVING filters after aggregation completes, unlike WHERE which filters before:
MATCH (n:Person)
WITH n.department AS dept, count(n) AS headcount, avg(n.salary) AS avg_sal
HAVING headcount > 5 AND avg_sal > 80000
RETURN dept, headcount, avg_salSee Also#
- Window Functions in Live Views — incremental maintenance
- Algorithms — graph algorithms (PageRank, Louvain, etc.)