| Title: | Evaluate the Discrepancy between Two Real-Time Updated Probabilistic Forecasts |
|---|---|
| Description: | Methods from Yeh, Rice, and Dubin (2022, doi:10.1080/00031305.2021.1967781; arXiv:2010.00781) for comparing two continuously updated probabilistic forecasts under squared (Brier) loss: pointwise loss and variance, a global delta test (Monte Carlo p-values), simulation designs, and a naive pointwise band plot. |
| Authors: | Chi-Kuang Yeh [aut, cre], Gregory Rice [ctb, ths], Joel A. Dubin [ctb, ths] |
| Maintainer: | Chi-Kuang Yeh <[email protected]> |
| License: | GPL-3 |
| Version: | 0.0.0.9000 |
| Built: | 2026-05-29 09:09:46 UTC |
| Source: | https://github.com/chikuang/rtforecasteval |
Methods from Yeh, Rice, and Dubin (2022, doi:10.1080/00031305.2021.1967781; arXiv:2010.00781) for comparing two continuously updated probabilistic forecasts under squared (Brier) loss: pointwise loss and variance, a global delta test (Monte Carlo p-values), simulation designs, and a naive pointwise band plot.
Maintainer: Chi-Kuang Yeh [email protected]
Other contributors:
Gregory Rice [contributor, thesis advisor]
Joel A. Dubin [contributor, thesis advisor]
Yeh, C.-K., Rice, G., & Dubin, J. A. (2022). Evaluating real-time probabilistic forecasts with application to National Basketball Association outcome prediction. The American Statistician, 76, 214–223. doi:10.1080/00031305.2021.1967781
Preprint: https://arxiv.org/abs/2010.00781 (PDF: https://arxiv.org/pdf/2010.00781.pdf).
Useful links:
Report bugs at https://github.com/chikuang/rtForecastEval/issues
Forms the empirical covariance matrix of either centered or non-centered
forecast differences across time points (see diff_cent vs diff_non_cent
in the paper), and returns the leading eigenvalues scaled by one over
nsamp, for use with calc_pval(). This matches the construction in the
replication utility.R (calc_p_val / eigs_sym).
calc_eig(df, n_eig = 10, ngame, nsamp, grid = "grid", cent = FALSE)calc_eig(df, n_eig = 10, ngame, nsamp, grid = "grid", cent = FALSE)
df |
Data frame containing |
n_eig |
Number of leading eigenvalues to extract (dimension D in the paper). |
ngame |
Number of independent games (used for scaling inner products). |
nsamp |
Number of time grid points. |
grid |
Name of the time grid column. |
cent |
If |
A numeric vector of length n_eig of eigenvalues (descending).
Yeh, C.-K., Rice, G., & Dubin, J. A. (2022). The American Statistician, 76, 214–223. doi:10.1080/00031305.2021.1967781
Preprint: https://arxiv.org/abs/2010.00781
For each time grid point, estimates the pointwise difference in expected
squared (Brier) loss between two probabilistic forecasts, and the variance
factor used for inference (Yeh, Rice, and Dubin, 2022). With default loss
L(x,y) = (x-y)^2, the function computes the mean over games of
L(Y, phat_1) - L(Y, phat_2) at each time, and the influence-style terms
si used to form sigma-squared over four, matching the paper replication
code.
calc_L_s2( df, pA = "phat_1", pB = "phat_2", Y = "Y", grid = "grid", L = function(x, y) (x - y)^2 )calc_L_s2( df, pA = "phat_1", pB = "phat_2", Y = "Y", grid = "grid", L = function(x, y) (x - y)^2 )
df |
A data frame with one row per (game × time) in long format. |
pA, pB
|
Column names for the two forecast vectors (probabilities between 0 and 1). |
Y |
Column name for the binary outcome (0/1). |
grid |
Column name for the normalized time grid between 0 and 1. |
L |
Loss function; default is squared error, \(L(x,y)=(x-y)^2\). |
A tibble with one row per grid value and columns L (mean loss
difference), sigma2 (variance factor), and n (number of rows).
Yeh, C.-K., Rice, G., & Dubin, J. A. (2022). The American Statistician, 76, 214–223. doi:10.1080/00031305.2021.1967781
Preprint: https://arxiv.org/abs/2010.00781
Given leading eigenvalues of the covariance operator used in the delta test
(from calc_eig()) and the observed statistic \(Z\) (from calc_Z()),
draws a Monte Carlo sample from the weighted sum of chi-square(1) variables
and returns the right-tail p-value and quantiles of the null distribution
(as in the paper’s replication code).
calc_pval(Z, eig, quan, n_MC = 5000)calc_pval(Z, eig, quan, n_MC = 5000)
Z |
Observed test statistic from |
eig |
Numeric vector of leading eigenvalues (length |
quan |
Probabilities for which to return quantiles of the simulated null
statistic (e.g. |
n_MC |
Number of Monte Carlo draws (default |
A list with p_val (one-sided p-value) and quantile (named
vector of quantiles at quan).
Yeh, C.-K., Rice, G., & Dubin, J. A. (2022). The American Statistician, 76, 214–223. doi:10.1080/00031305.2021.1967781
Preprint: https://arxiv.org/abs/2010.00781
Computes the global test statistic Z for comparing two real-time forecasters under squared loss: Z equals (n_game / n_samp) times the sum over grid points of the squared pointwise mean loss differences, matching the implementation in the paper replication code (utility.R).
calc_Z( df, pA = "phat_1", pB = "phat_2", Y = "Y", grid = "grid", nsamp, ngame, L = function(x, y) (x - y)^2 )calc_Z( df, pA = "phat_1", pB = "phat_2", Y = "Y", grid = "grid", nsamp, ngame, L = function(x, y) (x - y)^2 )
df |
Data frame containing forecasts, outcomes, and |
pA, pB
|
Names of columns with the two probability forecasts. |
Y |
Name of the binary outcome column. |
grid |
Name of the column with normalized times between 0 and 1. |
nsamp |
Number of distinct time points (length of the grid). |
ngame |
Number of independent replicates (e.g. games). |
L |
Loss function; default is squared error. |
A single numeric value of the test statistic Z.
Yeh, C.-K., Rice, G., & Dubin, J. A. (2022). The American Statistician, 76, 214–223. doi:10.1080/00031305.2021.1967781
Preprint: https://arxiv.org/abs/2010.00781
Generates replicated “games” and two competing forecast trajectories on a
uniform grid from 0 to 1, following the simulation constructions used in
Yeh, Rice, and Dubin (2022) (see sanity_generator / simulation scripts in
the replication repository). Ornstein–Uhlenbeck (type = "OU") or Brownian
motion (type = "BM") noise can drive the latent processes; outputs include
binary outcomes Y and forecast probabilities phat_A, phat_B suitable
for calc_Z(), calc_eig(), etc.
df_gen(N, Ngame, type = c("OU", "BM"), a = 1, b = 0.27)df_gen(N, Ngame, type = c("OU", "BM"), a = 1, b = 0.27)
N |
Number of interior time steps (grid has |
Ngame |
Number of independent replicates (games). |
type |
|
a, b
|
Constants controlling the drift of the latent trajectory (see replication code). |
Requires the sde package (Suggests) for Brownian paths.
A tibble with columns including game, grid, Y, phat_A,
phat_B, and latent W1, W2, etc., in long format.
Yeh, C.-K., Rice, G., & Dubin, J. A. (2022). The American Statistician, 76, 214–223. doi:10.1080/00031305.2021.1967781
Preprint: https://arxiv.org/abs/2010.00781
Maps a vector of probabilities prob observed at times grid onto an
equally spaced grid of length length(grid) between 0 and 1 using linear
interpolation (stats::approx). Useful when aligning ESPN or model outputs
to the common grid used in the NBA analysis (see replication pre_process /
utility.R).
lin_interp(prob, grid, outcome)lin_interp(prob, grid, outcome)
prob |
Numeric vector of forecast probabilities. |
grid |
Numeric vector of time points (same length as |
outcome |
Scalar binary outcome \(Y\) to attach to every interpolated row. |
A tibble with columns phat_approx, grid, and Y.
Yeh, C.-K., Rice, G., & Dubin, J. A. (2022). The American Statistician, 76, 214–223. doi:10.1080/00031305.2021.1967781
Preprint: https://arxiv.org/abs/2010.00781
Plots the pointwise mean loss difference from calc_L_s2() (column L)
against time, with a normal-theory band using sigma2 and the point sample
size n. This is the naive \(t\)-style band from the paper; global
inference uses calc_Z() / calc_pval() instead.
plot_pcb( df, grid = "grid", L = "L", var = "sigma2", title = "Pointwise mean loss difference (A vs B)", subtitle = "Naive normal band (95%); use calc_pval() for global test", caption = NULL, xlab = "Normalized time (grid)", ylab = "Mean squared loss difference" )plot_pcb( df, grid = "grid", L = "L", var = "sigma2", title = "Pointwise mean loss difference (A vs B)", subtitle = "Naive normal band (95%); use calc_pval() for global test", caption = NULL, xlab = "Normalized time (grid)", ylab = "Mean squared loss difference" )
df |
Output of |
grid |
Name of the x-axis column (default |
L |
Name of the pointwise mean difference column (default |
var |
Name of the variance column (default |
title, subtitle, caption
|
Passed to |
xlab, ylab
|
Axis labels (see |
This plot summarizes relative skill (loss difference) over time — not a
classical calibration plot (forecast vs observed event rate). For a
simple reliability-style view from the same long-format data, see the package
vignette example that bins phat and plots mean Y vs mean forecast.
A ggplot2 object.
Yeh, C.-K., Rice, G., & Dubin, J. A. (2022). The American Statistician, 76, 214–223. doi:10.1080/00031305.2021.1967781
Preprint: https://arxiv.org/abs/2010.00781