class: center, middle, inverse, title-slide # CRPS-Learning ## Jonathan Berrisch, Florian Ziel ### University of Duisburg-Essen ### 2021-11-17 --- class:middle name: content # Outline - [Motivation](#motivation) - [The Framework of Prediction under Expert Advice](#pred_under_exp_advice) - [The Continious Ranked Probability Score](#crps) - [Optimality of (Pointwise) CRPS-Learning](#crps_optim) - [A Simple Probabilistic Example](#simple_example) - [The Proposed CRPS-Learning Algorithm](#proposed_algorithm) - [Simulation Results](#simulation) - [Possible Extensions](#extensions) - [Application Study](#application) - [Wrap-Up](#conclusion) - [References](#references) --- name: motivation # Motivation .pull-left[ The Idea: - Combine multiple forecasts instead of choosing one - Combination weights may vary over **time**, over the **distribution** or **both** 2 Popular options for combining distributions: - Combining across quantiles (this paper) - Horizontal aggregation, vincentization - Combining across probabilities - Vertical aggregation ] .pull-right[ <div style="position:relative; margin-top:-50px; z-index: 0"> .panelset[ .panel[.panel-name[Time] ![](data:image/png;base64,#index_files/figure-html/unnamed-chunk-1-1.svg)<!-- --> ] .panel[.panel-name[Distribution] ![](data:image/png;base64,#index_files/figure-html/unnamed-chunk-2-1.svg)<!-- --> ]] ] --- name: pred_under_exp_advice # The Framework of Prediction under Expert Advice ### The sequential framework .pull-left[ Each day, `\(t = 1, 2, ... T\)` - The **forecaster** receives predictions `\(\widehat{X}_{t,k}\)` from `\(K\)` **experts** - The **forecaster** assings weights `\(w_{t,k}\)` to each **expert** - The **forecaster** calculates her prediction: `\begin{equation} \widetilde{X}_{t} = \sum_{k=1}^K w_{t,k} \widehat{X}_{t,k}. \label{eq_forecast_def} \end{equation}` - The realization for `\(t\)` is observed ] .pull-left[ - The experts can be institutions, persons, or models - The forecasts can be point-forecasts (i.e., mean or median) or full predictive distributions - We do not need any assumptions concerning the underlying data - <a id='cite-cesa2006prediction'></a><a href='#bib-cesa2006prediction'>Cesa-Bianchi and Lugosi (2006)</a> ] --- name: regret # The Regret Weights are updated sequentially according to the past performance of the `\(K\)` experts.
A loss function `\(\ell\)` is needed (to compute the **cumulative regret** `\(R_{t,k}\)`) `\begin{equation} R_{t,k} = \widetilde{L}_{t} - \widehat{L}_{t,k} = \sum_{i = 1}^t \ell(\widetilde{X}_{i},Y_i) - \ell(\widehat{X}_{i,k},Y_i) \label{eq_regret} \end{equation}` The cumulative regret: - Indicates the predictive accuracy of the expert `\(k\)` until time `\(t\)`. - Measures how much the forecaster *regrets* not having followed the expert's advice Popular loss functions for point forecasting <a id='cite-gneiting2011making'></a><a href='#bib-gneiting2011making'>Gneiting (2011a)</a>: .pull-left[ - `\(\ell_2\)`-loss `\(\ell_2(x, y) = | x -y|^2\)` - optimal for mean prediction ] .pull-right[ - `\(\ell_1\)`-loss `\(\ell_1(x, y) = | x -y|\)` - optimal for median predictions ] --- name: popular_algs # Popular Algorithms and the Risk .pull-left[ ### Popular Aggregation Algorithms #### The naive combination `\begin{equation} w_{t,k}^{\text{Naive}} = \frac{1}{K} \end{equation}` #### The exponentially weighted average forecaster (EWA) `\begin{align} w_{t,k}^{\text{EWA}} & = \frac{e^{\eta R_{t,k}} }{\sum_{k = 1}^K e^{\eta R_{t,k}}} = \frac{e^{-\eta \ell(\widehat{X}_{t,k},Y_t)} w^{\text{EWA}}_{t-1,k} }{\sum_{k = 1}^K e^{-\eta \ell(\widehat{X}_{t,k},Y_t)} w^{\text{EWA}}_{t-1,k} } \label{eq_ewa_general} \end{align}` ] .pull-right[ ### Optimality In stochastic settings, the cumulative Risk should be analyzed <a id='cite-wintenberger2017optimal'></a><a href='#bib-wintenberger2017optimal'>Wintenberger (2017)</a>: `\begin{align} &\underbrace{\widetilde{\mathcal{R}}_t = \sum_{i=1}^t \mathbb{E}[\ell(\widetilde{X}_{i},Y_i)|\mathcal{F}_{i-1}]}_{\text{Cumulative Risk of Forecaster}} \\ &\underbrace{\widehat{\mathcal{R}}_{t,k} = \sum_{i=1}^t \mathbb{E}[\ell(\widehat{X}_{i,k},Y_i)|\mathcal{F}_{i-1}]}_{\text{Cumulative Risk of Experts}} \label{eq_def_cumrisk} \end{align}` ] --- # Optimal Convergence .pull-left[ ### The selection problem `\begin{equation} \frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\min} \right) \stackrel{t\to \infty}{\rightarrow} a \quad \text{with} \quad a \leq 0. \label{eq_opt_select} \end{equation}` The forecaster is asymptotically not worse than the best expert `\(\widehat{\mathcal{R}}_{t,\min}\)`. ### The convex aggregation problem `\begin{equation} \frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\pi} \right) \stackrel{t\to \infty}{\rightarrow} b \quad \text{with} \quad b \leq 0 . \label{eq_opt_conv} \end{equation}` The forecaster is asymptotically not worse than the best convex combination `\(\widehat{X}_{t,\pi}\)` in hindsight (**oracle**). ] .pull-right[ Optimal rates with respect to selection \eqref{eq_opt_select} and convex aggregation \eqref{eq_opt_conv} <a href='#bib-wintenberger2017optimal'>Wintenberger (2017)</a>: `\begin{align} \frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\min} \right) & = \mathcal{O}\left(\frac{\log(K)}{t}\right)\label{eq_optp_select} \end{align}` `\begin{align} \frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\pi} \right) & = \mathcal{O}\left(\sqrt{\frac{\log(K)}{t}}\right) \label{eq_optp_conv} \end{align}` Algorithms can statisfy both \eqref{eq_optp_select} and \eqref{eq_optp_conv} depending on: - The loss function - Regularity conditions on `\(Y_t\)` and `\(\widehat{X}_{t,k}\)` - The weighting scheme ] --- name:crps .pull-left[ ## Optimality EWA satisfies optimal selection convergence \eqref{eq_optp_select} in a deterministic setting if: - Loss `\(\ell\)` is exp-concave - Learning-rate `\(\eta\)` is chosen correctly Those results can be converted to stochastic iid settings <a id='cite-kakade2008generalization'></a><a href='#bib-kakade2008generalization'>Kakade and Tewari (2008)</a> <a id='cite-gaillard2014second'></a><a href='#bib-gaillard2014second'>Gaillard, Stoltz, and Van Erven (2014)</a>. Optimal convex aggregation convergence \eqref{eq_optp_conv} can be satisfied by applying the kernel-trick: `\begin{align} \ell^{\nabla}(x,y) = \ell'(\widetilde{X},y) x \end{align}` `\(\ell'\)` is the subgradient of `\(\ell\)` at forecast combination `\(\widetilde{X}\)`. ] .pull-right[ ## Probabilistic Setting **An appropriate choice:** `\begin{align*} \text{CRPS}(F, y) & = \int_{\mathbb{R}} {(F(x) - \mathbb{1}\{ x > y \})}^2 dx \label{eq_crps} \end{align*}` It's strictly proper <a id='cite-gneiting2007strictly'></a><a href='#bib-gneiting2007strictly'>Gneiting and Raftery (2007)</a>. Using the CRPS, we can calculate time-adaptive weights `\(w_{t,k}\)`. However, what if the experts' performance varies in parts of the distribution?
Utilize this relation: `\begin{align*} \text{CRPS}(F, y) = 2 \int_0^{1} \text{QL}_p(F^{-1}(p), y) \, d p. \label{eq_crps_qs} \end{align*}` ... to combine quantiles of the probabilistic forecasts individually using the quantile-loss QL. ] --- name: crps_optim # CRPS-Learning Optimality
QL is convex, but not exp-concave
Bernstein Online Aggregation (BOA) lets us weaken the exp-concavity condition. It satisfies that there exist a `\(C>0\)` such that for `\(x>0\)` it holds that `\begin{equation} P\left( \frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\pi} \right) \leq C \log(\log(t)) \left(\sqrt{\frac{\log(K)}{t}} + \frac{\log(K)+x}{t}\right) \right) \geq 1-e^{-x} \label{eq_boa_opt_conv} \end{equation}`
Almost optimal w.r.t *convex aggregation* \eqref{eq_optp_conv} <a href='#bib-wintenberger2017optimal'>Wintenberger (2017)</a>. The same algorithm satisfies that there exist a `\(C>0\)` such that for `\(x>0\)` it holds that `\begin{equation} P\left( \frac{1}{t}\left(\widetilde{\mathcal{R}}_t - \widehat{\mathcal{R}}_{t,\min} \right) \leq C\left(\frac{\log(K)+\log(\log(Gt))+ x}{\alpha t}\right)^{\frac{1}{2-\beta}} \right) \geq 1-2e^{-x} \label{eq_boa_opt_select} \end{equation}` if `\(Y_t\)` is bounded, the considered loss `\(\ell\)` is convex `\(G\)`-Lipschitz and weak exp-concave in its first coordinate.
Almost optimal w.r.t *selection* \eqref{eq_optp_select} <a id='cite-gaillard2018efficient'></a><a href='#bib-gaillard2018efficient'>Gaillard and Wintenberger (2018)</a>.
We show that this holds for QL under feasible conditions. --- name: simple_example # A Probabilistic Example .pull-left[ Simple Example: `\begin{align} Y_t & \sim \mathcal{N}(0,\,1) \\ \widehat{X}_{t,1} & \sim \widehat{F}_{1} = \mathcal{N}(-1,\,1) \\ \widehat{X}_{t,2} & \sim \widehat{F}_{2} = \mathcal{N}(3,\,4) \label{eq:dgp_sim1} \end{align}` - True weights vary over `\(p\)` - Figures show the ECDF and calculated weights using `\(T=25\)` realizations - Pointwise solution creates rough estimates - Pointwise is better than constant - Smooth solutions may be better than pointwise ] .pull-right[ <div style="position:relative; margin-top:-50px; z-index: 0"> .panelset[ .panel[.panel-name[CDFs] <img src="data:image/png;base64,#index_files/figure-html/unnamed-chunk-3-1.svg" style="display: block; margin: auto;" /> ] .panel[.panel-name[Weights] <img src="data:image/png;base64,#index_files/figure-html/unnamed-chunk-4-1.svg" style="display: block; margin: auto;" /> ]] ] --- # The B-Smooth Procedure .pull-left[ Represent weights as linear combinations of bounded basis functions: `\begin{align} w_{t,k} = \sum_{l=1}^L \beta_{t,k,l} \varphi_l = \boldsymbol \beta_{t,k}' \boldsymbol \varphi \end{align}` A popular choice are are B-Splines as local basis functions `\(\boldsymbol \beta_{t,k}\)` is calculated using a reduced regret matrix: `\(\underbrace{\boldsymbol r_{t}}_{\text{LxK}} = \frac{L}{P} \underbrace{\boldsymbol B'}_{\text{LxP}} \underbrace{\left({\boldsymbol{QL}}_{\mathcal{P}}^{\nabla}(\widetilde{\boldsymbol X}_{t},Y_t)- {\boldsymbol{QL}}_{\mathcal{P}}^{\nabla}(\widehat{\boldsymbol X}_{t},Y_t)\right)}_{\text{PxK}}\)`
`\(\boldsymbol r_{t}\)` is transformed from PxK to LxK If `\(L = P\)` it holds that `\(\boldsymbol \varphi = \boldsymbol{I}\)` For `\(L = 1\)` we receive constant weights ] .pull-right[ We receive the constant solution for high values of `\(\lambda\)` when setting `\(d=1\)` <center> <img src="weights_kstep.gif"> </center> ] --- # The P-Smooth Procedure .pull-left[ Penalized cubic B-Splines for smoothing weights: Let `\(\varphi=(\varphi_1,\ldots, \varphi_L)\)` be bounded basis functions on `\((0,1)\)` Then we approximate `\(w_{t,k}\)` by `\begin{align} w_{t,k}^{\text{smooth}} = \sum_{l=1}^L \beta_l \varphi_l = \beta'\varphi \end{align}` with parameter vector `\(\beta\)`. The latter is estimated penalized `\(L_2\)`-smoothing which minimizes `\begin{equation} \| w_{t,k} - \beta' \varphi \|^2_2 + \lambda \| \mathcal{D}^{d} (\beta' \varphi) \|^2_2 \label{eq_function_smooth} \end{equation}` with differential operator `\(\mathcal{D}\)` Computation is easy, since we have an analytical solution ] .pull-right[ We receive the constant solution for high values of `\(\lambda\)` when setting `\(d=1\)` <center> <img src="weights_lambda.gif"> </center> ] --- name:proposed_algorithm # The Proposed CRPS-Learning Algorithm .pull-left-3[ .font90[ **Initialization:** Array of expert predicitons: `\(\widehat{X}_{t,p,p}\)` Vector of Prediction targets: `\(Y_t\)` Starting Weights: `\(\boldsymbol w_0=(w_{0,1},\ldots, w_{0,K})\)` Penalization parameter: `\(\lambda\geq 0\)` B-spline and penalty matrices `\(\boldsymbol B\)` and `\(\boldsymbol D\)` on `\(\mathcal{P}= (p_1,\ldots,p_M)\)` Hat matrix: `$$\boldsymbol{\mathcal{H}} = \boldsymbol B(\boldsymbol B'\boldsymbol B+ \lambda (\alpha \boldsymbol D_1'\boldsymbol D_1 + (1-\alpha) \boldsymbol D_2'\boldsymbol D_2))^{-1} \boldsymbol B'$$` Cumulative Regret: `\(R_{0,k} = 0\)` Range parameter: `\(E_{0,k}=0\)` Starting pseudo-weights: `\(\boldsymbol \beta_0 = \boldsymbol B^{\text{pinv}}\boldsymbol w_0(\boldsymbol{\mathcal{P}})\)` ]] .pull-right-3[ .font90[ **Core**: for( t in 1:T ) { `\(\widetilde{\boldsymbol X}_{t} = \text{Sort}\left( \boldsymbol w_{t-1}'(\boldsymbol P) \widehat{\boldsymbol X}_{t} \right)\)` .grey[\# Prediction] `\(\boldsymbol r_{t} = \frac{L}{M} \boldsymbol B' \left({\boldsymbol{QL}}_{\boldsymbol{\mathcal P}}^{\nabla}(\widetilde{\boldsymbol X}_{t},Y_t)- {\boldsymbol{QL}}_{\boldsymbol{\mathcal P}}^{\nabla}(\widehat{\boldsymbol X}_{t},Y_t)\right)\)` `\(\boldsymbol E_{t} = \max(\boldsymbol E_{t-1}, \boldsymbol r_{t}^+ + \boldsymbol r_{t}^-)\)` `\(\boldsymbol V_{t} = \boldsymbol V_{t-1} + \boldsymbol r_{t}^{ \odot 2}\)` `\(\boldsymbol \eta_{t} =\min\left( \left(-\log(\boldsymbol \beta_{0}) \odot \boldsymbol V_{t}^{\odot -1} \right)^{\odot\frac{1}{2}} , \frac{1}{2}\boldsymbol E_{t}^{\odot-1}\right)\)` `\(\boldsymbol R_{t} = \boldsymbol R_{t-1}+ \boldsymbol r_{t} \odot \left( \boldsymbol 1 - \boldsymbol \eta_{t} \odot \boldsymbol r_{t} \right)/2 + \boldsymbol E_{t} \odot \mathbb{1}\{-2\boldsymbol \eta_{t}\odot \boldsymbol r_{t} > 1\}\)` `\(\boldsymbol \beta_{t} = K \boldsymbol \beta_{0} \odot \boldsymbol {SoftMax}\left( - \boldsymbol \eta_{t} \odot \boldsymbol R_{t} + \log( \boldsymbol \eta_{t}) \right)\)` `\(\boldsymbol w_{t}(\boldsymbol P) = \underbrace{\boldsymbol B(\boldsymbol B'\boldsymbol B+ \lambda (\alpha \boldsymbol D_1'\boldsymbol D_1 + (1-\alpha) \boldsymbol D_2'\boldsymbol D_2))^{-1} \boldsymbol B'}_{\boldsymbol{\mathcal{H}}} \boldsymbol B \boldsymbol \beta_{t}\)` } .grey[\#t] ] ] --- name: simulation # Simulation Study .pull-left[ Data Generating Process of the [simple probabilistic example](#simple_example) - Constant solution `\(\lambda \rightarrow \infty\)` - Pointwise Solution of the proposed BOAG - Smoothed Solution of the proposed BOAG - Weights are smoothed during learning - Smooth weights are used to calculate Regret, adjust weights, etc. ] .pull-right[ <div style="position:relative; margin-top:-50px; z-index: 0"> .panelset[ .panel[.panel-name[QL Deviation] Deviation from best attainable `\(\boldsymbol{QL}_\boldsymbol{\mathcal{P}}\)` (1000 runs). ![](data:image/png;base64,#pre_vs_post.gif) ] .panel[.panel-name[Lambda] CRPS Values for different `\(\lambda\)` (1000 runs) ![](data:image/png;base64,#pre_vs_post_lambda.gif) ] .panel[.panel-name[Knots] CRPS Values for different number of knots (1000 runs) ![](data:image/png;base64,#pre_vs_post_kstep.gif) ] ] ] --- # Simulation Study The same simulation carried out for different algorithms (1000 runs): <center> <img src="algos_constant.gif"> </center> --- # Simulation Study .pull-left-1[ **New DGP:** `\begin{align} Y_t & \sim \mathcal{N}\left(\frac{\sin(0.005 \pi t )}{2},\,1\right) \\ \widehat{X}_{t,1} & \sim \widehat{F}_{1} = \mathcal{N}(-1,\,1) \\ \widehat{X}_{t,2} & \sim \widehat{F}_{2} = \mathcal{N}(3,\,4) \label{eq_dgp_sim2} \end{align}`
Changing optimal weights
Single run example depicted aside
No forgetting leads to long-term constant weights <center> <img src="forget.png"> </center> ] .pull-right-2[ **Weights of expert 2** <img src="data:image/png;base64,#index_files/figure-html/unnamed-chunk-5-1.svg" style="display: block; margin: auto;" /> ] --- name:extensions # Possible Extensions .pull-left[ **Forgetting** - Only taking part of the old cumulative regret into account - Exponential forgetting of past regret `\begin{align*} R_{t,k} & = R_{t-1,k}(1-\xi) + \ell(\widetilde{F}_{t},Y_i) - \ell(\widehat{F}_{t,k},Y_i) \label{eq_regret_forget} \end{align*}` **Fixed Shares** <a id='cite-herbster1998tracking'></a><a href='#bib-herbster1998tracking'>Herbster and Warmuth (1998)</a> - Adding fixed shares to the weights - Shrinkage towards a constant solution `\begin{align*} \widetilde{w}_{t,k} = \rho \frac{1}{K} + (1-\rho) w_{t,k} \label{fixed_share_simple}. \end{align*}` ] .pull-right[ **Non-Equidistant Knots** - Non-equidistant spline-basis could be used - Potentially improves the tail-behavior - Destroys shrinkage towards constant <center> <img src="uneven_grid.gif"> </center> ] --- name: application # Application to NN Power Price Forecasts .pull-left[ Apply BOA to Normal and JSU Power price forecasts: ```r mod <- online_mv( y = Y, experts = experts, tau = 1:99 / 100 ) ``` This yields:
CRPS Values
norm
jsu
comb
1.4449
1.4594
1.4438
Figure shows most recent JSU weights ] .pull-left[ <div style="position:relative; margin-top:0px; z-index: 0">
] --- # Application to NN Power Price Forecasts .scroll-output[ .pull-left[ Combination consistently outperforms ```r mod <- online_mv( y = Y, experts = experts, tau = 1:99 / 100, smooth_pr = list( penalized = list( lambda = c(-Inf, 2^(-2:15), 2^20) ) # lambda = 4096 was selected ), smooth_mv = list( penalized = list( lambda = c(-Inf, 2^(-2:15), 2^20) ) # lambda = 8 was selected ), forget_regret = c(0, 2^seq(-11, -4, 1)) # .15625 was selected ) ``` ] .pull-right[
CRPS Values
norm
jsu
n-
n+
j-
j+
comb
h
1.4449
1.4594
2.4058
2.1155
2.4808
2.0809
1.4174
all
0.9902
1.0036
2.1853
1.9754
2.2674
1.9561
0.9738
1
1.0044
1.0150
2.1728
2.0126
2.2553
1.9944
0.9867
2
1.0691
1.0777
2.1637
2.1003
2.2466
2.0823
1.0532
3
1.0508
1.0567
2.0804
2.1182
2.1604
2.0981
1.0354
4
1.0952
1.1072
2.1584
2.0755
2.2386
2.0508
1.0879
5
1.2991
1.3096
2.2087
2.0722
2.2852
2.0321
1.2941
6
1.5783
1.5834
2.3376
2.2261
2.4062
2.1852
1.5692
7
1.6908
1.6944
2.3958
2.3145
2.4615
2.2762
1.6790
8
1.6132
1.6268
2.4659
2.1745
2.5379
2.1364
1.5743
9
1.5718
1.5855
2.4508
2.1499
2.5223
2.1089
1.5370
10
1.5481
1.5628
2.4340
2.1602
2.5063
2.1184
1.5292
11
1.5604
1.5777
2.5063
2.1749
2.5809
2.1338
1.5462
12
1.6328
1.6538
2.5389
2.2658
2.6119
2.2239
1.6195
13
1.6748
1.6930
2.5327
2.3406
2.6038
2.3028
1.6546
14
1.5961
1.6149
2.5196
2.2063
2.5916
2.1705
1.5609
15
1.6523
1.6739
2.6334
2.1475
2.7051
2.1147
1.6006
16
1.5672
1.5888
2.5707
2.0356
2.6457
2.0011
1.5140
17
1.6772
1.6915
2.6129
2.1303
2.6899
2.0897
1.6285
18
2.1245
2.1341
2.9295
2.5147
2.9936
2.4767
2.0790
19
1.7891
1.7960
2.5656
2.2921
2.6319
2.2481
1.7635
20
1.3525
1.3641
2.2497
1.9638
2.3264
1.9188
1.3312
21
1.1864
1.2058
2.3075
1.7877
2.3902
1.7524
1.1449
22
1.1070
1.1325
2.2820
1.7693
2.3648
1.7358
1.0754
23
1.2453
1.2767
2.4381
1.7649
2.5149
1.7334
1.1797
24
] ] --- # Insights Into Extended Model .pull-left[
] .pull-right[ <img src="data:image/png;base64,#index_files/figure-html/unnamed-chunk-13-1.svg" style="display: block; margin: auto auto auto 0;" /> ] --- name: conclusion # Wrap-Up .font90[ .pull-left[ Potential Downsides: - Pointwise optimization can induce quantile crossing - Can be solved by sorting the predictions Upsides: - Pointwise learning outperforms the Naive solution significantly - Online learning is much faster than batch methods - Smoothing further improves the predictive performance - Asymptotically not worse than the best convex combination ] .pull-left[ Important: - The choice of the learning rate is crucial - The loss function has to meet certain criteria The [
profoc](https://profoc.berrisch.biz/) R Package: - Implements all algorithms discussed above - Is written using RcppArmadillo
its fast - Accepts vectors for most parameters - The best parameter combination is chosen online - Implements - Forgetting, Fixed Share - Different loss functions + gradients ] ] <a href="https://github.com/BerriJ" class="github-corner" aria-label="View source on Github"><svg width="80" height="80" viewBox="0 0 250 250" style="fill:#f2f2f2; color:#212121; position: absolute; top: 0; border: 0; right: 0;" aria-hidden="true"><path d="M0,0 L115,115 L130,115 L142,142 L250,250 L250,0 Z"></path><path d="M128.3,109.0 C113.8,99.7 119.0,89.6 119.0,89.6 C122.0,82.7 120.5,78.6 120.5,78.6 C119.2,72.0 123.4,76.3 123.4,76.3 C127.3,80.9 125.5,87.3 125.5,87.3 C122.9,97.6 130.6,101.9 134.4,103.2" fill="currentColor" style="transform-origin: 130px 106px;" class="octo-arm"></path><path d="M115.0,115.0 C114.9,115.1 118.7,116.5 119.8,115.4 L133.7,101.6 C136.9,99.2 139.9,98.4 142.2,98.6 C133.8,88.0 127.5,74.4 143.8,58.0 C148.5,53.4 154.0,51.2 159.7,51.0 C160.3,49.4 163.2,43.6 171.4,40.1 C171.4,40.1 176.1,42.5 178.8,56.2 C183.1,58.6 187.2,61.8 190.9,65.4 C194.5,69.0 197.7,73.2 200.1,77.6 C213.8,80.2 216.3,84.9 216.3,84.9 C212.7,93.1 206.9,96.0 205.4,96.6 C205.1,102.4 203.0,107.8 198.3,112.5 C181.9,128.9 168.3,122.5 157.7,114.1 C157.9,116.9 156.7,120.9 152.7,124.9 L141.0,136.5 C139.8,137.7 141.6,141.9 141.8,141.8 Z" fill="currentColor" class="octo-body"></path></svg></a><style>.github-corner:hover .octo-arm{animation:octocat-wave 560ms ease-in-out}@keyframes octocat-wave{0%,100%{transform:rotate(0)}20%,60%{transform:rotate(-25deg)}40%,80%{transform:rotate(10deg)}}@media (max-width:500px){.github-corner:hover .octo-arm{animation:none}.github-corner .octo-arm{animation:octocat-wave 560ms ease-in-out}}</style> ??? Execution Times: T = 5000 Opera: Ml-Poly > 157 ms Boa > 212 ms Profoc: Ml-Poly > 17 BOA > 16 --- class: center, middle [
CRPS-Learning](https://arxiv.org/abs/2102.00968) --- name:references # References 1 Cesa-Bianchi, N. and G. Lugosi (2006). _Prediction, learning, and games_. Cambridge university press. Gaillard, P., G. Stoltz, and T. Van Erven (2014). "A second-order bound with excess losses". In: _Conference on Learning Theory_. PMLR. , pp. 176-196. Gaillard, P. and O. Wintenberger (2018). "Efficient online algorithms for fast-rate regret bounds under sparsity". In: _Advances in Neural Information Processing Systems_. , pp. 7026-7036. Gneiting, T. (2011a). "Making and evaluating point forecasts". In: _Journal of the American Statistical Association_ 106.494, pp. 746-762. Gneiting, T. and A. E. Raftery (2007). "Strictly proper scoring rules, prediction, and estimation". In: _Journal of the American statistical Association_ 102.477, pp. 359-378. Herbster, M. and M. K. Warmuth (1998). "Tracking the best expert". In: _Machine learning_ 32.2, pp. 151-178. Kakade, S. M. and A. Tewari (2008). "On the Generalization Ability of Online Strongly Convex Programming Algorithms." In: _NIPS_. , pp. 801-808. --- # References 2 Wintenberger, O. (2017). "Optimal learning with Bernstein online aggregation". In: _Machine Learning_ 106.1, pp. 119-141. --- class: center, middle [
](#content)