Multivariate Probabilistic CRPS Learning with an Application to Day-Ahead Electricity Prices

.title[
# Multivariate Probabilistic CRPS Learning with an Application to Day-Ahead Electricity Prices
]
.subtitle[
## Jonathan Berrisch, Florian Ziel
]
.author[
### University of Duisburg-Essen
]
.date[
### 2023-06-28
]

---

# Outline

</br>

1. Multivariate CRPS Learning
  - Introduction
  - Smoothing procedures
  - Application to multivariate electricity price forecasts
2. The `profoc` R package
  - Package overview
  - Implementation details
  - Illustrative examples

---

# Introduction to Multivariate CRPS Learning

The Idea:
- Combine multiple forecasts instead of choosing one

- Combination weights may vary over **time**, over the **distribution**, and over **covariates**

2 Popular options for combining distributions:
- Combining across quantiles (this paper)
  - Horizontal aggregation, vincentization
- Combining across probabilities
  - Vertical aggregation

]

![](data:image/png;base64,#index_files/figure-html/unnamed-chunk-1-1.svg)

]

.panel[.panel-name[Distribution]
![](data:image/png;base64,#index_files/figure-html/unnamed-chunk-2-1.svg)

![](data:image/png;base64,#index_files/figure-html/unnamed-chunk-3-1.svg)

]]
]]

---

# The Framework of Prediction under Expert Advice

### The sequential framework

Each day, `\(t = 1, 2, ... T\)`
- The **forecaster** receives predictions `\(\widehat{X}_{t,k}\)` from `\(K\)` **experts**
- The **forecaster** assings weights `\(w_{t,k}\)` to each **expert**
- The **forecaster** calculates her prediction:
`\begin{equation}
    \widetilde{X}_{t} = \sum_{k=1}^K w_{t,k} \widehat{X}_{t,k}.
    \label{eq_forecast_def}
\end{equation}`
- The realization for `\(t\)` is observed
]

- The experts can be institutions, persons, or models
- The forecasts can be point-forecasts (i.e., mean or median) or full predictive distributions
- We do not need any assumptions concerning the underlying data
- <a id='cite-cesa2006prediction'></a><a href='#bib-cesa2006prediction'>Cesa-Bianchi and Lugosi (2006)</a>

]

---

Weights are updated sequentially according to the past performance of the `\(K\)` experts.

`\begin{equation}
    R_{t,k}  = \widetilde{L}_{t} - \widehat{L}_{t,k} =  \sum_{i = 1}^t \ell(\widetilde{X}_{i},Y_i) - \ell(\widehat{X}_{i,k},Y_i)
    \label{eq_regret}
\end{equation}`

The cumulative regret:
- Indicates the predictive accuracy of expert `\(k\)` until time `\(t\)`.
- Measures how much the forecaster *regrets* not having followed the expert's advice

Popular loss functions for point forecasting <a id='cite-gneiting2011making'></a><a href='#bib-gneiting2011making'>Gneiting (2011a)</a>:
.pull-left[
- `\(\ell_2\)`-loss `\(\ell_2(x, y) = | x -y|^2\)`
  - optimal for mean prediction 
]
.pull-right[
- `\(\ell_1\)`-loss `\(\ell_1(x, y) = | x -y|\)` 
  - optimal for median predictions 
]

---

name:crps

# Probabilistic Setting

An appropriate loss:

`\begin{align*}
    \text{CRPS}(F, y) & = \int_{\mathbb{R}} {(F(x) - \mathbb{1}\{ x > y \})}^2 dx
    \label{eq_crps}
\end{align*}`

It's strictly proper <a id='cite-gneiting2007strictly'></a><a href='#bib-gneiting2007strictly'>Gneiting and Raftery (2007)</a>.

Using the CRPS, we can calculate time-adaptive weights `\(w_{t,k}\)`. However, what if the experts' performance varies in parts of the distribution?

`\begin{align*}
    \text{CRPS}(F, y) = 2 \int_0^{1}  \text{QL}_p(F^{-1}(p), y) \, d p.
    \label{eq_crps_qs}
\end{align*}`

... to combine quantiles of the probabilistic forecasts individually using the quantile-loss QL.

]

# Optimal Convergence

</br>

Convergence rates of BOA are:

]

---

# Multivariate CRPS Learning

Additionally, we extend the **B-Smooth** and **P-Smooth** procedures to the multivariate setting:

- Basis matrices for reducing 
- - the probabilistic dimension from `\(P\)` to `\(\widetilde P\)`
- - the multivariate dimension from `\(D\)` to `\(\widetilde D\)`

- Hat matrices
- - penalized smoothing across P and D dimensions

We utilize the mean Pinball Score over the entire space for hyperparameter optimization (e.g, `\(\lambda\)`)

]

*Basis Smoothing*

Represent weights as linear combinations of bounded basis functions:

`\begin{equation}
  \underbrace{\boldsymbol w_{t,k}}_{D \text{ x } P} = \sum_{j=1}^{\widetilde D} \sum_{l=1}^{\widetilde P} \beta_{t,j,l,k} \varphi^{\text{mv}}_{j} \varphi^{\text{pr}}_{l} = \underbrace{\boldsymbol \varphi^{\text{mv}}}_{D\text{ x }\widetilde D} \boldsymbol \beta_{t,k} \underbrace{{\boldsymbol\varphi^{\text{pr}}}'}_{\widetilde P \text{ x }P} \nonumber
\end{equation}`

A popular choice: B-Splines

`\(\boldsymbol \beta_{t,k}\)` is calculated using a reduced regret matrix:

`\(\underbrace{\boldsymbol r_{t,k}}_{\widetilde P \times \widetilde D} = \boldsymbol \varphi^{\text{pr}} \underbrace{\left({\boldsymbol{QL}}_{\mathcal{P}}^{\nabla}(\widetilde{\boldsymbol X}_{t},Y_t)- {\boldsymbol{QL}}_{\mathcal{P}}^{\nabla}(\widehat{\boldsymbol X}_{t},Y_t)\right)}_{\text{PxD}}\boldsymbol \varphi^{\text{mv}}\)`

If `\(\widetilde P = P\)` it holds that `\(\boldsymbol \varphi^{pr} = \boldsymbol{I}\)`  (pointwise)

For `\(\widetilde P = 1\)` we receive constant weights

]

---

# Multivariate CRPS Learning

#### Penalized smoothing:

Let `\(\boldsymbol{\psi}^{\text{mv}}=(\psi_1,\ldots, \psi_{D})\)` and `\(\boldsymbol{\psi}^{\text{pr}}=(\psi_1,\ldots, \psi_{P})\)` be two sets of bounded basis functions on `\((0,1)\)`:

`\begin{equation}
  \boldsymbol w_{t,k} = \boldsymbol{\psi}^{\text{mv}} \boldsymbol{b}_{t,k} {\boldsymbol{\psi}^{pr}}'
\end{equation}`

with parameter matix `\(\boldsymbol b_{t,k}\)`. The latter is estimated to penalize `\(L_2\)`-smoothing which minimizes

`\begin{align}
   & \| \boldsymbol{\beta}_{t,d, k}' \boldsymbol{\varphi}^{\text{pr}}  - \boldsymbol b_{t, d, k}' \boldsymbol{\psi}^{\text{pr}}  \|^2_2 + \lambda^{\text{pr}}  \| \mathcal{D}_{q}  (\boldsymbol b_{t, d, k}' \boldsymbol{\psi}^{\text{pr}})  \|^2_2 +                       \nonumber \\
   & \| \boldsymbol{\beta}_{t, p, k}' \boldsymbol{\varphi}^{\text{mv}}  - \boldsymbol b_{t, p, k}' \boldsymbol{\psi}^{\text{mv}}  \|^2_2 + \lambda^{\text{mv}}  \| \mathcal{D}_{q}  (\boldsymbol b_{t, p, k}' \boldsymbol{\psi}^{\text{mv}})  \|^2_2  \nonumber
\end{align}`

with differential operator `\(\mathcal{D}_q\)` of order `\(q\)`

Computation is easy since we have an analytical solution.

]

</br>

for( t in 1:T ) {

&nbsp; for( d in 1:D ) {

&nbsp; .grey[...]

&nbsp;&nbsp; `\(\boldsymbol w_{t,k} = \mathcal{H}^{\text{mv}}  \boldsymbol \varphi^{\text{mv}} \beta'_{t,k} \varphi^{\text{pr}} \mathcal{H}^{\text{pr}}\)`

&nbsp; }  .grey[\#h]

} .grey[\#t]

]

---

# Application

#### Data

- Day-Ahead electricity price forecasts from <a id='cite-marcjasz2022distributional'></a><a href='#bib-marcjasz2022distributional'>Marcjasz, Narajewski, Weron, and Ziel (2022)</a>
- Produced using probabilistic neural networks
- 24-dimensional distributional forecasts
- Distribution assumptions: JSU and Normal
- 8 experts (4 JSU, 4 Normal)
- 27th Dec. 2018 to 31st Dec. 2020 (736 days)
- We extract 99 quantiles (percentiles)

]

#### Setup

Evaluation: Exclude first 182 observations

Extensions: Penalized smoothing | Forgetting

Tuning strategies:
- Bayesian Fix
  - Sophisticated Baesian Search algorithm
- Online
  - Dynamic based on past performance
- Bayesian Online
  - First Bayesian Fix then Online

Computation Time: ~30 Minutes

]

---

# Special Cases

]

]

]

]

]

---

# Results

---

# Results

---

# Results: Hour 16:00-17:00

---

# Results: Mean

---

# Profoc R Package

#### Probabilistic Forecast Combination - profoc

Available on [Github](https://github.com/BerriJ/profoc) and [CRAN](https://CRAN.R-project.org/package=profoc)

Main Function: `online()` for online learning.
- Works with multivariate and/or probabilistic data
- Implements BOA, ML-POLY, EWA (and the gradient versions)
- Implements many extensions like smoothing, forgetting, thresholding, etc.
- Various loss functions are available 
- Various methods (`predict`, `update`, `plot`, etc.)

]

#### Speed

Large parts of profoc are implemented in C++.

We use `Rcpp`, `RcppArmadillo`, and OpenMP.

We use `Rcpp` modules to expose a class to R
- Offers great flexibility for the end-user
- Requires very little knowledge of C++ code
- High-Level interface is easy to use

]

---

# Profoc - B-Spline Basis

Basis specification `b_smooth_pr` is internally passed to `make_basis_mats()`:

```r
mod <- online(
  y = Y,
  experts = experts,
  tau = 1:99 / 100,
  b_smooth_pr = list(
    knots = 9,
    mu = 0.5,
    sigma = 1,
    nonc = 0,
    tailweight = 1,
    deg = 3
  )
)
```

Knots are distributed using the generalized beta distribution.

]

Exemplary Basis with 9 Knots:

</br>

]

---

# Profoc - B-Spline Basis

Basis specification `b_smooth_pr` is internally passed to `make_basis_mats()`:

```r
mod <- online(
  y = Y,
  experts = experts,
  tau = 1:99 / 100,
  b_smooth_pr = list(
    knots = 9,
    mu = 0.3, # NEW
    sigma = 1,
    nonc = 0,
    tailweight = 1,
    deg = 3
  )
)
```

Knots are distributed using the generalized beta distribution.

]

Exemplary Basis with 9 Knots ...

... and `mu` set to `0.3`:

]

---

# Profoc - B-Spline Basis

Basis specification `b_smooth_pr` is internally passed to `make_basis_mats()`:

```r
mod <- online(
  y = Y,
  experts = experts,
  tau = 1:99 / 100,
  b_smooth_pr = list(
    knots = 9,
    mu = 0.3,
    sigma = 1,
    nonc = 0,
    tailweight = 1,
    deg = 3,
    periodic = TRUE # NEW
  )
)
```

Knots are distributed using the generalized beta distribution.

]

Exemplary Basis with 9 Knots ...

... and `mu` set to `0.3` and `periodic = TRUE`

]

---

# Wrap-Up

The [<svg aria-hidden="true" role="img" viewBox="0 0 496 512" style="height:1em;width:0.97em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"/></svg> profoc](https://profoc.berrisch.biz/) R Package:

Profoc is a flexible framework for online learning.

- It implements several algorithms
- It implements several loss functions
- It implements several extensions
- Its high- and low-level interfaces offer great flexibility

Profoc is fast.

- The core components are written in C++
- The core components utilize OpenMP for parallelization

]

Multivariate Extension:

- Code is available now
- [Pre-Print](https://arxiv.org/abs/2303.10019) is available now

Get these slides:

.center[
<center>
<img src="web_pres.png">
</center>
[https://berrisch.biz/slides/23_06_ecmi/](https://berrisch.biz/slides/23_06_ecmi/)
]
]

<a href="https://github.com/BerriJ" class="github-corner" aria-label="View source on Github"><svg width="80" height="80" viewBox="0 0 250 250" style="fill:#f2f2f2; color:#212121; position: absolute; top: 0; border: 0; right: 0;" aria-hidden="true"><path d="M0,0 L115,115 L130,115 L142,142 L250,250 L250,0 Z"></path><path d="M128.3,109.0 C113.8,99.7 119.0,89.6 119.0,89.6 C122.0,82.7 120.5,78.6 120.5,78.6 C119.2,72.0 123.4,76.3 123.4,76.3 C127.3,80.9 125.5,87.3 125.5,87.3 C122.9,97.6 130.6,101.9 134.4,103.2" fill="currentColor" style="transform-origin: 130px 106px;" class="octo-arm"></path><path d="M115.0,115.0 C114.9,115.1 118.7,116.5 119.8,115.4 L133.7,101.6 C136.9,99.2 139.9,98.4 142.2,98.6 C133.8,88.0 127.5,74.4 143.8,58.0 C148.5,53.4 154.0,51.2 159.7,51.0 C160.3,49.4 163.2,43.6 171.4,40.1 C171.4,40.1 176.1,42.5 178.8,56.2 C183.1,58.6 187.2,61.8 190.9,65.4 C194.5,69.0 197.7,73.2 200.1,77.6 C213.8,80.2 216.3,84.9 216.3,84.9 C212.7,93.1 206.9,96.0 205.4,96.6 C205.1,102.4 203.0,107.8 198.3,112.5 C181.9,128.9 168.3,122.5 157.7,114.1 C157.9,116.9 156.7,120.9 152.7,124.9 L141.0,136.5 C139.8,137.7 141.6,141.9 141.8,141.8 Z" fill="currentColor" class="octo-body"></path></svg></a><style>.github-corner:hover .octo-arm{animation:octocat-wave 560ms ease-in-out}@keyframes octocat-wave{0%,100%{transform:rotate(0)}20%,60%{transform:rotate(-25deg)}40%,80%{transform:rotate(10deg)}}@media (max-width:500px){.github-corner:hover .octo-arm{animation:none}.github-corner .octo-arm{animation:octocat-wave 560ms ease-in-out}}</style>