class: center, middle, inverse, title-slide .title[ # High-Resolution Peak Demand Estimation Using Generalized Additive Models and Deep Neural Networks ] .subtitle[ ##
Jonathan Berrisch
, Michał Narajewski, Florian Ziel ] .author[ ### University of Duisburg-Essen ] .date[ ### 2023-09-13 ] --- name: motivation # Motivation .bold[High Resolution Peak Demand Estimation Challenge] - Organized by Western Power Distribution and Catapult Energy Systems - Does limited high-resolution monitoring help estimate future high-resolution peak loads? The Objective: - Estimate minimum and maximum electricity load values (one-minute resolution) - Given data with only a 30-minute resolution - One single substation, every half-hour of September 2021 Data: - From Nov. 2019 to Sept. 2021 (30-minute resolution) - MERRA-2 weather reanalysis data from five locations close to the substation --- # Motivation ### Location Overview <img src="fig/map.svg" width="1000px" style="display: block; margin: auto;" /> --- # Motivation ### Data Overview <img src="index_files/figure-html/unnamed-chunk-2-1.svg" style="display: block; margin: auto;" /> --- # Data .pull-left[ Targets: `\(\Delta^{\min}_t\)` and `\(\Delta^{\max}_t\)` Possible explanatory variables: The *half-hourly* load: `\(L_t\)` Discrete second order central difference (DSOCD): `\(L_t'' = L_{t-1}-2L_t+L_{t+1}\)` Deterministic components (to capture potential seasonal characteristics) - Daily `\(D_t\)` number of hours in a day - Weekly `\(W_t\)` number of hours in a week - Annual `\(A_t\)` number of hours in a meteorological year with `\(365.24\)` days ] .pull-right[ Weather Inputs: Temperature, Windspeed (North / East), Solar, Humidity The Figure (next slide) shows how these variables correlate. - Lower triangle: - Pearson's correlation - Upper triangle: - Distance correlation Distance correlation: non-linear dependency measure that takes values in `\([0, 1]\)` and characterizes stochastic independence <a id='cite-szekely2007measuring'></a><a href='#bib-szekely2007measuring'>Székely, Rizzo, and Bakirov (2007)</a>. ] --- # Correlation <img src="fig/cortikz_sa.svg" width="1000px" style="display: block; margin: auto;" /> --- # Data <img src="fig/delta.svg" width="1000px" style="display: block; margin: auto;" /> --- # Feature Set and Models .pull-left[ </br> Selected features based on preliminary analysis: </br> | Variable type | Included feature | Number | |---|---|---| | Lagged load | `\(L_{t-1}\)`, `\(L_t\)`, `\(L_{t+1}\)` | 3 | | Lagged DSOCD load | `\(\widetilde{L}_{t-4}'',\ldots , \widetilde{L}_{t+4}''\)` | 9 | | Weather inputs | Temp, Solar, WindN, WindE, Press, Humid | 6 | | Seasonal inputs | `\(D_t\)`, `\(W_t\)`, `\(A_t\)` | 3 | ] .pull-right[ </br> Considered Models: - Generalized Additive Model (GAM) - Multilayer Perceptron Network (MLP) - Combinations of the above Competition Benchmark: .bold[Naive] `$$L_t^{\max} = L_t$$` `$$L_t^{\min} = L_t$$` ] --- # Modelling Approach: GAM ### Generalized Additive Model (GAM) .pull-left[ `\begin{equation} \Delta^{m}_t = \sum_{i=1}^L f_i( X_{t,1}, \ldots, X_{t,N}) + \epsilon_t \label{eq_GAM_gen} \end{equation}` where `\(m\in\{\min , \max\}\)` Traditional framework using cubic B-splines *GAM.full* Specification (with all 2-way interactions): `\begin{align} \Delta^{m}_t =& \sum_{i=1}^N b_{k_0}( X_{t,i} ) + \nonumber \\ &\sum_{i=1}^N \sum_{j=1,j>i}^N b_{k_1,k_2}( X_{t,i}, X_{t,j} ) +\epsilon_t \label{eq_big_GAM} \end{align}` ] .pull-right[ `\(b_{k_0}\)` and `\(b_{k_1,k_2}\)` denote univariate and bivariate splines with `\(k_0\)`, and `\((k_1,k_2)\)` knots. Tensor interaction splines -> only capture the joint effects. We set `\(k_0=27\)` and `\(k_1=k_2=9\)`. Thus, linear terms are specified by `\(27\)` parameters and bivariate terms by `\(81\)` parameters. *GAM.red*: Interactions with `\(L_t\)`, `\(\widetilde{L}_t''\)` and `\(\text{Solar}_t\)`. *GAM.simple*: `\begin{align} \Delta^{m}_t = b_{k_0}( L_t ) + b_{k_0}( \widetilde{L}_t'') + b_{k_0}( \text{Solar}_t ) +\epsilon \label{eq_simple_GAM} \end{align}` ] --- # Modelling Approach: MLP .pull-left[ Multilayer Perceptron Network: <img src="fig/nn_fig.svg" width="500px" style="display: block; margin: auto;" /> Hyperparameters are tuned using OPTUNA Python package <a id='cite-akiba2019optuna'></a><a href='#bib-akiba2019optuna'>Akiba, Sano, Yanase, Ohta, and Koyama (2019)</a> ] .pull-right[ - Input feature selection - Number of hidden layers -- either 2 or 3 - Dropout layer -- whether to use it after the input layer and, if yes, at what rate. - Activation functions in the hidden layers: elu, relu, sigmoid, softmax, softplus, and tanh - Number of neurons in the hidden layer drawn on an exp-scale from `\([4,128]\)` - `\(L_1\)` regularization -- whether to use it on the hidden layers and, if yes, at what rate. - Learning rate for the Adam optimization algorithm drawn on an exp-scale from `\(\left(10^{-5}, 10^{-1}\right)\)` interval ] --- # GAM Parameter Significance </br> .pull-left[ | Min | EDF | F | Max | EDF | F | | --- | | `\(L_t\)` | 9.5 | 8.9 | `\(L_t\)` | 10.8 | 17.6 | | `\(\widetilde{L}_{t-1}''\)` | 8.8 | 4.4 | `\(\widetilde{L}_{t-2}''\)` | 6.7 | 4.9 | | `\(\widetilde{L}_{t+1}''\)` | 8.4 | 4.8 | `\(L_{t-1}\)` | 6.3 | 6.9 | | `\(\widetilde{L}_{t}''\)` | 8.2 | 4.6 | `\(L_{t+1}\)` | 5.4 | 5.0 | | Humid | 6.1 | 3.7 | `\(D_t\)` | 4.8 | 3.1 | | WindE | 5.1 | 8.9 | Temp | 4.1 | 11.5 | | `\(A_t\)` | 4.7 | 4.9 | Solar | 4.0 | 5.5 | | WindN | 4.2 | 8.5 | WindE | 3.8 | 5.8 | | Temp | 3.5 | 3.6 | `\(\widetilde{L}_{t+4}''\)` | 3.2 | 2.1 | | `\(L_{t-1}\)` | 3.3 | 1.3 | `\(\widetilde{L}_{t+1}''\)` | 3.1 | 2.2 | ] .pull-right[ | Min | EDF | F | Max | EDF | F | | --- | | `\(L_t\)`, `\(A_t\)` | 26.7 | 7.8 | `\(L_t\)`, `\(A_t\)` | 37.8 | 28.1 | | `\(L_t\)`, Solar | 20.7 | 11.2 | `\(L_t\)`, `\(D_t\)` | 25.1 | 20.6 | | `\(L_t\)`, `\(D_t\)` | 18.3 | 8.2 | Solar, `\(A_t\)` | 15.4 | 6.4 | | `\(L_t\)`, `\(L_{t+1}\)` | 17.1 | 4.0 | `\(\widetilde{L}_{t}''\)`, `\(\widetilde{L}_{t-1}''\)` | 14.9 | 6.5 | | `\(\widetilde{L}_{t}''\)`, `\(\widetilde{L}_{t-1}''\)` | 13.5 | 2.5 | Solar, `\(D_t\)` | 11.0 | 8.5 | | `\(L_t\)`, Temp | 13.3 | 3.1 | `\(\widetilde{L}_{t}''\)`, `\(\widetilde{L}_{t+1}''\)` | 11.0 | 2.4 | | `\(L_t\)`, `\(L_{t-1}\)` | 13.0 | 4.9 | `\(\widetilde{L}_{t}''\)`, `\(\widetilde{L}_{t+2}''\)` | 10.7 | 2.8 | | `\(\widetilde{L}_{t}''\)`, `\(\widetilde{L}_{t+1}''\)` | 12.1 | 2.3 | `\(L_t\)`, Solar | 10.0 | 2.9 | | `\(L_t\)`, `\(W_t\)` | 11.7 | 3.0 | `\(\widetilde{L}_{t}''\)`, Temp | 9.5 | 3.4 | | `\(\widetilde{L}_{t}''\)`, `\(\widetilde{L}_{t+2}''\)` | 11.4 | 2.9 | `\(L_t\)`, WindE | 9.3 | 2.5 | ] --- # MLP Feature Importance <img src="fig/input_feature_freq.svg" width="1000px" style="display: block; margin: auto;" /> --- # Study Design and Evaluation </br> .pull-left[ Rolling Window Forecasting Study: - Length: 12 Months (10/2020 - 09/2021) - 1-Month shifts - Evaluation by RMSE Competition Design: - Only evaluating 09/2021 - Rank base on Score (relative RMSE) `\begin{align} \text{Score} & = \text{RMSE}(\text{Model}) / \text{RMSE}(\textbf{naive}). \end{align}` ] .pull-right[ Considered Models: - .bold[GAM.full] - .bold[GAM.red] - .bold[DNN] - .bold[naive] - .bold[Combination of GAM.full, GAM.red, and DNN] Two additional GAM models for diagnostic purposes: - .bold[GAM.simple] - .bold[GAM.no.Weather] ] --- # Results </br> </br>
Overall
20/10
20/11
20/12
21/1
21/2
21/3
21/4
21/5
21/6
21/7
21/8
21/9
Avg
GAM.full
.1239
(56.3)
.0703
(58.5)
.0497
(56.6)
.0532
(57.6)
.0988
(55.4)
.1202
(57.8)
.1447
(57.2)
.1746
(56.7)
.1368
(56.3)
.1537
(53.9)
.1383
(60.2)
.1163
(56.9)
.1150
(56.9)
GAM.red
.1241
(56.2)
.0700
(58.7)
.0496
(56.6)
.0536
(57.3)
.0994
(55.2)
.1201
(57.8)
.1467
(56.6)
.1750
(56.6)
.1385
(55.8)
.1554
(53.4)
.1374
(60.5)
.1162
(56.9)
.1155
(56.7)
DNN
.1230
(56.6)
.0704
(58.4)
.0507
(55.7)
.0545
(56.6)
.1027
(53.7)
.1251
(56.1)
.1514
(55.2)
.1653
(59.0)
.1474
(52.9)
.1538
(53.9)
.1422
(59.1)
.1225
(54.6)
.1174
(56.0)
Combination
.1221
(56.9)
.0689
(59.3)
.0491
(57.1)
.0527
(58.0)
.0979
(55.8)
.1193
(58.1)
.1443
(57.3)
.1684
(58.2)
.1365
(56.4)
.1512
(54.6)
.1371
(60.6)
.1164
(56.9)
.1137
(57.4)
GAM.noWeather
.1301
(54.1)
.0707
(58.2)
.0508
(55.6)
.0533
(57.5)
.1058
(52.3)
.1253
(56.0)
.1471
(56.5)
.1822
(54.8)
.1401
(55.3)
.1584
(52.5)
.1424
(59.1)
.1249
(53.7)
.1193
(55.3)
GAM.simple
.1531
(46.0)
.0940
(44.5)
.0584
(49.0)
.0672
(46.5)
.1267
(42.9)
.1500
(47.3)
.1797
(46.8)
.2093
(48.1)
.1733
(44.7)
.1873
(43.8)
.1671
(52.0)
.1413
(47.6)
.1423
(46.7)
Naive
.2833
.1693
.1144
.1255
.2217
.2847
.3380
.4029
.3131
.3334
.3478
.2699
.2670
--- name: wrap # Wrap-Up .pull-left[ Estimating high-resolution elecricity peak demand using lower-resolution data: - .bold[GAM.full] and .bold[GAM.red] perform similar - .bold[DNN] beats .bold[GAM.full] in some Months - .bold[Combination of GAM.full, GAM.red, and DNN] performs best - Weather variables improve the skill score by 1.5 percentage points on average - .bold[DNN] performs better at predicting maximum peak loads
<a id='cite-berrisch2023high'></a><a href='#bib-berrisch2023high'>Berrisch, Narajewski, and Ziel (2023)</a> ] .pull-right[ We won the competition. - 42.6% vs. 43.6% (second place) - Using slightly different model .center[ <center> <img src="frame.png"> </center> [berrisch.biz/slides/23_09_stat_woche](https://berrisch.biz/slides/23_09_stat_woche/) ] ] <a href="https://github.com/BerriJ" class="github-corner" aria-label="View source on Github"><svg width="80" height="80" viewBox="0 0 250 250" style="fill:#f2f2f2; color:#212121; position: absolute; top: 0; border: 0; right: 0;" aria-hidden="true"><path d="M0,0 L115,115 L130,115 L142,142 L250,250 L250,0 Z"></path><path d="M128.3,109.0 C113.8,99.7 119.0,89.6 119.0,89.6 C122.0,82.7 120.5,78.6 120.5,78.6 C119.2,72.0 123.4,76.3 123.4,76.3 C127.3,80.9 125.5,87.3 125.5,87.3 C122.9,97.6 130.6,101.9 134.4,103.2" fill="currentColor" style="transform-origin: 130px 106px;" class="octo-arm"></path><path d="M115.0,115.0 C114.9,115.1 118.7,116.5 119.8,115.4 L133.7,101.6 C136.9,99.2 139.9,98.4 142.2,98.6 C133.8,88.0 127.5,74.4 143.8,58.0 C148.5,53.4 154.0,51.2 159.7,51.0 C160.3,49.4 163.2,43.6 171.4,40.1 C171.4,40.1 176.1,42.5 178.8,56.2 C183.1,58.6 187.2,61.8 190.9,65.4 C194.5,69.0 197.7,73.2 200.1,77.6 C213.8,80.2 216.3,84.9 216.3,84.9 C212.7,93.1 206.9,96.0 205.4,96.6 C205.1,102.4 203.0,107.8 198.3,112.5 C181.9,128.9 168.3,122.5 157.7,114.1 C157.9,116.9 156.7,120.9 152.7,124.9 L141.0,136.5 C139.8,137.7 141.6,141.9 141.8,141.8 Z" fill="currentColor" class="octo-body"></path></svg></a><style>.github-corner:hover .octo-arm{animation:octocat-wave 560ms ease-in-out}@keyframes octocat-wave{0%,100%{transform:rotate(0)}20%,60%{transform:rotate(-25deg)}40%,80%{transform:rotate(10deg)}}@media (max-width:500px){.github-corner:hover .octo-arm{animation:none}.github-corner .octo-arm{animation:octocat-wave 560ms ease-in-out}}</style> --- name:references # References 1 <p><cite><a id='bib-akiba2019optuna'></a><a href="#cite-akiba2019optuna">Akiba, T., S. Sano, T. Yanase, et al.</a> (2019). “Optuna: A next-generation hyperparameter optimization framework”. In: <em>Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining</em>. , pp. 2623–2631.</cite></p> <p><cite><a id='bib-berrisch2023high'></a><a href="#cite-berrisch2023high">Berrisch, J., M. Narajewski, and F. Ziel</a> (2023). “High-resolution peak demand estimation using generalized additive models and deep neural networks”. In: <em>Energy and AI</em> 13, p. 100236.</cite></p> <p><cite><a id='bib-szekely2007measuring'></a><a href="#cite-szekely2007measuring">Székely, G. J., M. L. Rizzo, and N. K. Bakirov</a> (2007). “Measuring and testing dependence by correlation of distances”. In: <em>The annals of statistics</em> 35.6, pp. 2769–2794.</cite></p>