Data Driven Identification of Power Plant Operation States Using Clustering

Jonathan Berrisch, Philipp Castro, Maike Spilger, Florian Ziel, Christoph Weber

University of Duisburg-Essen, House of Energy Markets and Finance

2025-01-21

Outline

  EfeMOD Project

  Motivation and Objective

  Data

  Empirical Approach

  Results

  Conclusion / Outlook

EfeMOD

Empirisch fundierte Elektrizitätsmarkt-Modellierung mit Open Data

Project Entities:

Chair of Prof. Dr. Christoph Weber (Management Sciences and Energy Economics)

Chair of Prof. Dr. Florian Ziel (Data Science in Energy and Environment)

Project Goal:

Use publicly available data (particularly ENTSO-E Transparency Platform) to estimate parameters for energy system and energy market models.

EfeMOD

Motivation and Objective

Identification of Power Plant Operation States Using Clustering

Gain Knowledge about the Power Plant Characteristics

  • Operation Points,
  • Efficiency
  • Capacity, etc.

This Presentation:

Identify Operation States:

  • Stable Operation
  • Startup
  • Minimum-Stable Operation, etc.

Provide these characteristics to other researchers

e.g. to estimate efficiency

Data

Entsoe Data:

  • ActualGenerationOutputPerGenerationUnit_16.1.A
  • UnavailabilityOfGenerationUnits_15.1.A_B

We focus on natural gas units:

  • 63 units in DE_LU bidding zone
  • 299 units across all bidding zones

We use recent data:

  • 2020-01-01 until “now”

Data

Data

Data

Heizkraftwerk Lausward

Location: Düsseldorf

Block Anton (Block AGuD)

Combined cycle gas turbine (CCGT)

Electrical output: 103 MW

75 MW of district heating can be decoupled

Efficiency: 54%

Fuel Utilization Rate: 87% (with district heating)

Erdgaskraftwerk Emsland

Location: Lingen (Ems)

Block C

Combined cycle gas turbine (CCGT)

Electrical output: 475 MW

Efficiency: 46%

Black start enabled.

Empirical Approach

Overview

Empirical identification of states

3-Step Approach:

  • Prior Partitioning
    • We create preliminary clusters
    • They will be used to initialize the main clustering
  • Main Clustering
    • Gaussian Model Based Clustering
  • Label Assignment
    • We assign meaningful labels to the final clusters

:::

Empirical Approach

Prior Partitioning

Divide the space in meaningful partitions:

Define the Capacity: \(\zeta = max(t0)\)

Define a threshold: \(\gamma = \frac{\zeta}{50}\)

\(\pm \gamma\) around the diagonal: Stable
\(t0 < 1\) & \(t1 < 1\): Zero
\(t0 < \gamma\) & \(t1 > 1\): Startup
\(t0 > 1\) & \(t1 < \gamma\): Shutdown
\(t1 > t0\): Ramp-Up
\(t1 < t0\): Ramp-Down

We project Stable observations onto the diagonal, Startup on \(t1\) and Shutdown on \(t0\) for the next step.

:::

Empirical Approach

Prior Partitioning

Model-Based Clustering of the Regions using mclust::Mclust in R.

  • Stable: 2-5 Clusters
  • Ramp Up: 2-4 Clusters
  • Ramp Down: 2-4 Clusters

Obtain finite mixture distribution:

\[\sum_{k=1}^{G}{\pi_k f_k (\mathbf{x}; \mathbf{\theta}_k)}\]

\(f_k\) Density of k’s component
\(\pi_k\) Mixture weights
\(\theta_k\) parameters of k’s density component

Empirical Approach

Prior Partitioning

\[f(\mathbf{x}; \mathbf{\Psi}) = \sum_{k=1}^{G}{\pi_k \phi (\mathbf{x}; \mathbf{\mu}_k; \mathbf{\Sigma}_k)}\]

\(\phi(\cdot)\) Multivariate Gaussian density

Maximum Likelihood Estimation via Expectation Maximization (EM) algorithm

Likelihood for Gaussian Mixture Models (GMMs):

\[\begin{align} \ell(\Psi) = \sum_{i=1}^n \log \left\{ \sum_{k=1}^G \pi_k \phi(x_i; \mu_k, \Sigma_k) \right\} \end{align}\]

We Re-Formulate this likelihood to a complete-data likelihood to utilize the EM algorithm

\[\begin{align} \ell_{\mathcal{C}}(\Psi) = \sum_{i=1}^n \sum_{k=1}^G z_{ik} \left\{ \log \pi_k + \log \phi(x_i; \mu_k, \Sigma_k) \right\} \end{align}\]

\[\begin{align} z_{ik} = \begin{cases} 1 & \text{if } x_i \text{ belongs to component }k \\ 0 & \text{otherwise.} \end{cases} \end{align}\]

E-Step:

\[\begin{align} \hat{z}_{ik} = \frac{\hat{\pi}_k \phi(x_i; \hat{\mu}_k, \hat{\Sigma}_k)}{\sum_{g=1}^{G} \hat{\pi}_g \phi(x_i; \hat{\mu}_g, \hat{\Sigma}_g)}, \end{align}\]

M-Step:

\[\begin{align} \quad \hat{\mu}_k = \frac{\sum_{i=1}^{n} \hat{z}_{ik} x_i}{n_k}, \quad \text{where} \quad n_k = \sum_{i=1}^{n} \hat{z}_{ik}. \end{align}\]

Empirical Approach

Prior Partitioning

Initialization

We initialize the EM algorithm (E-Step) using the partitions obtained from model-based agglomerative hierarchical clustering (MBAHC)

Estimation

The Bayesian information criterion (BIC) is used for model selection

Prior Partitioning Results

Right graph shows prior clusters.

Empirical Approach

Main Clustering

MBAHC

Prior Clusters are used in MBAHC

The results of the MBAHC are used to initialize the EM Algorithm in the main Gaussian Model Based Clustering

Main Clustering Results

Right graph shows Maximum A Posteriori (MAP) Classification

Colour indicates cumulated log(density) of all components.

Empirical Approach

Label Assignment

We assign labels to the clusters using their mean \(\mu\) and correlation \(\rho\)

Multiple clusters may describe one Generation State (e.g., along the diagonal)

# A tibble: 6 × 4
  classification      mu_t0      mu_t1      cor
           <int>      <dbl>      <dbl>    <dbl>
1              1 -0.0000290 -0.0000338 -0.00562
2              2 33.6       33.8        0.703  
3              3 10.5       48.0        0.795  
4              4 83.2       88.4        0.821  
5              5 82.4       82.4        1.00   
6              6 80.5       80.1        0.978  

\[\begin{align} \text{State} = \begin{cases} \color{#202020FF}{\text{Zero}} & (\mu_{t0} < 1) \land (\mu_{t1} < 1), \\ \text{MSO} & \left[ (\mu_{t0} > \zeta/10) \land (\mu_{t1} > \zeta / 10) \land (\right| \mu_{t0} - \mu_{t1} \left| > \zeta / 10) \right]\\ & \rightarrow \operatorname{argmin}(\mu_{t0} + \mu_{t1}), \\ \text{Max Capacity} & \rightarrow \operatorname{argmax}(\mu_{t0} + \mu_{t1}), \\ \text{Startup} & (\mu_{t1} \geq \zeta / 10) \land (\mu_{t0} < \gamma) \land (\rho < 0.3), \\ \text{Shutdown} & (\mu_{t0} \geq \zeta / 10) \land (\mu_{t1} < \gamma) \land (\rho < 0.3), \\ \text{Stable Operation} & \text{Remaining clusters with cor} > 0.8, \\ \text{Ramp Up} & \text{Remaining clusters: } \mu_{t1} > \mu_{t0}, \\ \text{Ramp Down} & \text{Remaining clusters: } \mu_{t1} < \mu_{t0}. \end{cases} \end{align}\]

Empirical Approach

Label Assignment

Right graphs show assigned states

The points are coloured according to

  • MAP
  • Probability (each pure colour reflects a probability of 1)

Some points below /above the diagonal are assigned to Ramp Up / Ramp Down

  • Can be easily fixed for MAP
  • Fixing probabilistic predictions not that easy

Empirical Approach

Label Assignment

Fixing assignments

Relabeling Ramp Up and Ramp Down MAP predictions is trivial:

\[\begin{align} \text{State} = \begin{cases} \text{Ramp Up} & x_{t1} > x_{t0}, \\ \text{Ramp Down} & x_{t1} < x_{t0}. \end{cases} \end{align}\]

Fixing the probability array is more involved:

Find observations \(x_{t1} < x_{t0}\) that can not be “Ramp Up”:

Set probability of all Ramp Up clusters to \(0\).

Normalize the probabilities.

Outlook



  • The approach works in general
  • Conceptually simple
  • Label assignment needs some more work
  • Probabilistic statements may need adjustments for Ramp-Up Ramp-Down predictions
  • Some kind of validation would be desirable
  • Results will be used party on another research project in the EFEMOD project