rcpptimer: Rcpp Tic-Toc Timer with OpenMP Support

Jonathan Berrisch

University of Duisburg-Essen, House of Energy Markets and Finance

2024-07-01

How it started

I searched a benchmarking tool for my (R)cpp code

I found RcppClock

RcppClock was slow, so I started accelerating it

But why is it so cool?

It’s simple:

Rcpp::Clock clock;   // Creates clock instance

clock.tick("my_clock");
// Some code I want to benchmark
clock.tock("my_clock");

clock.stop();       // Passes results to R

How did it work?

class Clock {
  private:
    std::vector<std::chrono::steady_clock::time_point> tick_times, tock_times;
    std::vector<std::string> tick_names, tock_names;

  public:
    // start a timer
    void tick(std::string name) {
      tick_names.push_back(name);
      tick_times.push_back(std::chrono::steady_clock::now());
    }

    // stop a timer 
    void tock(std::string name) {
      tock_names.push_back(name);
      tock_times.push_back(std::chrono::steady_clock::now());
    }
}

Incomplete section for illustration purposes. The complete source is available on GitHub

Why was it slow?

void stop() {
  // Fill `timers` and `tickers` with the results
  for (unsigned int i = 0; i < tick_names.size(); ++i) {
    for (unsigned int j = 0; j < tock_names.size(); ++j) {
      if (tick_names[i] == tock_names[j]) {
        timers.push_back(duration(tock_times[j] - tick_times[i]));
        tickers.push_back(tick_names[i]);
        tock_times.erase(tock_times.begin() + j);
        tock_names.erase(tock_names.begin() + j);
        break;
      }
    }
  }
  // Pass data to R
  DataFrame df = DataFrame::create(Named("ticker") = tickers, Named("timer") = timers);
  df.attr("class") = "RcppClock";
  Environment::global_env()["times"] = df;
}

Incomplete section for illustration purposes. The complete source is available on GitHub

Status quo of RcppClock

  • It became faster with commit aa19372, See #1
    • By moving the ‘matching algorithm’ into the .tock method.
    • Durations are now also calculated in .tock
  • It does support overlapping timers
  • It lacks OpenMP Support
  • Results are returned as a (potentially huge) Dataframe to R
    • Results need to be grouped / summarized in R (which is potentially unefficient )
  • It misses some less important options

Improvements RcppTimer

Speed

Improve Speed

Using std::map to match .tick() and .tock() calls is efficient:

std::map<std::string, tp> tickmap;

// Start timer: write name + time into tickmap
void tick(std::string name)
{
    tickmap.insert(std::pair<std::string, tp>(name, sc::high_resolution_clock::now()));
}

// Stop timer: write duration into timers, save key 
void tock(std::string name)
{
    timers.push_back(
      sc::duration_cast<sc::nanoseconds>(
        sc::high_resolution_clock::now() - tickmap[name]
        ).count()
      );
    keys.push_back(name);
}

Commit

Parallelism

OMP Support

At this stage, RcppTimer can’t handle OpenMP parralelism as it can’t distinguish between threads.

Simple solution: Add Threadnumber to the Key of the tickmap.

using key = std::string;
std::map<key, tp> tickmap;

// Start timer: write name + time into tickmap
void tick(std::string name)
{
    tickmap.insert(std::pair<key, tp>(name, sc::high_resolution_clock::now()));
}

Commit

OMP Support

At this stage, RcppTimer can’t handle OpenMP parralelism as it can’t distinguish between threads.

Simple solution: Add Threadnumber to the Key of the tickmap.

using keypair = std::pair<std::string, int>;
std::map<keypair, tp> tickmap;

// Now key is augmented by the thread number
void tick(std::string name)
{
    keypair key(name, omp_get_thread_num());
    tickmap.insert(std::pair<keypair, tp>(key, sc::high_resolution_clock::now()));
}

Commit

OMP Memory Access

OMP Memory Access

Parallelism can cause memory access problems

But we can “lock” objects in OMP threads, before accessing them


using keypair = std::pair<std::string, int>;
std::map<keypair, tp> tickmap;

// Now key is augmented by the thread number
void tick(std::string name)
{
    keypair key(name, omp_get_thread_num());
    
    
    tickmap.insert(std::pair<keypair, tp>(key, sc::high_resolution_clock::now()));
    
}

OMP Memory Access

Parallelism can cause memory access problems

But we can “lock” objects in OMP threads, before accessing them


using keypair = std::pair<std::string, int>;
std::map<keypair, tp> tickmap;

// Now key is augmented by the thread number
void tick(std::string name)
{
    keypair key(name, omp_get_thread_num());
    #pragma omp critical
    {
        tickmap.insert(std::pair<keypair, tp>(key, sc::high_resolution_clock::now()));
    }
}

Convenience

Automatically Return Results to R


Use deconstructor to pass results to R

// Destroy - pass data to R
~Clock()
{
    DataFrame df = DataFrame::create(
        Named("Name") = names,
        Named("Nanoseconds") = timers);
    df.attr("class") = "cppclock";
    Environment env = 
      Environment::global_env();
    env[name] = df;
}

Commit

Scoped Timer

The Timer::ScopedTimer calls

  • tic at instantiation and
  • toc as it goes out of scope
void greet(const std::string lang) {

    Rcpp::Timer timer;                              // Init Timer instance


    if(lang == "german"){
      cout << "Servus, Haselnuss";
    }else{
      cout << "Hello ISF"
    }
}

Commit

Scoped Timer

The Timer::ScopedTimer calls

  • tic at instantiation and
  • toc as it goes out of scope
void greet(const std::string lang) {

    Rcpp::Timer timer;                              // Init Timer instance
    Rcpp::Timer::ScopedTimer isf(timer, "greet");  // Measure exec time

    if(lang == "german"){
      cout << "Servus, Haselnuss";
    }else{
      cout << "Hello ISF"
    }
}

Commit

Efficiency

Efficiency

Returning whole timers and names vectors is unnecessary

  • They may contain lots of durations for the same region

New aggregate() method

  • Extracts unique names from names vector, Iterates over all timers
  • Calculates mean and variance

Destructor calls aggregate():

~clock() stop() { aggregate() Return results }

Commit

Fragmentation

Fragmentation

rcpptimer

Includes cpptimer as submodule

R-Specific Code

  • stop()
  • Warnings

cpptimer

Shared C++ core

Header only

Contains aggregate method

cppytimer

Includes cpptimer as submodule

Python specific code

  • stop()
  • Warnings

Wrap-up

Wrap-up

Simple

//[[Rcpp::depends(rcpptimer)]]
#include <rcpptimer.h>

void main(){
  Rcpp::Timer timer;
  Rcpp::Timer::ScopedTimer _(timer, "ST");

  timer.tic("First");
  timer.tic("Second");
  timer.toc("First");
  timer.tic("Second");
}

Wrap-up

Simple

//[[Rcpp::depends(rcpptimer)]]
#include <rcpptimer.h>

void main(){
  Rcpp::Timer timer;
  Rcpp::Timer::ScopedTimer _(timer, "ST");

  timer.tic("First");
  timer.tic("Second");
  timer.toc("First");
  timer.tic("Second");
}
[R] times
  Name Milliseconds    SD Count
First         0.002 0.001    10
Second        0.048 0.011    10
ST          658.280 0.000    1

Wrap-up

Simple

//[[Rcpp::depends(rcpptimer)]]
#include <rcpptimer.h>

void main(){
  Rcpp::Timer timer;
  Rcpp::Timer::ScopedTimer _(timer, "ST");

  timer.tic("First");
  timer.tic("Second");
  timer.toc("First");
  timer.tic("Second");
}
[R] times
  Name Milliseconds    SD Count
First         0.002 0.001    10
Second        0.048 0.011    10
ST          658.280 0.000    1

Powefull

  • Overlapping timers
  • OMP support
  • Auto-Return to R
  • Reset method
  • Sanity checks
  • Countless speed optimizations
  • Microseconds resolution
  • On Cran: rcpptimer
  • Student presentation