FRTB: Sparking new approaches for big data analytics

FRTB: Sparking new approaches for big data analytics

The introduction of the Basel Committee’s Fundamental Review of the Trading Book (FRTB) standards involves a comprehensive overhaul of banks’ market risk capital frameworks. The move from value-at-risk (VaR) to scaled expected shortfall (ES) in order to capture tail risk will significantly increase the number and complexity of the capital calculations that banks need to undertake, as well as the sheer volume of data they must manage.

From a computation perspective, this means that P&L vectors need to be generated per risk class, per liquidity horizon and per risk set. Removing the redundant permutations brings the total number of P&L runs to 63 (some of which can be done weekly), compared to two (VaR and Stress VaR) in the current approach.

Firms are faced with the challenge of performing a significantly increased range of FRTB capital calculations at scale while also managing their costs and risk. The question is: are banks’ current IT risk infrastructures up to the task ahead?

If banks want to achieve proactive and intraday risk management while also effectively managing their capital over the long-term, they will require high-performing IT infrastructure that can handle the intensive calculations required. However, many banks today rely on technologies such as relational databases and in-memory data grids (IMDGs) to conduct risk analytics, aggregation and capital calculations.

IMDGs work by replicating data or logging updates across machines. This requires copying large amounts of data over the cluster network, which has a far lower bandwidth than that of RAM. As a result, IMDGs incur substantial storage overheads, are sub-optimal when applied to pure analytics use cases, such as FRTB analytics, and are expensive to run.

In short, banks’ legacy IT architectures will need a significant overhaul when it comes to FRTB and firms are looking for alternative options. One of those options is Apache Spark, an open source processing engine built around speed, ease of use and sophisticated analytics.

Spark has a distributed programming model based on an in-memory data abstraction called Resilient Distributed Datasets (RDDs) which is purpose built for fast analytics. RDDs are immutable, support coarse-grained transformations and keep track of which transformations have been applied to them. RDD immutability rules out a big set of potential problems due to updates from multiple threads at once and lineages that can be used for RDD reconstruction. As a result, check pointing requirements are low in Spark. This makes caching, sharing and replication easy. These are significant design wins and there are other advantages over IMDGs too:

  • Memory optimisation: IMDGs require the entire working set in memory only and are limited to the physical memory available. Spark can spill to disk when portfolios do not fit into memory making it far more scalable and resource efficient.
  • Efficient joins: IMDGs have fixed cubes and cannot do joins across datasets. Spark supports joining of multiple datasets natively. This allows more flexible reporting without the need for new cubes and additional memory. Joins are very performant as Spark does a broadcast behind the scenes of smaller datasets. Broadcasts are based on a peer-to-peer BitTorrent-like protocol.
  • Polyglot analytics: Spark supports custom aggregations and analytics which can be implemented in a variety of languages: Python, Scala, Java or R. IMDGs allow only limited SQL or OLAP expressions.
  • Multi-tenant support: Spark supports dynamic resource allocation, resource management, queues and quotas, allowing multiple users and processes to be supported on the same cluster. Some of these include: operations reporting, decision support, what-if and back testing.
  • Frugal hardware requirements: The immutable nature of RDDs enables Spark to scale and provide fault tolerance efficiently. A Spark cluster is highly available without the need for Active-Active hardware.

In fact, our own studies have demonstrated many of these capabilities and showed the power of Spark in terms of performance, scalability and flexibility. For example, we recently completed a proof-of-concept with a European bank for our capital analytics and aggregation engine, FRTB Studio, which showed that the engine can support the capital charges for IMA and SA in single digit seconds based on a portfolio of one million trades with 9 million sensitivities, 18 million P&L vectors and on hardware costing just USD 20,000.

As one of the most active projects in the Apache Software Foundation, Spark benefits from thousands of contributors continuously enhancing the platform. In fact, we’ve seen a 20% improvement in Spark aggregation performance year-on-year since we started building our solutions on the platform in 2016. We’re excited to see the improvements that are bound to come in the year ahead!



 
Tel: +44 20 562 6630

 
paul.jones@ihsmarkit.com