Using semantic technologies for GRC in the financial industry

July 4, 2016

Peter G Cowap and Professor Tom Butler from the GRC Technology Centre talk about how semantic technologies can be employed to solve the problems of regulatory compliance and risk management in the financial industry…

Much has changed in the financial industry since 2008. In 2016, the industry faces higher levels of regulatory oversight than ever before. This is reflected in the truly enormous volume, variety, breadth, and complexity of regulations that need to be complied with. This places enormous burdens on financial organisations of all shapes and sizes. While compliance with regulatory principles and rules is now more problematic, so too is the management of risks, such as operational risks, particularly sub-categories such as conduct risk and cyber risk. Thus, the problems of governance, risk management, and compliance reporting have grown to almost unmanageable proportions.

In response to these challenges, the Governance Risk and Compliance Technology Centre (GRCTC) was instituted in 2013 by Enterprise Ireland and the IDA, on behalf of the Irish Government. Its mission was to investigate how semantic technologies could be employed to solve the problems of regulatory compliance and risk management in the financial industry. In this, the GRCTC is unique as it is the only industry-led, multi-institutional, and multi-disciplinary research centre to be addressing such problems.

Avoiding the groundhog day of regulatory compliance in the financial industry

It is clear from our research at the GRCTC that the approaches financial institutions are applying in order to deal with the mountain of ever-complex regulations are analogous to that of the mythical Greek King Sisyphus, or the experiences of Phil Connors, the central character in the film Groundhog Day. Financial institutions are seemingly repeating the same regulatory compliance process over and over and are not developing a cumulative knowledge of financial regulations and rules. They are effectively stuck in what Harvard Business School Professor Chris Argyris called single-loop learning. Take the Markets in Financial Instruments Directive (MiFID), for example, researchers at the Governance Risk and Compliance Technology Centre found no evidence of cumulative, double-loop learning in major financial institutions as they began to address MiFID II. And so it will be with MiFID III, when it comes along, as the organisational mental models or knowledge bases are not enriched by previous experience in unpacking MiFID II.

The multi-disciplinary team of researchers at the GRC Technology Centre have developed a standards-based methodology for unpacking regulations and extracting the knowledge contained therein. The methodology’s semantic model and related guidelines form the architectural basis of a prototype software application called Ganesha—named after the Hindu God associated with wisdom and learning, and known as the remover of obstacles. Ganesha is designed to be both a regulatory compliance information system and knowledge base. Based on the Object Management Group (OMG) standard, the Semantics of Business Vocabulary and Rules (SBVR), the methodology, guidelines, and Ganesha software application enable business practitioners to capture regulations in a Regulatory Natural Language (RNL), that is both human readable (in structured English) and machine readable in XML and, in a future version, RDF/OWL. Essentially, it stores the unstructured data of a regulatory text or rule as semantically enriched structured data in XML and, ultimately, in an RDF triple store or knowledge base. This XML data store/knowledge base can then be queried using Xquery/SPARQL to identify obligations, derogations, exemptions, exclusions, etc. expressed as regulatory rules. The advantage of this approach is that an organisation can cumulatively build a knowledge base of regulations represented as regulatory vocabulary elements and regulatory rules.

There are currently no solutions available on the market that enable legal and financial subject matter experts to capture, store and transfer/share knowledge of regulatory compliance imperatives according to an international standard and map these onto business vocabularies and rules to enable the drafting of governance policies and inform regulatory compliance. Ganesha currently provides the capabilities for legal or financial SMEs to cumulatively build disambiguated and clarified vocabularies and rules in a regulatory natural language (RNL) that is complete, logical and free of legalese, complexity, and ambiguity. We are also developing the capability to use a combination of semantic technologies such as our Financial Industry Regulatory Ontology (FIRO) and standard Natural Language Processing (NLP) tools to identify, extract and load into Ganesha obligations, prohibitions and so on, in a semi-structured XML format thereby automating part of this process. Building on these innovations, sophisticated tagging of knowledge in SBVR-compliant XML documents by the Ganesha application provides powerful capabilities for practitioners to query, extract, transform and load regulatory compliance information into a user interface for the purpose of knowledge sharing or training inexperienced personnel. Future versions will enable the creation of knowledge bases using sematic technologies such as RDF and triple stores.

Navigating the digital labyrinth of structured and unstructured data

At the core of many of the problems financial institutions face in managing risk is the manner in which they manage their data—structured and unstructured. In contrast with regulatory oversight, the approach has seen little change since 2008. It remains to be seen whether BCBS 239 will herald in a new era for risk data governance, let along risk data aggregation.

In an industry that generates more data, and spends more on its storage than any other, there still persists a basic inability to manage that data, to interconnect it, link it with external information, and to make inferences from disparate and diverse data, wherever it exists. This makes risk management and compliance reporting hugely problematic and expensive.

With few exceptions, the current fragmented offerings from the FinTech sector are merely adding to the digital labyrinth, as new structured and unstructured data silos are being created. The same can be said of the nascent RegTech sector in terms of offering comprehensive solutions for the particular problems faced by the financial industry. Thus firms are not addressing the core problems of developing a common language for their data, or agreed conceptions of the risks they face, that would, in turn, enable data integration and make risk data aggregation a reality.

The solution to the problem of the digital labyrinth is technically feasible and practically possible, although there are few players in the market providing comprehensive solutions for the financial industry. One approach that is receiving much attention is Data Virtualisation. This approach provides access to data directly from one or more disparate data sources, without physically moving the data, and presenting it in a form that makes the technical complexity transparent to the end-user. There is broad agreement across industry sectors that semantic metadata is required to make data virtualisation and other NoSQL approaches work.

In our view, semantic metadata should be expressed in the Web Ontology Language (OWL). Indeed this is exactly the approach adopted by the EDM Council and the application of their Financial Industry Business Ontology (FIBO). In the area of operational risk, we have created the Financial Industry Operational Risk Ontology (FiORO). Two approaches are then available to underpin data virtualisation. In the first, the ontology-based semantic metadata is used to develop SPARQL queries, which are then translated to SQL queries that are then used to query relational database silos and extract the data of interest for further analysis, without disruption to the financial systems of record. In the second approach, semantic technologies are used to perform extraction, Transformation and Loading (ETL) of data from heterogeneous data stores, such as relational databases and spreadsheets, into RDF and persist it in a triple store, which is one type of graph data store. A combination of semantic metadata and instance data in the RDF triple store will enable enhanced querying and relationship identification among data. The existence of a semantic metadata in OWL will also enable sophisticated inferencing of data, to identify previously unknown relationships. Take, for example, FiORO, which could be applied to the classification of operational risk data with greater precision through the semantic enrichment of multiple relational databases and other data stores. It could also enable enhanced predictive analytics using the data. The second option also enables unstructured data to be queried.

Once of the clear benefits of such a model is that the semantic metadata model in OWL with instance data in RDF, and using Uniform Resource Indicators-URIs, is that such models can be linked automatically with related semantic models like the Financial Industry Business Ontology (FIBO) and any other standards-based knowledge base. Also unlike traditional SQL-based approaches, the model can be extended easily. In addition, adopting such an approach avoids the danger of floundering in the digital labyrinth.

Peter G Cowap

Centre Director

Professor Tom Butler

Centre Principal Investigator

GRC Technology Centre

info@grctc.com

www.grctc.com

https://twitter.com/GRCTC_Tweet

Please note: this is a commercial profile