3 Aug, 2021

Why is AI so difficult in finance?

INSIGHTS Research

By Gavin Cheung, 03/08/2021

The last decade has seen a revolution in the field of AI stemming from advancements in machine learning (ML), deep learning and computer architecture.

The recent developments have been applied to a wide variety of fields such as computer vision, natural language processing, drug discovery, bioinformatics, self-driving vehicles, and recommender systems.

Even more impressive is that some of these systems can be run on your own personal laptop. For example, the figure below shows a demonstration of YOLOv3 that has been designed for object recognition in real-time. This system is powered by a complex deep learning model that has been tailored for its needs, but can be run on your laptop.

Figure 1: Demonstrating use of AI for object recognition.
Source: https://www.youtube.com/watch?v=BNHJRRUKMa4

Thinking of the difficulty involved in the applications in other domains brings about the question of what AI can do in a financial setting, with its significant complexities, and the economic risk of going wrong. At Arabesque AI, our goal is to use AI to power customised, sustainable investing. We implement AI in our Engine which forecasts the price movement of equities for a given point in the future. In this article, we will explore some of the difficulties we face in applying machine learning to finance.

Data quality

The first topic to discuss is the issue of data quality. In the financial world, the abundance of data is not an issue. Data can easily be collected every second from a wide variety of sources such as instrument prices, news articles, stock fundamentals, social media posts, macroeconomic data, satellite images, ESG data, credit card transactions, footfall traffic and so on. Some of this data is classified as structured and typically has a numerical quantity and a well-defined structure (e.g. stock prices). Structured data is relatively easy to feed into a ML model. Other data, unstructured data, does not come in a pre-defined structure and often requires extra processing to extract meaningful information (e.g. news articles, social media posts or images). The difficulty of this information extraction process is clear if we take a news article which discusses ‘apples’ as an example. While a human would relatively easily identify that the news article is not talking about Apple (NYSE:AAPL) rather ‘apples’ the fruit, it is non-trivial to build an intelligent system that can replicate this feat.

The greater concern is not the quantity of data but rather its quality and usefulness, specifically the signal-to-noise ratio of the dataset. In a system as complex as the stock market, the reality is that the signal is highly drowned out by the noise. Therefore, our key challenge is building an intelligent system that can extract the meaningful signal in this sea of noise. We predominantly solve this with mathematical tools and techniques to reliably discern signal from stochastic fluctuations.

Data quality is an issue in all areas of AI. As previously seen (e.g. Amazon) your AI is only as good as the data it is trained on. A more finance specific problem we face is related to the timeseries nature of finance data – that is, events on Tuesday have to be analysed with the knowledge of other events which happened on Monday.

Non-stationarity

The timeseries nature of financial data makes the data ‘non-stationary’. ‘Stationary’ refers to data which largely stays the same over time. For example, we could train an AI to recognise images of ducks. If we show the AI an array of images of ducks, it will learn that if it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck. But most importantly, whether the picture is from 1900 or 2000, they will contain similar features that can be picked up by an AI. These features are stationary, and are what the AI uses to identify what the picture represents.

Compare this with financial data which exhibits highly non-stationary behaviour. This phenomenon is often summarised by the mantra: “past performance is no guarantee of future results”. There can be many patterns that can arise such as the stock price of a company going up if soybean futures go down and bonds go up. However, there is absolutely no guarantee that this will be the case again in the future. As AI mostly makes its decisions from past results, the ever-changing nature of the financial market poses a significant obstacle for any AI system. Although non-stationarity is a significant issue with developing an AI system, there are many ways to understand and remedy its effects. For example, we combat it with regular retraining of our models as well as many other safeguards.

Figure 2: An AI determining whether an image is a duck (top) and an AI determining whether to buy or sell a stock from the price (bottom)

Self-correction

Assuming that one can handle these problems, the next issue is the timescale for which the AI can produce profitable results. The reality is that the financial market is a complex dynamic system that will eventually autocorrect any profitable opportunities when saturated. Any predictable pattern that an AI could find may be washed out eventually as prices correct themselves. The only solution to this is to build a constantly evolving system that can keep up with the dynamical market. At Arabesque AI, we are tackling this problem with a large team of researchers to provide new ideas and state-of-the-art technology in order to build an AI system for finance that can withstand the test of time.