Category: Research
INSIGHTS Research
By Leo Carlos-Sandberg, 05/10/2021
Hi, I’m Leo, an AI Researcher at Arabesque AI working in research and development. Specifically, I focus on our input data, analysing it, understanding its structure and processing it. I have a background in finance, computer science, and physics and have seen how all three disciplines deal with complex systems. This piece will give an overview of complex systems, their importance, and some associated challenges. I’ve written this overview to illustrate the difficulty of understanding financial markets and the need for highly sophisticated approaches.
Complex systems
Whether or not you realise it, your life has been impacted by complex systems. These systems are everywhere and lead to much of the complexity associated with decision making in the natural world. A complex system is a system composed of interacting components. Some well-known complex systems are the human brain (interacting neurons), social group structures (interacting people), gases (interacting particles), and financial markets (interacting market participants). Often, multiple different complex systems can be derived from a single system. Take, for example, financial markets; complex systems may be composed of traded assets (for stock price analysis), banks (for bankruptcy risk analysis), and non-market values with traded assets (for an investigation of the impact of ESG data on stock price).
Emergent behaviour
The construction of a complex system is often simple, merely composed of interacting components, furthermore in many cases these components and/or interactions are themselves simple in nature and well understood. However, even simple components with simple interactions can, at a large scale, exhibit a phenomenon known as emergent behaviour. Emergent behaviour can be seen as the complex behaviour of a system that is not immediately obvious from its individual interactions. A perfect example of emergent behaviour is illustrated in Conway’s Game of Life, a 2D grid where each square (cell) may be either black or white (alive or dead) based on the following rules:
1. Any live cell with two or three live neighbours survives.
2. Any dead cell with three live neighbours becomes a live cell.
3. All other live cells die in the next generation. Similarly, all other dead cells stay dead.
This game is given an initial configuration and then repeatedly iterated (with each iteration being a new generation) based on the above rules. This setup is relatively simple but can lead to the occurrence of some impressive emergent behaviour, as shown below. In fact, this game is Turing complete1, and from these simple rules, people have even been able to make the Game of Life within the Game of Life!
This emergent behaviour becomes important for real-world systems, though it can be benign, it may also be beneficial or harmful. An example of harmful emergent behaviour is ‘hot potato trading’ between high-frequency market markers. This behaviour causes a market maker to buy assets and then rapidly sell them to another market maker who repeats the process (similar to passing a hot potato between a group of people). This artificially increases the trading volume of the asset heavily affecting the market price. Some hypothesise that this behaviour was a significant contributor to the 2010 flash crash. More broadly, emergent behaviour can also be seen in financial markets in the form of regimes defining high-level market behaviours, such as bubbles.
Due to the potentially significant impact of these emergent behaviours, those participating in the markets should have methods incorporating these complexities to make well-informed decisions.
A popular approach to investigating emergent behaviour is through simulation, which is often done in two stages. First, a model of the system is created by building pieces of code to act as each component in the system, and allowing these components to exchange information. These types of models are referred to as agent-based models. These models can be given initial conditions and run, allowing for emergent behaviour to occur naturally in a system that is amenable to analysis (with its interactions and behaviours tracked). Second, due to the randomness associated with running these models (or the lack of knowledge on the initial conditions) to gather an idea of the system these models will be run many times (under different conditions) with the results aggregated. This approach is known as a Monte Carlo method.
Relation discovery
So far, we have been discussing systems under the assumption that we know how the components interact; however, in reality this is rarely the case. Interactions are often unknown and challenging to discover, with the naive perturbation methods (impacting one component to see how others react) being either impossible or immoral (for very good reason, it is illegal to intentionally crash a market to see the reaction). Because of this, practitioners typically rely on statistical inference methods (methods that use data or results from the system to infer structure and knowledge of the system) to determine the relationship between components. Inference methods are particularly useful when considering data that is often not directly associated with the stock price, and hence the interactions are harder to infer, such as ESG data.
Statistical inference of relationships between components relies on a measurable output of each component; taking an example system composed of US technology companies, one could use the stock price of each of these companies. This naturally takes the form of a time series2 that then represents the company’s state throughout time, multiple time series of this type can then be compared to explore how the components of a system interact throughout time.
For time series data, relationship discovery often takes one of two conceptual approaches. Either the similar movement of the series, or the predictive information content of the series. Here we will discuss these by briefly describing two popular and straightforward methods, Pearson correlation, for the former, and Granger causality, for the latter. Pearson correlation measures how similar the movement of two series are, i.e. if one increases by an amount over time, does the other also increase? A strong correlation is often taken to imply that two components are linked. However, as the adage goes, correlation does not equal causation, and many things with a high correlation are obviously not linked, such as the decrease in pirates and the increase in global warming. A potentially more robust approach to relationship discovery is Granger causality. This is a measure of how much the past of one series can be used to predict the future of another series, over the predictive power of that series’ own history, e.g. if a series X and Y are identical, there would be no Granger causality, but there would be correlation. This measure has a sense of direction, where one variable “causes” another, and considers how much a series may predict itself. Both measures are quite simple, being based on linearity and bivariate systems, and many more simple and complex approaches exist.
Relationship patterns
Often statistical inference of relationships is discussed in the context of two variables, e.g. given X and Y, does one cause the other. However in reality it is unlikely that the approximation that a system is composed of only two variables, will hold. This is because most real-world systems have confounding variables (which is especially relevant when considering causation), that may affect both X and Y, e.g. a confounding variable Z may cause both X and Y, which can appear as X causing Y even though that is not the case. This makes statistical inference of causation and relationships within multivariate systems significantly more complex and challenging. These confounding variables need to be considered as the systems cannot be decomposed into bivariate ones.
To represent the relationship information of a multivariate system matrices are often used, frequently referred to as patterns preceded by the type of relationship being shown, e.g. causality pattern. An example of how a network of causality links can be encoded as a pattern is shown below3:
Time-varying relationships
Another level of complexity with real-world systems is that these relationships are frequently dynamic, changing in strength and even existence over time. Though this dynamic behaviour likely has logical and understandable causes, these may be occurring at such a low level as to not be viewable when modelling or analysing the system; for example, the investment strategy of an individual investor might change if that investor is about to go on holiday and wants to clear their positions, however investor’s vacation calendars are rarely public knowledge. Therefore, when discussing complex systems, it is likely that the components themselves could in theory be described by another system, and that these components’ behaviour is the emergent behaviour of that system. This abstraction (turning systems into components of larger systems) through introducing some level of randomness is necessary. If we were to build a true full model of a system it would require considering the entire universe and every particle in it, which is somewhat impractical.
Though systems do not exist in isolation many outside effects may be negligible or considered as noise, which, if kept at an appropriate level can be acceptable for a given objective. This type of abstraction can be seen in measures such as temperature: while technically a measure of particle excitement, its abstraction to a Celsius temperature (a simpler measure) is more appropriate for everyday use. Another example is in finance, where modelling the logic of every single investor may not be needed, and instead their actions may be taken as overall trends and behaviours, with deviations considered as noise.
To transform statistical approaches for inference of static relationships into ones for time-varying relationships windowing is frequently employed. Windowing breaks the time series data into segments, where the aforementioned methods can be applied to each segment sequentially instead of to the whole series. This produces a new series, where each item in the series is the matrix of relationships for the system during that segment. However, this introduces its own issues with one wanting short windows to capture short term behaviour but longer windows providing more robust statistical estimation.
The behaviour of these changing interactions (trying to find logic and predictability within it) is another area of research, adding even more complexity to these types of systems.
Concluding remarks
In this blog, I have gone through some causes of complexity in understanding, modelling, and predicting complex systems, such as financial markets. It should now be apparent that financial markets are rife with complex, multi-level, and non-obvious behaviours and connections. The analysis of financial markets is by no means a solved problem. To gain a greater understanding of its complex and dynamic nature, advanced tools and techniques need to be developed and implemented. This type of advancement can be seen in the work done here at Arabesque AI.
Footnotes:
1: In principle a Turing complete system could be used to solve any computation problem [8].
2: A time series is a sequence of data points that occur consecutively over a time period.
3: In the network diagram circles represent components of the system and arrows the causal links between them. In the causality pattern 0 represents no causality and 1 represents a causal link.
References:
McKenzie. R. H, 2017, Emergence in the Game of Life, blogspot, viewed 1 September 2021, <https://condensedconcepts.blogspot.com/2017/09/emergence-in-game-of-life.html >
Bradbury. P, 2012, Life in life, youtube, viewed 10 September 2021, <https://www.youtube.com/watch?v=xP5-iIeKXE8>
Court. E, 2013, ‘The Instability of Market-Making Algorithms’, MEng Dissertation, UCL, London
Benesty. J, Chen. J, Hung. Y, & Cohen. I, 2009, ‘Pearson Correlation Coefficient’, Noise Reduction in Speech Processing, vol. 2, pp. 1-4
Granger. C, 1969, ‘Investigating Causal Relations by Econometric Models and Cross-Spectral Methods’, Econometrica, vol. 37, pp. 424-438
Andersen. E, 2012, True Fact: The Lack of Pirates is Causing Global Warming, Forbes, viewed 2 September 2021, < https://www.forbes.com/sites/erikaandersen/2012/03/23/true-fact-the-lack-of-pirates-is-causing-global-warming/?sh=3a6520033a67 >
Jiang. M, Gao. X, An. H, Li. H, & Sun. B, 2017, ‘Reconstructing complex network for characterizing the time-varying causality evolution behavior of multivariate time series’, Scientific Reports, https://doi.org/10.1038/s41598-017-10759-3
Sellin. E, 2017, What exactly is Turing Completeness?, evin sellin medium, viewed 3 September 2021, <http://evinsellin.medium.com/what-exactly-is-turing-completeness-a08cc36b26e2>
INSIGHTS Research
What is the socio-economic impact of carbon emissions? Carbon dioxide and other greenhouse emissions impose a burden on society and future generations, also financially. Most economic transactions today, however, are underestimating the impact of carbon emissions.
In the third part of this four chapter series, Isabel Verkes looks at carbon taxes as an additional mechanism to put a price on carbon emissions. Needless to say, adding new taxes are often unpopular. Yet, under some circumstances, people can directly benefit from carbon taxes. Whether carbon taxes can be effective to price carbon, and drive lower emissions, depends on a variety of factors. This piece gives on overview of when and how carbon taxes will (not) work.
To read the full article, click here.
To read part one of the series, click here.
To read part two of the series, click here.
INSIGHTS Research
Adopted in 1997, the Kyoto Protocol set the basis for the development of a carbon market with the goal of limiting and reducing GHG emissions of industrialised countries and economies, in accordance with agreed targets by each member. The creation and regulation of this market was thought as an instrument to facilitate signatories to comply with emission targets (‘Parties’ assigned amount’), or global ceilings for greenhouse gas emissions.
In part two of this four-part research series, Maria Belen Ahumada provides an overview of compulsory and voluntary carbon markets. She focuses on the European regulated carbon market as she walks us through the EU ETS, addressing the role of the government, the market structure, the main products of the primary and secondary market as well as the main supply and demand drivers of carbon price. In addition, Belen outlines some of the main differences between regulated and non regulated carbon markets and discusses useful final remarks as we approach COP 26.
To read the full article, click here.
To read part one of the series, click here.
INSIGHTS Research
Within a relatively short timeframe, an array of top-down regulatory initiatives have been introduced by policy-makers, including the proposed French Climate and Resilience Law, the UK Climate Change Act, as well as the recently adopted European Climate Law. These are accompanied by a range of mechanisms intended to internalize the cost of carbon emissions, such as carbon trading schemes, renewable energy certificates and carbon offsets, which have emerged as an important driver of climate action by the public and private sector alike.
In this four-part research series, the Arabesque Research team explores the policy and market-based perspectives behind these emissions reduction measures, as well as the viability of carbon trading schemes, carbon taxation, as well as emissions offsets, within the context of the ongoing drive to reach net-zero.
In part one, Dr Inna Amesheva provides an introduction to the global carbon markets landscape, together with the underlying policy initiatives that underpin carbon trading regimes in key jurisdictions. She also outlines the main implementation mechanisms set out by the international climate change legal framework, as well as an overview of how this translates into private sector action and engagement.
To read the full article, click here.
INSIGHTS Research
By Sofia Kellogg, 6/09/2021
If you asked me four months ago, ‘What is unsupervised learning and how do we use it to train Machine Learning (“ML”) models?’ I would have given you a blank stare. My background is in political science and sustainability, with a general knowledge about AI. When I say general knowledge, this is what I pictured when I thought about AI:
During my first few weeks, I spoke to the AI Researchers and Engineers on our team to gain a better understanding of exactly what AI is and how we’re using it in the Arabesque AI Engine. Thanks to their guidance, I now understand how our AI Engine works (at a very high level – I am still by no means an expert). If you need help demystifying the AI Engine, continue reading and hopefully this will help you on your journey.
What is AI?
To start, what is AI? Artificial intelligence is a program with the ability to learn and reason like a human. Machine Learning (ML) is a subset of AI. ML is when an algorithm has the ability to learn by itself given some sort of input of data. An algorithm is a set of instructions that a computer program runs. Algorithms take in inputs and spit out outputs.
What is the Arabesque AI Engine?
The Arabesque AI Engine is a group of ML models that take in financial and non-financial data and work together to attempt to analyse patterns and behaviours in equity markets. In this process, the AI Engine can analyse significantly more data than one human could. It is designed to provide an unbiased analysis of a vast amount of data, and to extract potentially unique conclusions from that data. It also significantly reduces the complexity of the data, which creates a more scalable process. We use a combination of supervised and unsupervised learning, incorporating financial data and other inputs, to analyse the probability of the price of a stock going up or down in the future.
AI Engine Inputs
The AI Engine takes in data and outputs the signals (the price predictions) that we use in portfolio construction. We input a variety of financial and non-financial data into the Engine. For example, we look at price returns, net profit, earnings per share, and indices such as the S&P 500. For our input data, we need at least 10 years of all data history in order to make predictions. Additionally, we input non-financial data, such as news and media (via Natural Language Processing (“NLP”) methods) and ESG data from our sister company, Arabesque S-Ray®. All of this data is treated equally by the algorithms, and models learn to choose which ones are most relevant for the universe of the given asset. For each asset we want to analyse, the Engine predicts the probability of whether the price of an asset goes up or down relative to the relevant benchmark index.
Supervised Learning
Supervised learning is when a program takes an input and learns to assign it an output, which is known a priori. Feedback loops help to adjust a model’s output. To do this, we give a ML algorithm a bunch of labelled data that is used as a training example. One caveat: data can still be biased. If I train a model to classify cats and dogs but 99% of the pictures I feed the model are dogs, then the model will have a bias towards dogs. We have to make sure we feed the model relevant information. Below is an example of classification using supervised learning. We feed the algorithm labelled pictures of dogs and cats, which it then uses to categorise animals and give them corresponding labels
The algorithm learns a ML model from this data, and then to test the model, we give it an input it’s never seen before (like an unlabelled picture of a dog) to see if it sorts the new data appropriately and gives a correct output.
If the model does not sort the new data appropriately, we continue to try to improve the algorithm, typically through more training or training on a more diverse dataset (e.g. labradors, spaniels, as well as poodles). The problem with supervised learning is that there can be issues around human error or rare occurrences (i.e. we load one picture of a labrador but this is tagged as a spaniel). It can also take a long time to label all the data we need (although in this case, we would see a lot of cute dog photos which some of our team members wouldn’t mind at all).
Unsupervised Learning
Unsupervised learning is when a computer program must find structure in the input data because it is unlabelled. There are different types of unsupervised learning, such as clustering, reinforcement learning, dimensionality reduction, etc. The example below demonstrates clustering, in which we give the ML algorithm a set of inputs, the algorithm finds similarities between these inputs, and the ML model learns to group these inputs together. In the AI Engine, we have hundreds of input features, and we have to try to condense these data points. We use unsupervised learning to compress the input data.
Lifecycle of a ML Model
It is important to note that training a ML model is not a one-time process. We are constantly retraining our ML models in order to test their accuracy. Below is the lifecycle of a model, which demonstrates its constant evolution.
- Model retraining: Every few months, we re-train the whole model. We input data and the output is the model.
- Model testing: We take the model and put in new data to test the model’s accuracy.
- Model deployment: When a model passes our validation and performance tests, we put this model into production.
- Inference of the model: On a daily basis, we input new data and output predictions.
AI Engine (high-level) Architecture
Now that we understand how a ML model works, let’s look at how we can use this information in the AI Engine. Let’s say we want to analyse the price of a stock. These are the 3 main steps in the Engine pipeline:
- Extract Features (Encode): We first take our input data, convert the data to a list of numbers, and give this list to the Engine. At the beginning of the Engine pipeline, it extracts important features. Basically, it takes the long list of numbers and compresses it to a shorter list (this happens through the unsupervised learning we talked about earlier – more specifically through our encoder models). We have a lot of data points, which could confuse our models, so we try to decrease their number in order for the models to have an easier time analysing and forecasting the output, as well as discarding redundant data.
- Make Predictions (Serving the Model): We take this list of numbers and feed it into the machine learning models. Each model has learned with a different machine learning algorithm. The aim of this process is to improve the predictive accuracy of the model. Each of the models has its own prediction of what the output will be.
- Combine Predictions (Ensemble): At the end, the last machine learning model combines all of our model predictions into a unique prediction, which is our signal. We provide this to the portfolio construction team to invest on and supply to asset managers.
Wrapping Up
Hopefully the concepts of AI and the AI Engine are a little less scary than at the beginning of this article. Here are some key takeaways:
- Artificial intelligence is a program with the ability to learn and reason similar to a human.
- The AI Engine is a group of ML models that take in financial and non-financial data and work together to attempt to analyse patterns and behaviours in equity markets.
- Supervised learning is when a program takes an input and learns to assign it an output.
- Unsupervised learning is when a computer program must find structure in the input data because it is unlabelled.
- The lifecycle of a ML model includes: model retraining, model testing, and the inference of the model.
- The AI Engine extracts features, makes predictions, and combines these predictions to create a signal.
This article by no means covers all the complexities of the AI Engine and only begins to explain how the ML models work. Each of our prediction models could have entire research papers written about them! However, for an AI novice this is the right place to start. Perhaps I’ll try my hand at learning some coding basics next…
INSIGHTS Research
Despite the rising interest in veganism, the UN predicts that demand for animal-based products will still increase in line with rising populations, growing urbanisation and increased demand for more diversified diets from an emerging middle class. Consequently, they expect that meat and milk consumption will grow by 73 and 58 per cent, respectively, from 2010 to 20508. This continued increase in demand means that despite various harmful effects of meat production and consumption, the large-scale animal agriculture industry is here to stay.
In this blog, Martyna Szumniak and Dr. Roan du Feu explore the climate impact of vegan companies, meaning those that do not generate revenue from animal or dairy products and agriculture, and aim to answer the question “Is veganism a more sustainable choice when it comes to investing? “
Read the full blog here.
INSIGHTS Research
An article by Min Low
Since the birth of civilisation, language has been one of humanity’s greatest tools. Developing alongside human society, language has become more than simple communication and education, having the power to shape perspectives and attitudes towards the subject at hand.
As early as the 18th century, when shipbuilding and mining were consuming increasing amounts of wood [1], people in Europe have been conscious about resource sustainability. Then, in 1975, US scientist Wallace Broecker brought the term ‘global warming’ into the public’s consciousness after including it in the title of one of his papers [2]. Public awareness of the issues around sustainability and climate change has existed for decades, even centuries, but has never been as widespread as it is today. This growing recognition can be explored by taking a deeper look at how the language of climate change has evolved and the importance that this has.
To read the full article, click here
INSIGHTS Research
By Gavin Cheung, 03/08/2021
The last decade has seen a revolution in the field of AI stemming from advancements in machine learning (ML), deep learning and computer architecture.
The recent developments have been applied to a wide variety of fields such as computer vision, natural language processing, drug discovery, bioinformatics, self-driving vehicles, and recommender systems.
Even more impressive is that some of these systems can be run on your own personal laptop. For example, the figure below shows a demonstration of YOLOv3 that has been designed for object recognition in real-time. This system is powered by a complex deep learning model that has been tailored for its needs, but can be run on your laptop.
Thinking of the difficulty involved in the applications in other domains brings about the question of what AI can do in a financial setting, with its significant complexities, and the economic risk of going wrong. At Arabesque AI, our goal is to use AI to power customised, sustainable investing. We implement AI in our Engine which forecasts the price movement of equities for a given point in the future. In this article, we will explore some of the difficulties we face in applying machine learning to finance.
Data quality
The first topic to discuss is the issue of data quality. In the financial world, the abundance of data is not an issue. Data can easily be collected every second from a wide variety of sources such as instrument prices, news articles, stock fundamentals, social media posts, macroeconomic data, satellite images, ESG data, credit card transactions, footfall traffic and so on. Some of this data is classified as structured and typically has a numerical quantity and a well-defined structure (e.g. stock prices). Structured data is relatively easy to feed into a ML model. Other data, unstructured data, does not come in a pre-defined structure and often requires extra processing to extract meaningful information (e.g. news articles, social media posts or images). The difficulty of this information extraction process is clear if we take a news article which discusses ‘apples’ as an example. While a human would relatively easily identify that the news article is not talking about Apple (NYSE:AAPL) rather ‘apples’ the fruit, it is non-trivial to build an intelligent system that can replicate this feat.
The greater concern is not the quantity of data but rather its quality and usefulness, specifically the signal-to-noise ratio of the dataset. In a system as complex as the stock market, the reality is that the signal is highly drowned out by the noise. Therefore, our key challenge is building an intelligent system that can extract the meaningful signal in this sea of noise. We predominantly solve this with mathematical tools and techniques to reliably discern signal from stochastic fluctuations.
Data quality is an issue in all areas of AI. As previously seen (e.g. Amazon) your AI is only as good as the data it is trained on. A more finance specific problem we face is related to the timeseries nature of finance data – that is, events on Tuesday have to be analysed with the knowledge of other events which happened on Monday.
Non-stationarity
The timeseries nature of financial data makes the data ‘non-stationary’. ‘Stationary’ refers to data which largely stays the same over time. For example, we could train an AI to recognise images of ducks. If we show the AI an array of images of ducks, it will learn that if it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck. But most importantly, whether the picture is from 1900 or 2000, they will contain similar features that can be picked up by an AI. These features are stationary, and are what the AI uses to identify what the picture represents.
Compare this with financial data which exhibits highly non-stationary behaviour. This phenomenon is often summarised by the mantra: “past performance is no guarantee of future results”. There can be many patterns that can arise such as the stock price of a company going up if soybean futures go down and bonds go up. However, there is absolutely no guarantee that this will be the case again in the future. As AI mostly makes its decisions from past results, the ever-changing nature of the financial market poses a significant obstacle for any AI system. Although non-stationarity is a significant issue with developing an AI system, there are many ways to understand and remedy its effects. For example, we combat it with regular retraining of our models as well as many other safeguards.
Self-correction
Assuming that one can handle these problems, the next issue is the timescale for which the AI can produce profitable results. The reality is that the financial market is a complex dynamic system that will eventually autocorrect any profitable opportunities when saturated. Any predictable pattern that an AI could find may be washed out eventually as prices correct themselves. The only solution to this is to build a constantly evolving system that can keep up with the dynamical market. At Arabesque AI, we are tackling this problem with a large team of researchers to provide new ideas and state-of-the-art technology in order to build an AI system for finance that can withstand the test of time.
INSIGHTS Research
Dr Tom McAuliffe – Arabesque AI
In order to find an edge in hyper-competitive markets, at Arabesque AI we utilise ideas with their roots in a wide range of research areas, including computer science, maths, and physics. In this blog post we’ll consider computer vision (CV), a subfield of machine learning focused on the automated analysis of images. Artificial intelligence as a whole owes much of its success to breakthroughs in CV, and its broader relevance persists today.
During the early 2010s the video gaming industry was booming, contributing to an increased supply of affordable graphical processing units (GPUs). This brought the back-propagation algorithm back into prominence for neural network training – perfectly suited for the massively parallel capabilities of GPUs. The ImageNet challenge conceived by Li et al [1] in 2006, asks competitors to classify 14 million photographs into one of approximately 20,000 categories; by leveraging the power of GPUs, in 2012 the AlexNet [2] convolutional neural network (CNN) surpassed all competition. It achieved an error rate marginally over 15%, almost 11% better than its closest rival. This breakthrough in CV, powered by ImageNet, CNNs, and GPU-powered back-propagation was a paradigm shift for artificial intelligence.
In the pursuit of out-of-sample generalisation, classes of models have emerged that are very well suited to specific types of data. To understand why the CNN is particularly well equipped for the analysis of images, we need to take a closer look at its architecture – what is convolution?
Images are just grids of pixels. In order to generate useful (from an ML point of view) features from these pixels, one traverses a small matrix (filter) across the grid, from top left to bottom right, first performing a pixel-wise (or element-wise in matrix terms) multiplication of filter and target pixels, followed by summing the results (perform dot-products)[1]. This is schematically shown in Figure 1. Depending on the filters chosen, we can highlight specific features of the image, as shown in Figure 2 (in all figures we use 3-by-3 pixel filters). The ‘Sobel’ filters used in Figure 2 (b) and (c) correspond to highlighting abrupt changes in the horizontal and vertical directions respectively. The gradient (or ‘Scharr’) filter in (d) identifies object boundaries independent of direction. These are simple, linear examples. In a CNN, rather than specifying a-priori what the filter matrices should be, we allow the system to learn the optimal filters for the problem at hand. Rather than identifying horizontal edges, with enough complexity (neural network depth) a CNN learns to identify features as abstract as “cat ears” or “human faces”. This is achieved through hierarchical combinations of simpler features (like the horizontal-edge Sobel filter) noted above, akin to the human visual system [3].
In the years since AlexNet, we have seen increasingly highly performing architectures that, under certain conditions, transfer extremely well to other domains despite having been developed for more specialised subfields. Key examples of this are the CNN for CV and the Transformer for natural language processing. The transferability ultimately stems from shared symmetries in data. Convolutional models are so successful for CV applications because they utilise inherent symmetries present in natural images. As we saw in the simple example demonstrated in Figures 1 and 2, a 2D convolutional filter scans across a 2D image[2]. This very act of scanning a filter across the image is itself exploiting the fact that by definition, images emerge as strongly local groupings at multiple nested scales. Cat-ear pixels are very likely to be adjacent to additional cat-ear pixels, lending the possibility of an appropriately tuned filter specifically for cat-ears. As humans we have evolved to consider this concept as stating-the-obvious, but the same logic does not apply, for example, to rows and columns of a spreadsheet. Independent entries (rows) can be completely unrelated to nearby entries, and there is no importance to the ordering of the columns (features). If you were to randomly shuffle the columns of an image, the meaning would be completely lost.
In Machine Learning nomenclature, this type of resistance-to-shuffling is called translational symmetry. It is a property of images but is not tabular (spreadsheet) data. The ability of a model to exploit this symmetry is called an inductive bias.
And so we arrive at quantitative finance. At Arabesque AI we are particularly interested in identifying and analysing trends in capital markets, including stock prices. These prices form a time-series, another type of dataset that possesses translational symmetry. In this case the symmetry is due to natural (causal) order present in price movements. Time moves in one direction, so the ordering of our observations in the time dimension is important, and shuffling breaks continuity and causality. Rather than the 2D filters described previously, for a time-series we can perform exactly the same operation but with a 1D filter. Using a 1D-CNN in this way we can learn filters that, similarly to looking for abstract features like faces or cat-ears in an image, let us identify trends like ‘bull’ markets, ‘bear’ markets, and complex interactions between company fundamentals (revenue, profitability, liabilities, etc).
But why stop there? Rather than a 1D view of a time-series, which simply observes a value changing over time, approaches exist for fully converting a 1D time-series into a 2D image. We can then directly analyse these with CV techniques.
Following Wang & Oates [4], we can represent our time-series as a 2D space using the Gramian sum angular field (GSAF), Gramian difference angular field (GDAF), and the Markov transition field (MTF) transforms. We can also represent a time-series as a recurrence plot (RP), or by its Fourier transform (with real and imaginary components stacked so as to form a narrow image). These transforms are shown in Figure 5, implemented after Faouzi & Janati [5] for a historical returns time-series. Each transform shows its own idiosyncrasies and tends to highlight specific behaviours and features. Considering application to synthetic data in Figure 6, we take a closer look at how varying the frequency of a simple sine wave affects its GSAF transform.
Figure 6: The GSAF transform of a sine wave
With such transforms at our disposal, we can convert the time-series of equity prices, individual company fundamentals, and macroeconomic indicators (like US GDP, $ to £ exchange rate, etc) into 2D representations. This lets us consider slices of a market as a stack of images. For example, over the same 60-day period we could have images corresponding to each of asset daily returns, daily highest price, daily lowest price, with each pixel representing a single day. This makes up a data stack akin to the red, green, blue layers of a coloured digital image. Recent research from Zhang et al [6] applies a similar approach directly to a limit order book in order to aid predictions of financial instruments.
Machine learning is about transforming complicated data into useful representations. CV techniques are very powerful in learning the extremely complex interactions between pixels in an image of a cat, to the degree that they can distinguish it from those of a dog. This is achieved by learning to look for (and distinguish between) the abstract features of ‘dog ear’ vs ‘cat ear’. By exploiting the translational symmetries shared between time-series and natural images, CNNs are able to efficiently identify these complex interactions.
We have the choice to use such techniques in either a supervised or unsupervised learning paradigm. In the former, one may train a classification model to take such images as inputs, and predict a future price movement, similarly to classifying an image as containing a cat or a dog. In this setting we would provide a corresponding label to each image (or set of images), representing examples of the mapping from image(s) to label we wish to learn. In an unsupervised setting, we provide data but no labels. An auto-encoder model compresses the information stored in, for example, an image down to a handful of representative (hidden) features, which it then uses to reconstruct the input as accurately as possible[3]. Presented in Figure 7 is an example of a CNN auto-encoder trained to reconstruct GSAF-transformed features of a time-series. The input can be reconstructed well, meaning the low-dimensional representations we access through this model contain the same information as the original data.
Learning to find the most important parts of the dataset with unsupervised learning increases the efficiency with which we can handle data, reducing compute cost and permitting more algorithmic complexity. Convolutional architectures do this extremely well for images, and other data with translational symmetry. Identifying key features of a time-series with unsupervised learning remains an important research focus for us.
At Arabesque AI, we aim to forecast stock market movements using a wide range of models, but finding useful features of very noisy data remains a key challenge. We research and develop the powerful technology discussed in this post towards our core objective: accurately forecasting stock market movements with cutting edge machine learning.
[1] Note that this operation, performed in CNNs, is actually a cross-correlation rather than a convolution. The misnomer is due to the fact that a convolution operation flips the kernel before calculating the dot product, such that a copy of the filter is obtained from a convolution with a unit ‘impulse’. As CNNs are already a complex system, and we do not care about this specific property we drop the filter flipping, making the operation technically a cross-correlation. In the case of Figure 2, the symmetric filters mean that the convolution and cross-correlation operations are identical, but in CNNs the learned filters need not be symmetric.
[2] Note that the concept of ‘scanning’ is what is mathematically happening in this operation. This would be an inefficient algorithmic implementation.
[3] This is similar to the function of principal component analysis (PCA), widely used in quantitative finance to remove the market factor from a portfolio’s performance, but an auto-encoder can identify complex non-linear interactions that PCA does not see.
References
[1] Fei-Fei, L. Deng, J. Li, K. (2009). “ImageNet: Constructing a large-scale image database.” Journal of Vision, vol. 9 http://journalofvision.org/9/8/1037/, doi:10.1167/9.8.1037
[2] Krizhevsky, A. Sutskever I., Hinton, G.E. (2012). “ImageNet classification with deep neural networks,” Communications of the ACM, vol 60 (6), pp 84–90, doi:10.1145/3065386
[3] R. W. Fleming and K. R. Storrs. (2019). “Learning to see stuff,” Current Opinion in Behavioural Sciences, vol. 30, pp. 100–108.
[4] Wang, Z. and Oates T. (2015). “Imaging time-series to improve classification and imputation,” International Joint Conference on Artificial Intelligence, pp. 3939 – 3945.
[5] Faouzi, J. and Janati, H. (2020). “pyts: A python package for time series classification,” Journal of Machine Learning Research, 21(46): pp. 1−6.
[6] Zhang, Z. Zohren, S., Roberts, S. (2019) “DeepLOB: Deep convolutional neural networks for limit order books,” IEEE Transactions on Signal Processing, 67 (11): pp. 3001 – 3012.
INSIGHTS Research
There is much excitement in the machine learning community surrounding language models (LMs), neural networks trained to “understand” the intricacies of language, semantics, and grammar. These have revolutionised natural language processing (NLP). In this newsletter we’ll go over what they are, some examples of what they can do, and ethical implications to their use, that as a community we must consider.
LMs transform sentences into numerical (vector) representations, which are subsequently used as inputs to a more traditional machine learning model, such as classification or regression. They do this by modelling the statistical distributions of words in sentences; they are trained to predict the most likely words at a given position in a sentence given the surrounding context. The LM does a lot of heavy lifting in terms of finding useful and relevant representations of language, using the most efficient representation of the meaning of a sentence using a handful of real numbers.
Leveraging this approach, the introduction of BERT (Bidirectional Encoder Representations from Transformers) in 2018 by Google researchers [1] constituted a serious paradigm shift, outperforming the previous state-of-the-art LMs on eleven language modelling challenges. BERT was 7.7% better than the competition on the GLUE sentence modelling metric, which evaluates models’ ‘understanding’ of test sentences, which had previously been dominated by a type of recurrent neural network called an LSTM (Long-Short Term Memory). A lot of the success can be attributed to the introduction of a powerful new neural network architecture known as a Transformer, which has been widely adopted in other NLP frameworks, computer vision, and time-series modelling. Transformers are now a state-of-the-art neural architecture, bringing performance gains over traditional sequence models due to computational and data efficiency.
A major advantage of using LMs is that only a relatively small amount of labelled data is required to solve a supervised learning problem. Raw, unlabelled data is used to train the LM, for example the text of Wikipedia articles or Reddit posts – you just need a very large corpus of human-written text. Once the LM “understands” language, you can fine-tune a specific task with a handful of manually labelled examples to get good results. For example, only a handful of Amazon reviews, labelled by their ‘star’ rating, are required to train a product sentiment classifier if a LM is used. BERT is now used in production for almost every Google search query [2].
After BERT, which had 340 million parameters and was trained on about 16 GB of text, there was a quick succession of bigger and bigger LMs (with strictly increasing improvements in sentence “understanding” metrics as well as intended task performance). A few examples include:
- (2019) XLNet (340 million parameters, 126 GB of data)
- (2019) MegatronLM (8.3 billion parameters, 174 GB of data)
- (2020) T-NLG (17 billion parameters, 174 GB of data)
- (2020) GPT-3 (175 billion parameters, 570 GB of data)
GPT-3 is broadly the state-of-the-art, and is essentially an LM trained on the entire internet. It is an extremely powerful tool, and as a result of being publicly available, has been used for some mind-blowing applications (see here for some examples: https://www.educative.io/blog/top-uses-gpt-3-deep-learning).
GPT-3 based models can also do things like generate code (HTML, CSS, JavaScript) for websites that have been given a sentence prompting, as well as generate narrative stories, and even ask ‘Albert Einstein’ to teach us about astrophysics [3]:
Me: Teach me about black holes.
Albert Einstein: A black hole is formed when a star has burnt all of its fuel. There are two ways this can happen. If the mass of the star is less than a certain limit, known as the Chandrasekhar Limit, then it can end in a Type 1a supernova explosion. When the mass of the star is above that limit, however, it can collapse in on itself. The reason for this is that the star is being pulled in on itself by its own gravity. It is being accelerated due to its own gravity. When a force is applied to something that is being accelerated, its momentum increases. The more massive the object is, the more force it takes to keep it from slowing down.
Me: what is Chandrasekhar Limit?
Albert Einstein: The Chandrasekhar Limit is 1.44 solar masses. An object with a mass less than this limit will not collapse into a black hole. If it is heavier than this limit, then its gravity will not be strong enough to fight against itself and it will collapse into a black hole.
Me: who is Chandrasekhar
Albert Einstein: Chandrasekhar was a scientist who calculated this mass limit.
Me: When did Chandrasekhar die?
Albert Einstein: He died in 1995 at the age of 84.
While this is clearly extremely impressive technology, there are ongoing concerns about the environmental and ethical consequences of such powerful software. Training a BERT model (by now considered a relatively small LM) has been estimated to consume as much energy as a trans-American flight [4]. This is exacerbated by the fact that the model is often trained a few times to trial different hyperparameters. Cloud computing companies generally use some renewable energy sources and/or carbon credit offsetting, but the majority of energy used is non-renewable [5].
Furthermore, and potentially more worrying, Bender et al [5] note that the datasets used to train massive LMs vastly overrepresent racist, misogynistic, and white-supremacist views, which they suggest is a result of the predominance of this sort of text on the English internet. Machine learning models cannot be separated from their training data, and essentially replicate the patterns observed in training. McGuffie & Newhouse [6] show that it is relatively easy to use GPT-3 to generate large quantities of grammatically coherent, racist, or extremist text which can then be used, for example, to swiftly populate forums and message boards, with the intent to radicalise human readers.
The AI community has yet to agree on approaches for addressing such problems, but the consensus will likely involve a push towards better curated training data for powerful models. For example, Google have pushed this forward in image-based training data by releasing the ‘More Inclusive Annotations for People’ image dataset. This changes labels of humans within images from (person, man, woman, boy, girl) to (person), with secondary gender labelling of (predominantly feminine, predominantly masculine, or unknown) and age labelling of (young, middle, older, or unknown) [7]. On the NLP side, the ‘Translated Wikipedia Biographies’ dataset aims to provide a mechanism for assessing common gender errors in machine translation, such as an implicit grammatical assumption that ‘doctor’ refers to a man [8].
In this month’s Arabesque AI newsletter, we’ve discussed language modelling, some powerful examples of its use, and highlighted a handful of concerns toward their use. There’s no doubt that LM technology is extremely powerful and effective at the task it has been trained to perform, but as a community we must be aware of potential ethical caveats, as well as the evolution of real-world dangers.
Dr Tom McAuliffe – with thanks to Dr Isabelle Lorge (both Arabesque AI)
References
[1] Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. “Attention is all you need.” In NIPS. 2017.
[2] https://searchengineland.com/google-bert-used-on-almost-every-english-query-342193 (accessed 26/06/21)
[3] https://news.ycombinator.com/item?id=23870595 (accessed 26/06/21)
[4] Strubell, Emma, Ananya Ganesh, and Andrew McCallum. “Energy and policy considerations for deep learning in NLP.” arXiv preprint arXiv:1906.02243. 2019.
[5] Bender, Emily M., Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?🦜.” In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 610-623. 2021.
[6] McGuffie, Kris, and Alex Newhouse. “The radicalization risks of GPT-3 and advanced neural language models.” arXiv preprint arXiv:2009.06807. 2020.
[7] Schumann, Candice, Susanna Ricco, Utsav Prabhu, Vittorio Ferrari, and Caroline Pantofaru. “A Step Toward More Inclusive People Annotations for Fairness.” arXiv preprint arXiv:2105.02317. 2021.
[8] https://ai.googleblog.com/2021/06/a-dataset-for-studying-gender-bias-in.html (accessed 26/06/21)