MASTER THESIS Term paper submitted in partial fulfillment of the requirements for the degree of Master of Science in Engineering at the University of Applied Sciences Technikum Wien - Degree Program Information Systems Management Opportunity detection and trade simulation system for arbitrage trading on the crypto market By: Lukas Fankhauser, BA Student Number: 2110302055 Supervisor 1: Mag. Robert Jonas Supervisor 2: Sophie Weidenhiller, BA Vienna, 28/05/2023 Declaration of Authenticity “As author and creator of this work to hand, I confirm with my signature knowledge of the relevant copyright regulations governed by higher education acts (see Urheberrechtsgesetz/ Austrian copyright law as amended as well as the Statute on Studies Act Provisions / Examination Regulations of the UAS Technikum Wien as amended). I hereby declare that I completed the present work independently and that any ideas, whether written by others or by myself, have been fully sourced and referenced. I am aware of any consequences I may face on the part of the degree program director if there should be evidence of missing autonomy and independence or evidence of any intent to fraudulently achieve a pass mark for this work (see Statute on Studies Act Provisions / Examination Regulations of the UAS Technikum Wien as amended). I further declare that up to this date I have not published the work to hand nor have I presented it to another examination board in the same or similar form. I affirm that the version submitted matches the version in the upload tool.” Vienna, 28/05/2023 Location, Date Signature 3 Kurzfassung Diese Arbeit untersucht das Potenzial des Arbitragehandels auf dem Krypto-Markt in einem breiteren Spektrum als bisher. Betrachtet werden 48 verschiedene Kryptowährungen der 100 größten Marktkapitalisierungen, die an 16 Krypto-Börsen gegen Euro gehandelt werden. Ein systematischer Ansatz wird anhand einer umfassenden Analyse historischer Daten der letzten 16 Monate angewandt, um die Assets mit dem höchsten Potenzial für Arbitragen auszuwählen. Diese Erkenntnisse werden durch die Entwicklung eines Prototyps für den Arbitragehandel unter Verwendung zentralisierter Krypto-Börsen erweitert. Die identifizierten Handelsmöglichkeiten werden auf Basis von Echtzeitdaten der Exchanges ermittelt und der Handel mit Market Orders als Paper Trading simuliert. Die Ergebnisse zeigen, dass der Krypto-Markt durch wiederkehrende Episoden von sich öffnenden und schließenden Arbitragemöglichkeiten gekennzeichnet ist, welche bis zu mehreren Tagen andauern können. Darüber hinaus weist dieser Eigenschaften auf, die das Auftreten von Arbitragemöglichkeiten begünstigen. Zusammenfassend wird aus den Ergebnissen abgeleitet, dass Arbitragehandel als profitable Strategie auf dem Krypto-Markt angesehen werden kann. Diese Thesis erweitert bestehende Studien über Arbitragehandel am Kryptowährungs-Markt, indem sie den Umfang der betrachteten Assets und Exchanges erweitert. Die Ergebnisse sind konsistent mit vorherigen verwandten Studien und bieten wertvolle Einblicke für Trader und Forschende, indem sie das Potenzial von Arbitrage-Strategien und die Bedeutung von automatisierten Handelslösungen hervorheben. Schlagwörter: Arbitrage Handel, Kryptowährungsmarkt, Automatisiertes Trading, Trading Strategien, Entwicklungs-Prototyp, Historische Preisdaten 4 Abstract This thesis examines the potential of arbitrage trading in the cryptocurrency market, in a broader spectrum, considering 48 diverse crypto assets within the top 100 with the highest market cap, traded against Euro, on 16 exchanges. To provide a systematic approach for selecting assets with the highest potential for arbitrage opportunities, a comprehensive analysis of historical data over the last 16 months is conducted. Further these findings, are extended with the development of an arbitrage trading prototype utilising centralized crypto exchanges. The performance of the identified opportunities is evaluated using real-time data of exchanges and trades are simulated with market orders via paper trading. The results show that the crypto market is characterised by recurring episodes of opening and closing arbitrage opportunities, which can last up to several days. In addition, the crypto market shows characteristics, which should encourage arbitrage opportunities to occur. Overall, the results indicate that arbitrage trading can be considered as a profitable strategy in the cryptocurrency market. It contributes to the existing body of knowledge on arbitrage trading in the cryptocurrency market by expanding the scope of assets and exchanges considered. The findings are consistent with previous connected studies and offer valuable insights for traders and researchers, highlighting the potential of arbitrage strategies and the significance of automated trading solutions. Keywords: Arbitrage Trading, Crypto market, Automated Trading, Trading Strategies, Development Prototype, Historical Pricing data 5 Acknowledgement I want to direct my appreciation and gratefulness towards multiple people, without these, this thesis would not have been finished. First of all, I would like to thank my supervisor at the FH Technikum Wien, Mag. Robert Jonas, for his valuable input and clear feedback to bring this thesis to its final stage. Moreover, his great responsiveness and the quick possibilities of setting up meetings, when needed, are worth emphasising. Second, I would like to thank my company supervisor at Autowhale GmbH, Sophie Weidenhiller, for her patience and valuable help with questions in the field of the crypto market. In addition, I would like to show my gratitude that it was possible as an external to write this thesis at this company, through which I was able to significantly increase my knowledge and interest in this area. Finally, words cannot express my gratitude to my family and friends for their support, motivation and keeping my spirits high throughout the process of writing this thesis. 6 Table of Contents 1 Introduction ....................................................................... ....................................... 8 1.1 Research Subject ....................................................................... .............................. 9 1.2 Structure ....................................................................... .......................................... 10 2 Current State of Literature and Technology ........................................................... 11 2.1 Literature Review ....................................................................... ............................ 11 2.1.1 Search Process ....................................................................... ............................... 11 2.1.2 Literature Summary ....................................................................... ......................... 13 2.1.3 Literature Analysis ....................................................................... ........................... 14 2.1.4 Conclusion ....................................................................... ...................................... 15 2.2 Arbitrage Trading ....................................................................... ............................ 16 2.2.1 Categorization and importance in trading strategies .............................................. 16 2.2.2 History of arbitrage trading ....................................................................... .............. 17 2.2.3 Types of arbitrage trading ....................................................................... ............... 18 2.3 Crypto market ....................................................................... .................................. 22 2.3.1 Types of exchanges ....................................................................... ........................ 22 2.3.2 Fees ....................................................................... ................................................ 24 2.3.3 Price formation ....................................................................... ................................ 25 2.3.4 Order Books and corresponding definitions ........................................................... 27 2.3.5 Theoretical concepts for Arbitrage Trading ............................................................ 29 3 Methods ....................................................................... .......................................... 34 3.1 Data collection & management ....................................................................... ....... 35 3.2 Concept for crypto asset filtering ....................................................................... ..... 36 3.3 Arbitrage trading prototype ....................................................................... .............. 36 4 Data collection & management ....................................................................... ....... 38 4.1 Data sources ....................................................................... ................................... 38 4.2 Assets and Exchanges ....................................................................... .................... 40 4.3 Collection of data ....................................................................... ............................ 44 4.4 Data pre-processing ....................................................................... ........................ 46 5 Crypto asset filtering ....................................................................... ....................... 47 5.1 Data cleaning & adjustments ....................................................................... .......... 47 5.2 Measurements ....................................................................... ................................ 48 7 5.2.1 Arbitrage Index ....................................................................... ................................ 48 5.2.2 Price differences ....................................................................... ............................. 50 6 Arbitrage trading prototype ....................................................................... .............. 52 6.1 Programming Language ....................................................................... .................. 52 6.2 Prototype Architecture ....................................................................... ..................... 52 6.3 Finding Arbitrage Opportunities ....................................................................... ...... 54 6.4 Exchange Connections ....................................................................... ................... 54 6.5 Dictionaries ....................................................................... ..................................... 58 6.6 Testing ....................................................................... ............................................ 60 7 Results ....................................................................... ............................................ 61 7.1 Data collection & management ....................................................................... ....... 61 7.2 Crypto asset filtering ....................................................................... ....................... 63 7.2.1 Results for Arbitrage Index ....................................................................... .............. 63 7.2.2 Results for Price Differences ....................................................................... ........... 69 7.3 Arbitrage trading prototype ....................................................................... .............. 75 8 Discussion ....................................................................... ....................................... 77 8.1 Interpretation of results ....................................................................... ................... 77 8.2 Limitations ....................................................................... ....................................... 79 8.3 Outlook & Future work ....................................................................... .................... 80 9 Conclusion ....................................................................... ...................................... 82 8 1 Introduction The beginning of today’s crypto market was marked by Satoshi Nakamotoʼs white paper "Bitcoin: A peer-to-peer Electronic Cash System", which was published in October 2008 (Nakamoto, 2008). The first software for it was released in January 2009. Bitcoin was the first widely adopted mechanism to provide absolute scarcity of a money supply, which used cryptography to control the distribution and creation without the need for centralized authorities like banks or governments (Böhme et al., 2015). After the appearance of Bitcoin, more and more cryptocurrencies and exchanges appeared over the years, leading to a market of thousands of assets and hundreds of exchanges with a market capitalization of over 1 trillion dollars as of April 2023 (“Total Cryptocurrency Market Cap,” 2023). Trading strategies are the basis for amateur or professional traders to generate profits in asset markets like the crypto or stock market. Some of these strategies, which are often automated, are also beneficent to the market. In the case of arbitrage trading, it benefits price stability and reduces price differences. The idea lies in benefiting from market inefficiencies (Heckel and Waldenberger, 2022), by selling an asset on a higher and buying on a lower priced exchange simultaneously. Although not every trading strategy of traditional financial markets can be applied in the crypto market. In theory, economic equilibrium supporting hypotheses like the Efficient-Market Hypothesis or the Law of one Price, which will be explained in the following chapters, should not allow market inefficiencies, and therefore arbitrage opportunities to appear. Still several papers showed, that these exist also over longer periods and with greater price differences across geographical regions (Brauneis and Mestel, 2018; Duan et al., 2021; Makarov and Schoar, 2020). In addition, crypto markets show characteristics which, by theory, should encourage arbitrage opportunities to occur, such as fast-moving markets, low regulations, high accessibility, high number of speculators and often inefficient markets (Al-Yahyaee et al., 2020; Duan et al., 2021; Dwyer, 2015; Levus et al., 2021; Makarov and Schoar, 2020). Therefore, the motivation for this thesis is to address these opportunities in more detail for multiple crypto exchanges and assets. Addressing the research gap, the crypto market is still vastly researched academically, in comparison to other financial markets, but there was an increase after the spike in 2017, when the market generated a lot of public attention. To date, only a limited number of academic studies have addressed the topic of arbitrage trading for the crypto market, with the majority focusing on just a few cryptocurrencies, mostly Bitcoin and Ethereum. Additionally, most papers focus on the general arbitrage conditions for this market and on trading pairs, which are traded against US-Dollar. Very few papers considered diverse assets, diverse exchanges and none known did this for Euro (EUR) pairs. In the scope of this 9 thesis, research will be expanded to include a broader range of crypto assets and exchanges, to provide valuable insights into the efficiency and opportunities for arbitrage trading within the cryptocurrency market. Additionally, a software prototype for an arbitrage trading system will be implemented. 1.1 Research Subject In this thesis, four research questions (RQ) were formulated. One of them is the main research question and the other three pose one question per method. Nr. Research Question Methods 1 How are traditional trading strategies of financial markets, such as arbitrage trading, also applicable with assets on the crypto market in an automated way? Data collection & management, Crypto asset filtering, Arbitrage trading prototype 1.1 Which information must be gathered to enable decision making for arbitrage trading opportunities? Data collection & management 1.2 What are the requirements and criteria, for a crypto asset to be considered for arbitrage-trading systems? Crypto asset filtering 1.3 How can an arbitrage trading strategy in the crypto market be realized as a software prototype? Arbitrage trading prototype Table 1: Research Questions and used methods. The expected results for the main research question are to demonstrate the existence and exploitability of arbitrage opportunities in the crypto market. This will be shown by a concept for finding suitable assets and exchanges based on historical pricing data. Subsequently, these findings are validated, by implementing a development prototype for arbitrage trading, which evaluates arbitrage opportunities with live data of exchanges. The research aims to provide evidence that arbitrage trading can be a potentially profitable strategy also in the crypto market. For RQ 1.1, the expected results are to discover information and requirements needed to determine the suitability of assets and exchanges for arbitrage trading. This is demonstrated by the provision of historical price data of the last 16 months in the correct format and structure, which are required for further data-processing and for enabling decisions to be made. 10 For RQ 1.2, the expected outcomes are to prove the existence of arbitrage opportunities with a concept to evaluate suitable crypto assets and exchanges. This is shown by conducting an analysis with two chosen mathematical measurements, based on historical pricing data. The research is expected additionally, to conclude that discovered arbitrage opportunities, can also be found with an arbitrage trading prototype with live data of exchanges. For RQ 1.3, the expected results are to evaluate that arbitrage opportunities can not only be found based on historical pricing data, but also live between crypto exchanges. This is shown by identifying and utilising the correct technologies and developing a functional software prototype, which is capable of connecting to real-time data feeds of crypto exchanges and can discover price differences for an asset and simulate trades based on it. 1.2 Structure The content of this thesis consists of nine chapters. First, a general introduction of the topic and proposing the research gap, motivation and research subject is written. In the second chapter, a comprehensive literature research is conducted, explaining the relevant basics and correlations of arbitrage trading and the crypto market. Subsequently, the scientific methods used are explained, followed by the implementation of each of those three in separate chapters. Next, the results of the methods are presented. Thereafter, those are discussed, limitations are presented, and an outlook on future work is given. Finally, the outcomes and research subject of the applicability of arbitrage trading in the crypto market are concluded. 11 2 Current State of Literature and Technology 2.1 Literature Review Arbitration is not a novel idea and has been around for a long time. Banks and other financial institutions all over the world have been utilizing this technique in the stock, forex, commodity and other markets for decades. The intention in this thesis is to review and implement the application of arbitrage trading in the crypto market in a similar manner, building on findings of existing literature. The cryptocurrency market presents itself as no exception to this concept, as the divergences between prices are often greater compared to traditional markets. This seems to be due to the large number of cryptocurrencies and exchanges, low government regulation, decentralization, high degree of volatility and a high number of speculators, which makes it difficult to achieve a consistent price and therefore provides a good basis for arbitrage (Levus et al., 2021). 2.1.1 Search Process The search process is a key element in the conduct of a literature review, so there are several exclusion and inclusion factors, which were considered. Including factors were used to narrow the search primarily, such as: - Language - In order to avoid ambiguity between documents in different languages, the documents included in the search must be in English. - Accessibility - Articles must be freely available via Open Access or must be accessible via an institutional access system for universities. - Keywords - The keywords listed in Table 2 must be included in the document. Exclusion factors were used to refine the search and obtain accurate results, such as: - Literature type - Only peer-reviewed, published papers in journals and books were considered for this review. - Abstract content - The abstract of a paper must relate to arbitrage trading and should to the cryptocurrency market. - Misleading terms – Some terms are strongly correlated with the topic of this paper, but due to the thematic divergence, these have been excluded. 12 Search engine used Keywords used Number of results search.onb.ac.at Österr. Nationalbibliotheken Arbitrage Trading, Arbitrage Handel 54, 37 buechereien.wien.gv. at Arbitrage Trading 2 ProQuest Ebook Central Arbitrage Trading, Arbitrage Crypto 98, 8 base-search.net Arbitrage Trading, Arbitrage Handel, Arbitrage Crypto 9603, 49, 55 sciencedirect.com Arbitrage Trading, Arbitrage Crypto, Arbitrage Cryptomarket, Arbitrage Bitcoin, Arbitrage Ethereum 20095, 168, 10, 383, 180 econbiz.de Arbitrage Trading, Arbitrage Handel, Arbitrage Crypto, Arbitrage DeFi, Arbitrage Bitcoin 8205, 1064, 13, 153, 81 Table 2: Keyword search results Arbitrage Trading is used since a long time, therefore there is a lot of literature to find. Books on trading strategies are available in abundance, however, when narrowing them down to crypto assets, they are considerably reduced. Therefore, mainly peer-reviewed and published papers in journals where found. Since the world of trading and crypto market, is not dominated by the German market, pretty much all the resources found were in English. As a consequence, the search was limited to English resources only. 2.1.1.1 Clarification of Terms and scope In the first phase of the literature review, a lot of literature was found, which are strongly related to arbitrage trading, but thematically have a different meaning or are not in the scope of this thesis. For this reason, a few resources had to be excluded from the search based on certain terms. 13 These were for instance: - Statistical Arbitrage: Similar in sound, but statistical arbitrage trading denominates pairs trading, which is not the trading strategy covered in this paper and had to be therefore excluded, as it is often just simply named arbitrage trading. - Impact on the market: As arbitrage opportunities exist in inefficient market, there is a lot of literature, how arbitrage impacts the market and how to eliminate these opportunities or close the gap till arbitrages disappear. 2.1.2 Literature Summary Searching for the exact title of this work on various search engines will yield no results for other works. This presumably results from the fact that the title is defined quite specifically. Simplifying the search terms related to arbitrage or cryptocurrencies, several papers could be found. Additionally, it was observed that the topic of arbitrage trading, is often brought together with the topic of market efficiency. Which is clear, since opportunities for arbitrage trading arise from inefficiencies in markets. Most search results were returned by publications that addressed the topic of arbitrage trading in general. From these, 29 papers came into closer consideration. Out of those, four address arbitrage theory of the non-crypto market (Angerer et al., 2023; Fernández-Pérez et al., 2012; Heckel and Waldenberger, 2022; Kiuchi, 2022). Two papers discussed the limitations of the theory of arbitrage in the non-crypto market (Gromb and Vayanos, 2018, 2002). Six explore High-Frequency-Trading, where arbitrage trading can be a type of, when it is automated (Brogaard et al., 2014; Brogaard and Garriott, 2019; Budish et al., 2015; Carrion, 2013; Kiuchi, 2022; O’Hara, 2015). Four papers are related to the crypto market and addressed the theoretical concepts of trading systems and specifically also arbitrage trading (Bruzgė and Šapkauskienė, 2022; Kabašinskas and Šutienė, 2021; Makarov and Schoar, 2020; Mohan, 2022). Three publications describe this topic further with technical concepts (Kakushadze and Yu, 2019; Levus et al., 2021; Pauna, 2018). Since arbitrage opportunities exist through market inefficiencies, most literature about arbitrage trading in the crypto market in fact deals with the subject of market efficiency, where ten papers were considered (Berg et al., 2022; Brauneis and Mestel, 2018; Clements, 2021; Duan et al., 2021; Holste and Gallus, 2019; Krückeberg and Scholz, 2020; Lee et al., 2020; Saengchote, 2021; Urquhart, 2016; Zhang et al., 2018) . 14 2.1.3 Literature Analysis When analysing the literature, it becomes apparent that certain topics were addressed more often and in more detail. Research on crypto assets in finance and economics is still in its infancy compared to traditional markets. Most studies in this area focus on the practical implications of using cryptocurrencies as a form of payment and conducting transactions. The first serious research was done on the economical dynamics, theory and price formation of bitcoin (Ciaian et al., 2016; Dwyer, 2015). Literature about arbitrage trading exists long before the 2000s. Since the start of Bitcoin in 2008, the first academic papers about arbitrage trading in the crypto market, therefore especially for bitcoin, appeared in 2012. The first literature on efficiency appeared in 2016 and got significantly more track after the first spike of the crypto market in 2017. The first literatures about practical approaches to automated arbitrage trading systems in the crypto market appeared since 2018 (Kakushadze and Yu, 2019; Levus et al., 2021; Makarov and Schoar, 2020; Pauna, 2018). Using a long-memory method, a study (Duan et al., 2021) analyses the development of informational efficiency and its effect on cross-market arbitrage opportunities. The findings indicate that all the biggest five crypto markets studied were nearly fully informationally efficient over the sample period, however, the level of market efficiency varied among markets and over time. A further study (Al-Yahyaee et al., 2020) shows that the top ranked cryptocurrencies are not efficient, which aligns with other previously published findings and concludes that the inefficiency of crypto markets is subject to change over time. Additionally, the study examines the multifractality, long-memory process, and efficiency hypothesis of six major cryptocurrencies (Bitcoin, Ethereum, Monero, Dash, Litecoin, and Ripple). Two other papers support that by discussing the limitations of the theory of arbitrage, which suggests that prices may not always align with the law of one price, even when arbitrageurs are present (Gromb and Vayanos, 2018, 2002). In addition, one paper found that cryptocurrency efficiency increases with liquidity (Brauneis and Mestel, 2018). Considering risks, general investments in the crypto market are considered as risky, whereas arbitrage trading as low- risk or even risk-free (Bruzgė and Šapkauskienė, 2022; Makarov and Schoar, 2020; Mohan, 2022), taking into account that any trades underly a basic execution risk (Krückeberg and Scholz, 2020). There are several factors from which the success of automated arbitrage systems can be derived. Multiple studies show, that the success of such systems is determined by how quickly they can search and transmit information, and therefore recognize arbitrage opportunities and execute trades, specifically in comparison to the speed or latency of other traders (Brogaard et al., 2014; Brogaard and Garriott, 2019; Budish et al., 2015; Carrion, 2013; Kiuchi, 2022; O’Hara, 2015). 15 2.1.4 Conclusion Arbitrage trading is a well-tested trading strategy, which in addition to generating profits, also benefits market efficiency, by increasing price stability and reducing price differences. A number of studies have already been carried out in the crypto market, but it is still at an early stage. It has to be taken into consideration that markets cannot be directly influenced, due to the factor of decentralization. A paper from 2020 shows, that some top ranked crypto exchanges are not efficient and it is a subject to change over time (Al-Yahyaee et al., 2020). Even when markets are nearly informationally efficient, efficiency still varies (Duan et al., 2021). Furthermore, it can be assumed that markets become more efficient, when liquidity increases (Brauneis and Mestel, 2018). Therefore, due to these inefficiencies, it can be concluded, that a suitable basis for finding arbitrages exist. This is also contributed by the facts, that there is a large number of crypto assets and exchanges, low government regulation or influence, high degree of volatility and a high number of speculators (Levus et al., 2021). Moreover, it can be concluded that a major role in the success of arbitrage trading in the crypto market are speed and latency in comparison with other traders (Brogaard et al., 2014; Brogaard and Garriott, 2019; Budish et al., 2015; Carrion, 2013; Kiuchi, 2022; O’Hara, 2015). 16 2.2 Arbitrage Trading Arbitrage trading has long been established in traditional financial markets and its application to the crypto market presents unique opportunities and challenges. The following chapter gives an overview of the categorization in trading strategies, history and the different types available. 2.2.1 Categorization and importance in trading strategies Algorithmic trading has established itself in the markets, used by individual and institutional market participants. In order to get an overview of different trading strategies and to be able to classify arbitrage trading, it can be categorized in six types by their objective and methods (Kiuchi, 2022): 1. Execution algorithms: These automate the allocation and timing of buy and sell orders, select optimal markets and make adjustments to achieve goals such as cost reduction. Some of these algorithms hide trade execution from other investors, reducing the cost of market influence, others ensure compliance with market rules. Often large orders are split into smaller ones and placed in stages to reduce market impact. An important task of execution algorithms is therefore to determine and implement optimal timing to minimize the sum of these two costs. 2. Benchmark execution algorithms: These are used, especially when executing large orders, to ensure that the average price of each small order resulting from it matches a benchmark such as the market closing price in order to limit the cost of market manipulation. 3. Market-making algorithms: Algorithmic market makers place both buy and sell orders at lower prices than the current market price and try to profit from the difference between the market price and the bid or ask price. Their main goal is to contribute or provide market liquidity and stability, while benefiting from it. 4. Arbitrage Algorithms: Arbitrage algorithms utilize occurring price differences of identical assets by simultaneously selling at the higher price and buying at the lower price. In this way, they try to make profits and limit the risk of price changes. They help to eliminate distortions in the markets and thus increase market efficiency. This strategy will be examined in detail in this thesis for the crypto market. 5. Directional Algorithms: These algorithms use market data such as prices, trading volumes and news to predict market price changes and profit from unidirectional changes in market prices. This trading strategy is generally high risk but also high return. 6. Market Manipulation Algorithms: Market manipulation algorithms are used to influence market prices in their favour by providing false information about liquidity and intent, thereby misleading other market participants. They can lead to lower trading costs and profits, but also to delays or prevention of orders from other market 17 participants. These algorithms can enable users to make significant profits, but are ethically troublesome, have a negative impact on market efficiency and can have a greater impact especially on smaller cryptocurrencies. It can be concluded that arbitrage trading is an important trading strategy, benefiting from trading inefficiencies in the market. In doing so, one can see that it is one of the positive strategies towards the market as it simultaneously is a driver of market efficiency. 2.2.2 History of arbitrage trading The first mention of the concept of arbitrage trading is found in the Hammurabi Code about 1760 BC, which dealt extensively with trade and financial matters across geographical regions. Arbitrage trading of coins and bars across different geographical regions was common, but the minting of coins by different political jurisdictions and the lack of a standardized unit of account made trades difficult. The introduction of a standardized currency expanded the opportunities to allow geographical arbitrage of physical coins to take advantage of different exchange rates. Opportunities for arbitrage arose from the trading activities of networks of traders and money changers and included uncovered interest arbitrage between areas with low interest rates and those with high rates. However, there were challenges such as lack of liquidity, difficulties in obtaining information and transporting goods over distances, and inherent political and economic risks (Poitras, 2010). In the 16th century, exchanges like the one in Antwerp replaced medieval fairs as important international venues for exchange trading. Medieval bankers operated arbitrage exchanges to profit from discrepancies in exchange rates, and by the 18th century the exchange market had developed in financial centres such as Amsterdam, London, Hamburg, and Paris. During the 18th century, the depth and breadth of exchanges expanded significantly, which lead to a development of different types of securities and commodities, which could be traded. The increased speed of communication between major financial centres, directed by the introduction of the telegraph and the ticker, made it easier to trade between exchanges and to engage in geographical arbitrage. In the 19th century, records of trading in options were added, in which, for example, short positions in Constantinople were combined with a written put and a bought call in London (Poitras, 2010). After trading in assets such as shares and bonds, the emergence of Bitcoin has now been followed by trading in electronic currencies, opening new opportunities for arbitrage, which can be completely automated. 18 2.2.3 Types of arbitrage trading Arbitrage is the execution of a specific sequence of actions that begin and end with the same asset and whose completion results in an increased value at the end of the sequence (Levus et al., 2021). For example, if an asset A is bought for €20 on stock exchange X and sold for €20.50 on stock exchange Y, a profit of €0.50 will be made, not including transaction costs. Like the evolution of financial markets and their possibilities, also arbitrage trading became an overarching term with several sub-concepts and ways of functionalities. Therefore, this trading concept is not new and there are different types of price spreads in every financial market, including stock exchanges or currency markets. Banks and other financial institutions itself around the world have been using this mechanism for hundreds of years to exploit price discrepancies and bring efficiency to the markets (Levus et al., 2021). The crypto market is no exception to arbitration and works almost entirely on the same principles as in traditional markets, but with different assets. Today there are about 4000 cryptocurrencies and multiple exchanges throughout every region in the world, with Binance, Kraken, Coinbase, Bitfinex, Bittrex as some of the most popular. Arbitration in the crypto market is particularly attractive because low levels of state regulation, decentralization, different markets, large numbers of speculators looking to make money and high volatility create difficult conditions for a single price to exist (Krückeberg and Scholz, 2020). Large price divergences between exchanges also often occur against a backdrop of political instability. Thus, in August 2019, in Argentina, against the backdrop of a sharp fall in the national currency, 1 bitcoin was priced 4% higher in US-Dollars than on international platforms. However, price differences between exchanges are more common than they seem, even when extreme economic and political events are excluded (Levus et al., 2021). A further financial definition needed for all types of arbitrage trading is the concept of a currency pair. This is a pair consisting of two currencies that are traded against each other. The first currency is usually referred to as the base currency and the second as the quota. For example, you need to find the BTC/EUR pair if you want to buy Bitcoin (BTC) for EUR. Besides cryptocurrency/fiat currency pairs, there are also cryptocurrency/cryptocurrency pairs, which are often more popular than fiat currency pairs due to trades between cryptocurrencies (Levus et al., 2021). The already mentioned economic equilibrium occurs when there are no more arbitrage opportunities left, since the arbitrage-free condition essentially implies that there is no price mismatch between the two markets. There are several benefits of arbitrage towards these equilibriums. First, it is attractive because it offers low-risk profits, so one can be sure that agents will rush to arbitrage opportunities when they arise. Second, the process of arbitrage conveniently removes these opportunities at some point and markets will move towards 19 equilibrium where prices are equal, to the extent permitted by transaction costs (Mohan, 2022). 2.2.3.1 Pure arbitrage / Two-point arbitrage The opportunity for two-point arbitrage appears, due to a difference in prices across different exchanges for the same asset at time ! . It refers to the fact that an asset that is bought in one market can be sold simultaneously in the other market in order to realise a profit without any risk in theory. If there is a mismatch between the prices quoted in two exchanges, it is profitable to do so (subject to transaction costs) (Mohan, 2022). To make this method possible, a trader must hold a positive balance of the corresponding assets or fiat money on the respective exchanges at the time of a trade. Obviously, the arbitrageurʼs balance of assets would fall on the higher priced exchange, where the assets would be sold, and rise on the lower priced exchange, since this is where purchases take place. To replenish this, transferring assets or capital from the exchange with the high balance onto the exchange with the low balance and vice versa is required. Ideally, it should be able to transfer profits immediately from the expensive to the cheap markets and then repeat the arbitrage, till prices converge. The faster the arbitrageur can recycle capital from one account to another, the more effective the arbitrage (Makarov and Schoar, 2020). While exploiting it, it is also the effect of arbitrage to eliminate price distortions. Therefore, speed is crucial, as the first person to seize the arbitrage opportunity can make the greatest profit (Kiuchi, 2022). In two-point arbitrage, traders use market orders because they calculate with a specific price (Foucault et al., 2005). The uncertainty of a matching limit order (if at all) increases the risk of price fluctuations, when an order is processed. Exchange 1 Exchange 2 Asset A Asset A higher price lower price Buy Sell Price difference = Profit Figure 1: Visualization of a two-point arbitrage trading process Figure 1: Visualization of a two-point arbitrage trading process 20 As already mentioned, if a profitable arbitrage opportunity is found, the arbitrage trade " with the actions of buying and selling is done simultaneously with the same volume. On the exchange with the lower price, a buy and on the other one with the higher price, a sell is executed. For each trade " , the profit # ! is equal to the difference between the sell $%&& ! and buy ʼ() ! price multiplied by the volume * !" traded. The formula to calculate the profit is (Pauna, 2018): # ! + ,* ! -$%&& ! .,ʼ() ! / As before mentioned, crypto assets are traded in pairs, like BTC/EUR or BTC/ETH. If the second example is considered, it defines how much of Ethereum (ETH) one need to “sell” or trade in, to buy a defined unit of BTC. Therefore, if one exchange quotes you certain price to buy BTC/ETH and another exchange quotes a higher price for the same pair, a two-point arbitrage trade and profit from the price difference can be made. Sometimes a problem occurs between exchanges, when the dependency between two crypto assets is not expressed in the same way. One exchange can state a price for instance for BTC/ETH and another one for ETH/BTC. The assets involved are the same, but the prices are constructed the other way round. To work around this issue, the software must compare the price from the first exchange with the price from the second one, to calculate and match the prices of the trade correctly. In this case, a reversed arbitrage trade is done, where the formula for the profit is (Pauna, 2018): # ! + ,* ! -$%&& ! ., 0 ʼ() ! / 2.2.3.2 Triangular arbitrage In difference to pure arbitrage, which requires two price quotes, from each market for arbitrage possibilities to emerge, triangular arbitrage needs three for its implementation. This method can also be performed cross-exchange, but is mostly used in one exchange to exploit internal price misalignments (Mohan, 2022). When carried out in one specific exchange, it is the ideal way to identify market-specific frictions and to compare price efficiency of different markets (Barbon and Ranaldo, 2021). Consider three tokens 1 , 2 , and 3 , available for trading, where a trader can swap between any pair. The idea of triangular arbitrage is to initiate a sequence of trades, starting with one asset, converting to another one and then converting it back to the starting asset and ending with more units than at opening. For example, one takes 1 unite of any asset represented by 3 , then loops through the assets by e.g., selling 3 for 1 , then 1 proceeds for 2 before converting back to 3 and resulting in a surplus of more than 1 unit of 3 . This sequence of conversion is represented by 3, 4 ,1, 4 ,2, 4 ,3 . Although it can also be 3, 4 ,2, 4 ,1, 4 ,3 , if executed in a different order, order 2 or 1 can also be the starting assets. In either case, if 21 the trader starts with 1 unit of an arbitrary asset ( 152,67,3 ) and ends with more than 1 unit, a triangular arbitrage trade is successful. A triangle can represent all these possibilities, with the three assets at the vertices and a certain cycle tracing a path along the sides, beginning and ending with a particular vertex (Mohan, 2022). It should be noted that this method of arbitration requires rapid data collection and analysis and consequently, the execution of the necessary trading sequence. Trading commissions and types of order execution must also be taken into consideration. A number of algorithms are used to find such arbitrage opportunities by finding the shortest path in a weighted graph from one node to another (Levus et al., 2021). Figure 2: Visualization of a triangular arbitrage trading process 2.2.3.3 Statistical arbitrage / Pairs trading For statistical arbitrage trading, also referred to as pairs trading, the strategy lies in going short in one exchange and going long on another. Several statistical methods are the basis for this strategy, like Arbitrage Pricing Theory, which will be described later. This strategy involves going long on assets that are relatively undervalued and going short on assets that are relatively overvalued. When the spread, or measure of relative mispricing, converges, a profit can be made by unwinding the position. Specifically, pair trading involves simultaneously opening long and short positions in two correlated assets with a balance point between them (Do et al., 2006). Regardless of bullish or bearish market conditions, this type of strategy seeks to take advantage of market inefficiencies (Carrasco Blázquez et al., 2018). While this strategy does not expose the arbitrageur to the risk of price fluctuations, a drawback of this strategy is that one becomes exposed to the risk of expected price convergence (Makarov and Schoar, 2020). This type of arbitrage trading equals a dollar- neutral mean-reversion strategy, when both investment levels of shorting and going long are the same (Kakushadze and Yu, 2019). Asset Z Asset YAsset X End Start 22 2.3 Crypto market Despite the significant growth of the cryptocurrency market, it has remained largely unregulated by government institutions. It is hypothesized that this unique, early-stage environment, for the financial market, may present pricing inefficiencies that could be identified and exploited through arbitrage trading (Fischer et al., 2019). To study the possibilities of arbitrage and to implement an automated system for it, it is necessary to examine the way in which the crypto market is organized. In general, crypto assets can be acquired in two ways. By miners, who find new blocks by computationally calculating them, which receive a reward in the respective crypto currency or by exchanging them for fiat currencies or other cryptocurrencies on exchange platforms (Böhme et al., 2015). 2.3.1 Types of exchanges To purchase assets of a market, one must utilize the services of an exchange platform. These are institutions, which standardize assets and trading rules for multiple participants, in contrast to over-the-counter (OTC) markets, where direct trading between two parties is enabled. To enable these actions, an exchange must provide a number of services, some which are like typical stock-exchanges acts and additional services crypto-exchanges have to provide, which are often referred to alternative trading services (ATS). These ATS are often a division of labour in traditional markets (Johnstone, 2019). The majority of exchanges are similar to traditional stock exchanges in that they maintain the liquidity of assets and determine the prices of assets through order book systems, which match the orders of buyers and sellers. Orders are usually public information, allowing market participants to gather information about interest in an asset and the price at which it is traded. For traditional exchanges, like stock ones, orders are maintained by a central authority (Mohan, 2022). ATS are commonly market making, contract counterparty, broking, dealing, advisory, custody and for some also over-the-counter trading offers, which are not traditional exchange like (Johnstone, 2019). For the crypto market a differentiation must be made by centralized and decentralized exchanges, where centralized exchanges (CEX) represent the majority of 99% (Goldenberg, 2018). A further distinction for CEX can be made by custodial and non-custodial, where the second one distinct itself by the lack of user-wallet management, but still matching orders through their internal system and taking fees of the top. The predominance still is custodial for CEX, which means that there is a trusted central institution, which process orders, maintain and secure assets and provide wallets for customers. This also means, that the private keys of user wallets are held by the centralized exchange, not known to the user itself, which puts user’s assets at risk, if a problem for the company running the exchange occurs. Therefore, a single point of failure (SPOF) results, whereas decentralized exchanges (DEX) imply a distributed risk with no SPOF. Centralized exchanges, are easier to 23 implement, provide better user-experience and functionality, but due to its unified design suffer from several drawbacks. Some of these are the possibility of losing custody of assets for users, single point of attack for hackers, subject to little regulation, lack of privacy or mismanagement of the exchange operators (Mohan, 2022). Since every occurred hack, CEX implemented new privacy methods, like storing a vastness of assets in “cold storage”, where wallets are not connected to the internet, providing an extra layer of protection. Still hacks on exchanges like Mt. Gox in 2014, Bitfinex in 2015, Coincheck in 2018 (Goldenberg, 2018) or mismanagement like FTX in 2022 (Fu et al., 2022) led to a serious lack of trust in centralized exchanges. In CEX all transactions are processed through their servers, which settles user trades immediately, in contrast to the typical transaction verification time of i.e. bitcoin, which takes about 5 minutes to several hours. This is possible through enabling transactions off-chain, where transactions are verified by the exchange itself, instead of recording and verifying every order on the blockchain. The only transactions that are recorded on-chain on centralized exchanges are withdrawals to external wallets or external deposits to internal wallets (Pourpounehnajafabadi et al., 2020; Schär, 2020). In contrast a decentralised exchange allows participants to exchange one asset for another without the need for a centralised third party that is responsible for overseeing trading activity, while users remain custody of assets and private keys (Mohan, 2022). DEX in their pure form fully take advantage of blockchain technology. As a result, on-chain order books are used, which means that all transactions and their corresponding verification is done by software, usually the blockchain itself or by smart contracts. However, a user must pay for every update to the order book and wait for the network to reach its consensus, which arises a less censored and more trustworthy exchange, but also lower speeds and higher transaction costs (Pourpounehnajafabadi et al., 2020). But also, DEX can be implemented in different ways, with off-chain order books or automated market makers. Like with centralized exchanges, off- chain order books can be used, where all orders are handled in a central manner, where only the final confirmation of a transaction is verified by a smart contract on the blockchain. This again results in improved speed performance and lower costs, although it requires more trust as the order book is not confirmed each time. A newer form of providing liquidity in DEXs is through automated market makers, which operate without the traditional order book system and use algorithmic agents or smart contracts instead. In contrast to supply and demand pricing, automated market makers pool liquidity and set prices through a deterministic pricing mechanism. This removes the need for counterparties but require arbitrageurs to remove price differences (Pourpounehnajafabadi et al., 2020). It must be stated, that no matter how more customer-friendly the immediate handling of user orders is, it dissolves the benefits a decentralized value-transfer network, like the blockchain, was originally invented for. DEX are often limited by already mentioned points above and liquidity and trading volume is lower compared to centralized exchanges, 24 but the trend is moving in their direction and centralized exchanges will head to a more hybrid form with decentralized elements (Goldenberg, 2018). To narrow down possible fiat currencies, which cryptocurrencies can be sold to, the base currency of the exchange’s country of operation must be used in the most cases. This results from regulations for crypto exchanges that became increasingly stronger over the recent years. Even if large exchanges operate across multiple regions, investors can only choose the local currency as base currency of their respective countries, as order books are usually held separately (Makarov and Schoar, 2020). In this thesis, the base currency will therefore be EUR (Euro). 2.3.2 Fees As a trader, or in the case of this thesis as an arbitrageur, there are a number of transaction fees to be aware of when trading in the crypto market over the entire phase. That is of even greater importance, when doing automated trades with usually just minor occurring profit margins. Following papers from traditional markets have shown that, after the strategy itself, the trading costs associated with a particular strategy are the second most important determinant of investment performance or stated as “Trading costs can substantially reduce the notional, or ‘paper,’ return to an investment strategy” (Keim and Madhavan, 1997). As a result, it is not surprising that empirical data from the fund industry has shown that the level of expenses is negatively correlated with the net return on investment (Carhart, 1997). In general, the majority of exchanges are striving to achieve lower fees for customers. This is based in the theory that lower fees lead to a substantial increase in liquidity in terms of narrowing bid-ask spreads, increasing depth and growth in trading volume (Malinova and Park, 2011). Fees cannot be generalized for every crypto exchange, as they differ for some. However, it can be summarized for the majority, that fees are applied for every action of selling, buying, withdrawing, and sometimes also depositing (Kabašinskas and Šutienė, 2021). For a sell and buy, most state them as maker and taker fees. Where maker fees are paid, when you add liquidity to the order book by placing a limit order at or below the ticker price for a buy and at or above for a sell. Taker fees are paid, when you remove liquidity from the order book by placing any order that is executed against any order from the order book (“Bitfinex | Our Fees,” 2023; “Fee Rate,” 2023; “Fee Structures | Explore our trading fees | Kraken,” 2023). These maker and taker fees can be of a fixed rate like for instance Bitfinex, where they charge 0.100% maker and 0.200% taker fees for crypto to crypto, crypto to stablecoin or crypto to fiat transactions. On other exchanges, like on Kraken or Coinbase they can be calculated on approximate terms, as the final charged fees are calculated at the time of placing an order, and may be determined by a combination of factors, including but not 25 limited to the users location, selected payment method, asset, size of the order and market conditions like volatility and liquidity (“Coinbase pricing and fees disclosures,” 2023; “Fee Structures | Explore our trading fees | Kraken,” 2023). For the action of withdrawing, every centralized exchange charges a fee, actually miner fees, as these transactions have to be recorded on-chain on the respective blockchain, smart contract or protocol of the crypto asset, which can also lead to varying amounts of costs. For a deposit of money on an exchange, fees also vary. Where a transfer of cryptocurrencies or a standard bank transfer are usually free, fiat deposits by credit card or PayPal come with costs. Maker and taker fees for a single transaction can be therefore calculated per trade. Costs for withdrawals or deposits must be analysed overarching. Maker and taker, or bid and ask, fees for trading the quantity 8 9 :, on exchange i can be calculated by the following formulas (Hautsch et al., 2018). Where B (Bid) or A (Ask) is the respective asset price and ; !#$ - < / 9 : and ; !#% - < / 9 : . = ! & - 8 / ,+,= ! & ,-0.; !#$ - 8 / / > ! & - 8 / ,+,> ! & ,-0.; !#% - 8 / / 2.3.3 Price formation How prices of crypto assets are formed, can be explained best by examining the biggest and most popular one, Bitcoin. As there are now thousands, not all function the same: some are backed by an asset in the real-world and some do not, some have a fixed supply and some do not, for some, the entire supply is already available, for others it is calculated on an ongoing basis. For the example of Bitcoin, it is a virtual currency with zero intrinsic value issued by the blockchain and not backed by a real-world asset or a government. Additionally, supply is fixed with 21.000.000 coins, but not all is already accessible because new bitcoins get mined on a continuous basis (Bouoiyour and Selmi, 2015). Two papers confirm that market forces of supply and demand have one of the highest impacts on the price formation of Bitcoin (as of any other currency), which importance tends to increase over time. Its scarcity on the market determines the number of units in circulation. Demand is mainly driven by the demand of transactions as a medium of exchange for goods and services. Consequently price movements can be explained by the interactions of supply and demand (Bouoiyour and Selmi, 2015; Buchholz et al., 2012). For the example of Bitcoin, the supply is exogenous, so it has no relationship to demand or price. The observed price changes are due to shifts in demand, because supply does not change in response to price. Therefore, the intersection between supply and demand should continuously move down the demand curve, because the quantity of Bitcoin is increasing over time due to miners. The demand curve is truly a horizontal line as any change in 26 quantity is fully expected. Hence all observed price fluctuations should be due to shifts in demand, as supply should not affect the price of bitcoins in dollars over time (Buchholz et al., 2012). Following figure visualise this example: Figure 3: All observed price fluctuations occur due to shifts in demand (Buchholz et al., 2012). Moreover, several features are missing for most cryptocurrencies from fiat currency supply and demand, which normally form the basis for its price. For this reason, the price formation cannot be explained by common economic theories such as the future cash flow model, purchasing power parity or uncovered interest parity. Also, since Bitcoin is not issued by a central bank or government, it is detached from the current economy, which implies that there are no macroeconomic fundamentals that could determine its pricing (Bouoiyour et al., 2014; Kristoufek, 2013). Additionally, the arrival of new information, new posts and increase in search results on the internet have a positive impact on Bitcoin price in the short run. This is also associated with incoming speculative investors, affecting the price, and providing liquidity to the market. In combination with the impact of increasing information on the internet this leads to the downside in the short run by increasing price volatility and creating price bubbles (Ciaian et al., 2016; Kristoufek, 2013). In the short, but not on long term, there is also a significant influence of global macro-financial development, captured by the Dow- Jones Index, exchange rate and oil-price (Ciaian et al., 2016). To further understand how prices are set, one must differentiate between asset price sources, like one of the most famous crypto comparison platforms coinmarketcap.com, and exchange specific prices. Prices vary, because exchanges are not connected and calculate prices based on their volume of trades and buy and sell activity, therefore respective supply and demand of their users. The more trading operations and volume the exchange processes, the more market relevant prices exist. News services like Google use an aggregated price model and cointelegraph.com or already mentioned coinmarketcap.com use own price indexes, which calculate asset prices by using an average value based on the prices of top exchanges (Egorova, 2018). As prices for crypto assets on exchanges differ, it can be concluded that arbitrage opportunities exist. However, not every price discrepancy presents a profitable arbitrage opportunity. 27 2.3.4 Order Books and corresponding definitions Order books are in use of most crypto exchanges, specifically Limit Order Books (LOB), which are a collection of limit orders at which market participants are willing to buy or sell. Centralised exchanges (e.g. Binance, Kraken, Coinbase) are based on Limit Order Books, whereas decentralized exchanges (e.g. Uniswap, Pancakeswap, Sushiswap) rely on an Automated Market Maker protocol (Barbon and Ranaldo, 2021). Also the vast majority of traditional stock exchanges use a LOB or hybrid LOB system to facilitate trading (Gould et al., 2013). Due to this similarity of centralised and stock exchanges, trading strategies, like arbitrage trading in this thesis, can be compared on a proper basis. To understand how arbitrage opportunities can be utilised, one must examine how they occur and how the trading mechanism of exchanges work. Therefore, those of Limit Order Books will be investigated in this chapter. LOB’s act in a flexible way, where every trader has the possibility of submitting buy or sell orders. When a buy or sell order ? is posted on an exchange, a trade matching algorithm checks if a previously submitted buy or sell order can be matched. If this is possible, the trade executes immediately, otherwise ? becomes active and remains on that status until it becomes matched to an incoming buy or sell order, or it is cancelled. Cancellation usually occurs when an exchange platform terminates active orders after a certain time, to prevent an overly large accumulation of active orders. Otherwise the owner of an order cancels it themselves, if they do not longer wish to do a trade at the stated price (Gould et al., 2013). Following definitions, need to be understood or are used for calculations in further methods of this thesis, for LOB’s from (Gould et al., 2013) are defined as: § An order is defined with price ; ʼ @ : and size of A ʼ @ :,67,A ʼ B : is a commitment to sell/buy up to A ʼ units of a traded asset at time ! ʼ . A, + -; ʼ 5A ʼ 5! ʼ / § The resolution parameters for a LOB’s order are defined as the lot size C , which is the smallest amount an asset can be traded within it, A ʼ ,D, E FGC, H ,G, + ,05I5JJJK and the tick size L , which is the smallest possible interval of a price, also called accuracy. § A LOB M-!/ is a composition of all active orders in a market at time ! . § A LOB can be considered as a set of queues of active buy orders = - ! / , for which A ʼ B : , and active sell orders >-!/ , for which A ʼ B : . 28 § The depth of available orders at price ; at time ! is defined for the bid-side as N ( - ,;5! / O, P A ʼ )ʼ"*"$+&,-". ! "/".0 and for the ask-side as N 1 - ,;5! / O, P A ʼ )ʼ"*"% +&,-". ! "/".0 § The bid price is the highest stated price among active buy orders at time ! . ʼ-!/ Q+, RST ʼ*$+&, ,; ʼ § The ask price is the lowest stated price among active sell orders at time ! . U-!/ Q+, RVW ʼ*%+&, ,; ʼ § The bid-ask spread is the difference between the ask and bid price at time ! . $ - ! / + U - ! / .ʼ-!/ § The mid-price at time ! is the calculated middle price between the ask and bid price. X - ! / Y+ Z U - ! / [ʼ - ! / \ I § Sometimes it is better to compare orders by relative price, where the bid-relative price is ] ( -,;/ Q+ ,ʼ-!/, ^ ,; and the ask-relative price is ] 1 -,;/ Q+ ,U-!/, ^ ,; . Figure 4: Schematic functionality of a Limit Order Book System (Gould et al., 2013) 29 In a LOB, at time ! , the maximum price to sell at least the lot size of the traded asset immediately is represented by the bid price ʼ-!/ , while the minimum price to sell at least the lot size of the traded asset immediately is represented by the ask price U-!/ . In addition to buy and sell orders, a further distinction has to be made between submitting limit and market orders. While limit orders have the possibility of matching at better prices, they also are at risk of never being matched and remaining in the active queue until cancellation. In contrast, market orders do not face the uncertainty associated with limit orders, although they never match at prices better than bid price ʼ-!/ or ask price U-!/ (Gould et al., 2013). Specifically, a market order is a request for immediate trading at the best price currently available in the market (Parlour and Seppi, 2008). The bid-ask spread $ - ! / can be considered as a measure of a market’s assessment of the value placed on immediacy and certainty associated with market orders versus the waiting and uncertainty of the completion of limit orders (Gould et al., 2013). LOBs are popular because they allow some traders to demand immediacy, whilst allowing others to provide it to those who require it later. Arbitrageurs, technical traders and indexers, whose activities are fast and often automated, will most likely submit market orders, whereas portfolio managers, whose focus is on long-term investments, will submit limit orders (Foucault et al., 2005). This results out of an arbitrageur’s strategy of simultaneously buying and selling an asset in an attempt to make instant profit. As they calculate with a certain price, when an order is submitted, the uncertainty of a matching limit orders (if ever), is of little use to them. 2.3.5 Theoretical concepts for Arbitrage Trading The efficient market hypothesis, along with the arbitrage pricing theory and the capital asset pricing model, have been crucial in understanding movements in the financial markets through mathematical models. However, their applicability is often challenged by real world data (Weron and Weron, 2000). This highlights the need for a broader perspective on the applicability of these theoretical concepts to better understand potential discrepancies and inefficiencies in financial markets. This chapter examines the efficient market hypothesis, arbitrage pricing theory and the law of one price. Finally, a conclusion is drawn on the potential for arbitrage opportunities. 2.3.5.1 The Efficient Market Hypothesis The efficient market hypothesis (EMH) progressed to being one of the most dominant paradigms in finance and has proved itself as an important and widely accepted fact of life in the literature of finance, accounting, and economics of uncertainty (Keim and Madhavan, 1997). It is generally accepted that Bachelier (1900) was the first to discover, that security prices follow Brownian motion and irregular random walk in speculative financial markets, which implies that investors cannot get any excess return by detecting price fluctuations. This 30 assumption can also be expressed with the statement of economics, that there is no such thing as free lunch (Liu et al., 2022). Fama (1970), was the first to formally propose the efficient market theory, which is based on this theory, and stated that a market is said to be efficient if transactions are at its correct value, because all available information is always included in the price. According to the efficient market hypothesis, an ideal market is one where prices provide accurate signals to allocate financial resources under the assumption that asset prices fully reflect all available information at any time (Fama, 1970). Jensen (Jensen, 2002) makes this hypothesis dependent on the amount of information available or given about the market. Where q t represents the given information and economic profits are considered as the risk adjusted returns net of all costs: A market is efficient with respect to information set q t if it is impossible to make economic profits by trading on the basis of information set q t . The EMH has been discussed in different forms in studies with varying results, resulting in the formation of three categories of the hypothesis, which are mainly distinguished by the given information set q t (Fama, 1970; Jensen, 2002): 1. The weak form of EMH is referred to, where the information set q t is all that is included in the past price history of the market as of time t . 2. The semi-strong form of the EMH is referred to, where q t is all that is publicly available at time t , which also includes past prices of the weak form. 3. The strong form of the EMH is referred to, where q t is all information that is known to anyone at time t . Version three is an extreme form, which is a logical completion of the first two, but not a realistic representation. When literature refers to the efficient market hypotheses, they normally represent the second version. 2.3.5.2 Arbitrage Pricing Theory Arbitrage pricing theory (APT) was developed, in 1976 by Ross, as an alternative to the popular capital asset pricing model for the explanation of asset or portfolio returns. The purpose was to determine the fair value of an asset (Ross, 1976). It assumes that equity returns can be predicted using a linear model of several systematic risk factors. The business cycle, changes in inflation, changes in interest rates, changes in exchange rates and so on are described as economic risk factors. The APT could indicate that the asset is either overvalued or undervalued if the current value differs from the calculated value. For example, if the arbitrage pricing model valued a stock at €300, but the current market price of 31 the stock is €250, the stock would be considered overvalued. According to APT, the price of the stock should eventually correct itself, creating an arbitrage opportunity (Ross, 1976). Unlike pure arbitrage, which places constraints on prices that are only observed at a particular moment in time, APT attempts to explain expected returns at different points in time. Therefore, APT is mainly of use for statistical arbitrage, which will be explained later, but has just a minor advantage for pure arbitrage (Poitras, 2010). 2.3.5.3 Law of one Price The law of one price is an economic cornerstone and says that in fully competitive markets, identical goods and services must sell for the same prices. This price then represents an equilibrium between supply and demand. The law states that the price of identical goods and services should be the same, in a perfectly competitive market, due to competition, where there are many buyers and sellers and no barriers to entry. If, for example, different prices exist in a market for a particular product, buyers would tend to buy only from the supplier who offered the lowest price. This would force the other suppliers to lower their prices to remain competitive, ultimately leading to a single price. In addition, a market with different prices would lead to the emergence of arbitrageurs, which in turn would lead to a standardisation of prices (Isard, 1976). 2.3.5.4 Chances for Arbitrage Opportunities As in theory, financial markets should be in economic equilibrium, or move towards it, it excludes the possibility of “making money out of nothing” or already mentioned “free lunch”. Therefore, arbitrage opportunities should not occur and in the essence of no-arbitrage of mathematical finance, the existence is also unrealistic. In addition, all mathematical models of financial markets have to satisfy an arbitrage-free condition to be realistic models (Fontana, 2015). In theory, economic equilibrium is a state of balance of market forces, a concept borrowed from the physical sciences, where observable physical forces can balance each other. Because of the dynamic and uncertain nature of the conditions underlying supply and demand, it is a fundamentally theoretical construct that may never happen in an economy. Consequently, the economy is in pursuit of equilibrium without ever actually achieving it (Jofre et al., 2014). As the quality in which markets operate vary strongly and do not meet the conditions of financial theory, opportunities for arbitrage occur. According to the efficient market hypothesis and given that it holds, arbitrage opportunities should not be available for assets cross-listing on multi-markets, like crypto assets across exchanges. However, it must be taken into account that these mentioned hypotheses above were developed on the basis of traditional financial markets. Since the crypto market varies in many key characteristics from the traditional markets, four aspects were found, which influence opportunities for arbitrage differently. First, considering that price formation 32 happens in each exchange themselves, and fiat currency, cannot flow seamlessly across regions, price fluctuations and formations in individual markets do not reflect cross-market information in a timely manner. This type of friction is preventing markets from forming a consensus, so price disparity between markets will not disappear and is inevitably linked to market efficiency failures (Duan et al., 2021; Makarov and Schoar, 2020). These findings were found for Bitcoin, but as these observations are consistent with the evidence from existing financial markets and the crypto market orients itself at the largest cryptocurrency and other assets are treated similar in exchanges, it will be therefore assumed, that these findings hold also for other crypto assets (Duan et al., 2021). Therefore, two papers conclude that arbitrage opportunities exist in inner- and cross-markets, wherever these occasions are greater across, than within regions (Duan et al., 2021; Makarov and Schoar, 2020). Second, crypto assets are completely identical across exchanges and countries, in difference to stocks and bonds which can differ. Third, crypto markets operate 24 hours a day and seven days per week, with continuously available pricing data (Dwyer, 2015). Fourth, the crypto market is unique in that there is no government regulation, such as the US Securities and Exchange Commissionʼs National Best Bid and Offer regulation, which allows traders to get the best possible price by comparing prices from different markets (Makarov and Schoar, 2020). One of those papers also combined cross-market arbitrage opportunities with market efficiency. The potential for cross-market arbitrage can be closely related to the share of active arbitrageurs in each market and to their migration behaviour from one market to another. A market with a high level of arbitrage activity processes new information more quickly than a market with a low level, thereby increasing market efficiency. In addition, a sudden change in arbitrage opportunities can result in arbitrage migration between markets, leading to changes in market efficiency. Conversely, changes to market efficiency can lead to arbitrage migration, which can lead to price dispersion between markets. This does not necessarily require physical migration of arbitrageurs, only a change from becoming active to inactive or vice versa. For example, if arbitrageurs are active only when markets are relatively inefficient and inactive when markets are relatively efficient, a similar effect of migration occurs (Duan et al., 2021). In general, arbitrage opportunities arise in low quality markets. Therefore, this factor will be examined in more detail. This hypothesis is supported by the positive relationship between liquidity and arbitrage activity, which therefore improves market efficiency (Chordia et al., 2008). In addition, three further papers found that liquidity and market efficiency are positively correlated in the crypto market (Al-Yahyaee et al., 2020; Brauneis and Mestel, 2018; Wei, 2018). Therefore, market quality will be addressed by focusing on market liquidity and efficiency. 33 2.3.5.5 Liquidity As the definition of liquidity vary, it is difficult to define formally. Black (1971) describes that a market for a stock is liquid, if the following conditions hold. First, for a trader who wants to buy or sell small amounts of shares immediately, there are always bid and ask prices available. Second, the bid and ask spread is at its smallest. Third, an investor who buys or sells many shares can, in the absence of specific information, expect to do so over a long period of time at a price, which on average does not differ significantly from the current market price. Fourth, a trader can buy or sell large blocks of securities immediately, but at a premium or discount that depends on the size of the block. The larger the blocks, the greater the premium or discount. Building upon these terms, Kyle (1985) identified tightness, depth and resilience as the three key characteristics of a liquid market. Where tightness, refers to the cost of turning around a position over a short period of time, as it should be costless to unwind a position in a perfectly liquid market. Depth refers to a marketʼs ability to absorb order volumes without significant market impact. This refers to one of the already mentioned definition of order books, when there are many market and limit orders at prices around the last trade, a market is said to have depth. Resilience is a measure of the speed of price recovery from a random, uninformative shock. 2.3.5.6 Efficiency For efficiency, one must determine, between markets being efficient by prices or information. As the EMH got already discussed, it can be extended by stating that in financial markets the concept of informational efficiency, refers to the ability of markets to integrate unexpected news quickly and accurately into current prices. The EMH suggests that prices should exhibit a random walk pattern, resulting in unpredictable price movements over time, if new information is correctly and quickly incorporated into asset prices. Therefore, the random walk is a is property of a perfectly efficient market, where tests for it have been used to assess the informational efficiency of markets and, in so doing, to test the EMH (Ozenbas et al., 2022). Kühl (2010) also shows that the inefficiency of markets cannot only originate from an individual market or in one exchange, which is the common approach, but also from a cross-market (or cross-exchange) development. 34 3 Methods In the context of this thesis, the overarching research question "How are traditional trading strategies of financial markets, such as arbitrage trading, also applicable with assets on the crypto market in an automated way?" will be answered. To achieve this goal, this work is divided into three methods, each with its own research question. In the first, the necessary data and information will be collected and acquired. In the second, a concept is further developed to find suitable crypto assets and exchanges that have an increased arbitrage possibility. In the last method, a prototype for arbitrage trading will be implemented and tested. To achieve this goal, Literate Research and the scientific method of Design Science according to Alan Hevner (2004) are used. With Design Science, artifacts are formed, by using an iterative cycle of analysis, implementation, and evaluation of a created model. Therefore, it is a method of addressing research problems by creating and testing artifacts specifically designed to address a specific business requirement. Design Science consists of three research cycles, which are the relevance, the design and the rigor cycle. The relevance cycle connects the project environment with the design science activities. Rigor cycle connects design science activities with experience, expertise and scientific foundations. The central design cycle iterates between the research processes and building and evaluating the design artifacts. An artefact is an answer to a specific research problem. It is tested to see how well it is actually suited to solve the initial problem (Hevner and Chatterjee, 2010). Other considered possible scientific methods have been case studies or expert interviews. As there are no accessible and performant arbitrage trading software available to study in detail, except some open-source bots, it is not shortlisted, due to various reasons. Since this involves trading software, which can generate significant amounts of money, larger companies or investors who use such technology try to keep it for themselves. Therefore, the method of conducting case study research is not appropriate. Expert interviews were also considered, which would shorten method one and two, but since literature provides the needed basic information, an own concept for filtering assets based on historical data has been worked out. Known principles from the literature were thus applied to data, with it being regarded as the single point of truth. Expert interviews, therefore, would not have produced the same conclusive results and would not have added any substantial benefits. 35 3.1 Data collection & management In the first method, the research question of which information must be gathered to enable decision making for arbitrage trading opportunities, will be answered. To achieve this goal, data has to be collected from exchanges and knowledge from literature gathered. This method is separated into two steps. In the first, the knowledge base must be built up, which information is needed according to current literature regarding crypto assets, as well as exchanges to detect arbitrage. The most crypto exchanges are governmentally regulated nowadays (Kabašinskas and Šutienė, 2021). Therefore, some exchanges will cease, and it can be limited to the possible ones. In addition, the exchanges that will be used are centralised exchanges, as they are, for the scope of this thesis, easier to deal with and they have almost no validation time, unlike decentralised exchanges where a set trade is not certain to be executed at that price (Hautsch et al., 2018). In addition, there are fees that must be paid for trading assets against each other, as well as deposit and withdrawal fees on centralized exchanges. Some of these information can be collected from crypto market utility sites, which aggregate and report data of different exchanges or assets. If these are not available, data has to be mined or aggregated independently from websites of the respective exchanges. In the second step, pricing data about diverse crypto assets will be collected. The highest possible resolution is tick-level, which is intraday data and represents a series of executed trades order bid/ask quotes from different exchanges. For the scope of this thesis, OHLCV data fits best, as data is needed, at which trades were actually done. This format of historical price data is retrieved per timeframe, which should ideally be in minutes. There are several platforms that offer this data on the internet, which can then be joined based on timestamps. However, these providers usually charge high fees. If reliable data is found from such a source, it will be aggregated this way. Otherwise, they will have to be scraped directly from exchanges, over their public Application Programming Interfaces (APIs), where an open- source library will be used. The price information and the additional information regarding exchanges and assets are then brought together, cleaned, and prepared for further analysis in the second method. 36 3.2 Concept for crypto asset filtering In the second method, the specific research question of what requirements and criteria for a crypto asset are to be considered for arbitrage trading systems, will be answered. This concept will be the basis for the part of the arbitrage-opportunity detection system of the next method, based on the gathered data in the first method. Through literature research, this method aims to understand, how arbitrage occurs, how it can be detected mathematically and what indications there are for arbitrage for assets, markets or exchanges. The arbitrage index, which is also used in other papers, is applied and price differences are calculated and visualised. With this knowledge base, a concept for filtering crypto assets by suitability for arbitrage trading will be developed using design science. Therefore, crypto assets which are suitable and have an increased potential of arbitrage opportunities, will be filtered on the gathered data and information from the previous method. This data analysis will be conducted with use of Python and its associated packages. 3.3 Arbitrage trading prototype In the third method the specific research question of how an arbitrage trading strategy can be realized in the crypto market as a software prototype, will be answered. To achieve this goal, a system that can identify arbitrage opportunities, calculate profits and simulate trades, will be designed and implemented. Within the scope of this thesis, only paper trading is done, which means, that the triggered function of executing a trade, logs it to a local database, incorporating trading fees, to simulate a real trade. Design science according to Alan Hevner will also be used in this method. To better relate this to the use case of implementing a development prototype, the more specific Systems Development Framework in Information Systems Research will be used, which consists of five stages. Figure 5: System development research model according to Design Science by Hefner. Adopted from Nunamaker (Hevner and Chatterjee, 2010) In the first one, a conceptual framework is constructed by studying relevant disciplines through literature research and based on the method’s research question. Further the system architecture is created by developing a modular and extensible architecture and defining the functionalities of components and their interrelationships. In the third stage, the system is Construct a conceptual framework Develop a system architecture Analyze & design the system Build the prototype system Evaluate the system 37 analysed, and a process designed to carry out the system functions. In the next step the prototype is developed, according to the previous steps. In the fifth and last stage the system will be observed and evaluated by observing the use of the system by case studies (Hevner and Chatterjee, 2010). For this method, a software tool is selected, and a technical concept of the arbitrage trading system is planned. Followed by development of the prototype and continuous testing and improvements. 38 4 Data collection & management In this method, the research question of which information have to be gathered to enable decision making for arbitrage trading opportunities, will be answered. First, reliable sources for historical data of crypto asset prices per exchange are searched for. In the next step, diverse top exchanges are considered, data is mapped into the same format and saved as files. In the last step, data is analysed and processed to enable correct results. 4.1 Data sources There are various sources to gather historical data of crypto asset prices from exchanges. Possibilities are digital asset providers like kaiko.com, crypto utility sites like coinmarketcap.com, cryptocompare.com, coingecko.com, exchanges themselves or open- source libraries. In the context of this thesis, historical data will be collected via cost-free approaches. Needed are information about crypto assets, exchanges and respective OHLCV data, best per minute. This abbreviation stands for open, high, low, close price and volume per time interval. The desired period for the analysis of the historical data was chosen from January 1 st , 2022 to April 1 st , 2023, therefore over the last 16 months. Digital asset providers like kaiko.com, offer high quality data of exchanges in formats like tick-level, order book and OHLCV in different intervals. As these sources are the most convenient and reliable, also other papers like of Makarov and Schoar (2020) used them. The disadvantage of those is that they have a high price, as this is their main business. In the example of kaiko.com, exact information about prices is only available upon request. There is also access for academic institutions, but this must be purchased by the respective institution. For this reason, this data source is obsolete for this thesis. Crypto utility sites like the most popular coinmarketcap.com, cryptocompare.com or coingecko.com, offer diverse data about assets and exchanges. These are available on the websites of the platforms or accessible via their own APIs. These APIs provide both cost-free and paid access to data, with some differences in terms of features, limitations, and data types. Free access in general offers basic data such as coin lists, market data, historical information and social data. In varying limited forms also historical data. In contrast, paid plans offer more comprehensive and detailed data, including historical order book records, in-depth market analysis, detailed historical data, initial coin offering (ICO) information, enabling deeper insights. The information of assets and exchanges required for this thesis can therefore be accessed free of charge via these APIs. The best free access to historical data in OHLCV format is provided by cryptocompare.com. In time intervals per minute, however, only for the last day 39 and per hour for three months per call. The desired period over the last 16 months, in minute intervals, would only be possible with an annual subscription of around €4.500, similar to the other providers for the same results (“CoinGecko API Pricing Plans,” 2023; “Pricing | CryptoCompare API,” 2023; CoinMarketCap, 2023). However, by writing a script in python, which executes several requests with time intervals of three months via the free version of the CryptoCompare API, a similar result can be achieved. This is considered as the main data source in this thesis, also due to the high number of available exchanges. Exchanges themselves also offer historical data over their APIs. Unfortunately, this option is difficult to access without registering on each exchange due to the limited documentation of their APIs. Some exchanges like Binance, also offer historical data as download per currency pair as zip file. As this option is rarely available on other exchanges, these two sources are not shortlisted due to the high complexity and effort involved. To solve this complexity of data collection from individual exchanges directly, an open-source library can help. This is called CCXT (“CCXT – CryptoCurrency eXchange Trading Library,” 2023) and is used to connect and trade with cryptocurrency exchanges and provides quick access to market data. While it provides a direct connection to the exchangeʼs public APIs, these interfaces are easily accessible through the libraryʼs provided functions. Due to these masked, but direct API connections there is still the problem of exchange’s rate limits (“CCXT - Documentation,” 2023). Since the desired historical data is usually not available via one request, several must be made again via a script. However, the public APIs of exchanges are quite sensitive to those limits, which results in very long time periods for retrieving data, unless the own IP address is not blocked temporarily or for a longer period of time. CCXT offers connections to 101 exchanges, as of April 2023 (“CCXT – CryptoCurrency eXchange Trading Library,” 2023). This source is considered as the second data source for historical data of this thesis. It should be noted that OHLCV data in minute intervals would have been the desired time format, which most other papers also use. Due to the constraints of the APIs, it is necessary to work with hourly data, which is the first limitation in the scope of this thesis. 40 4.2 Assets and Exchanges All known papers, which have dealt with arbitrage opportunities in the crypto market, have only addressed top crypto assets, such as Bitcoin or Ethereum. It is known from literature research that arbitrage opportunities are more likely to arise when liquidity is lower and assets are not traded enough, that price differences close quickly or do not arise at all. For this reason, the selection is extended the top 100 assets by market cap on coinmarketcap.com and cryptocompare.com. These are the 100 assets, which are considering in the beginning, with abbreviation and full name: ETH - Ethereum USDT - Tether BNB - Binance Coin USDC - USD Coin BTC - Bitcoin XRP - XRP ADA - Cardano DOGE - Dogecoin MATIC - Polygon SOL - Solana DOT - Polkadot LTC - Litecoin SHIB - Shiba Inu BUSD - Binance USD AVAX - Avalanche TRX - TRON DAI - Dai WBTC - Wrapped Bitcoin LINK - Chainlink UNI - Uniswap ATOM - Cosmos OKB - OKB LEO - UNUS SED LEO ETC - Ethereum Classic XMR - Monero TON - Toncoin XLM - Stellar FIL - Filecoin BCH - Bitcoin Cash APT - Aptos LDO - Lido DAO TUSD - TrueUSD ARB - Arbitrum HBAR - Hedera NEAR - NEAR Protocol VET - VeChain CRO - Cronos ICP - Internet Computer Protocol APE - ApeCoin ALGO - Algorand GRT - The Graph QNT - Quant FTM - Fantom EOS - EOS STX - Stacks MANA - Decentraland AAVE - Aave THETA - Theta Network IMX - Immutable EGLD - MultiversX FLOW - Flow XTZ - Tezos AXS - Axie Infinity CFX - Conflux SAND - The Sandbox BIT - BitDAO RPL - Rocket Pool USDP - Pax Dollar 41 CHZ - Chiliz NEO - NEO KCS - KuCoin Token OP - Optimism CRV - Curve DAO KLAY - Klaytn GMX - GMX MKR - Maker LUNC – Terra Classic FXS - Frax Share USDD - USDD SNX - Synthetix MINA - Mina Protocol BSV - Bitcoin SV ZEC - Zcash CAKE - PancakeSwap DASH - Dash INJ - Injective Protocol HT - Huobi Token MIOTA - IOTA XEC - eCash RNDR - Render Token XDC - XDC Network GT - GateToken BTT - BitTorrent WOO - WOO Network RUNE - THORChain PAXG - PAX Gold CSPR - Casper AGIX - SingularityNET LRC - Loopring TWT - Trust Wallet Token ZIL - Zilliqa 1INCH - 1inch Network FLR - Flare CVX - Convex Finance KAVA - Kava DYDX - dYdX ETH - Ethereum USDT - Tether Table 3: Top 100 assets by market cap considered at the beginning with abbreviation and full name. These are the 85 available exchanges to gather data from cryptocompare.com and CCXT used on both sources, with abbreviation and full name: aax - AAX ABCC - ABCC Exchange ataix - ATAIX bequant - Bequant Bibox - Bibox BigONE - BigONE Binance - Binance binanceusa - Binance US Bit2C - Bit2C BitBank - BitBank BitBay - BitBay bitbuy - Bitbuy Bitfinex - Bitfinex bitFlyer - bitFlyer bitflyereu - bitFlyer EU bitflyerus - bitFlyer US Bithumb - Bithumb bithumbglobal - Bithumb Global Bitkub - Bitkub BitMart - BitMart Bitpanda - Bitpanda Bitso - Bitso Bitstamp - Bitstamp BitTrex - Bittrex blockchaincom - Blockchain.com BTCBOX - BTCBOX BTCMarkets - BTC Markets BTCTurk - BTCTurk btse - BTSE bullish - Bullish CBX - CBX Cexio - CEX.IO Coinbase - Coinbase Coincheck - Coincheck CoinCorner - CoinCorner CoinEx - CoinEx coinfield - CoinField CoinJar - CoinJar Coinmate - Coinmate 42 Coinone - Coinone Coinsbit - Coinsbit crosstower - CrossTower cryptodotcom - Crypto.com currency - Currency.com dcoin - Dcoin decoin - Decoin DigiFinex - DigiFinex eidoo - Eidoo erisx - ErisX etoro - eToro Exmo - Exmo ftxus - FTX US Gateio - Gate.io Gemini - Gemini gopax - GOPAX HitBTC - HitBTC huobijapan - Huobi Japan huobikorea - Huobi Korea HuobiPro - Huobi Global IndependentReserve - Independent Reserve indodax - Indodax itBit - itBit Korbit - Korbit Kraken - Kraken Kucoin - KuCoin LAToken - LATOKEN Liquid - Liquid lmax - LMAX Digital Luno - Luno Lykke - Lykke NDAX - NDAX nominex - Nominex OKCoin - OKCoin OKEX - OKEx P2PB2B - P2PB2B Paymium - Paymium probit - ProBit TheRockTrading - The Rock Trading Upbit - Upbit valr - VALR Vaultoro - Vaultoro Zaif - Zaif ZB - ZB.com ZBG - ZBG zebitex - Zebitex Table 4: 85 available exchanges from cryptocompare.com and CCXT with abbreviation and full name. As already mentioned, only assets that are traded against the fiat currency Euro are considered in this paper, so that no additional currency exchange rate conversions must be incorporated. Therefore, a first selection of assets and exchanges takes place on its own, which will already reduce the considered assets by half. This happens because not all exchanges support trading against EUR. Secondly, not all exchanges that do, support the same asset pairs. In addition, it can be expected that a few exchanges will provide entirely false data or only for certain asset pairs, which will then be removed manually in the step of data processing. As mentioned earlier in this thesis, centralized exchanges come at risk of losing custody of assets for users, single point of attack for hackers, subject to little regulation, lack of privacy or mismanagement of the exchange operators (Mohan, 2022), a basic overview of considerable exchanges has to be done. To get a shortlist of centralized exchanges, which can be considered as trustworthy, a comparison is done between the top 25 exchanges of coingecko.com, cryptocompare.com, coinmarketcap.com and kaiko.com, by their exchange score. These are ratings depending on different criteria, varying slightly per platform and are called Trust Scores, Exchange Scores or Points. This process included, scraping the first 25 rows of the ranking table of each website. Second, the results were concatenated per exchange with each rating per platform. In the next step, every exchange was excluded, which was not present in at least three top 25 rankings by platform. Additionally, exchanges 43 were excluded, which are not accessible on the European market. Since the location of the realisation of this thesis is Austria, Austriaʼs best-known crypto exchange Bitpanda (Bitpanda Pro, as this is the product where trading over an API possible) is added. 13 exchanges remain from this selection of the top 25 according to the respective ranking of the platforms. These are seen as preferred exchanges in the context of this thesis: Figure 6: Preferred exchanges, resulting from top 25 exchange rankings by platforms trust score. Scraped Websites with exchange ranking by Trust Scores, Exchange Scores or Points: Address of Exchange Ranking Ranked by Accessed www.coinmarketcap.com/rankings/exchanges/ Score 10/04/2023 www.cryptocompare.com/exchanges/#/overview?f2=Centralized Points 10/04/2023 www.coingecko.com/en/exchanges Trust Score 10/04/2023 www.kaiko.com/pages/exchange-ranking Kaiko Exchange Score™ 10/04/2023 Table 5: Scraped Websites with exchange ranking by Trust Scores, Exchange Scores or Points. 44 4.3 Collection of data As previously mentioned, OHLCV data per minute is not available free of charge from the main data source cryptocompare.com. For this reason, it is only possible to use hourly intervals as the best possible data granularity. From the library CCXT, which gets data from exchanges public API, retrieving OHLCV data in minute intervals was possible, but only from Binance, Bitstamp and Bitvavo correctly. Nevertheless, it was rarely the case that all three exchanges provided minute interval data for the same crypto asset over the desired 16 months. Additionally, CCXT delivered in general a lower number of exchanges, compared to cryptocompare.com, which should in fact be the same, as identical exchanges offer the same trading pairs. Due to this reason cryptocompare.com is used as the main data source and if data is incorrect, it is substituted with data from CCXT, if available. The gathered data is saved in the file structure as seen in Figure 7. In a main data directory, the script creates a folder with the name “historical_asset_EUR”, where asset is the asset, which is traded against the fiat currency Euro, which represents “EUR”. Per asset folder, the OHLCV data from the exchanges are saved, which offer the desired trading pair. Those are stored as “exchange_asset_h.csv”, where exchange stands for the respective trading platform and “h” for the data in hourly format. Data within these exchange files is saved in OHLCV format, which stands for Open, High, Low, Close price and Volume per time interval. Cryptocompare.com delivers volume divided into “Volume from” and “Volume to”, where the first, stands for the number of assets traded for EUR, while the second is the number of EUR traded against the respective asset (“Glossary of Trading Terms,” 2023). CCXT does not deliver both volumes, just the comparable “Volume from”. Since only this is needed for further use, this does not pose a problem. As every OHLCV Figure 7: File structure of gathered historical data. 45 data comes with a respective timestamp, the first column is used for this in the format of a Unix timestamp. This is a commonly used format in development and represents a way to track time as a running total of seconds starting from the Unix epoch, the January 1 st , 1970 at UTC. Figure 8: Process of gathering data from exchanges for cryptocompare.com and CCXT Both functions ‘historicalData_cryptocompare()’ and ‘historicalData_ccxt()’ for gathering the required data in OHLCV format from January 1 st , 2022 to April 1 st , 2023, can be found as Attachment A and B. The functions take the desired trading pairs as input, as well as start and end date, API address, exchange name and timeframe (1h). Both functions perform a loop that executes the function for each asset and exchange, which are stored as separate arrays. To speed up the process of these mainly Input/Output (I/O) tasks, those are executed in a multithreaded manner. The functions involve the steps of creating the directory per asset, requesting the API for data, setting up the csv file with column headers, creating a data frame with the requested data, filtering out rows which are outside the set timeframe, basic cleaning and writing to the csv file. Creating a directory per asset Requesting the API for data Setting up the csv file with column headers Creating a dataframe with the requested data Filtering out rows which are outside the set timeframe Cleaning and indexing Writing to the csv file 46 4.4 Data pre-processing From the top 100 considered assets, 82 were available to trade against EUR. For the first step of cleaning the data, a manual review is done, checking through every file and their respective file sizes to know if they are likely to contain all the intended data. Manual cleaning steps include: 1. Check every file and their respective file sizes to know if they are likely to contain all the planned data. 2. If only one exchange file is available per crypto asset, the exchange folder is removed, as at least two exchanges are needed to perform two-point arbitrage trading. 3. For some assets, exchange data was available, but contained throughout just one or a low number of constant values or just the value 0.0. These exchange files are therefore unusable and deleted. 4. Those deleted files from the previous step are noted and retried from the second data source CCXT. Of these 100 considered assets above, 48 remain from gathering data from cryptocompare.com, are available to trade against EUR, manually cleaned and are of acceptable data quality as far as can be judged in the first step. The following table lists them with abbreviation and full name: 1INCH - 1inch Network AAVE - Aave ADA - Cardano ALGO - Algorand APE - ApeCoin APT - Aptos ARB - Arbitrum ATOM - Cosmos AVAX - Avalanche AXS - Axie Infinity BAT - Basic Attention Token BCH - Bitcoin Cash BTC – Bitcoin CHZ - Chiliz CRO - Cronos CRV - Curve DAO DASH - Dash DOGE - Dogecoin DOT – Polkadot EGLD - MultiversX EOS - EOS ETC - Ethereum Classic ETH - Ethereum FIL - Filecoin FTM – Fantom GRT - The Graph ICP - Internet Computer IMX – Immutable LINK - Chainlink LRC - Loopring LTC – Litecoin MANA - Decentraland MASK - Mask Network MATIC - Polygon NEAR - NEAR Protocol SAND - The Sandbox SHIB - Shiba Inu SNX - Synthetix SOL - Solana TRX – TRON UNI - Uniswap USDC - USD Coin USDT – Tether VET - VeChain XLM - Stellar XRP - XRP XTZ - Tezos ZEC - Zcash Table 6: 48 available assets of the 100 considered, with abbreviation and full name. 47 5 Crypto asset filtering 5.1 Data cleaning & adjustments From the already pre-processed dataset of the previous method, it became clear during the implementation that the dataset needed to be further cleaned, as the results were extremely unrealistic. Data quality was therefore once again lower than assumed after the data collection. First, some exchanges deliver OHLCV data only after a certain timestamp and before that only the value 0.0. The reason for this problem can usually be traced back to the fact that on these exchanges the respective trading pairs were made available for trading after this timestamp. However, since these are tradable on the current day, no error is thrown from the time when this was not yet the case. Second, some exchanges deliver for specific exchanges a fixed value over longer periods of time, even if the volume changes, which however is mostly within the price range of the other exchanges. For some, the price ranges are completely out of the possible range, i.e. about 100 - 1000 times the average of the other exchanges and are therefore considered as wrong. These data points do not appear as an error at the beginning of the analysis, since data is available and these are also not considered as null values. For this reason, the incorrect data must be excluded and gathered again via cryptocompare.com or CCXT. If the new data is still the same, or not available from the second source, it will be removed from the comparison. Exchanges where these problems occur most often, sorted by number of occurrences: Exchange name Occurrences Bitstamp 15 Bitpanda 12 Kraken 9 Binance 6 Bittrex 2 Coinbase 1 Table 7: Exchanges with false data by occurrences By excluding these problematic exchanges and data points, the crypto asset filtering can maintain a higher level of reliability and accuracy. Nevertheless, the results calculated based on the rather low quality of the data set must be taken with caution. However, they can represent a trend or a benchmark. 48 5.2 Measurements In order to assess the price disparities between the assets, the arbitrage index is used, as it is also applied in other papers that examine arbitrage possibilities. In addition, for assets with a high arbitrage index, the price differences are calculated relative to the mean price to better illustrate price discrepancies. 5.2.1 Arbitrage Index To show the extent of price deviations across exchanges at a specific timeframe, the arbitrage index is computed, which calculates the maximum price difference between the exchanges. It gives a measure of the degree of price variation to identify potential arbitrage opportunities (Duan et al., 2021; Makarov and Schoar, 2020). The initial step involves calculating the arbitrage index for the given time interval, which for the given data is hourly. To do this, volume-weighted average price (VWAP) for each hour is determined for every exchange. The typical price per timeframe is needed first, which is calculated as: _);"`U&,#7"`% 2345 Y, -,; 2!62" [; 738 [,; 973:; ,/ a , The VWAP per timeframe is then calculated as the following, where b *6&(X% 2345 :&15& is referred to as the cumulated volume since the start of the observed price period. *c># 2345 Y, , b ,-_);"`U&,#7"`% 2345 ,d,*6&(X%/, 2345 :&15& ,, b *6&(X% 2345 :&15& , Subsequently, the maximum price across all exchanges is taken and divided by the minimum price. Finally, the arbitrage index is averaged at the daily level to reduce the impact of intra- day volatility. 49 The implementation to calculate the arbitrage index for each assets over their available exchanges, represents the function “arbitrageIndex_allCurrencies()”. This function will further be changed depending on the desired output, for example, the arbitrage index for only one asset: def arbitrageIndex_allCurrencies(): arbitrage_indices = {} # calculate the arbitrage index for each currency for currency in currency_folders: currency_name = os.path.basename(currency).split("_")[1] currency_exchangefiles = glob.glob(os.path.join(currency, ʼ*.csvʼ)) exchange_data = {} # format data and calculate the vwap per exchange for exchangefile in currency_exchangefiles: ohlcv_data = pd.read_csv(exchangefile) exchange = os.path.basename(exchangefile).split("_")[0] # ensure correct datetime format and set as index ohlcv_data[ʼtimeʼ] = pd.to_datetime(ohlcv_data[ʼtimeʼ], unit=ʼsʼ) ohlcv_data = ohlcv_data.set_index(ʼtimeʼ) # ensure all value columns are in float format value_columns = [ʼopenʼ, ʼhighʼ, ʼlowʼ, ʼcloseʼ, ʼvolumefromʼ] ohlcv_data[value_columns] = ohlcv_data[value_columns].astype(float) # exclud rows where all OHLCV values are 0.0, consider as null values ohlcv_data = ohlcv_data.drop(data[ (ohlcv_data[ʼopenʼ].eq(0.0)) & (ohlcv_data[ʼhighʼ].eq(0.0)) & (ohlcv_data[ʼlowʼ].eq(0.0)) & (ohlcv_data[ʼcloseʼ].eq(0.0)) & (ohlcv_data[ʼvolumefromʼ].eq(0.0)) ].index) # calculate the typical price and then VWAP ohlcv_data[ʼtypical_priceʼ] = (ohlcv_data[ʼlowʼ] + ohlcv_data[ʼcloseʼ] + ohlcv_data[ʼhighʼ]).div(3).values # Cumulative total of price times volume ohlcv_data[ʼprice*volumeʼ] = ohlcv_data[ʼtypical_priceʼ] * ohlcv_data[ʼvolumefromʼ] ohlcv_data[ʼcumulative_price*volumeʼ] = ohlcv_data[ʼprice*volumeʼ].cumsum() # Cumulative total of volume, then calculate VWAP ohlcv_data[ʼcumulative_volumeʼ] = ohlcv_data[ʼvolumefromʼ].cumsum() ohlcv_data[ʼvwapʼ] = ohlcv_data[ʼcumulative_price*volumeʼ] / ohlcv_data[ʼcumulative_volumeʼ] exchange_data[exchange] = ohlcv_data[ʼvwapʼ].dropna() 50 # if the number of exchange per currency is greater than 1, the arbitrage index is calculated if len(exchange_data) > 1: combined_data = pd.concat(exchange_data, axis=1) # get the max and min vwap per hour max_vwap_per_minute = combined_data.max(axis=1) min_vwap_per_minute = combined_data.min(axis=1) # calculate the arbitrage index arbitrage_ratios = max_vwap_per_minute / min_vwap_per_minute # Calculate the average arbitrage ratio at the daily level arbitrage_ratios = arbitrage_ratios.resample(ʼ1Dʼ).mean().dropna() # Add the currencyʼs arbitrage index to the dictionary arbitrage_indices[currency_name] = arbitrage_ratios return arbitrage_indices 5.2.2 Price differences To analyse price differences in relation to the mean price for each currency over a given period of time, the percental deviation of each data point must be calculated. This helps to understand the degree of price variation and identify potential arbitrage opportunities. As for the arbitrage index, the typical price is calculated first. Then the mean price is calculated over the entire period and the price differences per data point. The implementation to calculate the price deviations per asset assets over their available exchanges, represents the function “relative_priceDifferences_byMean()”: def relative_priceDifferences_byMean(currency): average_prices = {} currency_exchangefiles = glob.glob(os.path.join( fʼ../data/historical_{currency}_EURʼ, ʼ*.csvʼ )) # format data and calculate average price for exchangefile in currency_exchangefiles: ohlcv_data = pd.read_csv(exchangefile) exchange = os.path.basename(exchangefile).split("_")[0] # ensure correct datetime format and set as index ohlcv_data[ʼtimeʼ] = pd.to_datetime(ohlcv_data[ʼtimeʼ], unit=ʼsʼ) ohlcv_data = ohlcv_data.set_index(ʼtimeʼ) # ensure all value columns are in float format 51 Value_columns = [ʼopenʼ, ʼhighʼ, ʼlowʼ, ʼcloseʼ, ʼvolumefromʼ] ohlcv_data[value_columns] = ohlcv_data[value_columns].astype(float) # exclude rows where all OHLCV values are 0.0, consider as null values ohlcv_data = ohlcv_data.drop(ohlcv_data[ (ohlcv_data[ʼopenʼ].eq(0.0)) & (ohlcv_data[ʼhighʼ].eq(0.0)) & (ohlcv_data[ʼlowʼ].eq(0.0)) & (ohlcv_data[ʼcloseʼ].eq(0.0)) & (ohlcv_data[ʼvolumefromʼ].eq(0.0)) # calculate the typical price and ohlcv_data[ʼtypical_priceʼ] = (ohlcv_data[ʼhighʼ] + ohlcv_data[ʼlowʼ] + ohlcv_data[ʼcloseʼ]).div(3).values average_prices[exchange] = ohlcv_data[ʼavg_priceʼ] average_prices_df = pd.concat(average_prices, axis=1) # calculate the mean price and the price differences relative to the mean price mean_prices = average_prices_df.mean(axis=1) price_differences = average_prices_df.sub(mean_prices, axis=0) return price_differences 52 6 Arbitrage trading prototype In this chapter the aspects involved in the development and implementation of the arbitrage trading prototype are described. The goal of this system is to find arbitrage opportunities, exploit the differences of asset prices and simulate trades, as paper trading. 6.1 Programming Language The selection of a suitable way of developing the concept for the arbitrage trading system is essential. It is already known that a key point for success is how quickly a trading system can search and transmit information, specifically in terms of speed and latency of other traders (Brogaard et al., 2014; Brogaard and Garriott, 2019; Budish et al., 2015; Carrion, 2013; Kiuchi, 2022; Levus et al., 2021; O’Hara, 2015). To achieve this, the system must be developed in a convenient and fast approach. In the scope of this thesis, the system is implemented using the Python programming language. 6.2 Prototype Architecture The architectural approach of this arbitrage prototype, as seen in Figure 9, involves three actors, with their corresponding data streams and overview of their functions. The first, is the trading person, called the trader, which controls the system. This is who starts and stops the prototype and provides the desired websocket URLs, API keys and trading pairs. Additionally, the trader gets continuously notified about system updates of interest, which can be all incoming price updates, found arbitrage opportunities or simulated trades. The invoking function, which starts the system is “asyncio.run(main)”, as seen below. The second actor is the arbitrage prototype, which, in the scope if this thesis, runs locally on a computer. This handles the business logic by connecting to the exchanges, subscribing to the desired information, mapping the price streams into the same format, finding arbitrage opportunities including trading fees and simulating trades. Trader Arbitrage Prototype Exchange 1 Exchange n Websocket Websocket asyncio.run(main()) websocketsConnector() websocketSubscriber() mapBest_ask_price() arbitrageOpportunityFinder() calulcateFees() simulateTrade() 1. 2. 3. 4. 5. 6. 7. Figure 9: Architectural approach with three actors and their corresponding data streams 53 In this example the arbitrage prototype connects to the websockets of the exchanges Binance and Kraken. To benefit performance and handle the continuous I/O tasks efficiently, the software runs asynchronous to enable parallel websocket connections, for which the ‘asyncio’ library is used. As the incoming price streams of the exchanges websockets, must be analysed for profitable price differences, the price streams are stored in a temporary ‘price_data’ dictionary. This gets cleared every 300 milliseconds, due to only the last arbitrage opportunities should be found and longer time intervals lead to misleading results. The number of milliseconds was chosen arbitrary and 300 has proven to be a good time window in tests, in the scope of this thesis. Figure 10: File structure of the arbitrage trading prototype The file structure of the arbitrage trading system contains of a main, functions, dictionaries and database file, as seen in Figure 10. The “main.py” file represents the core of the system, from which all functions are invoked in an asynchronous manner. The “functions.py” file, contains all the business logic. The “dicts.py” file contains the dictionaries with the provided input information for the system. Additionally, the “db.py” file saves the desired logs of found arbitrage opportunities or simulated trades to a local db. 54 6.3 Finding Arbitrage Opportunities In this thesis the chosen type of arbitrage trading is pure arbitrage or also called two-point arbitrage, as explained before. For calculating the possible profit, calculations must be done for the arbitrage opportunity and the relating fees. For each trade " , the profit # ! is equal to the difference between the sell $%&& ! and buy ʼ() ! price multiplied by the volume * !" traded. The formula to calculate the profit is (Pauna, 2018): # ! + ,* ! -$%&& ! .,ʼ() ! / Taker fees for trading the quantity 8 9 :, on exchange i can be calculated by the following formulas (Hautsch et al., 2018). Where A (Ask) is the respective asset price and ; !#% - < / 9 : . > ! & - 8 / ,+,> ! & , e 0. ; !#% - 8 / f 6.4 Exchange Connections To programmatically interact with different exchanges, it is necessary to work with APIs. For the type of API interfaces, there are two options exchanges offer to communicate in both directions, which are Representational State Transfer (REST) and Websocket. As real time market data is needed from several exchanges to identify arbitrage opportunities fast, the call/response mechanism of REST, would need to be called multiple times in short, predefined intervals, which would not be suitable. In addition, the problem of rate limits arises again, which would represent a limitation or could result in temporary or permanent bans, when implementing a high frequency arbitrage trading system. Needed is a real-time data stream, offering price updates in the intervals available per exchange. For this use case Websockets are used, which first send a Websocket protocol handshake and then establish an open stream via a Transmission Control Protocol (TCP), if the request to the server was successful. Websocket addresses also use a different scheme, which is “wss” instead of “https”. Therefore, the websocket connection URL from the exchange Kraken looks like “wss://ws.kraken.com”. Once a connection has been established, the next step is to subscribe to the desired channel. In the case of this thesis, order book data of the respective trading pairs are needed, from which the highest ask price is filtered to execute a market order, as these can be executed immediately on centralized exchanges. For this, only the first 10% of the order book is needed, which also results in a smaller amount of incoming data and therefore less data to compute (“Websocket API | Binance Developers,” 2023; “Which API should I use?,” 2023). 55 As already mentioned, the prototype gets invoked from the “main.py” file in an asynchronous manner. This happens for every provided websocket URL in the dictionary and connects to the different websockets through the “websocketsConnector()” function, which takes the URL and exchange name as input: async def websocketsConnector(exchange_name, websocket_url): async with websockets.connect(websocket_url) as websocket: await websocketSubscriber(websocket, exchange_name) while True: message = await websocket.recv() mapped_data = mapBest_ask_price(json.loads(message), exchange_name) if mapped_data is not None: price_data[exchange_name] = mapped_data arbitrageOpportunityFinder() await websocketMessager(mapped_data) The “websocketsConnector()” invokes the next function “websocketSubscriber()” to subscribe to the desired channels and receive the corresponding price streams. As the subscription requests differ in format per exchange, these are sent depending on the passed exchange name: async def websocketSubscriber(websocket, exchange_name): if exchange_name == "kraken": await websocket.send(json.dumps( { "event":"subscribe", "subscription": { "name": "book","depth": 10 }, "pair": trading_pairs["kraken"] })) elif exchange_name == "binance": await websocket.send(json.dumps( { "method": "SUBSCRIBE", "params": [ f"{pair.lower()}@depth@100ms" for pair in trading_pairs["binance"] ], "id": 1 })) Those are getting mapped to find the best ask price, through the “mapBest_ask_price()” function and assigned to the temporary “price_data” variable. As different exchanges use different formats for trading pairs, they are mapped to the ASSET/EUR format, which is used 56 in most cases. Additionally, for instance Kraken abbreviates Bitcoin internally as XBT, instead of the commonly used BTC acronym. This function returns the best ask price as a float, the symbol and exchange name: def mapBest_ask_price(raw_data, exchange): if exchange.lower() == ʼbinanceʼ: if isinstance(raw_data, dict) and ʼeʼ in raw_data and raw_data[ʼeʼ] == ʼdepthUpdateʼ and ʼaʼ in raw_data and len(raw_data[ʼaʼ]) > 0: symbol = raw_data[ʼsʼ].replace(ʼEURʼ, ʼ/EURʼ) return float(raw_data[ʼaʼ][0][0]), symbol, exchange elif exchange.lower() == ʼkrakenʼ: if isinstance(raw_data, list) and len(raw_data) >= 2 and isinstance(raw_data[1], dict) and ʼaʼ in raw_data[1]: symbol = raw_data[-1].replace(ʼXBTʼ, ʼBTCʼ) return float(raw_data[1][ʼaʼ][0][0]), symbol, exchange else: raise ValueError("Unsupported exchange") As seen in the “websocketsConnector()” function, there’s also a logger function, which simply logs the output in the format “Exchange: Best Ask Price, Symbol”, if needed: async def websocketMessager(mapped_data): if mapped_data is not None: price, symbol, exchange = mapped_data print(f"{exchange}: {price} , {symbol}") The “websocketsConnector()” function, finally invokes the “arbitrageOpportunityFinder()” function to check if arbitrage opportunities exist. The system loops through the temporary "price_data" dictionary, which contains the mapped price stream of the exchanges, and depending on the trading pair, determines the maximum and minimum price in the given period. The maximum prices are stored in the max_price_Symbol and the minimum prices are stored in the min_price_Symbol list. These are defined as negative infinity for the example of maximum values, ensuring that any number encountered in the list will be greater than the initial value, and so the variable will be updated accordingly as it iterates through the list. In addition, price, symbol and exchange are also stored in the max/min_price_data_Symbol tuples. Further, trading fees are also included, which can be seen in the following function. In this example, an arbitrary value is taken, looking for price differences that are at least 1%. The higher this percentage is set, the less often arbitrage opportunities are found naturally, but therefore the possibility is also higher that they can be successfully converted. If such a predefined arbitrage opportunity is found, the system prints it to the console, with the corresponding information. 57 def arbitrageOpportunityFinder(): if price_data is not None: max_price_BTC = max_price_ETH = -float(ʼinfʼ) min_price_BTC = min_price_ETH = float(ʼinfʼ) max_price_data_BTC = min_price_data_BTC = max_price_data_ETH = min_price_data_ETH = None for exchange, (price, symbol, exchange_name) in price_data.items(): if symbol == ʼBTC/EURʼ: if price > max_price_BTC: max_price_BTC = price max_price_data_BTC = (price, symbol, exchange) if price < min_price_BTC: min_price_BTC = price min_price_data_BTC = (price, symbol, exchange) if symbol == ʼETH/EURʼ: if price > max_price_ETH: max_price_ETH = price max_price_data_ETH = (price, symbol, exchange) if price < min_price_ETH: min_price_ETH = price min_price_data_ETH = (price, symbol, exchange) else: pass max_price_BTC_wFees = calculateFees(max_price_BTC, exchange_name) min_price_BTC_wFees = calculateFees(min_price_BTC, exchange_name) max_price_ETH_wFees = calculateFees(max_price_ETH, exchange_name) min_price_ETH_wFees = calculateFees(min_price_ETH, exchange_name) price_difference_BTC = (max_price_BTC_wFees - min_price_BTC_wFees) / min_price_BTC_wFees price_difference_ETH = (max_price_ETH_wFees - min_price_ETH_wFees) / min_price_ETH_wFees if price_difference_BTC > 0.01: simulateTrade(price_difference_BTC, max_price_data_BTC, min_price_data_BTC) print(f"Price difference greater than 1% ({price_difference_BTC}):") print(f"Max price: {max_price_data_BTC}") print(f"Min price: {min_price_data_BTC}") if price_difference_ETH > 0.01: simulateTrade(price_difference_ETH, max_price_data_ETH, min_price_data_ETH) print(f"Price difference greater than 1% ({price_difference_ETH}):") print(f"Max price: {max_price_data_ETH}") print(f"Min price: {min_price_data_ETH}") 58 In the “arbitrageOpportunityFinder()” function, also the trading fees are calculated using the function "calculateFees()", at this step still without deposit and withdrawal fees: def calculateFees(price, exchange): return price + (price * (exchange_fees[exchange][ʼTaker Feeʼ][0][ʼpercentageʼ] / 100)) If a predefined arbitrage opportunity is found, the next step is to simulate the trade as paper trading in the function “simulateTrade()”, which logs it to a csv file, where it can then be analysed and evaluated again: def simulateTrade(price_difference, max_price_data, min_price_data): # write trade to csv with open(ʼ./trades.csvʼ, ʼaʼ) as f: f.write(f"{time.time()},{price_difference*100},{max_price_data[0]}, {max_price_data[1]},{max_price_data[2]},{min_price_data[0]}, {min_price_data[1]},{min_price_data[2]}\n") 6.5 Dictionaries To encapsulate all static data structures, better flexibility, performance and code reusability, a separate python file with multiple dictionaries is used. The first is “websocket_urls”, where all public websocket addresses are stored by exchange name. The second is “trading_pairs”, where all used formats of trading pairs are stored by exchange name, as they are not consistent. The third dictionary is “exchange_fees” where all taker fees, deposit EUR and withdraw EUR Fees are stored by exchange name. These are the dictionaries used for the exchange Kraken: websocket_urls = { "kraken": "wss://ws.kraken.com" } trading_pairs = { "kraken": ["XBT/EUR","ETH/EUR"] } exchange_fees = { ʼkrakenʼ: { ʼTaker Feeʼ: [ { ʼ30day-minAmountʼ: 0, ʼ30day-maxAmountʼ: 50000, 59 ʼpercentageʼ: 0.26, ʼabsoluteʼ: None, }, { ʼ30day-minAmountʼ: 50001, ʼ30day-maxAmountʼ: 100000, ʼpercentageʼ: 0.24, ʼabsoluteʼ: None, } ], ʼDeposit EURʼ: { ʼCredit Cardʼ: { ʼallowedʼ: True, ʼpercentageʼ: 3.75, ʼadditional-fixedʼ: 0.25, ʼabsoluteʼ: None, }, ʼSEPAʼ: { ʼpercentageʼ: None, ʼabsoluteʼ: 1, } }, ʼWithdraw EURʼ: { ʼCredit Cardʼ: { ʼallowedʼ: False, ʼpercentageʼ: None, ʼadditional-fixedʼ: None, ʼabsoluteʼ: None, }, ʼSEPAʼ: { ʼpercentageʼ: None, ʼabsoluteʼ: 1, } }, } } 60 6.6 Testing Since automated trading systems involves real money, testing its function is of particular importance. In the scope of this thesis, a differentiation can be made between detecting arbitrage opportunities, which will be the main focus, and simulating trades. For the first step of the detection, this can be tested in detail, as this does not involve the need for executing real trades yet. The testing type of the trading system used in the scope of this thesis, is referred to as paper trading, where the last step of the execution is written to a paper and evaluated for profitability again later. For the stage of setting trades, only speed and latency are the limitation factor, when converting a detected arbitrage opportunity to a successful arbitrage trade. 61 7 Results 7.1 Data collection & management Of the 100 considered assets in the beginning, 48 remain from gathering data from cryptocompare.com. Those are available to trade against EUR, cleaned and are of acceptable data quality as far as can be judged in the first method. This table shows the 48 assets with abbreviation and full name, which were considered for further use: 1INCH - 1inch Network AAVE - Aave ADA - Cardano ALGO - Algorand APE - ApeCoin APT - Aptos ARB - Arbitrum ATOM - Cosmos AVAX - Avalanche AXS - Axie Infinity BAT - Basic Attention Token BCH - Bitcoin Cash BTC - Bitcoin CHZ - Chiliz CRO - Cronos CRV - Curve DAO DASH - Dash DOGE - Dogecoin DOT - Polkadot EGLD - MultiversX EOS - EOS ETC - Ethereum Classic ETH - Ethereum FIL - Filecoin FTM - Fantom GRT - The Graph ICP - Internet Computer IMX - Immutable LINK - Chainlink LRC - Loopring LTC - Litecoin MANA - Decentraland MASK - Mask Network MATIC - Polygon NEAR - NEAR Protocol SAND - The Sandbox SHIB - Shiba Inu SNX - Synthetix SOL - Solana TRX - TRON UNI - Uniswap USDC - USD Coin USDT - Tether VET - VeChain XLM - Stellar XRP - XRP XTZ - Tezos ZEC - Zcash Table 8: The 48 assets, for which data was available, with abbreviation and full name Of the 85 available exchanges, 16 remain. Most of them were sorted out in the process, because they do not allow trading against Euro and the exchanges therefore send no data. Furthermore, some were further excluded in the data cleaning of this method, due to various mentioned criteria. This table shows the 16 remaining exchanges with abbreviation and full name, which were considered for further use: binance - Binance bitfinex - Bitfinex bitflyer - Bitflyer bitpanda - Bitpanda bitstamp - Bitstamp bittrex - Bittrex blockchaincom - Blockchain.com cexio - CEX.IO Coinbase - Coinbase coinfield - CoinField currency - Currency.com exmo - Exmo gemini – Gemini kraken - Kraken lmax - LMAX Digital paymium - Paymium Table 9: The 16 remaining exchanges from which data was available, with abbreviation and full name. 62 To show a data example, the first 24 hours of the desired time window from January 1 st , 2022 to April 1 st , 2023 are shown here. This sample is from the exchange Binance for the trading pair ETH/EUR, downloaded over the cryptocompare.com API: Figure 11: Data example of a Binance for ETH/EUR over the specified timeframe. 63 7.2 Crypto asset filtering Two measures were chosen to filter the 48 considered crypto assets for arbitrage opportunities, over the last 16 months. First the arbitrage index was calculated for all assets, across their available exchanges and then 10 each, depending on specific factors. With those results, the price differences were computed and visualized for the top 10 assets with the highest mean of arbitrage index. 7.2.1 Results for Arbitrage Index By calculating the arbitrage Index for all exchanges and over all their respective available exchanges, the results are as shown in the following graph. It can already be seen that a trend for arbitrage opportunities exists, as the arbitrage index should always equal 1, in an efficient market, where no price discrepancies can be found between markets. Figure 12: Arbitrage Index for all 48 assets overall their available exchanges In the next, Figure 13, to gain better insights, just the top ten assets, by market cap, are plotted. These are Bitcoin (BTC), Ethereum (ETH), US-Dollar Coin (USDT), Ripple (XRP), Cardano (ADA), Dogecoin (DOGE), Polygon (MATIC) und Solana (SOL). It can now be seen that Solana has the highest arbitrage index of those, which makes it the most possible of this 64 selection to encounter arbitrage opportunities across exchanges. In contrast, it can also be seen that USDT, which is a "stablecoin", is linked to the US dollar and, as the name already implies, should be stable. But a significant price jump can be observed around February/March 2022, which must be due to errors in the data. From the following statistical table in Figure 14, one can conclude that, SOL has the highest maximum value, standard deviation, as well as mean value. It can also be seen that the two stablecoins in the list USDC and USDT have the lowest mean value. Figure 13: Arbitrage Index for the top 10 assets (by Market cap) overall their available exchanges Figure 14: Corresponding table with statistical insights about the arbitrage Index for the top 10 assets (by market cap) 65 In comparison, in the next Figure 15, just the lowest ten assets by market cap from the considered list are plotted. These are 1inch Network (1INCH), Dash (DASH), Zcash (ZEC), Synthetix (SNX), Curve DAO Token (CRV), Chiliz (CHZ), Immutable (IMX), Axie Infinity (AXS), Tezos (XTZ) and Aave (AAVE). Already, there can generally be seen a higher volatility and higher indexes overall. While the majority are considerable steady in price, 1INCH, XTZ, AAVE and AXS stand out with a substantial higher arbitrage index. From the following statistical table in Figure 16, XTZ has the highest maximum value, standard deviation, as well as mean value. DASH in comparison has the lowest of those values. Figure 15: Arbitrage Index for the 10 assets with the lowest market cap of the considered list, overall their available exchanges Figure 16: Corresponding table with statistical insights about the arbitrage Index for the lowest 10 assets of the considered list (by market cap) 66 In the following Figure 17, the assets with the highest mean of arbitrage indexes are displayed. These are Decentraland (MANA), Cosmos (ATOM), Loopring (LRC), Solana (SOL), Tezos (XTZ), NEAR Protocol (NEAR), 1inch Network (1INCH), Algorand (ALGO), Shiba Inu (SHIB) and Polkadot (DOT). As expected for the assets with highest mean, those can be considered as the most likely to generate arbitrage opportunities. As from the statistical table, it can be concluded, that MANA is the asset with highest mean and standard deviation and DOT, the one with the lowest values. Figure 17: Arbitrage Index for the 10 assets with the highest mean, overall their available exchanges Figure 18: Corresponding table with statistical insights about the arbitrage Index for the 10 assets with the highest mean 67 In the following Figure 19, the assets with the lowest mean of arbitrage indexes are displayed. These are Ethereum Classic (ETC), USD Coin (USDC), Arbitrum (ARB), Zcash (ZEC), Tron (TRX), Mask Network (MASK), Dash (DASH), Cronos (CRO), Synthetix (SNX) and Filecoin (FIL). From these, it can be concluded that those are the most unlikely assets to expect arbitrage opportunities from. As from the statistical table, it can be concluded that ETC is the asset with lowest mean and CRO with the lowest standard deviation. Noteworthy about this graph is, that CRO has a mean close to zero till November 2022, but since then it has been rising strongly and continuously. Figure 19: Arbitrage Index for the 10 assets with the lowest mean, overall their available exchanges Figure 20: Corresponding table with statistical insights about the arbitrage Index for the 10 assets with the lowest mean 68 In order to get a better picture of all 48 assets, the entire table is shown below in Figure 21. This contains the abbreviations on the y-axis and the mean value, standard deviation, minimum and maximum value in the columns. The table is sorted by the mean value in ascending order of the considered timeframe, the last 16 months. Figure 21: All 48 assets, sorted by the arbitrage index mean value of the last 16 months. 69 7.2.2 Results for Price Differences On the basis of the results of the arbitrage index calculations, following are five computations for the price differences per asset over the last 16 months, over all available respective exchanges. Considered here are the top 10 crypto currencies with the highest mean value of the arbitrage index. For some of these, price differences occur in high frequency, but for some in low frequency and therefore clearly visible on a plot. In the following, the results for five assets are shown, for which the price differences are easily visible in the plot as well as in the data. In addition to the plots, a corresponding table is shown per asset with the highest price differences occurring. This also includes the lower price exchange, higher price exchange and the maximum price difference. To get a better comparison, the time component is also included, with "Time" of the price difference, as well as its "Start Time", "End Time" and the duration of the price difference of the time window of the arbitrage opportunity. In addition, in order to be able to assess the price difference better, the mean price at time is also given for all exchanges and in relation to this, the percentage price difference. 70 Figure 22: Relative Price Differences to the mean price for 1INCH Figure 23: Price Differences to the mean price table for 1INCH Four exchanges are available for 1INCH, which experience clear price differences. At the lowest and highest price of the exchanges, it is balanced and these change regularly. The highest relative price difference here was even around 39% and these two highest even lasted for 12 days. 71 Figure 24: Relative Price Differences to the mean price for ALGO Figure 25: Price Differences to the mean price table for ALGO Four exchanges are available for ALGO, which also experience clear price differences, but are definitely the largest at the beginning of the period and become increasingly smaller over time. When looking at the lowest and highest price of the exchanges, one can see that Exmo has the highest prices and is rarely among the lowest. The highest relative price difference was around 11%, but all arbitrage opportunities disappeared again within one day. 72 Figure 26: Relative Price Differences to the mean price for ATOM Figure 27: Price Differences to the mean price table for ATOM Six exchanges are available for ATOM, which also experience clear price differences and become smaller towards the end of the period. The lowest and highest prices of the exchanges are balanced. The highest relative price difference here was around 35%. Most of the arbitrage opportunities disappeared within one day, but one of about 15% remained over six days. 73 Figure 28: Relative Price Differences to the mean price for MANA Figure 29: Price Differences to the mean price table for MANA Four exchanges are available for MANA. It can be seen that phases were largest in April/May 2022, a bit smaller in July to August 2022 and smaller again in November 2022. From January to April 2023, MANA again experienced a phase of larger price differences. The exchanges are balanced at the lowest and highest price. The highest relative price difference here was around 18%. Most arbitrage opportunities disappeared within a day, but they remain for one or also three days. 74 Figure 30: Relative Price Differences to the mean price for XTZ Figure 31: Price Differences to the mean price table for XTZ Four exchanges are also available for XTZ. Price differences were obviously smaller here, but experienced few but large outliers. Looking at the lowest and highest price of the exchanges, one can see Cex.io with the highest prices. The highest relative price difference was even 60%. The strong but short price differences remained for two days or disappeared again within one day. 75 7.3 Arbitrage trading prototype The functionality and approach of the arbitrage trading prototype was already discussed in detail. Here, the results of the prototype will be evaluated. When the prototype is started, it runs until it is explicitly closed again. The results of the arbitrage opportunities found, are logged in the console and the trade is simulated with market orders, to test the viability of the identified openings, by writing the price differences, with minimum and maximum price and the respective symbol and exchange name to a csv file. In addition, a Unix timestamp is added. For this test, the arbitrage prototype was executed for Binance and Kraken for the trading pairs BTC/EUR and ETH/EUR, for eight hours on May 10th with a predefined minimum percentage of a price difference of 1% in this example. The following Figure 32, shows a snapshot of the first 20 arbitrage opportunities found: Figure 32: Example of the prototype logging to csv, where arbitrage opportunities were found. In this example the highest found arbitrage opportunity was 9.9% for ETH/EUR at 15:43 CEST, where Binance had the higher price. In this example, trading fees are already included, as they occur before the relative price difference of 1% is calculated. In those eight hours, Binance was always the higher priced and Kraken the lower priced exchange for Bitcoin and Ethereum. Therefore, opportunities for both assets were found. It must be noted 76 that in the case of a used arbitrage opportunity, i.e. when a trade is also executed, the opportunities that follow afterwards are no longer taken into account for a short period of time. This results from the necessary recalculation of the available assets on the exchanges in order to be able to sell them on the higher-priced ones, which would be required in an autonomous arbitrage trading system on the real market. Thus, the amount of directly consecutive arbitrage opportunities on the exchanges in this example cannot be considered as an opportunity. The number of arbitrage opportunities found are consequently exemplary. 77 8 Discussion 8.1 Interpretation of results Results shown in this thesis for occurring arbitrage opportunities were notable and in line with other papers (Brauneis and Mestel, 2018; Duan et al., 2021; Makarov and Schoar, 2020). Within the scope of this thesis, it was shown that arbitrage possibilities exist, based on data of the last 16 months, from January 1 st , 2022 to April 1 st , 2023. Most of the time, these disappear again within a day, but it was also possible to show that they can last a few or even up to 12 days. Furthermore, it was shown that relative price differences in relation to the mean price usually amount to a maximum of around 30%, but in exceptional cases can even reach a difference of up to 60%. To raise again the question of how trustworthy the results of the method for crypto asset filtering are, it must be mentioned again that the quality of data turned out to be not very high from the free sources considered. In addition, the OHLCV data were only available in hourly intervals. This means that the open, high, low, close price is taken per hour, where considerable price changes can already occur in one hour. To ease this limitation a little, the mean price of the day was already taken to reduce intraday volatility. In addition, it is extremely unlikely that these recorded prices occur at the same moment, e.g. in a fraction of a second, when decisions for a trade are being made. For this reason, the results cannot be regarded as straightforward, as prices on the crypto market can change quickly. However, they can certainly be seen as a trend, since arbitrage possibilities could also be demonstrated over longer periods of time in this thesis. For the results of the arbitrage prototype, it could be shown, consistent with the points above, that these opportunities can be found. These could also be detected in a test example for Bitcoin and Ethereum on the exchanges Binance and Kraken, over a period of 8 hours. The price differences were relatively up to nine percent. Subsequent trades with market orders were simulated by logging them to a csv. In order to be able to execute these trades successfully, other points such as execution and latency time of the exchanges come into effect, as well as assets available for sale on various exchanges. As these arbitrage possibilities also existed for a short, but sufficient period of time, it can be assumed that they can be used profitable. Testing the simulation with trade execution, with a starting capital, forms the future work of this thesis, among other activities. In the following the research questions are answered, starting with the main, how trading strategies of financial markets, such as arbitrage trading, also are applicable with assets on crypto markets in an automated way. In this thesis it could be shown that arbitrage trading can also be used on assets on the crypto market. First, because existing price differences were found in the method of crypto asset filtering, as well as subsequently in the prototype. 78 Second through utilising a system, written in a suitable programming language for the use case, which finds arbitrage opportunities and executes trades in an automated way. It can thus be said that arbitrage trading can also be applied to the crypto market and has turned out to be well-suited because of the suitable characteristics of the market. RQ 1.1 posed the question of which information must be gathered to enable decision making for arbitrage trading. For that, a dataset is needed that includes order prices per time intervals at which trades have taken place. Suitable is historical data in OHLCV (Open, High, Low, Close, Volume) format, which is available per asset and per exchange. For the accuracy of the calculations, the smallest possible time interval is recommended. In addition, for the exchanges or assets, information about trading fees, transfer times and regulatory/legal information must be gathered, as well as a certain market liquidity to enable trading. RQ 1.2 raised the question of which requirements and criteria exist for a crypto asset to be considered for arbitrage trading. In order to trade using two-point arbitrage method, at least two exchanges are needed on which an asset is listed, which should be available in the trader’s region and offer a certain security and reliability. For this, it must also have sufficient liquidity to be traded, which is not relevant for centralised exchanges, as they would otherwise not be offered. In addition, a crypto currency should have low transaction fees to mitigate the possibility of these absorbing the profits, which again is not as relevant on CEX as on DEX. If these criteria are met, the asset should be tested to see if it is likely to offer arbitrage opportunities. Crypto currencies like USDT or USDC, which are stablecoins, can be excluded, as the name implies. RQ 1.3 presents the question, in how an arbitrage trading strategy in the crypto market can be realized as a software prototype. For this, a performant programming language suitable for the use case must be chosen, which can be extended with systems for specific purposes. At least three actors are required for this, which are the trader himself, the arbitrage trading prototype and the exchanges to be attached. Those must be able to communicate with each other and the prototype must connect to the exchanges via websockets to receive real-time data. Subsequently, the system needs to be extended with business logic, by subscribing to the exchanges desired information, mapping the price streams into the same format, finding arbitrage opportunities including trading fees and executing trades. Then the software must be continuously tested, improved and extended with functions that enable fully autonomous arbitrage trading. As shown in this thesis, when implemented this way, an arbitrage trading prototype can find opportunities and simulate trades based on them. To implement this strategy fully and autonomously, further work is needed, such as real trade execution or a balancing strategy of assets between the exchanges, as described in the chapter Outlook and Future Work. 79 8.2 Limitations Two limitations have emerged in the course of this work. First, testing, since this is an academic thesis, the possibilities for tests with real money on the crypto market are limited. For this reason, paper trading was used, where the last step of the trade execution was logged to a csv file with all details that would otherwise have represented a trade. As centralized exchanges offer the possibility of market orders and these are used in the prototype to guarantee the price at which a trade is placed, these values can still be considered as realistic. One factor that cannot be considered, however, is execution time. This means the time it takes to actually execute a trade, since a minimum amount of time elapses during the duration of data processing, recognition of the arbitrage opportunity and the decision to place a trade. In order to be able to confirm the results of this thesis, it would therefore require not only a substantial starting capital but also a longer test phase or frequent sample tests over a longer period of time, as arbitrage opportunities vary over time. The second limitation is the previously mentioned data quality. Since free sources of crypto asset prices from various exchanges were used in this work, it became apparent that they are of significantly lower quality. This was noticeable in comparison to data quality of other papers that used paid data sources, which can cost up to several thousand euros. In addition, it was not possible to obtain OHLCV data per minute via free resources, only per hour, which makes prices difficult to evaluate. From these four known prices per hour, the typical price was used, but due to the nature of rapidly changing prices in the crypto market, it is very unlikely that the calculated prices occurred at the same moment. Since arbitrage opportunities only occur when assets are mispriced at the same time, an average price per hour is not a meaningful indicator. For this reason, the results of the second method are to be considered as a trend, but not as a basis for any monetary decisions. With financial resources, these limitations can be overcome and hence form a point in the next chapter. Due to these limitations, which could not be tackled in the scope of this thesis, no hypotheses to prove could be established sustainable, such as whether profits can be realised on the crypto market with arbitrage trading. 80 8.3 Outlook & Future work Limitations are given due to the general circumstances like money or time, or simply tasks, which are out of the scope of this thesis, but would benefit the arbitrage prototype. As in the scope of this thesis an arbitrage prototype was developed, it takes some extra methods to build a fully autonomous arbitrage trading system. Following are seven areas which propose future work and should be done to foster and enhance the findings of this paper. To begin with the limitations, first a re-test of the arbitrage measures of the second method of this thesis can be carried out, based on data of higher quality, i.e. from paid sources. Since the calculations of the arbitrage index and the price differences are already in place, it is easy to run the calculations again with new data in the same file structure and in OHLCV format. Second, substituting trade simulation with real trade execution on the crypto market, represents the next element of future work, which can be initiated with starting capital to buy and hold assets on different exchanges to sell. Based on this, further improvements and benchmark tests can be conducted in order to achieve a high performance of the arbitrage software in the interaction of the processing power of the executing computer or server and the latency of the exchanges. Third, in order to run an arbitrage system fully autonomously, a balancing strategy between the exchanges is needed to redistribute assets. Since purchases are made on exchanges with low prices, they consequently accumulate one-sidedly. These must be distributed to other exchanges with currently higher prices. Otherwise, the arbitrage possibilities expire and cannot be used if it is not possible to sell on the more expensive exchange. For this reason, this step is particularly important to develop an autonomous arbitrage system, since poorly distributed assets mean that arbitrage opportunities can only be exploited to a limited extent. Fourth, an extension method for the optimal trading amount per crypto asset can be added. In the context of this thesis, the basic functions for an arbitrage prototype were developed, so currently crypto assets are only seen as a whole unit. However, since several of these can be bought at once or only a fraction, this should be taken into account depending on the available money or assets per exchange. Fifth, the incoming price streams of the websockets are only passed through a simple dictionary in Python. This works for the test example mentioned in this thesis with BTC/EUR and ETH/EUR on Binance and Kraken. However, if assets and exchanges increase, the process of data streams and data mapping should be handled by a dedicated server, such as the event streaming platform Apache Kafka. 81 Sixth, to enhance the user experience, a Graphical User Interface (GUI) can be added. This can either be extended in Python by a simple local program or by a web application such as Angular. Currently, it is only available as a command line tool, to start and stop, but it could also be used to set custom trade thresholds, data stream overview, profit overview, asset distribution on the exchanges, live events of the software and many more. Final, in addition to the two-point arbitrage used in this thesis, through which market inefficiencies can be exploited across two exchanges, triangular arbitrage can also be tested in order to exploit inner inefficiencies of exchanges. 82 9 Conclusion In conclusion, this thesis investigated the feasibility of arbitrage trading in the cryptocurrency market, as in traditional financial markets. This was examined in a broader spectrum, through the inclusion of ultimately 48 crypto assets within the top 100 with the highest market cap, traded against EUR, on 16 exchanges. Based on those, a method to filter crypto assets by historic data was developed, which aimed to identify the ones with the highest possibility of experiencing price inefficiencies across exchanges. Further, these findings were extended with the development of an arbitrage trading prototype, which finds arbitrage opportunities using continuous real-time data of exchanges and simulates trades with market orders via paper trading to test the viability of the identified openings. This implementation was done, utilising centralized crypto exchanges. In summary, the results show that over the past 16 months, the history of cryptocurrency exchanges has been characterised by recurring episodes of opening and closing arbitrage opportunities, as well as a few periods of high arbitrage spreads that lasted for up to twelve days. Furthermore, it was shown that relative price differences usually amount to a maximum of around 30%, but in exceptional cases can even reach a difference of up to 60%. In addition, arbitrage opportunities increased significantly from the half year 2022 onwards for most assets and remained at a higher level than in the first half year 2022 until April 2023. Most of the time, prices equalize again within a day, when arbitrageurs make use of them, but at times it seems that arbitrage capital gets overwhelmed by the speculators or other characteristics in the crypto market. The results of this thesis therefore indicate that arbitrage opportunities do exist in this market, even if they should not in theory, as the economic equilibrium supporting hypotheses like the Efficient-Market Hypothesis, or the Law of one Price propose. The outcomes are therefore consistent with other connected papers for arbitrage opportunities on the crypto market from 2018 to 2021 (Brauneis and Mestel, 2018; Duan et al., 2021; Makarov and Schoar, 2020). Possible reasons for the occurrence of price differences are several aspects. In summary, these include the low level of regulation, the high number of speculators, a continuously available and easily accessible market and the independent pricing of each asset on every exchange. Another possible reason for the emergence of arbitrage opportunities for assets traded against fiat currencies are various capital controls, which supports the point of lower regulations, justifying that arbitrage spreads are smaller in two-way cryptocurrency trades, for example Bitcoin to Ethereum, than against Dollar or Euro (Makarov and Schoar, 2020). Overall, the results of this thesis suggest that arbitrage trading can be considered as a profitable strategy in the cryptocurrency market and that the methods and the prototype developed can be valuable tools for traders looking to leverage these opportunities. 83 However, it is important to note that the scope of this thesis included trade simulation and did not execute actual trades in the real market. Further research and development would be required to evaluate the performance of the proposed strategy in practice to implement a fully autonomous arbitrage trading system. 84 References [1] Al-Yahyaee, K.H., Mensi, W., Ko, H.-U., Yoon, S.-M., Kang, S.H., 2020. Why cryptocurrency markets are inefficient: The impact of liquidity and volatility. The North American Journal of Economics and Finance 52, 101168. https://doi.org/10.1016/j.najef.2020.101168 [2] Angerer, M., Neugebauer, T., Shachat, J., 2023. Arbitrage bots in experimental asset markets. Journal of Economic Behavior & Organization 206, 262–278. https://doi.org/10.1016/j.jebo.2022.12.004 [3] Bachelier, L., 1900. Theory of speculation. Scientific Annals of the École normale supérieure 17, 21–86. [4] Barbon, A., Ranaldo, A., 2021. On The Quality Of Cryptocurrency Markets: Centralized Versus Decentralized Exchanges. https://doi.org/10.48550/ARXIV.2112.07386 [5] Berg, J.A., Fritsch, R., Heimbach, L., Wattenhofer, R., 2022. An Empirical Study of Market Inefficiencies in Uniswap and SushiSwap. https://doi.org/10.48550/ARXIV.2203.07774 [6] Bitfinex | Our Fees [WWW Document], 2023. URL https://www.bitfinex.com/fees/ (accessed 3.30.23). [7] Black, F., 1971. Toward a Fully Automated Stock Exchange, Part I. Financial Analysts Journal 27, 28–35. https://doi.org/10.2469/faj.v27.n4.28 [8] Böhme, R., Christin, N., Edelman, B., Moore, T., 2015. Bitcoin: Economics, Technology, and Governance. Journal of Economic Perspectives 29, 213–238. https://doi.org/10.1257/jep.29.2.213 [9] Bouoiyour, J., Selmi, R., 2015. What Does Bitcoin Look Like? Annals of Economics and Finance 16, 449–492. [10] Bouoiyour, J., Selmi, R., Tiwari, A., 2014. Is Bitcoin business income or speculative bubble? Unconditional vs. conditional frequency domain analysis. MPRA Paper 59595. [11] Brauneis, A., Mestel, R., 2018. Price discovery of cryptocurrencies: Bitcoin and beyond. Economics Letters 165, 58–61. https://doi.org/10.1016/j.econlet.2018.02.001 [12] Brogaard, J., Garriott, C., 2019. High-Frequency Trading Competition. J. Financ. Quant. Anal. 54, 1469–1497. https://doi.org/10.1017/S0022109018001175 [13] Brogaard, J., Hendershott, T., Riordan, R., 2014. High-Frequency Trading and Price Discovery. Rev. Financ. Stud. 27, 2267–2306. https://doi.org/10.1093/rfs/hhu032 [14] Bruzgė, R., Šapkauskienė, A., 2022. Network analysis on Bitcoin arbitrage opportunities. The North American Journal of Economics and Finance 59, 101562. https://doi.org/10.1016/j.najef.2021.101562 [15] Buchholz, M., Delaney, J., Warren, J., Parker, J., 2012. Bits and Bets Information, Price Volatility, and Demand for Bitcoin. Economics 312. [16] Budish, E., Cramton, P., Shim, J., 2015. The High-Frequency Trading Arms Race: Frequent Batch Auctions as a Market Design Response*. The Quarterly Journal of Economics 130, 1547–1621. https://doi.org/10.1093/qje/qjv027 [17] Carhart, M.M., 1997. On Persistence in Mutual Fund Performance. The Journal of Finance 52, 57–82. https://doi.org/10.1111/j.1540-6261.1997.tb03808.x [18] Carrasco Blázquez, M., De la Orden De la Cruz, C., Prado Román, C., 2018. Pairs trading techniques: An empirical contrast. European Research on Management and Business Economics 24, 160–167. https://doi.org/10.1016/j.iedeen.2018.05.002 [19] Carrion, A., 2013. Very fast money: High-frequency trading on the NASDAQ. Journal of Financial Markets 16, 680–711. https://doi.org/10.1016/j.finmar.2013.06.005 [20] CCXT – CryptoCurrency eXchange Trading Library, 2023. 85 [21] CCXT - Documentation [WWW Document], 2023. URL https://docs.ccxt.com/#/?id=rate-limit (accessed 4.29.23). [22] Chordia, T., Roll, R., Subrahmanyam, A., 2008. Liquidity and market efficiency. Journal of Financial Economics 87, 249–268. https://doi.org/10.1016/j.jfineco.2007.03.005 [23] Ciaian, P., Rajcaniova, M., Kancs, d’Artis, 2016. The economics of BitCoin price formation. Applied Economics 48, 1799–1815. https://doi.org/10.1080/00036846.2015.1109038 [24] Clements, R., 2021. Built to Fail: The Inherent Fragility of Algorithmic Stablecoins. SSRN Journal. https://doi.org/10.2139/ssrn.3952045 [25] Coinbase pricing and fees disclosures [WWW Document], 2023. . Coinbase Help. URL https://help.coinbase.com/en/coinbase/trading-and-funding/pricing-and-f ees/fees (accessed 3.30.23). [26] CoinGecko API Pricing Plans [WWW Document], 2023. . CoinGecko. URL https://www.coingecko.com/en/api/pricing (accessed 4.29.23). [27] CoinMarketCap, 2023. CoinMarketCap API Pricing [WWW Document]. coinmarketcap.com. URL https://coinmarketcap.com/api/pricing/ (accessed 4.29.23). [28] Do, B., Faff, R., Hamza, K., 2006. A New Approach to Modeling and Estimation for Pairs Trading. Proceedings of 2006 Financial Management Association European Conference. [29] Duan, K., Li, Z., Urquhart, A., Ye, J., 2021. Dynamic efficiency and arbitrage potential in Bitcoin: A long-memory approach. International Review of Financial Analysis 75, 101725. https://doi.org/10.1016/j.irfa.2021.101725 [30] Dwyer, G.P., 2015. The economics of Bitcoin and similar private digital currencies. Journal of Financial Stability 17, 81–91. https://doi.org/10.1016/j.jfs.2014.11.006 [31] Egorova, K., 2018. Crypto Exchanges, Explained [WWW Document]. Cointelegraph. URL https://cointelegraph.com/explained/crypto-exchanges-explained (accessed 3.31.23). [32] Fama, E.F., 1970. Efficient Capital Markets: A Review of Theory and Empirical Work. The Journal of Finance 25, 383. https://doi.org/10.2307/2325486 [33] Fee Rate [WWW Document], 2023. . Binance. URL https://www.binance.com (accessed 3.30.23). [34] Fee Structures | Explore our trading fees | Kraken [WWW Document], 2023. URL https://www.kraken.com/features/fee-schedule (accessed 3.30.23). [35] Fernández-Pérez, A., Fernández-Rodríguez, F., Sosvilla-Rivero, S., 2012. Genetic Algorithm for Arbitrage with More than Three Currencies. TI 03, 181–186. https://doi.org/10.4236/ti.2012.33025 [36] Fischer, T., Krauss, C., Deinert, A., 2019. Statistical Arbitrage in Cryptocurrency Markets. JRFM 12, 31. https://doi.org/10.3390/jrfm12010031 [37] Fontana, C., 2015. Weak and strong no-arbitrage conditions for continuous financial markets. Int. J. Theor. Appl. Finan. 18, 1550005. https://doi.org/10.1142/S0219024915500053 [38] Foucault, T., Kadan, O., Kandel, E., 2005. Limit Order Book as a Market for Liquidity. The Review of Financial Studies 18, 1171–1217. [39] Fu, S., Wang, Q., Yu, J., Chen, S., 2022. FTX Collapse: A Ponzi Story. https://doi.org/10.48550/ARXIV.2212.09436 [40] Glossary of Trading Terms [WWW Document], 2023. . CryptoCompare. URL https://www.cryptocompare.com/coins/guides/glossary-of-trading-terms/ (accessed 4.30.23). [41] Goldenberg, T., 2018. Watch Out Crypto Exchanges, Decentralization Is Coming [WWW Document]. URL https://www.coindesk.com/markets/2018/05/31/watch-out- crypto-exchanges-decentralization-is-coming/ (accessed 3.27.23). 86 [42] Gould, M.D., Porter, M.A., Williams, S., McDonald, M., Fenn, D.J., Howison, S.D., 2013. Limit order books. Quantitative Finance 13, 1709–1742. https://doi.org/10.1080/14697688.2013.803148 [43] Gromb, D., Vayanos, D., 2018. The Dynamics of Financially Constrained Arbitrage: The Dynamics of Financially Constrained Arbitrage. The Journal of Finance 73, 1713–1750. https://doi.org/10.1111/jofi.12689 [44] Gromb, D., Vayanos, D., 2002. Equilibrium and welfare in markets with financially constrained arbitrageurs. Journal of Financial Economics 66, 361–407. https://doi.org/10.1016/S0304-405X(02)00228-3 [45] Hautsch, N., Scheuch, C., Voigt, S., 2018. Limits to Arbitrage in Markets With Stochastic Settlement Latency. SSRN Journal. https://doi.org/10.2139/ssrn.3302159 [46] Heckel, M., Waldenberger, F. (Eds.), 2022. The Future of Financial Systems in the Digital Age: Perspectives from Europe and Japan, Perspectives in Law, Business and Innovation. Springer Singapore, Singapore. https://doi.org/10.1007/978-981-16-7830- 1 [47] Hevner, A., Chatterjee, S., 2010. Design Science Research in Information Systems, in: Design Research in Information Systems, Integrated Series in Information Systems. Springer US, Boston, MA, pp. 9–22. https://doi.org/10.1007/978-1-4419- 5653-8_2 [48] Hevner, March, Park, Ram, 2004. Design Science in Information Systems Research. MIS Quarterly 28, 75. https://doi.org/10.2307/25148625 [49] Holste, B., Gallus, C., 2019. Sind Krypto-Währungsmärkte Fair? (Are Crypto- Currency Markets Fair?). SSRN Journal. https://doi.org/10.2139/ssrn.3466919 [50] Isard, P., 1976. How Far Can We Push The “Law of One Price”? Int. finance discuss. pap. 1976, 1–22. https://doi.org/10.17016/IFDP.1976.84 [51] Jensen, M.C., 2002. Some Anomalous Evidence Regarding Market Efficiency. SSRN Journal. https://doi.org/10.2139/ssrn.244159 [52] Jofre, A., Rockafellar, R.T., Wets, R.J.-B., 2014. General Economic Equilibrium with Financial Markets and Retainability. SSRN Journal. https://doi.org/10.2139/ssrn.2460128 [53] Johnstone, S., 2019. Requisites for Development of a Regulated Secondary Market in Digital Assets. SSRN Journal. https://doi.org/10.2139/ssrn.3379623 [54] Kabašinskas, A., Šutienė, K., 2021. Key Roles of Crypto-Exchanges in Generating Arbitrage Opportunities. Entropy 23, 455. https://doi.org/10.3390/e23040455 [55] Kakushadze, Z., Yu, W., 2019. Altcoin-Bitcoin Arbitrage. SSRN Journal. https://doi.org/10.2139/ssrn.3327524 [56] Keim, D.B., Madhavan, A., 1997. Transactions costs and investment style: an inter- exchange analysis of institutional equity trades. Journal of Financial Economics 46, 265–292. https://doi.org/10.1016/S0304-405X(97)00031-7 [57] Kiuchi, T., 2022. High-Frequency Trading in Japan: A Unique Evolution, in: Heckel, M., Waldenberger, F. (Eds.), The Future of Financial Systems in the Digital Age, Perspectives in Law, Business and Innovation. Springer Singapore, Singapore, pp. 159–183. https://doi.org/10.1007/978-981-16-7830-1_9 [58] Kristoufek, L., 2013. BitCoin meets Google Trends and Wikipedia: Quantifying the relationship between phenomena of the Internet era. Sci Rep 3, 3415. https://doi.org/10.1038/srep03415 [59] Krückeberg, S., Scholz, P., 2020. Decentralized Efficiency? Arbitrage in Bitcoin Markets. Financial Analysts Journal 76, 135–152. https://doi.org/10.1080/0015198X.2020.1733902 [60] Kühl, M., 2010. Bivariate cointegration of major exchange rates, cross-market efficiency and the introduction of the Euro. Journal of Economics and Business 62, 1– 19. https://doi.org/10.1016/j.jeconbus.2009.07.002 87 [61] Kyle, A.S., 1985. Continuous Auctions and Insider Trading. Econometrica 53, 1315. https://doi.org/10.2307/1913210 [62] Lee, S., Meslmani, N.E., Switzer, L.N., 2020. Pricing Efficiency and Arbitrage in the Bitcoin Spot and Futures Markets. Research in International Business and Finance 53, 101200. https://doi.org/10.1016/j.ribaf.2020.101200 [63] Levus, R., Berko, A., Chyrun, L., Panasyuk, V., Hrubel, M., 2021. Intelligent System for Arbitrage Situations Searching in the Cryptocurrency Market, in: MoMLeT+DS. [64] Liu, G., Yu, C.-P., Shiu, S.-N., Shih, I.-T., 2022. The Efficient Market Hypothesis and the Fractal Market Hypothesis: Interfluves, Fusions, and Evolutions. SAGE Open 12, 215824402210821. https://doi.org/10.1177/21582440221082137 [65] Makarov, I., Schoar, A., 2020. Trading and arbitrage in cryptocurrency markets. Journal of Financial Economics 135, 293–319. https://doi.org/10.1016/j.jfineco.2019.07.001 [66] Malinova, K., Park, A., 2011. Subsidizing Liquidity: The Impact of Make/Take Fees on Market Quality. SSRN Journal. https://doi.org/10.2139/ssrn.1944054 [67] Mohan, V., 2022. Automated market makers and decentralized exchanges: a DeFi primer. Financ Innov 8, 20. https://doi.org/10.1186/s40854-021-00314-5 [68] Nakamoto, S., 2008. Bitcoin: A Peer-to-Peer Electronic Cash System. [69] O’Hara, M., 2015. High frequency market microstructure. Journal of Financial Economics 116, 257–270. https://doi.org/10.1016/j.jfineco.2015.01.003 [70] Ozenbas, D., Pagano, M.S., Schwartz, R.A., Weber, B.W., 2022. Liquidity, Markets and Trading in Action: An Interdisciplinary Perspective, Classroom Companion: Business. Springer International Publishing, Cham. https://doi.org/10.1007/978-3- 030-74817-3 [71] Parlour, C.A., Seppi, D.J., 2008. Limit Order Markets: A Survey, in: Handbook of Financial Intermediation and Banking. Elsevier, pp. 63–96. https://doi.org/10.1016/B978-044451558-2.50007-6 [72] Pauna, C., 2018. Arbitrage Trading Systems for Cryptocurrencies. Design Principles and Server Architecture. IE 22, 35–42. https://doi.org/10.12948/issn14531305/22.2.2018.04 [73] Poitras, G., 2010. Arbitrage: Historical Perspectives, in: Cont, R. (Ed.), Encyclopedia of Quantitative Finance. John Wiley & Sons, Ltd, Chichester, UK, p. eqf01010. https://doi.org/10.1002/9780470061602.eqf01010 [74] Pourpounehnajafabadi, M., Nielsen, K., Ross, O., 2020. Automated Market Makers. Department of Food and Resource Economics, University of Copenhagen IFRO Working Paper. [75] Pricing | CryptoCompare API [WWW Document], 2023. . CryptoCompare. URL https://min-api.cryptocompare.com/pricing (accessed 4.29.23). [76] Ross, S.A., 1976. The arbitrage theory of capital asset pricing. Journal of Economic Theory 13, 341–360. https://doi.org/10.1016/0022-0531(76)90046-6 [77] Saengchote, K., 2021. A DeFi Bank Run: Iron Finance, IRON Stablecoin, and the Fall of TITAN. SSRN Journal. https://doi.org/10.2139/ssrn.3888089 [78] Schär, F., 2020. Decentralized Finance: On Blockchain- and Smart Contract-based Financial Markets. SSRN Journal. https://doi.org/10.2139/ssrn.3571335 [79] Total Cryptocurrency Market Cap [WWW Document], 2023. . CoinMarketCap. URL https://coinmarketcap.com/charts/ (accessed 4.25.23). [80] Urquhart, A., 2016. The inefficiency of Bitcoin. Economics Letters 148, 80–82. https://doi.org/10.1016/j.econlet.2016.09.019 [81] Websocket API | Binance Developers [WWW Document], 2023. URL https://developers.binance.com/docs/binance-trading-api/websocket_api (accessed 2.4.23). [82] Wei, W.C., 2018. Liquidity and market efficiency in cryptocurrencies. Economics Letters 168, 21–24. https://doi.org/10.1016/j.econlet.2018.04.003 88 [83] Weron, A., Weron, R., 2000. Fractal market hypothesis and two power-laws. Chaos, Solitons & Fractals 11, 289–296. https://doi.org/10.1016/S0960-0779(98)00295-1 [84] Which API should I use? REST versus WebSocket [WWW Document], 2023. . Kraken. URL https://support.kraken.com/hc/en-us/articles/4404197772052-Which- API-should-I-use-REST-versus-WebSocket (accessed 2.4.23). [85] Zhang, W., Wang, P., Li, X., Shen, D., 2018. The inefficiency of cryptocurrency and its cross-correlation with Dow Jones Industrial Average. Physica A: Statistical Mechanics and its Applications 510, 658–670. https://doi.org/10.1016/j.physa.2018.07.032 89 List of Figures Figure 1: Visualization of a two-point arbitrage trading process ............................................ 19 Figure 2: Visualization of a triangular arbitrage trading process ............................................ 21 Figure 3: All observed price fluctuations occur due to shifts in demand (Buchholz et al., 2012). ....................................................................... .............................................................. 26 Figure 4: Schematic functionality of a Limit Order Book System (Gould et al., 2013) ........... 28 Figure 5: System development research model according to Design Science by Hefner. Adopted from Nunamaker (Hevner and Chatterjee, 2010) .................................................... 36 Figure 6: Preferred exchanges, resulting from top 25 exchange rankings by platforms trust score. ....................................................................... .............................................................. 43 Figure 7: File structure of gathered historical data. ................................................................ 44 Figure 8: Process of gathering data from exchanges for cryptocompare.com and CCXT .... 45 Figure 9: Architectural approach with three actors and their corresponding data streams .... 52 Figure 10: File structure of the arbitrage trading prototype .................................................... 53 Figure 11: Data example of a Binance for ETH/EUR over the specified timeframe. ............. 62 Figure 12: Arbitrage Index for all 48 assets overall their available exchanges ...................... 63 Figure 13: Arbitrage Index for the top 10 assets (by Market cap) overall their available exchanges ....................................................................... ....................................................... 64 Figure 14: Corresponding table with statistical insights about the arbitrage Index for the top 10 assets (by market cap) ....................................................................... ............................... 64 Figure 15: Arbitrage Index for the 10 assets with the lowest market cap of the considered list, overall their available exchanges ....................................................................... .................... 65 Figure 16: Corresponding table with statistical insights about the arbitrage Index for the lowest 10 assets of the considered list (by market cap) ........................................................ 65 Figure 17: Arbitrage Index for the 10 assets with the highest mean, overall their available exchanges ....................................................................... ....................................................... 66 Figure 18: Corresponding table with statistical insights about the arbitrage Index for the 10 assets with the highest mean ....................................................................... .......................... 66 Figure 19: Arbitrage Index for the 10 assets with the lowest mean, overall their available exchanges ....................................................................... ....................................................... 67 Figure 20: Corresponding table with statistical insights about the arbitrage Index for the 10 assets with the lowest mean ....................................................................... ........................... 67 Figure 21: All 48 assets, sorted by the arbitrage index mean value of the last 16 months. ... 68 Figure 22: Relative Price Differences to the mean price for 1INCH ....................................... 70 Figure 23: Price Differences to the mean price table for 1INCH ............................................ 70 Figure 24: Relative Price Differences to the mean price for ALGO ....................................... 71 Figure 25: Price Differences to the mean price table for ALGO ............................................ 71 Figure 26: Relative Price Differences to the mean price for ATOM ....................................... 72 Figure 27: Price Differences to the mean price table for ATOM ............................................ 72 90 Figure 28: Relative Price Differences to the mean price for MANA ....................................... 73 Figure 29: Price Differences to the mean price table for MANA ............................................ 73 Figure 30: Relative Price Differences to the mean price for XTZ ........................................... 74 Figure 31: Price Differences to the mean price table for XTZ ................................................ 74 Figure 32: Example of the prototype logging to csv, where arbitrage opportunities were found. ....................................................................... .............................................................. 75 91 List of Tables Table 1: Research Questions and used methods. ................................................................... 9 Table 2: Keyword search results ....................................................................... ..................... 12 Table 3: Top 100 assets by market cap considered at the beginning with abbreviation and full name. ....................................................................... .............................................................. 41 Table 4: 85 available exchanges from cryptocompare.com and CCXT with abbreviation and full name. ....................................................................... ........................................................ 42 Table 5: Scraped Websites with exchange ranking by Trust Scores, Exchange Scores or Points. ....................................................................... ............................................................. 43 Table 6: 48 available assets of the 100 considered, with abbreviation and full name. .......... 46 Table 7: Exchanges with false data by occurrences .............................................................. 47 Table 8: The 48 assets, for which data was available, with abbreviation and full name ........ 61 Table 9: The 16 remaining exchanges from which data was available, with abbreviation and full name. ....................................................................... ........................................................ 61 92 List of Abbreviations API Application Programming Interface APT Arbitrage Pricing Theory ATS Alternative Trading Service BTC Bitcoin CEX Centralised Exchange DEX Decentralised Exchange EMH Efficient Market Hypothesis ETH Ethereum EUR Euro GUI Graphical User Interface I/O Input/Output ICO Initial Coin Offering LOB Limit Order Book OHLCV Open, High, Low, Close, Volume OTC Over-the-counter Market REST Representational State Transfer RQ Research Question SPOF Single Point of Failure TCP Transmission Control Protocol VWAP Volume Weighted Average Price 93 Attachment A: Data gathering from cryptocompare.com def historicalData_cryptocompare(fsym, tsym, limit, start_timestamp, api_url, exchange): symbol = f"{fsym}_{tsym}" folder_name = f"data/historical_{symbol}" # create the folder if it doesnʼt exist and set the filename if not os.path.exists(folder_name): os.makedirs(folder_name) filename = f"{folder_name}/{exchange}_{symbol}_h.csv" # Create csv and write the header with open(filename, mode=ʼwʼ, newline=ʼʼ) as csv_file: fieldnames = [ʼtimeʼ, ʼopenʼ, ʼhighʼ, ʼlowʼ, ʼcloseʼ, ʼvolumefromʼ, ʼvolumetoʼ] writer = csv.DictWriter(csv_file, fieldnames=fieldnames) writer.writeheader() to_ts = None file_created = True while True: current_url = f"{api_url}&e={exchange}&fsym={fsym}&tsym={tsym}&limit={limit}" # check if it is the first request if to_ts is not None: current_url += f"&toTs={to_ts}" # get the response from the API response = requests.get(current_url) ohlcv_data = response.json() # if the request throws an error, the trading pair is not available and getʼs skipped. # Also, the created file is removed if ohlcv_data.get("Response") == "Error": print(f"Skipping {exchange}: { ohlcv_data.get(ʼMessageʼ) }") file_created = False os.remove(filename) break; return data_points = ohlcv_data[ʼDataʼ][ʼDataʼ] earliest_timestamp = ohlcv_data[ʼDataʼ][ʼTimeFromʼ] # Write the data to csv for every request with open(filename, mode=ʼaʼ, newline=ʼʼ) as csv_file: fieldnames = [ʼtimeʼ, ʼopenʼ, ʼhighʼ, ʼlowʼ, ʼcloseʼ, ʼvolumefromʼ, ʼvolumetoʼ] writer = csv.DictWriter(csv_file, fieldnames=fieldnames) for data_point in data_points: writer.writerow({k: data_point[k] for k in fieldnames}) 94 # Break the while loop if the beginning of the historical data is reached if earliest_timestamp <= start_timestamp: break to_ts = earliest_timestamp if file_created: # Clean csv and remove rows outside the specified time range df = pd.read_csv(filename) df[ʼtimeʼ] = df[ʼtimeʼ].astype(int) df = df.sort_values(by=ʼtimeʼ) df = df[(df[ʼtimeʼ] >= start_timestamp) & (df[ʼtimeʼ] <= end_timestamp)] df = df.sort_values(by=ʼtimeʼ) df = df.drop_duplicates(subset=ʼtimeʼ, keep=ʼfirstʼ) df.to_csv(filename, index=False) 95 Attachment B: Data gathering from CCXT Library def historicalData_ccxt(exchange_id, symbol, timeframe=ʼ1hʼ, start_date=None, end_date=None): # get exchange data exchange = getattr(ccxt, exchange_id)() # check if the exchange offers OHLCV data if exchange.has[ʼfetchOHLCVʼ]: try: if start_date and end_date: since = exchange.parse8601(start_date) until = exchange.parse8601(end_date) ohlcv = [] while since < until: ohlcv_data = exchange.fetch_ohlcv(symbol, timeframe, since) if not ohlcv_data: break since = ohlcv_data[-1][0] + 1 ohlcv += ohlcv_data time.sleep(exchange.rateLimit / 1000) # create the folder if it doesnʼt exist and set the filename folder_name = f"data/historical_{symbol.replace(ʼ/ʼ, ʼ_ʼ)}" if not os.path.exists(folder_name): os.makedirs(folder_name) filename = f"{folder_name}/{exchange_id}_{symbol.replace(ʼ/ʼ, ʼ_ʼ)}_h.csv" # Convert to dataframe, clean csv and remove rows outside the specified time range df = pd.DataFrame(ohlcv, columns=["time", "open", "high", "low", "close", "volume"]) df[ʼtimeʼ] = df[ʼtimeʼ] / 1000 df[ʼtimeʼ] = df[ʼtimeʼ].astype(int) df = df[(df[ʼtimeʼ] >= start_timestamp) & (df[ʼtimeʼ] <= end_timestamp)] df = df.sort_values(by=ʼtimeʼ) df = df.drop_duplicates(subset=ʼtimeʼ, keep=ʼfirstʼ) # Write the data to csv df.to_csv(filename, index=False) print(f"Historical data saved for {symbol} on {exchange_id}") except Exception as message: print(f"Error for {symbol} on {exchange_id}: {message}") else: print(f"{exchange_id} does not support fetchOHLCV")