📜 ⬆️ ⬇️

Guide: how to use Python for algorithmic trading on the exchange. Part 1



Technology has become an asset - financial institutions are now not only engaged in their core business, but are paying a lot of attention to new developments. We have already told that in the world of high-frequency trading, the owners of not only the most efficient, but also fast software and hardware achieve the best results.

Among the most popular in the field of finance programming languages ​​can be noted R and Python, C ++, C # and Java are also often used. A guide published on the DataCamp website talks about how to start using Python to create financial applications - we present you a series of articles-adaptations of the chapters of this material.
')
Management structure:


Introduction: simple language about the sphere of finance


Before plunging into the world of trading strategies, it makes sense to touch on basic concepts. However, this does not mean that what will be discussed below is intended entirely for beginners. It would be great if you first familiarize yourself with the course on using Python to work with data , and also imagine how to work with lists and Python packages, as well as at least at a basic level are familiar with NumPy and Pandas.

Shares and trading on the stock exchange


When a company wants to continue developing its business, launch new projects or expand, it can use stocks as a financing instrument. A share represents a share in the ownership of a company; shares are exchanged for money. Shares can be bought and sold: participants in such transactions conduct transactions with existing, previously issued shares.

The price at which a particular stock will be sold or bought may constantly change, regardless of the business performance of the issuing company stock: everything is determined by supply and demand. It is important to understand the difference between stocks and, for example, bonds (bonds), which are used to attract borrowed funds.

When it comes to trading, not only the sale and purchase of shares can be considered - a transaction can be concluded for various assets, including financial instruments and, for example, precious metals or resources like oil.

When buying shares, the investor gets a certain share in the company, from which he can gain financial gain in the future by selling this share. Strategies may vary: there are long deals (long) concluded in the hope of further growth of the shares, and short ones when the investor assumes that the shares will become cheaper, therefore he sells the shares in the hope of “buying back” them at a lower price in the future.

Developing a trading strategy involves several stages, which is similar, for example, to building machine learning models: first you need to formulate a strategy and describe it in a format that allows you to run it on a computer, then you need to test the performance of the resulting program, optimize it, and then evaluate the performance and reliable performance.

Trading strategies are usually tested using backtesting: this is an approach in which the strategy is “run” on historical trading data — based on them, the program generates transactions. This makes it possible to understand whether such a strategy would bring an income given the development of the market situation that was observed in the past. Thus, it is possible to preliminarily assess the prospects of a strategy in real-time trading. At the same time, there are no guarantees that good indicators on historical data will be repeated when working in the real market.

Time series data


A time series is a sequence of digital data received at consecutive equal intervals of time. In the field of finance, such series are used to track price movements over a certain period of time, recorded at equal intervals. Here is what it looks like:



Dates are located on the X axis, and the price is on the Y axis. "Consecutive equal intervals of time" in this case means that the dates on the time axis are located at a two-week interval: you can compare 3/7/2005 and 3/31/2005, and 4/5/2005 and 4/19/2005 ( here the dates are recorded in the format adopted in the USA, when the month first comes and then the day ).

However, financial data usually includes not two parameters (price and date), but five - in addition to the value of the trading period, it is the opening price of the trading period, the highest and lowest price within it, as well as the price at the time of the closing period. This means that if we consider the day period, the data analysis will give us information about the level at which the price was at the start and end of trading on the selected day, as well as the maximum and minimum price during the bidding.

Above, we have described the basic concepts that you need to know in order to continue studying this manual.

Basics of Python for Finance: Pandas


One of the most popular tools when using Python to develop financial applications is the Pandas package. It is needed at the very beginning, but as you go deeper into the development process, you will also need packages such as NumPy, SciPy, Matplotlib.

First, focus on Pandas and apply this tool to time series analysis. Below we will discuss how to use this package to import data, analyze and manipulate it.

Import Financial Data


The pandas-datareader package allows you to receive data from sources such as Google, Yahoo! Finance or the World Bank - for more information about the available data sources, see the documentation . This tutorial will look at getting data from Yahoo! Finance. To get started, you need to install the latest version of the package using pip:

pip install pandas-datareader 

Installation instructions for the development version are presented here .

 import pandas_datareader as pdr import datetime aapl = pdr.get_data_yahoo('AAPL', start=datetime.datetime(2006, 10, 1), end=datetime.datetime(2012, 1, 1)) 

Not so long ago there were changes in the Yahoo API, so to start independent work with the library, you need to install patches that will allow you to wait for the official patch. More problem is described here . However, for this guide, the data was downloaded in advance, so that problems with its study will not arise.

It is also important to understand that despite the fact that pandas-datareader is a handy tool for loading data, it is not the only one for Python. You can also use libraries like Quandl, which allows you to receive data from the Google Finance service:

 import quandl aapl = quandl.get("WIKI/AAPL", start_date="2006-10-01", end_date="2012-01-01") 

Also, many people know that in the field of finance Excel is very popular for data analysis. For the convenience of future work, you can integrate this tool with Python (for more information , click here ).

Work with time series data


To import data, we used pandas_datareader. As a result, the aapl object appeared - this is a DataFrame, that is, a two-dimensional named data structure with columns of potentially different types. The first thing to do when working with such a frame is to run the head () and tail () functions in order to look at the first and last columns of the data frame. To obtain a useful statistical summary of the downloaded data, use the describe () function.

An example of this code can be found on the source page .

The data contains four columns with the price of opening and closing of the trading period, as well as the maximum and minimum price - we consider the daily intervals and stocks of Apple. We also get two additional columns: Volume and Adj Close. The first one is used to fix the number of shares with which deals were made on a trading day. The second column is the “adjusted closing price”, which means that all actions with stocks that could have been made before the opening of the next trading day were added to the closing price of the period.

If you need to save data to a CSV file, you can do this with the help of the to_csv () function, and you can read the file with read_csv () - this is useful for situations when the data source changes and access to it is temporarily lost.

 import pandas as pd aapl.to_csv('data/aapl_ohlc.csv') df = pd.read_csv('data/aapl_ohlc.csv', header=0, index_col='Date', parse_dates=True) 

After a basic analysis of the downloaded data, it's time to move on. To do this, you can, for example, examine the indices and columns by selecting, for example, the ten last rows of a specific column. This is called subsetting, since only a small set of available data is taken. The resulting subset is a series, that is, a one-dimensional named array.

In order to look at the index and data columns, use the attributes index and columns. You can then select a subset of the ten most recent observations in column. To isolate these values, use square brackets. The last value is placed in the ts variable, and its type is checked using the type () function.

 # Inspect the index aapl.index # Inspect the columns aapl.columns # Select only the last 10 observations of `Close` ts = aapl['Close'][-10:] # Check the type of `ts` type(ts) 

Using square brackets is convenient, but this is not the most characteristic way of working with Pandas. Therefore, it is also worth considering the functions loc () and iloc (): the first one is used for label-based indexing, and the last one is used for positional indexing.

In practice, this means that you can pass a label of a series like 2007 or 2006-11-01 to the loc () function, and whole numbers like 22 or 43 are passed to the iloc () function.

 # Inspect the first rows of November-December 2006 print(aapl.loc[pd.Timestamp('2006-11-01'):pd.Timestamp('2006-12-31')].head()) # Inspect the first rows of 2007 print(aapl.loc['2007'].head()) # Inspect November 2006 print(aapl.iloc[22:43]) # Inspect the 'Open' and 'Close' values at 2006-11-01 and 2006-12-01 print(aapl.iloc[[22,43], [0, 3]]) 

If you take a close look at the results of the partitioning procedure, you will see that certain days are missing in the data. Further analysis of the pattern will show that usually two or three days are missing. These are weekends and public holidays, during which there is no exchange trading.

In addition to indexing, there are several ways to learn more about data. You can, for example, try to create a sample of 20 lines of data, and then reformat them so that appl becomes a non-daily value and monthly. This can be done using the sample () and resample () functions:

 # Sample 20 rows sample = aapl.sample(20) # Print `sample` print(sample) # Resample to monthly level monthly_aapl = aapl.resample('M').mean() # Print `monthly_aapl` print(monthly_aapl) 

Before proceeding to data visualization and financial analysis, you can begin to calculate the difference between the opening and closing prices of the trading period. This arithmetic operation can be done using Pandas — subtract the values ​​of the Open column of the appl data from the Close column. Or, in other words, subtract aapl.Close from aapl.Open. The result will be stored in a new column in the aapl data frame, called diff, which can be removed using the del function:

 # Add a column `diff` to `aapl` aapl['diff'] = aapl.Open - aapl.Close # Delete the new `diff` column del aapl['diff'] 

The resulting absolute values ​​may already be useful in developing a financial strategy, but usually a more in-depth analysis is required, for example, the percentage values ​​of a rise or fall in the price of a certain stock.

Time Series Data Visualization


In addition to analyzing data using the functions head (), tail () and indexing, it is also possible to visualize them. Thanks to the integration of Pandas with the Matplotlib graphing tool, this can be done quite easily. It is only necessary to use the plot () function and pass it the relevant parameters. In addition, if you add the grid parameter, the resulting graph will be superimposed on the grid.

 # Import Matplotlib's `pyplot` module as `plt` import matplotlib.pyplot as plt # Plot the closing prices for `aapl` aapl['Close'].plot(grid=True) # Show the plot plt.show() 

This code gives the following schedule:



The next part of this tutorial focuses on the financial analysis of time series data using Python.

To be continued…..

Other materials on finance and stock market from ITinvest :


Source: https://habr.com/ru/post/331542/


All Articles