How I earned $ 500K on machine learning and high-frequency trading - Part 2

From the translator: Continuation of the translation of the article ( part 1 ), which captured my attention and pleased the audience, about a guy who, using his technological skills, was able to earn half a million dollars in a year.

Creating a full-fledged trading simulator

So, I had a framework that allowed me to test and optimize indicators. But I had to do something more - I needed a framework that would allow me to test and optimize the entire trading system as a whole; one in which I could send commands and open positions. In this case, I would be able to optimize the overall profits and losses and - to a certain extent - the average profits and losses during one trading session.
')
It would be difficult to create such a framework - in a sense, it is not even possible to simulate exactly, but I did everything I could. And here are some questions that I had to face:

When a team goes to the market in a simulator, I have to simulate a time lag. The fact that my system “saw” the offer does not mean that it can immediately buy it. The system will send a command, wait approximately 20 milliseconds, and only if the offer is still valid will it be treated as a closed deal. This is not entirely accurate, since the duration of a real lag is not always the same and is not recorded.
When I place offers to buy or sell stocks, I need to take into account the flow of execution of transactions (which provides the API) and use it to track when my order is executed. To do this correctly, I have to track the position of my team in the queue (the queue is formed according to the first-in first-out system). And again I could not do it exactly, but I modeled the system as close to reality as possible.

To improve the simulation of execution of orders, I took logs collected during real trading using the API, and compared them with logs recorded during trading simulation for the same time period. I was able to bring my simulator to a state very close to reality, and with respect to those moments that could not be simulated exactly, I tried to make sure that the output corresponded to statistical data (for the metrics that I found important).

Ensuring profitable trades

Having a simulation model of placing orders, I was able to send commands to the exchange in simulation mode and track conditional profits and losses. But how will my system understand where and when to buy and sell?

Prediction of price behavior was the starting point for the system, but the story did not end there. Next, I developed a scoring system for each of the 5 price levels for buying and selling. These levels included one level above the domestic price of demand (for orders to buy) and one level below the internal price of proposals (for orders to sell).

If the account at any price level is above a predetermined threshold value, this means that at this level there must be an active buy / sell offer in my system. If the score is below the threshold, then any active orders should be canceled. Under these conditions, it would often be a situation when my system would suddenly place a purchase offer on the market and then immediately cancel it (in fact, I tried to minimize the likelihood of such events occurring, since for any living person such a situation on the monitor screen would seem scary annoying).

Accounts for different price levels were calculated based on the following factors:

Prediction of price behavior (which we discussed earlier)
Considered price level (internal levels mean a more likely price jump)
Number of contracts before my order in the queue (the smaller, the better)
Number of contracts after my order in the queue (the more the better)

Essentially, these factors determine the “safe” zone for buying / selling. In itself, the prediction of price behavior would be an inadequate way of assessing the situation, since it did not take into account the fact that when I placed the offer to buy, I did not close the position automatically - this happens only when someone really sells me securities. . In reality, the very fact that someone sells something to me at a given price changes the probabilistic picture of trades.

All variables used in this step were subject to optimization. It was done in the same way that I optimized the variable indicators of price changes, except that in this case I optimized the variables at the lower limit of profits and losses.

What my program ignored

When a person leads a trade, he is often seriously affected by emotions and prejudices that can lead to the adoption of sub-optimal decisions. Of course, I did not want these prejudices to have any reflection in my code. Therefore, my system ignored some factors:

Entry price for a position - In the office of a trading company, there is often talk about the price at which someone makes long and short transactions, as if this should influence the making of such decisions in the future. Despite the fact that such data is of some importance as part of a risk reduction strategy, they have nothing to do with the further development of market events. Therefore, my program completely ignored this information. It is like ignoring the irreversible costs.
Conclusion of a long / short trade - As a rule, a human trader would have a special criterion that determines where to sell a long position and where to make a short trade. However, from the point of view of my algorithm, there was no difference between these two concepts. If my algorithm was expecting a drop in prices, then the sale was a logical step regardless of the nature of the transaction: “long”, “short” or “flat”.
“Doubling” strategy - This is a generally accepted strategy, according to which traders buy more stocks if the auction is initially not in their favor. As a result, your average purchase price decreases, and this means that when (if) the price of a stock changes its course, you will “beat off” your expenses as soon as possible. I think this is just a nightmarish strategy, unless you are a Warren Buffet. You are deceiving yourself, thinking that you are doing well, because most of your transactions will burn out. The problem is that if you are unlucky, the loss will be overwhelming. Another consequence of this approach is that it becomes extremely difficult to determine whether you have really gained a market advantage, or if you are just lucky. An important quality of my program was just that I could monitor and confirm the situations where my program really had such an advantage.

Management of risks

Since my algorithm made decisions equally regardless of at what stage of events he made a deal and whether the position was currently long or short, he did occasionally open unprofitable positions and made unsuccessful deals for large sums of money (although there were also good deals for at least large amounts). However, you shouldn’t assume that I didn’t do anything to manage risk.

I rigidly set the size of the maximum position at the level of 2 contracts per unit of time, since on the days of active trading the size of the maximum position could occasionally increase. I also had a limit on the maximum amount of losses during the day in order to protect myself from any unexpected market changes, as well as from bugs in my own program. These limits were written in the code, but I additionally insured by giving instructions to my broker. By taking these precautions, I subsequently did not experience any significant problems.

Work with the algorithm

Since the start of work on my program, six months passed before I brought it to a state where it began to bring profit and I was able to test it in action. Although, to be honest, most of this time I studied a programming language. As I worked on improving the program, I recorded increasing profits in each of the next four months.

Every week I would have to re-educate my program based on data collected from the previous 4 weeks. However, I found out that this upsets the balance between finding the latest behavioral market trends and ensuring that my algorithm gets enough information to develop meaningful patterns of behavior. When learning began to take more and more time, I broke it up so that it could be performed by 8 virtual machines using Amazon EC2. Then the results were combined on my local machine.

The highest point of my trading was October 2009, when I earned almost $ 100,000. Subsequently, I spent another 4 months trying to improve my program despite the fact that profits were declining every month. Unfortunately, to date, it seems to me that I tried all my best ideas, because everything I used did not help me much.

Frustrated by the inability to improve the program and the lack of a sense of growth, I began to think about a new direction. I wrote letters to 6 different trading companies specializing in high-frequency trading and asked if they would like to buy my program and hire me to work for them. No one answered. Then I got ideas for new startups, on which I would like to work, so I finally abandoned this business.

Note : I posted this post on Hacker News, where it became very popular. I just want to say that I am not trying to protect those who are now trying to do something similar on their own. You will need a team of very smart colleagues with lots of skills just to try to compete with someone in the market. Even when I wrote my program, loners rarely achieved success (but I heard about such).

At the top of the page [ in the original post - approx. of the translator] there is a comment in which such expressions as “manipulating statistics” are encountered, and they call me one of the “investors-retailers”, about which real quanta [ quanta / quanta are used in Russian translation practice both translation versions - approx. translator] say that they need to "shoot." This is a very unfortunate comment, simply far from reality. Meanwhile, there are more interesting reviews on my article.

Note 2 : I posted a list of answers to frequently asked questions that I received from traders who read this article.

Source: https://habr.com/ru/post/211642/

All Articles