📜 ⬆️ ⬇️

Quotes program data collection

The title obliges to go directly to the program code ... But, I think, the introduction is still necessary. And why, in fact, is it necessary?

Effective action on the exchange associated with a thorough analysis of what is happening on the market. What lies behind the dynamics of numbers, quotes?

Lack of such analysis, or muddled decision making on a transaction can lead to losses. I had to watch more than once how people made decisions - the right ones ... or not the right ones - in the dealing hall of the brokerage office.

Dealing halls of brokerage houses ... there exists its own, special atmosphere. Atmosphere of communication, exchange of experience, emotions. I like dealing halls. According to how a person enters a deal, traders can be divided into two groups. I will talk about those whose result is usually sad. And these traders - the majority. So - I describe the process of entering the market of the trader of the relevant group. A man of 20-60 years old shouts into the dealing room: “Where are we going ?!” Up?! Down! ”From the side of the people there are ambiguous shouts of“ Up! ” Down! ”The newcomer joins the most loudly shouting group and ... makes tyts. Tyts button purchase or sale. Everything. Now man in the market. From now on, he risks his money. From this point on, the trader does not look like a trader. He looks like a fan. Vuvuzela in the hands of such a trader, I think, would be an appropriate trading tool.
')

And now he is part of a group experiencing, and with a groan he perceives all the movements of the market. And on the news gets such a surge of adrenaline, which the guys climbing the slopes of mountain ranges, can only dream.

The result of such transactions is quite predictable. But ... is there a happier outcome? Of course. And it is associated with the analysis of data quotes. How to get this data? How to get this data in large volumes? How great that there is such a wonderful company "FINAM" and their online resource finam.ru! Servers "FINAM" provide a great opportunity - download quotes, for example, in this form ( for example ):



However, this way it is possible to download only one file at a time. What if we want more data to analyze? Much more? Almost all tools! For all periods! This will provide a wealth of opportunities for data analysis. Oh ... is it possible? Answer: yes it is possible.

In the meantime, we will determine the list of securities (instruments), as well as the main principal points that will allow us to obtain data on quotations. The list of papers (instruments) that will be provided by FINAM will be taken from here :



This page is interesting for us because it has, firstly, most of the tools that FINAM provides; secondly, web links that can go directly to the page of each security (instrument).

The links are as follows:

www.finam.ru/profile/moex-akcii/polymetal-international-plc/export
www.finam.ru/profile/moex-akcii/pllc-yandex-nv/export
www.finam.ru/profile/moex-akcii/alrosa-ao/export

By parping the corresponding page, we get the link file. Now we know where the instruments live. The file can be downloaded from this link. Why do we have a place of residence for each instrument? This parameter is still useful to us. Be patient. So far we have links to 6131 paper (instrument).

What does the FINAM server require? What are the parameters for getting data? Let's try to get one file, and see the parameters of the request. When downloading quotes from Polymetal, I have this GET request:
__http: //export.finam.ru/POLY_170620_170623.txt? market = 1 & em = 175924 & code = POLY & apply = 0 & df = 20 & mf = 5 & yf = 2017 & from = 06.06.2017 & dt = 23 &
mt = 5 & yt = 2017 & to = 06/23/2017 & p = 8 & f = POLY_170620_170623 & e = .txt & cn = POLY & dtf = 1 & tmf = 1 &
MSOR = 1 & mstime = on & mstimever = 1 & sep = 1 & sep2 = 1 & datf = 1 & at = 1

Among the entire list I would like to focus on the parameters em , market , code . The em parameter should be understood as an index, a kind of label paper (tool). If we want to download not one tool, but an array of data on several papers (tools), we need to know em of each of them. The market variable indicates where this paper (instrument) rotates - in which market? There are a lot of markets: MosBirge top ***, MosBirge peeks ***, MosBirge bonds ***, Receptions, etc. The code parameter is a symbol variable for the instrument.

So, to get the file of quotes, we need to get these three parameters: em and market and code . For all securities (instruments). The question is where to get them? Answer: remember the file with links . In the file there is, for example, the following link:

www.finam.ru/profile/moex-akcii/polymetal-international-plc/export

Go to it and in the source code of the page we will see what we need - in the javascript elements sit our desired parameters that relate to this paper (tool):

Finam.IssuerProfile.Main.issue = {"quote": {"id": 175924, "code": "POLY", "fullUrl": "moex-akcii/polymetal-international-plc", "title": "Polymetal", "decp": 1, "testDriveEnabled": false, "market": {"id": 1, "title": " ", "volumeEnabled": true},"info": {"decp": 1, "last": 680, "pchange": 1.87266, "change": 12.50001, "bid": null, "ask": null, "open": 668, "high": 686, "low": 666, "close": 667.5, "volume": 53037, "date": "05.07.2017 18:47:18", "weekMin": 653.5, "weekMax": 688, "monthMin": 653.5, "monthMax": 753, "yearMin": 572, "yearMax": 1009.5,"currency": ".","volumeCode": "."}," /*…    ,      …*/ 175924, "url": "/profile/moex-akcii/polymetal-international-plc/secondary/", }, "corporativeEvents": {"quote": 175924, "url": "/profile/moex-akcii/polymetal-international-plc/corporate/", }, "blogsAndGraphs": {"quote": 175924, "url": "__http://whotrades.com/markets/instrument/polymetal-international-plc", "count": "1", "pageSize": 1, "pageNumber": 1, "pagesCount": 1}}}; 

Note that in this piece of id code - this is em ; There is a parameter code , as well as market parameters - id and its Russian name. This piece of code with variations is present in each paper (tool). Let's go, for example, on:

www.finam.ru/profile/moex-akcii/pllc-yandex-nv/export
www.finam.ru/profile/moex-akcii/alrosa-ao/export

and see all the same. Now, I think, the general chain of data acquisition is clear: in the cycle we loop through the links where the individual papers (tools) live. Parsim javascript pieces, collecting em , market and code parameters for each position. Having this data in hand, we can programmatically access the FINAM server and receive quotation files. It remains the case for the performance technique.

What are we going to parse? Parse will be using Java. And ... from all the bikes I choose the one that stands in my garage. Namely Jsoup . Although it would be possible to use htmlunit .



Small clarification. When parsing the page, I also received data - the Russian-language name of the paper (1) and the section in which FINAM defined the paper (tool) (2) . Thus, there are three files at the input of the parser. Let me remind you that we have 6131 positions - securities (instruments). All this information, as well as the results of parsing, will be merged into one file. Parser code can be downloaded from this link .

As a result of the execution, we have the function_parameters.csv file. In the case of line-by-line reading, each line of the file can be used as a list of parameters for the FINAM server access function for quotes. The function_parameters.csv file can be downloaded from this link .

In order to write the function of accessing the server "FINAM" (and we will write it in Python), once again consider the parameters of the GET request:
__http: //export.finam.ru/POLY_170620_170623.txt? market = 1 & em = 175924 & code = POLY & apply = 0 & df = 20 & mf = 5 & yf = 2017 & from = 06.06.2017 & dt = 23 &
mt = 5 & yt = 2017 & to = 06/23/2017 & p = 8 & f = POLY_170620_170623 & e = .txt & cn = POLY & dtf = 1 & tmf = 1 &
MSOR = 1 & mstime = on & mstimever = 1 & sep = 1 & sep2 = 1 & datf = 1 & at = 1

POLY_170620_170623 - it is obvious that this line represents the parameter code , as well as temporal characteristics.

.txt - file extension; the extension is mentioned in the parameter e ; when writing a function, you should remember this nuance.

We will also take into account the contents of the source code of a page like www.finam.ru/profile/moex-akcii/gazprom/export inside the form tag (where name = "exportdata"). Characterize the indicators.

market , em , code - about these parameters, mentioned earlier, when accessing the function, their values ​​will be taken from the file.
df , mf , yf , from , dt , mt , yt , to are the time parameters.
p - period of quotations (tics, 1 min., 5 min., 10 min., 15 min., 30 min., 1 hour, 1 day, 1 week, 1 month)
e - file extension; possible options - .txt or .csv
dtf - date format (1 - yyyymmdd, 2 - yymmdd, 3 - ddmmggy, 4 - dd / mm / yy, 5 - mm / dd / yy)
tmf - time format (1 - hhmmss, 2 - hhmm, 3 - hh: mm: ss, 4 - hh: mm)
MSOR - issue time (0 - the beginning of the candle, 1 - the end of the candle)
mstimever - give time (NOT Moscow - mstimever = 0; Moscow - mstime = 'on', mstimever = '1')
sep - the field separator parameter (1 - comma (,), 2 - period (.), 3 - semicolon (;), 4 - tabulation ("), 5 - space ())
sep2 - the parameter separator (1 - no, 2 - dot (.), 3 - comma (,), 4 - space (), 5 - quotation mark ('))
datf - List of received data (# 1 - TICKER, PER, DATE, TIME, OPEN, HIGH, LOW, CLOSE, VOL; # 2 - TICKER, PER, DATE, TIME, OPEN, HIGH, LOW, CLOSE; # 3 - TICKER , PER, DATE, TIME, CLOSE, VOL; # 4 - TICKER, PER, DATE, TIME, CLOSE; # 5 - DATE, TIME, OPEN, HIGH, LOW, CLOSE, VOL; # 6 - DATE, TIME, LAST, VOL, ID, OPER).
at - add title to file (0 - no, 1 - yes)

After the list of parameters has been determined, and the sources of the data obtained have been established, we write the following function for obtaining quotes. For example, one paper - favorite Polymetal.

 # -*- coding: utf-8 -*- """ Created on Sat Jun 24 01:46:38 2017 @author: optimusqp """ import urllib code='POLY'; e='.txt'; market='1' em='175924'; e='.txt'; p='3'; yf='2017'; yt='2017'; month_start='05'; day_start='20'; month_end='06'; day_end='20'; dtf='1'; tmf='1'; MSOR='1'; mstimever='0' sep='1'; sep2='3'; datf='1'; at='1'; year_start=yf[2:]; year_end=yt[2:]; mf=(int(month_start.replace('0','')))-1; mt=(int(month_end.replace('0','')))-1; df=(int(day_start.replace('0','')))-1; dt=(int(day_end.replace('0','')))-1; def quotes(code,year_start,month_start,day_start,year_end,month_end,day_end,e,market,em,df,mf,yf,dt,mt,yt,p,dtf,tmf,MSOR,mstimever,sep,sep2,datf,at): page = urllib.urlopen('http://export.finam.ru/'+str(code)+'_'+str(year_start)+str(month_start)+str(day_start)+'_'+str(year_end)+str(month_end)+str(day_end)+str(e)+'?market='+str(market)+'&em='+str(em)+'&code='+str(code)+'&apply=0&df='+str(df)+'&mf='+str(mf)+'&yf='+str(yf)+'&from='+str(day_start)+'.'+str(month_start)+'.'+str(yf)+'&dt='+str(dt)+'&mt='+str(mt)+'&yt='+str(yt)+'&to='+str(day_end)+'.'+str(month_end)+'.'+str(yt)+'&p='+str(p)+'&f='+str(code)+'_'+str(year_start)+str(month_start)+str(day_start)+'_'+str(year_end)+str(month_end)+str(day_end)+'&e='+str(e)+'&cn='+str(code)+'&dtf='+str(dtf)+'&tmf='+str(tmf)+'&MSOR='+str(MSOR)+'&mstimever='+str(mstimever)+'&sep='+str(sep)+'&sep2='+str(sep2)+'&datf='+str(datf)+'&at='+str(at)) f = open("company_quotes.txt", "w") content = page.read() f.write(content) f.close() qq = quotes(code,year_start,month_start,day_start,year_end,month_end,day_end,e,market,em,df,mf,yf,dt,mt,yt,p,dtf,tmf,MSOR,mstimever,sep,sep2,datf,at) 

The function code can also be downloaded from this link .

What's next? Now it is possible to use this function in a cycle according to our positions. Just have, I recall, 6131 position. From the function_parameters.csv file, we load the parameters, specify the date, select the desired format. And, using this code, do not forget about the rules of good tone - put a delay in a couple of seconds in the loop iteration, so as not to overload the source server.

I think you will have plenty of data to analyze the market. I sincerely hope that customers at FINAM will only increase after writing this article!

Source: https://habr.com/ru/post/332700/


All Articles