📜 ⬆️ ⬇️

Predicting Events and Data Mining - Forward to the Future



An interesting open source information monitoring service appeared on the Web - Recorded Future .

It allows you to accumulate information from more than 150,000 different media with the ability to store an archive of up to 5 years with the possibility of subsequent analysis and extraction of knowledge about the possible consequences of the incident and future events.
')
The author of the service is Chris Holden, who kindly offered us to use Recorded Future without making payment, although the full functionality is available only on a commercial basis.

For example, now the service carries out continuous monitoring of more than 8,000 political leaders from various countries of the world, allowing you to keep track of where and why a famous figure will go. Sometimes, a good analytics of these events allows you to establish relationships in international relations and predict the most likely models of their development by analyzing the travel history of the selected person.

The most interesting cases demonstrating the capabilities of the system are reflected in the following application examples:

- tracking emerging cyber threats and hacker actions in the world
- analysis of the contents of letters from the circle of close Osama Bin Laden
- analysis of protest activity
- analysis of elections in Greece and Egypt

Recorded Future in action

The application of the service has wider boundaries than the use in analyzing the geopolitical situation, terrorism and protest activity. It is successfully suitable for monitoring corporate news, information on competing companies, their products and the mechanisms of their consecration in the press.

Analytics allows you to track events related to the emergence of any new technology, entering into contracts, changing members of the board of directors or key persons of the company, which is already a very powerful and convenient analytical tool with the ability to evaluate emotional coloring (positive, negative) :

Futures - “What Apple has outlined for 2012/2013 year”



The service offers a paid API ( http://code.google.com/p/recordedfuture/wiki/RecordedFutureAPI ), allowing the flexibility to set tags for tracking by specified criteria, including geography:

The forecast of protest activity in August 2012 against the Russian Federation



Example of creating a query (Python):

import urllib, json, datetime, zlib, sys, time def query(q, usecompression=True): """      JSON- """ try: url = 'http://api.recordedfuture.com/ws/rfq/instances?%s' if usecompression: url = url + '&compress=1' for i in range(3): try: data = urllib.urlopen(url % urllib.urlencode({"q":q})) if type(data) != str: data = data.read() if usecompression: data = zlib.decompress(data) break except: print >>sys.stderr, "Retrying failed API call." time.sleep(1) res = json.loads(data) if res['status'] != "SUCCESS": print >>sys.stderr, "Error",str(res['errors']) return res except Exception, e: print str(e) return {'status': 'FAILURE', 'errors': str(e)} 


The idea used in the service is very simple - from all sources there are dates in various notations (numeric, symbolic), after which the events that are assigned to them are recorded. At the same time, it is analyzed when exactly this event will occur (“soon”, “in a few months”, “in the distant future”). The service constantly sends updates on the most interesting areas for tracking:



Using the prepared class in Python:

python company-entquery.py MYTOKEN tickerfile.txt 2010-06-14 2010-06-20 > entoutputfile.txt ,
python company-aggquery.py MYTOKEN tickerfile.txt 2010-06-14 2010-06-20 > aggrawoutputfile.txt
Where:

MYTOKEN - received API access hash;
tickerfile.txt is a special file whose directives point to the media and the resources to be analyzed.

The summary report will be an output of the form:

Ticker,Entity,Time,Count,Momentum,Positive,Negative
MSFT,33312449,2011-11-01 19:30:00,780,0.43689,0.062,0.00461
GOOG,33321272,2011-11-01 19:30:00,1707,0.72436,0.07052,0.0254
AMZN,33328212,2011-11-01 19:30:00,344,0.20139,0.05491,0.01374
CHK,33511577,2011-11-01 19:30:00,6,0.00817,0,0
MSFT,33312449,2011-11-02 19:30:00,1235,0.4538,0.04981,0.0137
GOOG,33321272,2011-11-02 19:30:00,2602,0.80317,0.06482,0.02282
AMZN,33328212,2011-11-02 19:30:00,619,0.22222,0.06884,0.00787
CHK,33511577,2011-11-02 19:30:00,45,0.02334,0,0.02581


Processing this information falls on the programmer’s shoulders, with the exception of evaluating “positive” and “negative”. The use of such a resource allows you to create a sufficiently powerful and effective tool for competitive analysis and be used for BI purposes.

Source: https://habr.com/ru/post/147145/


All Articles