📜 ⬆️ ⬇️

Twitter reposting (or rss) to Haskell vkontakte.ru status

In this article we will discuss a small program that repost tweets to the status in VKontakte.
The task is quite simple and completely unoriginal. It all started with the fact that I read an article on Habré on how it is solved on python and a similar article on php . On the Internet, it seems that even some online services are specifically for this task. But here the whole tsimus is to solve this simple task by yourself, using your favorite tools. Actually the decision on php appeared later and with the same purpose.

Well, what did I write on? On haskell, natĂĽrlich!
Further I will tell you in detail about how I did everything and how to repeat it. Perhaps no special knowledge is required for understanding.

Introduction

Those two articles and an article about reposting from rss in the livejournal on Haskella helped me in implementing the solution.
At first, I honestly wanted to do work with twitter via twitter-api : I poked the appropriate library from hackage , but it didn’t work right away and I left it - I wanted to get the result quickly and I was too lazy to dig and understand what I was doing wrong. And since twitter is broadcast on rss and reading rss on haskell 'e is already a solved problem, I went this way.
Moreover, it is a more versatile solution. You can broadcast any rss channel to VKontakte. You can even say that this is not twitter2vkontakte, but rss2vkontakte.
In addition, I used vkontakte-api , and not parsed the page in search of status, like my predecessors. I think this is a plus.
')
The rest is literate-haskell code. That is, not a code with comments, but detailed comments with pieces of code that are regular sources on haskell. This post can be simply saved entirely to a file with the .lhs extension and fed to the interpreter / compiler. Everything should work fine.
All the working code here is highlighted here with these characters: >

Necessary preparations

It is assumed that you already have a Haskell compiler and the main set of libraries. If not, then it is easy to fix - you need to install Haskell Platform . It is very simple.

Now, to install additional libraries, just type in the console:
cabal update
cabal install regex-tdfa curl feed utf8-string

Next is a short list of imports with brief explanations.
A couple of times I used regular expressions:

> import Text.Regex.TDFA ( (=~) )

Once cut and glue the list:

> import Data.List ( intercalate )

For all online requests, I used the curl library:

> import Network.Curl ( curlGetString )
> import Network.Curl.Opts

Read and parsil rss-feeds:

> import Text.Feed.Import ( parseFeedString )
> import Text.Feed.Query ( getFeedItems , getItemSummary )

And even once encoded a string into Unicode:

> import Codec.Binary.UTF8.String ( encodeString )


Further there will be a more meaningful code with more meaningful and, perhaps, in places with too detailed explanations ...

Twitter via rss

The first thing we need is the address of our rss-feed tweets. It can be taken on your Twitter account. Let's create for it a separate constant:

> feedUrl = "https://twitter.com/statuses/user_timeline/22251772.rss"

How to pick up rss-feeds and parsit, I peeped in the article about rss2lj . But I did not use this library. Everything there is certainly well done, but I need one simple function that will download the rss feed, take the first element and extract its content. And that's how I did it:

> getTweet :: IO String
> getTweet = do
> (_,feed) <- curlGetString feedUrl []
> return $ getMsg $ head $ getItems feed
> where
> getItems = maybe (error "rss parsing failed!" ) getFeedItems . parseFeedString
> getMsg = maybe (error "rss-item parsing failed!" ) format . getItemSummary
> format = unwords . ( "twitter:" : ) . tail . words . encodeString

I will explain what happens in it. The curlGetString :: URLString -> [CurlOption] -> IO (CurlCode, String) takes a url address, a list of options, and issues an operation code ( CurlOk , if everything went well) and a server response. In this case, we specify our twitter-rss feed as the address, and do not give any options. We do not pay attention to the completion code. And here we call the content part of the answer feed.
The next line should be read from right to left: we retrieve the elements of the feed ( getItems feed ), we get the list, we take the first element ( head ) from it, we extract the message from it ( getMsg ) and return it to the output.
And now in more detail about these functions, in the same order. Each of them is written in point-free-style, that is, without specifying an argument, simply as a composition (period.) Other functions.
The composition can also be read from right to left, by points (: that is, in the order of application of functions: getItems first uses the parseFeedString function (from the Feed library), it has the type ( String -> Maybe Feed ), that is, it receives a string with any porridge at the input from rss-tags, and gives an abstract type of feed, with which you can already do something. Since the Maybe Feed value is returned (“Maybe a feed”), it may happen that the parser chokes and returns Nothing - then we give an error with the text «rss parsing failed!». If the parse goes well, we get a value of ( Just ), and then applied to not in function getFeedItems , which is extracted from the feed elements in a list. This branching ( Nothing or Just ... ) implemented a standard feature maybe .
After work getItems we get a list of feed items: [Item] . We only need the first one (that is, the last one by date). We take its function head . And now we want to pick out the message text from it: getMsg .
This function has a structure similar to getItems : first, getItemSummary is used, which returns a Maybe String . If the content could not be retrieved, we give the corresponding error. Otherwise, format the received message.
Formatting is done briefly as follows (again from right to left): encode a string in unicode, break into words (by spaces), throw out the first word, insert “twitter:” instead of it (optional), glue back all the words into one line . The first word in rss-tweets is always your nickname. Therefore, we throw him out.

That's all with rss. I may have described everything in too great detail, but I think, for the curious, strangers with haskell, this description was informative.

Vkontakte api

First of all, let's get some constants for working with VKontakte:

> email = " e-mail"
> uid = " user-id "
> pass = " "

This is the data corresponding to your registration in VKontakte.

All operations are performed by GET requests to the server (all the same function curlGetString ), with the corresponding tricky addresses. They are built as follows:
base address (for example, userapi.com/data ?) plus a list of parameters in the form key = value, separated by ampersands &.
To form such addresses, we will write a couple of auxiliary functions:

> param :: (String, String) -> String
> param (key, value) = key ++ "=" ++ value ++ "&"

This function simply takes a pair (key, value) and makes it a string of the desired format.

> formUrl :: String -> [(String, String)] -> String -> String
> formUrl base opts sid = base ++ ( concatMap param (opts ++ [( "id" ,uid)]) ) ++ sid

We form the url of the desired format from the base address of the base , the list of options opts (in the form of pairs), and the session identifier sid (about it later).
The content part is in parentheses: map takes a function and a list, and applies the function to each element of the list. That is, from the list of pairs (, ) , it makes the list of strings "=&" . A concat simply sticks all these strings together ( concatMap = concat . map ).
For different tasks, the set of options is different, but in all cases you need to specify a user ID ( uid ), so in order not to write this option every time, we add it in the definition of this function.

In order to somehow work with VKontakte, you must first log in. Then the server will give us cookies and session ID (sid = session id). I did not use cookies, but sid is needed for almost any operation with receiving / changing user data.

> login :: IO String
> login = do
> (_,headers) <- curlGetString authUrl [CurlHeader True]
> return ( headers =~ "sid=[a-z0-9]*" :: String )
> where
> authUrl = formUrl "http://login.userapi.com/auth?"
> [( "site" , "2" ), ( "fccode" , "0" ),
> ( "fcsid" , "0" ), ( "login" , "force" ),
> ( "email" ,email), ( "pass" ,pass)] ""

The authentication address has a bunch of options, the purpose of which I did not understand, but I took it from the documentation and nothing works without them. We form this address using the newly written function formUrl , and our email and password are inserted in the last two options. And the sid parameter is left empty - we don’t have one yet, and for this very purpose we wrote the login function.
What happens in it: a curl request is sent to the address authUrl , which returns headers headers (the CurlHeader option is set for this). They actually cookies, redirect address and something else. Here is the address where the server sends us, and what we are looking for is hidden. With the help of a secret regular expression technique, the coveted session id, of the form "sid = 35dfe55b09b599c9fx622fcx8cd83a37", is pulled out of headers .
I will not dwell on regular expressions in haskell - this is a separate topic. We can assume that this is just a search for a substring of the desired type.

Wonderful! sid we got, now we have all the possibilities of api vkontakt open. For our task, only one is needed - status change.
In principle, any interaction with VKontakte will be released to the following team:

(_,answer) <- curlGetString someUrl []

where someUrl is the corresponding request (see the documentation), and answer is the server response. Here is the status change request:

> setActivityUrl :: String -> String -> String
> setActivityUrl text = formUrl "http://userapi.com/data?" [( "act" , "set_activity" ), ( "text" , text)]

Note that the third parameter of the formUrl function, sid , is not specified. This is a partial application - the function has 3 parameters, and we gave only 2, it means we have a function of the remaining one parameter. That is, setActivityUrl is a function not only of the text parameter (the actual status itself), but also of the second parameter sid , which appears to be written to the right.

One more trifle: there will be spaces in the tweet text, which is unacceptable for a url request. Therefore, we will make a simple function, replacing all spaces, by% 20:

> escSpaces = intercalate "%20" . words

It breaks the line into a list of words, inserts the string "% 20" between the adjacent elements of this list, and then sticks it all together again into one line (the last two actions are done by the intercalate function).

Now we can assemble from the parts already discussed, the status change function:

> setStatus :: String -> String -> IO ()
> setStatus text sid = do
> (_,answer) <- curlGetString url []
> if answer =~ " \" ok \" :1" :: Bool
> then putStrLn text
> else error "something is bad with vkontakte-api..."
> where
> url = setActivityUrl (escSpaces text) sid

It would be easier to write this function in one line:

setStatus text sid = curlGetString (setActivityUrl (escSpaces text) sid) []

But the first option is clearer, checking the server’s response is done there - if the answer contains "ok":1 , then everything is fine - the status has changed, which we inform the user (that is, that is).
Everything! Now we have all the pieces of the mosaic and it is very easy to assemble it.

main

What these functions were written for:

> main = do
> tweet <- getTweet
> sid <- login
> setStatus tweet sid

It looks extremely simple, isn’t it? There are no comments here.
I think that all the other functions look quite understandable with my explanations.
Statistics for: ~ 40 LinesOfCode.

Conclusion

To run this code, you need as already mentioned, just save the entire post to a file with the .lhs extension and type in the console:

runhaskell _.lhs

That's all.
I don’t know if I need a sequel telling how to automate this launch.
Personally for myself, I (as a Mac OS X user) decided this by creating a “Service” in Automator and by assigning a hot key, to quickly call it, this is just launch automation, but for me it is enough.

I hope it was interesting for someone to read. Waiting for your questions / suggestions / objections (:

upd: moved to the thematic blog.

Source: https://habr.com/ru/post/85894/


All Articles