In this article we will discuss a small program that repost tweets to the status in VKontakte.
The task is quite simple and completely unoriginal. It all started with the fact that I read an article on Habré on how it is
solved on python and a similar
article on php . On the Internet, it seems that even some online services are specifically for this task. But here the whole tsimus is to solve this simple task by yourself, using your favorite tools. Actually the decision on php appeared later and with the same purpose.
Well, what did I write on? On haskell, natĂĽrlich!
Further I will tell you in detail about how I did everything and how to repeat it. Perhaps no special knowledge is required for understanding.
IntroductionThose two articles and
an article about reposting from rss in the livejournal on Haskella helped me in implementing the solution.
At first, I honestly wanted to do work with twitter via
twitter-api : I poked the appropriate
library from hackage , but it didn’t work right away and I left it - I wanted to get the result quickly and I was too lazy to dig and understand what I was doing wrong. And since twitter is broadcast on rss and reading rss on haskell 'e is already a solved problem, I went this way.
Moreover, it is a more versatile solution. You can broadcast any rss channel to VKontakte. You can even say that this is not twitter2vkontakte, but rss2vkontakte.
In addition, I used
vkontakte-api , and not parsed the page in search of status, like my predecessors. I think this is a plus.
')
The rest is literate-haskell code. That is, not a code with comments, but detailed comments with pieces of code that are regular sources on haskell. This post can be simply saved entirely to a file with the .lhs extension and fed to the interpreter / compiler. Everything should work fine.
All the working code here is highlighted here with these characters:
>
Necessary preparationsIt is assumed that you already have a Haskell compiler and the main set of libraries. If not, then it is easy to fix - you need to install
Haskell Platform . It is very simple.
Now, to install additional libraries, just type in the console:
cabal update
cabal install regex-tdfa curl feed utf8-string
Next is a short list of imports with brief explanations.
A couple of times I used regular expressions:
> import Text.Regex.TDFA ( (=~) )
Once cut and glue the list:
> import Data.List ( intercalate )
For all online requests, I used the curl library:
> import Network.Curl ( curlGetString )
> import Network.Curl.Opts
Read and parsil rss-feeds:
> import Text.Feed.Import ( parseFeedString )
> import Text.Feed.Query ( getFeedItems , getItemSummary )
And even once encoded a string into Unicode:
> import Codec.Binary.UTF8.String ( encodeString )
Further there will be a more meaningful code with more meaningful and, perhaps, in places with too detailed explanations ...
Twitter via rssThe first thing we need is the address of our rss-feed tweets. It can be taken on your Twitter account. Let's create for it a separate constant:
> feedUrl = "https://twitter.com/statuses/user_timeline/22251772.rss"
How to pick up rss-feeds and parsit, I peeped in the
article about rss2lj . But I did not use this library. Everything there is certainly well done, but I need one simple function that will download the rss feed, take the first element and extract its content. And that's how I did it:
> getTweet :: IO String
> getTweet = do
> (_,feed) <- curlGetString feedUrl []
> return $ getMsg $ head $ getItems feed
> where
> getItems = maybe (error "rss parsing failed!" ) getFeedItems . parseFeedString
> getMsg = maybe (error "rss-item parsing failed!" ) format . getItemSummary
> format = unwords . ( "twitter:" : ) . tail . words . encodeString
I will explain what happens in it. The
curlGetString :: URLString -> [CurlOption] -> IO (CurlCode, String)
takes a url address, a list of options, and issues an operation code (
CurlOk
, if everything went well) and a server response. In this case, we specify our twitter-rss feed as the address, and do not give any options. We do not pay attention to the completion code. And here we call the content part of the answer feed.
The next line should be read from right to left: we retrieve the elements of the feed (
getItems feed
), we get the list, we take the first element (
head
) from it, we extract the message from it (
getMsg
) and return it to the output.
And now in more detail about these functions, in the same order. Each of them is written in point-free-style, that is, without specifying an argument, simply as a composition (period.) Other functions.
The composition can also be read from right to left, by points (: that is, in the order of application of functions:
getItems
first uses the
parseFeedString
function (from the Feed library), it has the type (
String -> Maybe Feed
), that is, it receives a string with any porridge at the input from rss-tags, and gives an abstract type of feed, with which you can already do something. Since the
Maybe Feed
value is returned (“Maybe a feed”), it may happen that the parser chokes and returns
Nothing
- then we give an error with the text «rss parsing failed!». If the parse goes well, we get a value of (
Just
), and then applied to not in function
getFeedItems
, which is extracted from the feed elements in a list. This branching (
Nothing
or
Just ...
) implemented a standard feature
maybe
.
After work
getItems
we get a list of feed items:
[Item]
. We only need the first one (that is, the last one by date). We take its function
head
. And now we want to pick out the message text from it:
getMsg
.
This function has a structure similar to
getItems
: first,
getItemSummary
is used, which returns a
Maybe String
. If the content could not be retrieved, we give the corresponding error. Otherwise, format the received message.
Formatting is done briefly as follows (again from right to left): encode a string in unicode, break into words (by spaces), throw out the first word, insert “twitter:” instead of it (optional), glue back all the words into one line . The first word in rss-tweets is always your nickname. Therefore, we throw him out.
That's all with rss. I may have described everything in too great detail, but I think, for the curious, strangers with haskell, this description was informative.
Vkontakte apiFirst of all, let's get some constants for working with VKontakte:
> email = " e-mail"
> uid = " user-id "
> pass = " "
This is the data corresponding to your registration in VKontakte.
All operations are performed by GET requests to the server (all the same function
curlGetString
), with the corresponding tricky addresses. They are built as follows:
base address (for example,
userapi.com/data ?) plus a list of parameters in the form key = value, separated by ampersands &.
To form such addresses, we will write a couple of auxiliary functions:
> param :: (String, String) -> String
> param (key, value) = key ++ "=" ++ value ++ "&"
This function simply takes a pair (key, value) and makes it a string of the desired format.
> formUrl :: String -> [(String, String)] -> String -> String
> formUrl base opts sid = base ++ ( concatMap param (opts ++ [( "id" ,uid)]) ) ++ sid
We form the url of the desired format from the base address of the
base
, the list of options
opts
(in the form of pairs), and the session identifier
sid
(about it later).
The content part is in parentheses:
map
takes a function and a list, and applies the function to each element of the list. That is, from the list of pairs
(, )
, it makes the list of strings
"=&"
. A
concat
simply sticks all these strings together (
concatMap = concat . map
).
For different tasks, the set of options is different, but in all cases you need to specify a user ID (
uid
), so in order not to write this option every time, we add it in the definition of this function.
In order to somehow work with VKontakte, you must first log in. Then the server will give us cookies and session ID (sid = session id). I did not use cookies, but sid is needed for almost any operation with receiving / changing user data.
> login :: IO String
> login = do
> (_,headers) <- curlGetString authUrl [CurlHeader True]
> return ( headers =~ "sid=[a-z0-9]*" :: String )
> where
> authUrl = formUrl "http://login.userapi.com/auth?"
> [( "site" , "2" ), ( "fccode" , "0" ),
> ( "fcsid" , "0" ), ( "login" , "force" ),
> ( "email" ,email), ( "pass" ,pass)] ""
The authentication address has a bunch of options, the purpose of which I did not understand, but I took it from the documentation and nothing works without them. We form this address using the newly written function
formUrl
, and our email and password are inserted in the last two options. And the sid parameter is left empty - we don’t have one yet, and for this very purpose we wrote the
login
function.
What happens in it: a curl request is sent to the address
authUrl
, which returns headers
headers
(the
CurlHeader
option is set for this). They actually cookies, redirect address and something else. Here is the address where the server sends us, and what we are looking for is hidden. With the help of a secret regular expression technique, the coveted session id, of the form "sid = 35dfe55b09b599c9fx622fcx8cd83a37", is pulled out of
headers
.
I will not dwell on regular expressions in haskell - this is a separate topic. We can assume that this is just a search for a substring of the desired type.
Wonderful! sid we got, now we have all the possibilities of api vkontakt open. For our task, only one is needed - status change.
In principle, any interaction with VKontakte will be released to the following team:
(_,answer) <- curlGetString someUrl []
where
someUrl
is the corresponding request (see the documentation), and
answer
is the server response. Here is the status change request:
> setActivityUrl :: String -> String -> String
> setActivityUrl text = formUrl "http://userapi.com/data?" [( "act" , "set_activity" ), ( "text" , text)]
Note that the third parameter of the
formUrl
function,
sid
, is not specified. This is a partial application - the function has 3 parameters, and we gave only 2, it means we have a function of the remaining one parameter. That is,
setActivityUrl
is a function not only of the
text
parameter (the actual status itself), but also of the second parameter
sid
, which appears to be written to the right.
One more trifle: there will be spaces in the tweet text, which is unacceptable for a url request. Therefore, we will make a simple function, replacing all spaces, by% 20:
> escSpaces = intercalate "%20" . words
It breaks the line into a list of words, inserts the string "% 20" between the adjacent elements of this list, and then sticks it all together again into one line (the last two actions are done by the
intercalate
function).
Now we can assemble from the parts already discussed, the status change function:
> setStatus :: String -> String -> IO ()
> setStatus text sid = do
> (_,answer) <- curlGetString url []
> if answer =~ " \" ok \" :1" :: Bool
> then putStrLn text
> else error "something is bad with vkontakte-api..."
> where
> url = setActivityUrl (escSpaces text) sid
It would be easier to write this function in one line:
setStatus text sid = curlGetString (setActivityUrl (escSpaces text) sid) []
But the first option is clearer, checking the server’s response is done there - if the answer contains
"ok":1
, then everything is fine - the status has changed, which we inform the user (that is, that is).
Everything! Now we have all the pieces of the mosaic and it is very easy to assemble it.
mainWhat these functions were written for:
> main = do
> tweet <- getTweet
> sid <- login
> setStatus tweet sid
It looks extremely simple, isn’t it? There are no comments here.
I think that all the other functions look quite understandable with my explanations.
Statistics for: ~ 40 LinesOfCode.
ConclusionTo run this code, you need as already mentioned, just save the entire post to a file with the .lhs extension and type in the console:
runhaskell _.lhs
That's all.
I don’t know if I need a sequel telling how to automate this launch.
Personally for myself, I (as a Mac OS X user) decided this by creating a “Service” in Automator and by assigning a hot key, to quickly call it, this is just launch automation, but for me it is enough.
I hope it was interesting for someone to read. Waiting for your questions / suggestions / objections (:
upd: moved to the thematic blog.