A friend recently asked me to write a bot importing news from an RSS feed on a site into a Telegram feed. The biggest advantage of this notification method are push notifications that come to each subscribed user on his device. I have long wanted to do something similar. Without thinking twice, I chose the Habr channel telegram.me/habr_ru as a model. Python was chosen as the programming language.
In the end, I had to solve the following problems:
From myself added more:
To solve this problem, it was decided to use SQLite database. SQLalchemy library was used to work with the database.
The structure is trivially simple - just one table. The object code is presented below:
class News(Base): __tablename__ = 'news' id = Column(Integer, primary_key=True) # text = Column(String) # (), link = Column(String) # . date = Column(Integer) # . . UNIX_TIME. publish = Column(Integer) # . . UNIX_TIME. chat_id = Column(Integer) # . , message_id = Column(Integer) # . . def __init__(self, text, link, date, publish=0,chat_id=0,message_id=0): self.link = link self.text = text self.date = date self.publish = publish self.chat_id = chat_id self.message_id = message_id def _keys(self): return (self.text, self.link) def __eq__(self, other): return self._keys() == other._keys() def __hash__(self): return hash(self._keys()) def __repr__(self): return "<News ('%s','%s', %s)>" % (base64.b64decode(self.text).decode(),\ base64.b64decode(self.link).decode(),\ datetime.fromtimestamp(self.publish)) #
Base64 is used to store text information and links, the Unix Timestamp was chosen as the date / time storage format.
Session data processing is carried out by a separate class.
Base = declarative_base() class Database: """ SQLAlchemy. , . . """ def __init__(self, obj): engine = create_engine(obj, echo=False) Session = sessionmaker(bind=engine) self.session = Session() def add_news(self, news): self.session.add(news) self.session.commit() def get_post_without_message_id(self): return self.session.query(News).filter(and_(News.message_id == 0,\ News.publish<=int(time.mktime(time.localtime())))).all() def update(self, link, chat, msg_id): self.session.query(News).filter_by(link = link).update({"chat_id":chat, "message_id":msg_id}) self.session.commit() def find_link(self,link): if self.session.query(News).filter_by(link = link).first(): return True else: return False
When a news is found, it is added to the database. Immediately set the time of publication.
To detect news that is ready for publication, use the get_post_withwithout_message_id
method. In fact, we select from the database all posts for which message_id=0
and the publication date is less than the current time.
To check for novelty, send a request to the database for the fact of the content of the link to the news ( find_link
method).
The update
method is used to update the data after the news is published in the channel.
I have to admit that I didn’t want to write my RSS parser at all, so the feedparser library entered the fray.
import feedparser class Source(object): def __init__(self, link): self.link = link self.news = [] self.refresh() def refresh(self): data = feedparser.parse(self.link) self.news = [News(binascii.b2a_base64(i['title'].encode()).decode(),\ binascii.b2a_base64(i['link'].encode()).decode(),\ int(time.mktime(i['published_parsed']))) for i in data['entries']]
The code is ridiculously simple. When the refresh
method is called using the generator, a list of News class objects is generated from the last 30 placed posts in the rss feed.
As mentioned above, bit.ly was chosen as the service. API does not cause unnecessary questions.
class Bitly: def __init__(self,access_token): self.access_token = access_token def short_link(self, long_link): url = 'https://api-ssl.bitly.com/v3/shorten?access_token=%s&longUrl=%s&format=json'\ % (self.access_token, long_link) try: return json.loads(urllib.request.urlopen(url).read().decode('utf8'))['data']['url'] except: return long_link
Only our access_token is passed to the init method. In case of unsuccessful receipt of an abbreviated link, the short_link
method returns the original link passed to it.
class ExportBot: def __init__(self): config = configparser.ConfigParser() config.read('./config') log_file = config['Export_params']['log_file'] self.pub_pause = int(config['Export_params']['pub_pause']) self.delay_between_messages = int(config['Export_params']['delay_between_messages']) logging.basicConfig(format = u'%(filename)s[LINE:%(lineno)d]# %(levelname)-8s \ [%(asctime)s] %(message)s',level = logging.INFO, filename = u'%s'%log_file) self.db = database(config['Database']['Path']) self.src = source(config['RSS']['link']) self.chat_id = config['Telegram']['chat'] bot_access_token = config['Telegram']['access_token'] self.bot = telegram.Bot(token=bot_access_token) self.bit_ly = bitly(config['Bitly']['access_token']) def detect(self): # 30 rss- self.src.refresh() news = self.src.news news.reverse() # . , # for i in news: if not self.db.find_link(i.link): now = int(time.mktime(time.localtime())) i.publish = now + self.pub_pause logging.info( u'Detect news: %s' % i) self.db.add_news(i) def public_posts(self): # 30 rss , message_id=0 posts_from_db = self.db.get_post_without_message_id() self.src.refresh() line = [i for i in self.src.news] # for_publishing = list(set(line) & set(posts_from_db)) for_publishing.reverse() # for post in for_publishing: text = '%s %s' % (base64.b64decode(post.text).decode('utf8'),\ self.bit_ly.short_link(base64.b64decode(post.link).decode('utf-8'))) a = self.bot.sendMessage(chat_id=self.chat_id, text=text, parse_mode=telegram.ParseMode.HTML) message_id = a.message_id chat_id = a['chat']['id'] self.db.update(post.link, chat_id, message_id) logging.info( u'Public: %s;%s;' % (post, message_id)) time.sleep(self.delay_between_messages)
When initializing with the configparser library, we read our config file and set up logging.
To detect news, use the detect
method. We get the last 30 published posts, one by one we check the presence of the link in the database.
Before publishing, you need to check the presence of posts downloaded from the database in the rss-channel. This will help us many . And after that we already publish the news with the help of the telegram library. Its functionality is quite wide and is focused on writing bots. After publishing, you need to update message_id
and chat_id
.
As a result, we obtain:
It is worth noting that if you rewrite the class rss, then you can also import news from other sources (VK, facebook, etc.).
Sources can be found on Github: https://github.com/Vispiano/rss2telegram
UPD: Yes, accidentally picked up "print" looks awful and the class names are not better than CamelCase.
Source: https://habr.com/ru/post/302688/
All Articles