📜 ⬆️ ⬇️

Import and convert LinguaLeo vocabulary to Anki flashcards

Formulation of the problem


Those who learn English are probably familiar with Anki, a program for memorizing words, expressions, and any other information using interval repetitions.

Another popular service that doesn’t need to be introduced is LinguaLeo, when reading the original text, immediately send unfamiliar words for study, storing them in your own vocabulary along with pronunciation, image, word transcription and context in which it is used. A couple of years ago, LinguaLeo introduced a system of interval repetitions, but unlike Anki, the repetition system is not as powerful and does not contain tuning capabilities.

What if we try to cross- use the advantages of the two platforms with the hedgehog ? Take the words themselves from Lingua Leo along with all the media files and information and use Anki's resources to memorize them.

Overview of solutions


There are already several ready-made solutions for the task, but all of them are not very user friendly and require a lot of additional actions: you need to create recording models, card templates, add css styles, upload media files separately, etc. etc.
')
In addition, they are all executed either as a separate program ( LinguaGet ), or as additions to the browser ( one , two ), which is also not very convenient. In an ideal situation, I wanted the user to get new cards directly from Anki itself.

Selection of tools


Anki 2.0 is written in Python 2.7 and supports the add-ons system, which are separate python modules. The stable version GUI uses PyQt 4.8.

Thus, the task can be described in one sentence: we write an addition on python that runs directly from Anki and does not require any action from the user other than entering a login and password from LinguaLeo.

Decision


The whole process can be divided into three parts:

  1. Log in and retrieve a dictionary from LinguaLeo.
  2. Convert the data to Anki cards.
  3. Creating a graphical interface.

Data import


LinguaLeo does not contain the official API, but by digging through the browser add-on code you can find two necessary addresses:

userDictUrl = "http://lingualeo.com/userdict/json"
loginURL = "http://api.lingualeo.com/api/login"


The authorization procedure is nothing complicated and is well described in this article on Habré .

JSON dictionary contains a hundred words on each page. Using urlencode, we pass in the filter parameter the value all, and in the page parameter the number of the desired page, we go through each of them and save our dictionary.

The code of the authorization and import dictionary module
 import json import urllib import urllib2 from cookielib import CookieJar class Lingualeo: def __init__(self, email, password): self.email = email self.password = password self.cj = CookieJar() def auth(self): url = "http://api.lingualeo.com/api/login" values = {"email": self.email, "password": self.password} return self.get_content(url, values) def get_page(self, page_number): url = 'http://lingualeo.com/ru/userdict/json' values = {'filter': 'all', 'page': page_number} return self.get_content(url, values)['userdict3'] def get_content(self, url, values): data = urllib.urlencode(values) opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(self.cj)) req = opener.open(url, data) return json.loads(req.read()) def get_all_words(self): """ The JSON consists of list "userdict3" on each page Inside of each userdict there is a list of periods with names such as "October 2015". And inside of them lay our words. Returns: type == list of dictionaries """ words = [] have_periods = True page_number = 1 while have_periods: periods = self.get_page(page_number) if len(periods) > 0: for period in periods: words += period['words'] else: have_periods = False page_number += 1 return words 


Convert the data to Anki cards


The data in Anka is stored as follows: the collection contains models that include css-style, a list of fields and html- design templates for cards (front and back).

In our case, there will be five fields: the word itself, translation, transcription, link to the file with pronunciation, link to the file with the image.

Filling in all the fields we get a record . In turn, two cards can be made from the record: “English - Russian” and “Russian - English”

flash-card

Cards lie in decks ( Koshchei dies in an egg, and an egg in a casket ).

In addition to the above, we will need functions that download audio and images.

Module code with utilities for creating models, records, etc.
 import os from random import randint from urllib2 import urlopen from aqt import mw from anki import notes from lingualeo import styles fields = ['en', 'transcription', 'ru', 'picture_name', 'sound_name', 'context'] def create_templates(collection): template_eng = collection.models.newTemplate('en -> ru') template_eng['qfmt'] = styles.en_question template_eng['afmt'] = styles.en_answer template_ru = collection.models.newTemplate('ru -> en') template_ru['qfmt'] = styles.ru_question template_ru['afmt'] = styles.ru_answer return (template_eng, template_ru) def create_new_model(collection, fields, model_css): model = collection.models.new("LinguaLeo_model") model['tags'].append("LinguaLeo") model['css'] = model_css for field in fields: collection.models.addField(model, collection.models.newField(field)) template_eng, template_ru = create_templates(collection) collection.models.addTemplate(model, template_eng) collection.models.addTemplate(model, template_ru) model['id'] = randint(100000, 1000000) # Essential for upgrade detection collection.models.update(model) return model def is_model_exist(collection, fields): name_exist = 'LinguaLeo_model' in collection.models.allNames() if name_exist: fields_ok = collection.models.fieldNames(collection.models.byName( 'LinguaLeo_model')) == fields else: fields_ok = False return (name_exist and fields_ok) def prepare_model(collection, fields, model_css): """ Returns a model for our future notes. Creates a deck to keep them. """ if is_model_exist(collection, fields): model = collection.models.byName('LinguaLeo_model') else: model = create_new_model(collection, fields, model_css) # Create a deck "LinguaLeo" and write id to deck_id model['did'] = collection.decks.id('LinguaLeo') collection.models.setCurrent(model) collection.models.save(model) return model def download_media_file(url): destination_folder = mw.col.media.dir() name = url.split('/')[-1] abs_path = os.path.join(destination_folder, name) resp = urlopen(url) media_file = resp.read() binfile = open(abs_path, "wb") binfile.write(media_file) binfile.close() def send_to_download(word): picture_url = word.get('picture_url') if picture_url: picture_url = 'http:' + picture_url download_media_file(picture_url) sound_url = word.get('sound_url') if sound_url: download_media_file(sound_url) def fill_note(word, note): note['en'] = word['word_value'] note['ru'] = word['user_translates'][0]['translate_value'] if word.get('transcription'): note['transcription'] = '[' + word.get('transcription') + ']' if word.get('context'): note['context'] = word.get('context') picture_url = word.get('picture_url') if picture_url: picture_name = picture_url.split('/')[-1] note['picture_name'] = '<img src="%s" />' % picture_name sound_url = word.get('sound_url') if sound_url: sound_name = sound_url.split('/')[-1] note['sound_name'] = '[sound:%s]' % sound_name return note def add_word(word, model): collection = mw.col note = notes.Note(collection, model) note = fill_note(word, note) collection.addNote(note) 


GUI creation


Since our addition seeks to minimalism, we only need:


image

Graphical interface
 class PluginWindow(QDialog): def __init__(self, parent=None): QDialog.__init__(self, parent) self.initUI() def initUI(self): self.setWindowTitle('Import From LinguaLeo') # Window Icon if platform.system() == 'Windows': path = os.path.join(os.path.dirname(__file__), 'favicon.ico') loc = locale.getdefaultlocale()[1] path = unicode(path, loc) self.setWindowIcon(QIcon(path)) # Buttons and fields self.importButton = QPushButton("Import", self) self.cancelButton = QPushButton("Cancel", self) self.importButton.clicked.connect(self.importButtonClicked) self.cancelButton.clicked.connect(self.cancelButtonClicked) loginLabel = QLabel('Your LinguaLeo Login:') self.loginField = QLineEdit() passLabel = QLabel('Your LinguaLeo Password:') self.passField = QLineEdit() self.passField.setEchoMode(QLineEdit.Password) self.progressLabel = QLabel('Downloading Progress:') self.progressBar = QProgressBar() self.checkBox = QCheckBox() self.checkBoxLabel = QLabel('Unstudied only?') # Main layout - vertical box vbox = QVBoxLayout() # Form layout fbox = QFormLayout() fbox.setMargin(10) fbox.addRow(loginLabel, self.loginField) fbox.addRow(passLabel, self.passField) fbox.addRow(self.progressLabel, self.progressBar) fbox.addRow(self.checkBoxLabel, self.checkBox) self.progressLabel.hide() self.progressBar.hide() # Horizontal layout for buttons hbox = QHBoxLayout() hbox.setMargin(10) hbox.addStretch() hbox.addWidget(self.importButton) hbox.addWidget(self.cancelButton) hbox.addStretch() # Add form layout, then stretch and then buttons in main layout vbox.addLayout(fbox) vbox.addStretch(2) vbox.addLayout(hbox) # Set main layout self.setLayout(vbox) # Set focus for typing from the keyboard # You have to do it after creating all widgets self.loginField.setFocus() self.show() 


Adventure begins when, in addition to the designated, you want to create a progress bar so that the user does not think that the addition has died during the download process.

For the progress of the bar to work without freezing the GUI, you need to create a separate thread (using QThread), where all data would be loaded and cards created, and only a progress counter would be sent to the graphical interface. Here we are in trouble - the information in Anki is stored in the SQlite database and the program does not allow changing it from outside the main thread. Solution: the main costly task is to download media files and transfer data to the progress bar in the second thread, while the main fields are filled and records are saved. Thus, we get a working progress line without a frozen interface.

All GUI module code
 # -*- coding: utf-8 -*- import locale import os import platform import socket import urllib2 from anki import notes from aqt import mw from aqt.utils import showInfo from PyQt4.QtGui import (QDialog, QIcon, QPushButton, QHBoxLayout, QVBoxLayout, QLineEdit, QFormLayout, QLabel, QProgressBar, QCheckBox) from PyQt4.QtCore import QThread, SIGNAL from lingualeo import connect from lingualeo import utils from lingualeo import styles class PluginWindow(QDialog): def __init__(self, parent=None): QDialog.__init__(self, parent) self.initUI() def initUI(self): self.setWindowTitle('Import From LinguaLeo') # Window Icon if platform.system() == 'Windows': path = os.path.join(os.path.dirname(__file__), 'favicon.ico') loc = locale.getdefaultlocale()[1] path = unicode(path, loc) self.setWindowIcon(QIcon(path)) # Buttons and fields self.importButton = QPushButton("Import", self) self.cancelButton = QPushButton("Cancel", self) self.importButton.clicked.connect(self.importButtonClicked) self.cancelButton.clicked.connect(self.cancelButtonClicked) loginLabel = QLabel('Your LinguaLeo Login:') self.loginField = QLineEdit() passLabel = QLabel('Your LinguaLeo Password:') self.passField = QLineEdit() self.passField.setEchoMode(QLineEdit.Password) self.progressLabel = QLabel('Downloading Progress:') self.progressBar = QProgressBar() self.checkBox = QCheckBox() self.checkBoxLabel = QLabel('Unstudied only?') # Main layout - vertical box vbox = QVBoxLayout() # Form layout fbox = QFormLayout() fbox.setMargin(10) fbox.addRow(loginLabel, self.loginField) fbox.addRow(passLabel, self.passField) fbox.addRow(self.progressLabel, self.progressBar) fbox.addRow(self.checkBoxLabel, self.checkBox) self.progressLabel.hide() self.progressBar.hide() # Horizontal layout for buttons hbox = QHBoxLayout() hbox.setMargin(10) hbox.addStretch() hbox.addWidget(self.importButton) hbox.addWidget(self.cancelButton) hbox.addStretch() # Add form layout, then stretch and then buttons in main layout vbox.addLayout(fbox) vbox.addStretch(2) vbox.addLayout(hbox) # Set main layout self.setLayout(vbox) # Set focus for typing from the keyboard # You have to do it after creating all widgets self.loginField.setFocus() self.show() def importButtonClicked(self): login = self.loginField.text() password = self.passField.text() unstudied = self.checkBox.checkState() self.importButton.setEnabled(False) self.checkBox.setEnabled(False) self.progressLabel.show() self.progressBar.show() self.progressBar.setValue(0) self.threadclass = Download(login, password, unstudied) self.threadclass.start() self.connect(self.threadclass, SIGNAL('Length'), self.progressBar.setMaximum) self.setModel() self.connect(self.threadclass, SIGNAL('Word'), self.addWord) self.connect(self.threadclass, SIGNAL('Counter'), self.progressBar.setValue) self.connect(self.threadclass, SIGNAL('FinalCounter'), self.setFinalCount) self.connect(self.threadclass, SIGNAL('Error'), self.setErrorMessage) self.threadclass.finished.connect(self.downloadFinished) def setModel(self): self.model = utils.prepare_model(mw.col, utils.fields, styles.model_css) def addWord(self, word): """ Note is an SQLite object in Anki so you need to fill it out inside the main thread """ utils.add_word(word, self.model) def cancelButtonClicked(self): if hasattr(self, 'threadclass') and not self.threadclass.isFinished(): self.threadclass.terminate() mw.reset() self.close() def setFinalCount(self, counter): self.wordsFinalCount = counter def setErrorMessage(self, msg): self.errorMessage = msg def downloadFinished(self): if hasattr(self, 'wordsFinalCount'): showInfo("You have %d new words" % self.wordsFinalCount) if hasattr(self, 'errorMessage'): showInfo(self.errorMessage) mw.reset() self.close() class Download(QThread): def __init__(self, login, password, unstudied, parent=None): QThread.__init__(self, parent) self.login = login self.password = password self.unstudied = unstudied def run(self): words = self.get_words_to_add() if words: self.emit(SIGNAL('Length'), len(words)) self.add_separately(words) def get_words_to_add(self): leo = connect.Lingualeo(self.login, self.password) try: status = leo.auth() words = leo.get_all_words() except urllib2.URLError: self.msg = "Can't download words. Check your internet connection." except ValueError: try: self.msg = status['error_msg'] except: self.msg = "There's been an unexpected error. Sorry about that!" if hasattr(self, 'msg'): self.emit(SIGNAL('Error'), self.msg) return None if self.unstudied: words = [word for word in words if word.get('progress_percent') < 100] return words def add_separately(self, words): """ Divides downloading and filling note to different threads because you cannot create SQLite objects outside the main thread in Anki. Also you cannot download files in the main thread because it will freeze GUI """ counter = 0 problem_words = [] for word in words: self.emit(SIGNAL('Word'), word) try: utils.send_to_download(word) except (urllib2.URLError, socket.error): problem_words.append(word.get('word_value')) counter += 1 self.emit(SIGNAL('Counter'), counter) self.emit(SIGNAL('FinalCounter'), counter) if problem_words: self.problem_words_msg(problem_words) def problem_words_msg(self, problem_words): error_msg = ("We weren't able to download media for these " "words because of broken links in LinguaLeo " "or problems with an internet connection: ") for problem_word in problem_words[:-1]: error_msg += problem_word + ', ' error_msg += problem_words[-1] + '.' self.emit(SIGNAL('Error'), error_msg) 


Add error handling and that's it. Addition is ready to go.

There are plans to rewrite the plugin for the third python for the new, still beta versions of Anka and to use asynchronous loading of media files to speed up work.

BitBucket source code.
Plugin page on Anki Add-ons forum .

Source: https://habr.com/ru/post/345864/


All Articles