Character content other than whitespace is not allowed because the content type is 'element-only'. Tag - <cite> , 245 times.
empty tag . Tag - <td> , 19 times.
However, my Kindle file (after converting to .mobi) perfectly displays the resulting file.
And from an informal point of view?
UPD: Fixed.I had a bug.Download version from repository. There is a strange bug with the disappearance of the spaces around the tags inside the text. That is, the HTML code of yet another bicycle turns into yet another bicycle . This is probably a BeautifulSoup bug, but maybe I have a bug somewhere.
Anything interesting in implementation?
Not really. All parsing comes down to the right library calls. After it, I download all the pictures and replace tags. on <image l:href="#image_id"/> . Then with the help of a small set of crutches I rearrange the parsing trees. Delete some tags, replace others, insert third ones. Finally, I collect all this together, add a header, a cellar and write to the file. The only not quite trivial moment is the replacement of <br> tags with . - . - conversion.py , make_paragraphs .. - . - conversion.py , make_paragraphs .
Quite possible. I am only exploring the possibilities of BeautifulSoup, so for sure some things did not go the way it is done. Write in the comments your point of view, discuss.