Page MenuHomePhabricator

Provide EPUB sanitizer
Open, LowPublicFeature

Description

Author: stf

Description:
EPUB is a open format for E-Books. Even though it is not really easy to create, its xml-based design enables a broad use. I expect a lot of wikimedia-related epubs, e.g. from wikipedia, wikisource or wikibook pages, which would be nice to store right in the projects near by its source.


Version: unspecified
Severity: enhancement

Details

Reference
bz17858

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:31 PM
bzimport set Reference to bz17858.
bzimport added a subscriber: Unknown Object (MLST).

jeluf wrote:

EPUB is a ZIP file containing (X)HTML files. We should not distribute these without sanitizing them first. Even though Javascript is not part of the EPUB specification, we can't be sure that browser plugins properly disable the browser's Javascript engine.

> changed bug summary, keywords, product

Might be interesting, but as noted would need some special support for inline reading and sanitation etc.

There exists a tool to validate such files at http://code.google.com/p/epubcheck/ which might be useful here.

I'm resetting the priority field. You really shouldn't be touching those unless you're a developer, and you definitely shouldn't mess with them without an explanation as to why.

It would be really great if Commons could support EPUB, it is one of the main book formats on Project Guttenberg which has 100,000 public domain books available with it's partners. Also as far as I understand fixing the validation problem for EPUB will also fix the same problem for the OpenDocument format.

I agree abut the need of a Commons strong support for ePub files. Commons can be seen as a shared multimedia repository, and books too are "media". In my vision, wikisource projects should be considered "the typographies" and Commons "the library"; a central library could be managed with robust librarian tecniques joining best skills of mediawiki people.

There's another way to to this, with a creative use of existing DjVu files; I'm testing such a fuzzy idea, is here anyone interested about?

@Alex_brollo, I think plenty people would be interested and that mostly things like this are currently held back by the lack of people able to work on it, so if you want to experiment with this, by all means go ahead.

It would be nice to be able to export a pandoc friendly mediawiki format, and thus be able to create several formats including epub.

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:01 AM