Gutenberging

As an Aberdonian who likes his culture, Project Gutenberg for me is pretty close to heaven.  And since I’ve been kindled up, the old problem of reading books on the computer screen has fallen away.   It’s far from perfect, however: a lot of the earlier books they produced were the work of one man and his scanner, and are full of assorted mistakes.   Worse, the obvious ‘big’ books — Dickens, Shakespeare and so on — were produced first and therefore worst, but they are exactly the ones people download most often.  Problem.
Enter Distributed Proofreading.  The DP project aims to produce high quality e-texts by having a community of volunteers check the scans and mark up the formatting.   Although the volunteers aren’t qualified proofreaders, the system is meant to compensate by passing each page through several rounds (currently three proofreading rounds, two formatting rounds, plus post-processing) in each of which successively more experienced eyeballs pick over each letter and punctuation mark.

From what I’ve seen, the system works really rather well.  Pedantic book-lovers with time on their hands and a touch of OCD are pretty good at spotting stray or missing commas and exotic diacritics, or checking other editions of the work to find out whether words at the end of printed lines should stay hyphenated.  DP is now the main producer of Project Gutenberg’s books, so the more recent additions should be good quality, with html (which easily converts to kindle or epub formats) as well as text versions available.  The problem of low quality existing texts remains, though some of these are now being re-done.  Unfortunately PG by default sorts its texts by the number of downloads, so older texts with downloads already accumulated get prime position.  (Tip of the day — editions with fewer downloads are likely to be newer and better-produced).

While most of DP’s activities require signing up for an account, anyone can ‘smooth read’ a text: http://www.pgdp.net/c/tools/post_proofers/smooth_reading.php  Many books are released for this stage before being uploaded to PG, so that people can read the book as a whole and pick up on any last mistakes or oddities.  If one takes your fancy, check the ‘Days left’ column to see if you’ll have enough time to read it, then download away.  They’re generally plain text files with longish lines, so kindles are best rotated 90°…

This entry was posted in Computers. Bookmark the permalink.