Go to:
Online Doc
Meeting Request
Scott Floyd's picture

Book Genome Project


Does anybody listen to Pandora? Is anybody blown away by how awesome it is?? Let's take readers advisory to the next level and make a Pandora-style service for books!!

They did it via the Music Genome Project: http://en.wikipedia.org/wiki/Music_Genome_Project

If anything, this is a great topic as an emerging technology.


Let's make a project like this come from the library world, not let a vendor steal the thunder (& charge a whole lot of money).

Gale offers a (conceptually) similar service with it's "What do I read next" service. This, however, is a mindless program that simply pulls from subject headings and really does not get at the qualitative aspects of books.

Thanks for reading. Obviously this is in the initial stages of thought. As far as I know and have looked into, no one in the library field is working on this.



Scott Floyd

Julia Bauder's picture

There was a brief discussion of this concept on the ANTHROLIB listserv last year, but nothing came of it. Still, I said then, and I'll say again -- while I don't have the time (or technical chops!) to do it by myself, I will happily contribute however I can to a project like this. This is exactly the kind of thing libraries ought to be doing if we want to stay relevant.

Scott Floyd's picture

Thanks for the input! I guess I didn't quite do enough background searching, because there have been some serious advancements since Gale's product. I checked a couple of these out below and they appear to be pretty good. Still, none of these are as effective as a genome-type project, but then again, Pandora took almost 10 years. I still think developing a more qualitative framework could be established for describing books. There is plenty of opportunity to improve upon finding new books (that you like). So any more brainstorming is welcome.






Neil Mussett (non-member)'s picture

Hi Scott,

I think you have a fantastic idea!

A few thoughts from a (non-librarian) database programmer:

  1. A genome-type engine would be much better for books than the sites you mention, because it would be able to recommend books with much fewer ratings by the user (or just one). It is easy to rate hundreds of songs (or movies), but difficult for books. It is also very difficult for these sites to recommend an obscure author, because unless someone has already rated it favorably or added it to a list, the engine does not know much about it.
  2. Your engine will be much more difficult to maintain than the Music Genome Project because of the time it will take to develop and answer questions for each book. The MGP folks have reported that it takes them about a half an hour for each song. I think it makes sense to propose this idea to librarians, because it will take some kind of large-scale community of experts to make this work.
  3. Your database will always be profoundly incomplete, so it might be helpful to start with a manageable sub-sub-category of books (what is your own expertise?). 
  4. My understanding of the MGP database is that there are just a few major categories, and once a song is identified as falling under one (e.g. classical vs. rock), all of the attributes under that category are required.  Your database might want to take looser tree-based approach, where some some trees have far more sub-trees than others (there are a lot more questions to answer to give the 'genome' of a mystery novel than a government pamphlet).
  5. There are other qualitative standards you may want to research, including the PANOSE specification for typefaces (cf. Fonts and Encodings by Y. Haralambous). The goal with PANOSE is to be able to substitute one typeface for another without the user being aware that a change has been made.

Good luck!

-Neil Mussett

Scott Floyd's picture

Hi Neil,

Great feedback, thank you. I just saw your reply. I believe this topic still has merit and I should continue to pursue it. With more investigation it might be a worthy emerging technologies topic for a future conference.


Thanks again,



Jesse Watson (non-member)'s picture

Hi Neil,

I'm also a non-librarian, but rather an author, and coincidentally, a software architect.

I think the idea of a Book Genome Project would be fantastic... Has there been any more activity along these lines?

I also agree with your analysis, especially the idea that the biggest challenge would be the work of performing the categorization for each book.  Unlike a song, whose attributes are rather inarguable (either the lead guitar has distortion or it doesn't), many attributes of a book would be open for debate (is the writing style casual or transparent, is the genre horror or suspense, etc).

In fact, that aspect of it seems so difficult that it might be necessary to make the index openly user-maintained, like Wikipedia.  Let the debate surrounding each book's attributes happen in a public forum, and then let the community directly edit the database in a revision-controlled manner with self-appointed librarians (also like Wikipedia).  The definition of each attribute and its meaning would be more tightly controlled, but could still be open for public commentary.  All that said, actual librarians might end up being heavy contributors to such an index, and thus involvement from the ALA could be invaluable.

Anyway, just thinking out loud... Let me know if anyone out there is interested enough in this to spend some time on it.





Frank Skornia's picture

Not to raise an old thread from the dead, but this does sound interesting and promising.

One project I came across about 8 months ago that looks very promising is Booklamp (http://www.booklamp.org).  The project describes itself as an attempt to do what Pandora did for music.  It was originally planned to be an idea for Google (the founder of Booklamp has an interesting blog chronicling his experiences with presenting to the Big G), but ultimately ended up venturing out on his own.  The interesting thing about how Booklamp works is that the software analyzes the book, without human interaction.

Booklamp is still in very early development, and while they have a demo up on their site, it would be greatly improved when more content is added.  Right now they have a lot of science fiction and fantasy that was provided by the Baen Free Library.  It also seems like it would be a project that would benefit greatly from having access to the content scanned by Google for Google Books - not to provide the texts, but to build a more complete database of information.

I'm excited to see librarians looking into this idea though and look forward to seeing what may come of it.

Scott Floyd's picture

Thanks, it sounds like they have some interesting technology under development. And I don't think you're raising the post from the dead, mainly because this technology is in its infant stages. I'm sure in a few years we'll all be plugging into this type of API left and right.

Anyway, a tool that effectively captures our reading interests is surely on the way!