Mondotofu wrote:Ideally, digital publishing (or conversion) would embed each digital product with useful metadata such as a Marc record, but for the DIY crowd, few of us are librarians. Perhaps a metadata transfer from a repository would help on converting the pricey college textbooks, but what about home-brew collections?Now, that's an interesting problem to work on.
I mentioned marc records precisely because they are easily available for basically any book already! There is no need for a freedb for books. Many library catalogs offer access to the full marc records. Ideally, for books with an ISBN, we could just search for it in the LOC catalog then grab the marc record and stick it in the folder with the images and just then compile it along with our images into the archival format of choice. For older books, it's a bit more work to find the exact edition, but for most things the LOC will have it. There is no way that I would ever take the time to write a marc record! (catalogers are very special and rare people)
Software code to grab metadata from marc records already exists and there are software libraries that can be used in conversion programs and for other data mining activity. There are a ridiculous number of book metadata formats (each with their own benefits and problems!) so there are also tons of conversion scripts that I have managed to find with some quick google searches. It seems that lots of popular libraries and scripts use LOC's mods format as an intermediary, such as Bibutils.
Mondotofu wrote:when you're searching PubMed, you're searching a lot of metadata first, before plunging into their compilation of articles on medicine.
Exactly! I like the idea of abstracts in metadata, even though for most things using uniform subject headings would probably be enough. Most of the abstract databases that I use (RILM, Arts and Humanities Citation Index, etc...) have the ability to export their data in several formats, so we could probably convert those into an appropriate ebook metadata format depending on need. (That's the ultimate problem with metadata: everyone has different needs)
strider1551 wrote:Does anyone (esp. abmartin) have any experience with xmp for storing metadata?
I played around with xmp a few years ago to tag music files, before going back to ID3 tags (neither can do the level of detail I want...). It looks absolutely brilliant for our purposes! Since it works with so many formats, it could be a great way to go. It seems as if there are two different sources for the technology - the more important for any detailed metadata is the BibTeX tags. I'll have to try it out on some files to see how detailed the XMP actually is. The spec sheet doesn't really explain if it can handle all BibTex tags or just some, but since it extensible it would probably handle it. Since they are so analogous, it probably wouldn't be too difficult to just type a bibtex file for each book then convert that into the XMP in the Djvu file! BibTex files are easy enough to create manually, and it is very easy to convert all of the other bib formats into a BibTeX file.
I'll actually have to give it a try and see how it works. (If I can manage to figure out how to do it...)
