url: http://dark.fhtr.org/repos/metadata tarball: http://dark.fhtr.org/repos/metadata/metadata-0.1.tar.gz Description ----------- This package `Metadata' comes with a library called `metadata' and a small program called `mdh'. The library probes files for their metadata (e.g. jpeg dimensions and camera make, mp3 artist, pdf word count) and returns the metadata as a Hash. Mdh can print out file metadata as YAML and package the metadata with the file. This package has many dependencies since there is no single universal metadata header format that all files use. Blame resource forks, filename extensions, bags of bytes and mimetypes. The metadata hash mostly follows the shared-metadata-spec naming. http://wiki.freedesktop.org/wiki/Specifications/shared-filemetadata-spec Usage ----- # print out metadata header mdh -p myfile.jpg # create myfile.jpg.mdh, which consists of metadata header + myfile.jpg mdh myfile.jpg # print out metadata header from mdh file mdh -e -p myfile.jpg.mdh # strip out metadata header from mdh file and save it to myfile.jpg mdh -e myfile.jpg.mdh irb> Metadata.extract('myfile.jpg') irb> Metadata.extract_text('myfile.pdf') irb> Pathname.new("myfile.jpg").metadata Requirements ------------ * Ruby 1.8 * Tons of metadata extraction programs, list of debian packages follows: dcraw libimlib2-ruby extract libimage-exiftool-perl poppler-utils mplayer html2text imagemagick unhtml pstotext antiword catdoc shared-mime-info * You do want to install the latest versions of dcraw and shared-mime-info to be able to handle camera raw images. http://cybercom.net/~dcoffin/dcraw/ http://freedesktop.org/wiki/Software/shared-mime-info * Python + chardet library http://chardet.feedparser.org/ License ------- Ruby's Ilmari Heikkinen <ilmari.heikkinen gmail com>