Internet Archive News

updates about archive.org

How Archive.org items are structured

What is an item?

An item is a logical “thing” that we present on one web page on archive.org. An item may be one video file along with scans of the DVD cover, one book, one audio file, or a set of audio files that represent a CD , etc.

How do you know whether your files should be in one item or separate items?  You get one metadata file per item.  If the same metadata describes ALL of the files (like a CD), then that’s one item.  If the files are too different to have the same metadata (title, creator, description, etc.), they should be in different items.

How Items Are Structured

All archive.org items have this format URL:
http://archive.org/details/[identifier]
(where [identifier] is unique within our system).

Example: For this item
http://www.archive.org/details/popeye_taxi-turvey
the identifier is popeye_taxi-turvey

An item is just a directory or folder of files that includes the originally uploaded content file(s) – audio, video, text, etc. – along with any derivative files we create from the originals and the metadata that describes the item.  To see all files in an item, click the HTTP link in the upper left box on the item page (circled in red below).

That link takes you to a directory listing showing all original, derived, and metadata files for the item.

You can view information about every file in this directory by viewing the file ending in _files.xml (in this example, popeye_taxi-turvey_files.xml). Each file in the item is listed here, along with whether the source is “original” (uploaded by the user), “derivative” (derived by archive.org), or “metadata” file.  You will also find a format designation, various checksums, and sometimes titles for the files.

To see all of the metadata for the item, view the file ending in _meta.xml (in this example, popeye_taxi-turvey_meta.xml). This file should list all of the pertinent information about the item, such as title, creator, description, etc.  IA’s metadata schema is based on Dublin Core, but it is extremely flexible.  You can add any key=value pair to this file and we will store it and make it searchable in the IA search engine.  (However, it may not automatically show up on the item page.)

Reviews, if there are any, are contained in the _reviews.xml file.

One thing to note: Many “display” characteristics on archive.org, among other things, work better if your item’s identifier matches your file name.  So if you’re uploading a file called popeye_taxi-turvey.mpg, it’s best to use the identifier popeye_taxi-turvey (just remove the file extension).  If you’re using the upload button on archive.org, put your desired identifier in the Title field of the upload form.  We turn that into the identifier automatically, and then after upload you can go back into the item and change the title to something more readable.

Archival URLs

An item’s “details” page will always be available at
http://archive.org/details/[identifier]

The item directory is always available at
http://archive.org/download/[identifier]

A particular file can always be downloaded from
http://archive.org/download/[identifier]/[filename]

Please Note: Archival URLs may redirect to an actual server that contains the content.  For example

http://www.archive.org/download/popeye_taxi-turvey

currently redirects to

http://ia600204.us.archive.org/14/items/popeye_taxi-turvey/

DO NOT LINK to any archive.org URL that begins with numbers like this.  This refers to the particular machine that we’re serving the file from right now, but we move items to new servers all the time.  If you link to this sort of URL, instead of the archival URL, your link WILL break at some point.

Originally posted on The Internet Archive Blog by internetarchive.
Advertisements

Written by internetarchive

March 31, 2011 at 4:59 am

Posted in Uncategorized

%d bloggers like this: