Internet Archive News

updates about archive.org

The Internet Archive Metadata API

The Metadata API is intended for fast, flexible, and reliable reading and writing of Internet Archive items.

Metadata Read API

The Metadata Read API is the fastest and most flexible way to retrieve metadata for items on archive.org. We’ve seen upwards of 500 reads per second for some collections!

Overview

Returns all of an item’s metadata in JSON.

Resource URL

http://archive.org/metadata/:identifier

Parameters

identifier: The globally unique ID of a given item on archive.org.

Usage

For example, frenchenglishmed00gorduoft is the identifier for http://archive.org/details/frenchenglishmed00gorduoft. You can retrieve all of this item’s metadata from the Metadata API using the following curl command:

$ curl http://archive.org/metadata/frenchenglishmed00gorduoft

The Metadata API also supports HTTPS:

$ curl https://archive.org/metadata/frenchenglishmed00gorduoft

Sub-item Access

The Metadata API returns all of an item’s metadata by default. You can access specific metadata elements like so:

http://archive.org/metadata/:identifier/metadata

http://archive.org/metadata/:identifier/server


http://archive.org/metadata/:identifier/files_count


http://archive.org/metadata/:identifier/files?start=1&count=2


http://archive.org/metadata/:identifier/metadata/collection


http://archive.org/metadata/:identifier/metadata/collection/0


http://archive.org/metadata/:identifier/metadata/title


http://archive.org/metadata/:identifier/files/0/name

Metadata Write API

The metadata write API is intended to make changes to metadata timely, safe and flexible.
It utilizes version 02 of the JSON Patch standard.

Overview

timely

  • Callers receive results (success or failure) immediately.
  • Changes are quickly reflected through the metadata read API.

safe

  • All writes pass through the catalog, so all changes are recorded.
  • All writes are checked before they’re submitted to the catalog.
  • If there’s a problem, no catalog task is created. Goal: no redrows!
  • All checks are repeated when the catalog task is executed.

flexible

  • Supports arbitrary changes to multiple metadata targets through a unified API.
  • Changes are easy — no string concatenation or libraries needed.

Resource URL

http://archive.org/metadata/:identifier

Parameters

identifier: The globally unique ID of a given item on archive.org.

Targets

The Metadata Write API supports three kinds of target:

metadata: Changes item_meta.xml (e.g. http://archive.org/metadata/:identifier/metadata).
files/:filename: Changes the file entry in the item’s files.xml (e.g. http://archive.org/metadata/:identifier/files).
other: Changes other.json (e.g. http://archive.org/metadata/:identifier/other).

For XML targets (e.g. ‘metadata‘ and ‘files‘) patches should be composed against their JSON representation, as found in metadata read API results.

Usage

As an HTTP post/get

http://archive.org/metadata/:identifier

With the following url-encoded arguments:

-target: The metadata target you would like to modify.
-patch: The patch you are submitting to the Metadata API.
username: The email address associated with your Archive.org account.
access: Your IA-S3 access key.
secret: Your IA-S3 secret key.

Authentication

NOTE: These calls must be made with appropriate authentication – at the moment, this means passing your Archive.org username and IA-S3 credentials. Please visit http://archive.org/account/s3.php to obtain your IA-S3 access key and secret key.

Patches

Patches are JSON strings. They should comply to the draft Json-Patch standard:

http://tools.ietf.org/html/draft-ietf-appsawg-json-patch-02

Examples

Writing to an item’s meta.xml

Add ‘scan_sponsor’ with value ‘Starfleet’ to target ‘metadata’ to the item metadata_test_item:

#!/bin/bash
ACCESS=<redacted>
SECRET=<redacted>
USERNAME="user@example.com"
IDENTIFIER=metadata_test_item
TARGET=metadata
PATCH='{"add":"/scan_sponsor", "value":"Starfleet"}'

curl --data-urlencode -target=$TARGET \
     --data-urlencode -patch="$PATCH" \
     --data-urlencode username=$USERNAME \
     --data-urlencode access=$ACCESS \
     --data-urlencode secret=$SECRET \
     http://archive.org/metadata/$IDENTIFIER

returns a JSON object, like the following:

{"success":true,"task_id":114350522,"log":"http://www.us.archive.org/log_show.php?task_id=114350522″}

or perhaps

{"error":"Some problem applying the patch"}

writing to files.xml entry

#!/bin/bash
ACCESS=<redacted> 
SECRET=<redacted>
USERNAME="user@example.com"
IDENTIFIER=metadata_test_item
TARGET='files/glogo.png'
PATCH='{"add":"/camera", "value":"Canon A150″}'

curl --data-urlencode -target=$TARGET \
     --data-urlencode -patch="$PATCH" \
     --data-urlencode username=$USERNAME \
     --data-urlencode access=$ACCESS \
     --data-urlencode secret=$SECRET \
     http://archive.org/metadata/$IDENTIFIER

Writing to metadata_test_item/foo_client.json

NOTE: Keys and values are binary-safe and unrestricted

#!/bin/bash
ACCESS=<redacted> 
SECRET=<redacted>
USERNAME="user@example.com"
IDENTIFIER=metadata_test_item
TARGET='foo_client'
PATCH='{"add":"/of concern to foo", "value":{"foo-ness":["buckle", "shoe"]}}'

curl --data-urlencode -target=$TARGET \
     --data-urlencode -patch="$PATCH" \
     --data-urlencode username=$USERNAME \
     --data-urlencode access=$ACCESS \
     --data-urlencode secret=$SECRET \     
     http://archive.org/metadata/$IDENTIFIER

After the above call, a metadata read of metadata_test_item will have a toplevel member ‘foo_client’ with value:

{"foo-ness":["buckle", "shoe"]}
Originally posted on The Internet Archive Blog by internetarchive.
Advertisements

Written by internetarchive

July 4, 2013 at 12:08 am

Posted in internet archive

%d bloggers like this: