Archive for the ‘internet archive’ Category
Thanks to Jason Scott, lots of deep collecting communities, and volunteers, Jason is announcing that the Internet Archive now hosts some very large software and computer documentation collections, maybe the largest overall host.
Now we all have to make it larger, more findable, and re-usable– please help, please donate money, time, anything– this is our history, lets write it well.
Just over 8-1/2 years ago, I wrote a multi-process daemon in PHP that we refer to as “catalogd”. It runs 24 hours a day, 7 days a week, no rest!
It is in charge of uploading all content to our archive.org servers, and all changes to uploaded files.
We recently passed the 100 millionth “task” (upload or edit to an archive “item”).
After starting with a modest 100 or so tasks/day, we currently run nearly 100,000 tasks/day. We’ve done some minor scaling, but of the most part, the little daemon has become our little daemon that could!
Here’s to the next 100 million tasks at archive.org!
Since 1997, a dedicated team of scanners and curators have been assembling a collection of historical computer and technology-related items. This collection, called BITSAVERS.ORG, contains tens of thousands of documents and software products dating back from the 1950s and into the 2000s. From the days of mainframes and electronic counting machines through the home computer revolution and the short lives and shorter support of various pieces of equipment, Bitsavers volunteers have been scanning industriously. There are piles of manuals and brochures, as well as guides and overviews, that have been cast aside in favor of the next big thing. Bitsavers has been working tirelessly to rescue these lost documents.
And now, they are mirrored on the Internet Archive.
Currently, over 23,000 individual manuals, books, memos, and guides are hosted on the Archive in the collection, automatically ported over from the Bitsavers mirrors.
Every week, a dozen or more new documents join the Bitsavers archive, from all reaches of technological history. Whether you want to browse the original manual for the Apple I or learn the benefits of a Sanders Associates 5700 Tape System, there’s something for every person interested in seeing where computing has come from.
Some other gems in the collection:
- Rare Atari 400/800 technical reference notes and operating system source code from the Atari corporation.
- The proper way to handle a 5 1/4″ floppy disk.
- A history of the TX-0, the transistorized computer built in 1956 that represented the playground of beginning hacker culture. (The document, from 1999, is a reprint of a 1974 history.)
- A lost-for-decades history of the Whirlwind Project (an early important research computer) which was intended for publication in 1967 but never ultimately distributed.
- An eye-opening glance into the complicated instructions for using a word processor in 1985.
- That time someone put an entire Apple I circuit board into a scanner.
- A 1950s education from IBM about how all their functional wiring works, including punchcards, printers, and controls.
Whether for research, nostalgia, or interesting inspiration for artwork and writing, the millions of scanned pages in the Bitsavers collection are a click away from the collection page. Where possible, further sub-collections for companies like IBM, DEC and Control Data Corporation are also available.
A toast to this flood of computer history!
Bitcoin to Cash Converter Box
(Please leave this page visible on the computer next to the cash box). To help us try out bitcoins, I am putting up $200 ($100 cash and $100 worth of bitcoins) to make an honor-based converter box to be available at the Internet Archive Friday lunches. Please donate a $1 conversion fee for each transaction to help cover loss and mistakes. If this works, then maybe other offices or hacker spaces will do this. Please leave this page visible after you finish.
Convert your bitcoins into dollars:
Then use your bitcoin client to send address: 1Pt9TRJKeAW61aR1ELQpUZKdMaYXzkCTrn, (you can send a skype from this machine with the address or use this webpage on your own machine). Take your dollars from the cash box. Please leave $1 for each transaction so it will cover for loss or mistakes. Please leave this page visible after you finish.
Convert your dollars into bitcoins:
Calculate the conversion (opens in new tab) amount for the dollars you want to convert and make sure it is under $100 (our current limit). Please subtract $1 from the amount you want to covert as a conversion fee to cover loss or mistakes. For more than $100 maybe think of using coinbase.
Then use the Bitcoin-qt application on the computer (right hand one of the displays in the Internet Archive library). Make sure it has enough coins, please email me if not. Skype your bitcoin address to this machine (bitcoinconverter). Use the Send Coins button on the application window, and send yourself the coins.
Please leave this page visible after you finish.
If you want to do this, this is the cashbox we got:
Today we are launching a new uploader that handles much larger files. We’ve tested files well over 100GB in size, so if you’re using the right browser it should be able to take care of all your uploading needs. We recommend using the latest versions of Chrome or Firefox for the best experience.
The new uploader does not work in Internet Explorer due to the limitations of that browser. The previous, flash-based uploader is still available for IE users, or for those who have any issues with the new one.
Let us know in the comments if you’re having any issues.
Thanks to Raj Kumar, Sam Stoller, Michael Ang, Tracey Jaquith, Jeff Kaplan and Alexis Rossi.
Aaron Swartz, champion of the open world committed suicide yesterday.
Working at the Internet Archive, Aaron was the architect and first coder of the OpenLibrary.org a site to open the world of books to the Internet generation. He helped put public domain books on the site that had been locked up by libraries. Public access to the Public Domain, while seems obvious is not the position of many institutions, and this caused friction for Aaron.
As a volunteer, he helped make the RECAP system to offer free public access to public domain government court documents. He took the bold step of seeding this system by going to a public library to download the public domain and then uploaded the documents to the Internet Archive– this got him in trouble with the FBI. Now many millions of public domain documents have been used by over six million people for free, including researchers that could never have afforded the high fees to gain access.
If there is a sin in the open world it is locking up the public domain. Aaron took selfless action.
When he was downloading a large number of old journal articles, he was arrested at MIT. I was shocked by this. When I was at MIT, if someone went to hack the system, say by downloading databases to play with them, we might a hero, get a degree, and start a company– but they called the cops on him. Cops. MIT used to protect us when we transgressed the traditional. Despite many of us supporting the lawyers for Aaron, he was still hounded by prosecutors. (I hope JSTOR.org and MIT will act differently in the future)
Aaron was steadfast in his dedication to building a better and open world. Selfless. Willing to cause change.
He is among the best spirits of the Internet generation. I am crushed by his loss, but will continue to be enlightened by his work and dedication.
May a hero and founder of our open world rest in peace.
Founder, Digital Librarian of the Internet Archive
Today we updated the Wayback Machine with much more data and some code improvements. Now we cover from late 1996 to December 9, 2012 so you can surf the web as it was up until a month ago. Also, we have gone from having 150,000,000,000 URLs to having 240,000,000,000 URLs, a total of about 5 petabytes of data. (Want a humorous description of a petabyte? start at 28:55) This database is queried over 1,000 times a second by over 500,000 people a day helping make archive.org the 250th most popular website.
Over the past year we archived tons of pages about the United States 2012 presidential election. You can revisit the New York Times live coverage page from election day, the campaign sites of Republican hopefuls like Newt Gingrich and Ron Paul, and mini-scandals like Romney’s car elevator or using aspirin as contraceptives. The Wayback record of the 2008 election was recently used by the Sunlight Foundation to contrast how Obama’s team dealt with disclosing inauguration donors then vs. now, so hopefully the 2012 election content will prove just as useful in the future.
The prolific volunteers of Archive Team spent a lot of time this year archiving web sites on the verge of disappearing and then contributing those records to Internet Archive. City of Heroes (including the boards with years of posts), Fortune City and Splinder were all saved from the proverbial wood chipper.
The updated version does have at least one known issue – there is a small amount of older content missing from the index, and it will take us another month or two to sort out that problem. In the mean time, you can still visit the previous version of the Wayback with that content.
We would like to thank the following for all their efforts in making the updated Wayback Machine:
- Andy Bezella
- Aaron Binns
- Hank Bromley
- Kris Carpenter
- Dominic Dela Cruz
- Vinay Goel
- Jake Johnson
- Brewster Kahle
- Jeff Kaplan
- Ilya Kreymer
- Raj Kumar
- John Lekashman
- Noah Levitt
- Adam Miller
- Gordon Mohr
- Ralf Muehlen
- Kenji Nagahashi
- Alexis Rossi
- Jim Shankland
- Sam Stoller
- Brad Tofel
- Travis Wellman
Thanks to the generous support of our users we raised $250,000 in donations during the month of December, and with the 3-to-1 match from one of our donors that gives us $1,000,000! We raised enough to purchase 4 petabytes of storage, which helps us towards the 10 we estimate for next year. Beyond that, this will help us archive books, music, video and web sites. If you haven’t donated yet, please help keep the archive open!
We brought an unprecedented amount of information into the archive in 2012:
- 50,000,000,000 web pages
- 1,000,000 hours of television
- 370,000 new audio/music items
- 100,000 new videos
We launched the TV News Search & Borrow service, which makes almost 400,000 television news programs searchable and borrowable. We made all of Balinese literature available online. And you can play with a new, beta Wayback that has a much more up to date index.
We look forward to archiving even more great material in 2013. Thank you for helping to support the goal of Universal Access to All Knowledge.
The Internet Archive has received a generous offer this holiday season. For every dollar we raise before December 31st, one of our supporters will match that money three to one. Please consider donating now.
Every day three million people around the world use our collections. We have archived over ten petabytes (that’s 10,000,000,000,000,000 bytes!) of information, including everything ever written in
Our constantly expanding collections require a lot of storage space, and if we can raise $150,000 by the end of the year, the 3-for-1 match will give us an additional $450,000. Together that’s enough to buy four more petabytes of storage.
Please help us keep the library free for millions of people by making a tax-deductible donation today. On behalf of all of us at the Internet Archive, we wish you a happy holiday.
Per some requests from our friends in the Live Music Archive community,
You can get any archive.org item downloaded to your local machine as a .zip file (that we’ve been doing for 5+ years!)
But whereas before it would be all files/formats,
now you can be pick/selective about *just* certain formats.
We’ll put links up on audio item pages, minimally, but the url pattern is simple for any item.
It looks like (where you replace IDENTIFIER with the identifier of your item (eg: thing after archive.org/details/):
for the entire item, and
wget -q -O - 'http://archive.org/compress/ellepurr/formats=Metadata,Checksums,Flac' > zip; unzip -l zip
Length Date Time Name
--------- ---------- ----- ----
1107614 2012-10-30 19:49 elle.flac
44 2012-10-30 19:49 ellepurr.md5
3114 2012-10-30 19:49 ellepurr_files.xml
693 2012-10-30 19:49 ellepurr_meta.xml
602 2012-10-30 19:49 ellepurr_reviews.xml
1112067 5 files