Archive for the ‘internet archive’ Category
The BBC has an article about Kalev Leetaru’s project to extract images from millions of Open Library pages.
You can read about how it works…
The Internet Archive had used an optical character recognition (OCR) program to analyse each of its 600 million scanned pages in order to convert the image of each word into searchable text. As part of the process, the software recognised which parts of a page were pictures in order to discard them.
Mr Leetaru’s code used this information to go back to the original scans, extract the regions the OCR program had ignored, and then save each one as a separate file in the Jpeg picture format. The software also copied the caption for each image and the text from the paragraphs immediately preceding and following it in the book. Each Jpeg and its associated text was then posted to a new Flickr page, allowing the public to hunt through the vast catalogue using the site’s search tool.
“I think one of the greatest things people will do is time travel through the images,” Mr Leetaru said.
… or just check out some of the results. Images plus citations plus metadata! We couldn’t be happier. Free to use with no restrictions.
The Internet Archive had a booth at Wikimania in London. The booth was in the Community Village section of the conference. We hope you stopped by and said hello, grabbed a sticker or a handout, and learned a bit more about our book scanning projects and told us what you were up to. If you’d like to pick up digital copies of our handouts, PDFs are here.
We also went to a lot of programs that were really worthwhile, the free/open culture vibe was palpable and exciting with 2500+ people all getting together to find ways to share more content in more ways. A few other documents we picked up that might be interesting to other folks.
- Teaching students how to edit: Wikipedia Assignments (pdf) – for educators who are Wikimedia curious but don’t know how to get started
- Wikimedian in Residence 2014 review – talks about outcomes and other words that make programs like this appealing to organizations with bottom lines and boards of directors
- How to work successfully with Wikipedia: A guide for galleries, libraries, archives & museums – helping forge partnerships across cultural heritage organizations
For people who like working on Wikipedia but are often flustered by paywalls, you should know about the Wikipedia Library which has a project to help editors access reliable sources. The Wikipedia Loves Libraries project is gearing up for a month of wiki-workshops and edit-a-thons at libraries around Open Access Week in October/November.
San Francisco Weekly said we are the best Bitcoin Evangelists in their BestOf section. Fun.
We now accept bitcoin at our Archive swag store. We continue to offer bitcoins to our employees as salary, eat sushi for bitcoin next door, supported bitcoin as well as could at our credit union, have a cool honor-based bitcoin ATM (please come and use it), accept bitcoin at movies, as well as graciously accept bitcoins as donations to keep our servers humming. (We get a few bits every day, thank you!)
It sounds simple enough for those familiar with the ubiquitous keyboard shortcut Ctrl+F…but it turns out that’s actually only 10% of you! So why use this feature when you’re browsing the TV News Archive of 500,000+ US TV News Shows? Several reasons:
1) More Better Context – The TV New search inside feature enables users to discover a word or combination of words within a show by highlighting the desired term in every segment where it occurs in a show. Furthermore, for every 1 minute segment where a term occurs, all accompanying closed captioning text is surfaced!
2) Less Background Noise – Columns of 1 minute segments that don’t contain a “search inside” term collapse so you can find exactly what you need faster.
3) Remedies the “Refer Problem” – About 80% of the time a user is referred to a TV News show page from a third party search engine, the user’s original search term doesn’t carryover. In other words, you land on a show page with zero terms highlighted, and that’s annoying. While we can’t exactly solve this problem, we can prescribe medication for the pain, “search inside.”
Why Cable TV Is Dying and Twitter is Winning | André-Pierre du Plessis, Columbia Graduate School of Journalism
Tiny Numbers | Bodo Winter, UC Merced Cognitive Sciences
Happy April Fool’s Day! We couldn’t think of a better day to launch the fully redesigned TV News Archive.
This research library, originally released in September 2012, is a free service provided as a way to enhance the capabilities of journalists, scholars, teachers, librarians, civic organizations and other engaged citizens. It repurposes closed captioning to enable users to search, quote and borrow from the Internet Archive’s collection of 500,000+ US TV news broadcasts aired since 2009.
The new interface has been designed to give users better access to this collection, and to provide new tools that enable users to share short clips from any broadcast and track play and share statistics of those clips over time.
Here’s a quick overview of the site’s features; we hope they serve you well.
Search transcripts of US TV news shows aired since 2009
- Search with topical terms to return shows with corresponding transcripts. Remember, you are searching the words spoken in the show.
- Use the advanced search tool (click the icon) to specify a network or show name, or sort your search results.
- Refer to the “info” panel throughout the site for details about your search results, related topics and other stats.
Scan and view show segments
- Shows are presented in 60 second segments, each with a video and corresponding transcript text.
- Scroll left and right to scan through segments of a show; search terms are highlighted in transcript text.
- To search within a show transcript text try Ctrl + F ( + F on mac) to search inside the page. (scrollable transcripts are coming soon!)
Share and embed short clips (aka quotes) from a show
- Shareable quotes are limited to 60 seconds. Refine your quote selection by clicking the “Edit” button and dragging the handles.
- Click a social media button (or 2x the embed button) to finalize and share your quote.
- Your quote will be assigned a permalink. You can always come back to see it!
Track popularity of show quotes shared over time
- Quotes with a unique start and stop time within a show will be tracked to see how often they are re-shared or played.
- View a specific quote by saving or sharing its unique permalink, or you can browse quotes from shows on the TV News Archive site by looking for the icon.
Borrow full shows on DVD
- Borrow shows (click the icon on any show detail page) from the Internet Archive library on a DVD-ROM for 30 days for a $25 processing fee.
- Internet Archive does not sell or license this content. Please note that this is a copyrighted work and performance, copying, or sale, whether or not for profit, by the recipient is not authorized.
An intrepid researcher wanted to figure out what magazine was used in movie WarGames and using the Internet Archive collection found it was Creative Computing. (which was a key magazine for me in the 70′s when I sold personal computers during the pre-Apple ][, kit days).