DLF Forum: Library of Congress and Flickr

Women at work on bomber, from the Library of Congress

Phil Michel and Michelle Springer from the Library of Congress presented on the LOC’s Flickr Pilot Project. The Library of Congress was the first cultural heritage institution to partner with Flickr to share photographic content and invite user participation and comments. With 15 institutions participating in what is now the Flickr Commons, it is an idea that caught on quickly and has been quite successful. I’ve been very excited about this project since its launch, and so I was motivated to clean up and blog my rather extensive notes on the session. For more information about the project, check out this LOC webcast.


The motivation for the project came from a desire to explore including user generated content (UGC) in LOC descriptive processes. Photos seemed like a good place to start because there is no language barrier, there was already a big collection of photos online, and because they’re fun.

Initial investigations showed that bringing tagging to LOC collections would have had high technical barriers if handled in-house. There was a desire to keep initial expenditures low, and so they started looking around for existing web 2.0 sites that were doing the things they wanted to do.

The project had three goals: Increase awareness of LOC collections; gain better understanding of social tagging; gain experience participating in the kinds of web communities that are interested in LOC materials

There were a number of principles that guided the development of the pilot project: The involved content must already be available on the LOC site; the agreement with the third party site must be non-exclusive; access to the content must must be free; there must be an option to control or exclude advertising on the account; LOC should be clearly identified as the source of the images; must allow LOC to remove and moderate user-supplied content to prevent inappropriate tags and comments; UGC must be clearly distinguishable from Library generated content; must be possible to accurately convey copyright status.

Flickr had a great deal of appeal as a partner: It recently announced the upload of its 3 billionth picture, and has an active user community of over 23 million members. It had a pre-existing, vibrant community built around photography and a conversation that included notes, comments, and tags. From a technical standpoint, it also had APIs that allow for batch uploads and batch downloads of UGC, and a history of dealing with alternative copyright status (Creative Commons licenses).

Getting it off the ground

Flickr programmed the “No Known Restrictions” option especially for the LOC partnership, and it is now used by most of the institutions participating in The Commons. Every institution has its own page in its own webspace the explains exactly what they mean by the statement.

Some time and effort was required on the part of the General Counsel’s office to work with Flickr to create a modified Terms and Conditions agreement that could deal appropriately with the Library’s status as a government institution.

Technical process: Someone (I missed who – Flickr, LOC, or both) built a Java(?) app called Flickrj to push and pull content between the LOC databases and Flickr’s. They chose selected MARC fields whose content would go to Flickr along with the photos: The MARC 856 field was used as unique machine tag value, and so was the DublinCore identifier field.

All together, getting the project off the ground took about 100 hours of work for technical staff.

The photos all went to Flickr in their rough state. LOC folks didn’t do any cropping, color fixing, or clean up of dust or scratches. Part of the curiosity was to see how the public would respond to the images in this rough form.

Startup investments: The Library of Congress purchased a $24.95 Flickr Pro account, which offers members unlimited uploads and stats about traffic to photos. The Pro account is an annual expense that will go on as long as the project does. All Commons member institutions have pro accounts. There was no full time staff assigned to the project, but it required General Counsel involvement, some big conference calls, and eight staffers who contributed about 20 hours each to collaborate with Flickr on development.

Launching and maintaining

This was the first project that LOC ever announced without a press release. There were announcements on the Library of Congress and Flickr blogs, and the organizers considered it a soft launch. Though it involved no mainstream press, there was an enormous initial response, totally out of proportion to what was expected. The result was some near-immediate revisions of plans for maintenance and direct staff involvement; the scale was too big to be as involved as they’d planned.

A number of LOC staff share responsibility for monitoring all new comments, notes, and tags. They use the Flickrj app to pull all the new UGC at once. It takes about 2 hours a week to moderate comments/notes/tags for spam and inappropriateness. Sometimes users call attention to these things before staff find them. There are very few problems with inappropriate tags or comments; the Flickr community is quite well-behaved. LOC staff don’t correct spelling or syntax or remove seemingly useless tags. Staff do accept group invitations from public group administrators, but they only join public, nudity/vulgarity-free groups, so monitoring the group invites also takes time.

Updates to the images themselves take 15-20 hours a week. These involve corrections to descriptive information, fixes in the LOC catalog, and occasionally image fixes. Sometimes the orientation of images is wrong. First they fix it on the LOC server, then they generate new derivatives, and then send corrected versions to Flickr. In general, they limit edits to very basic changes and real errors. Sometimes they’ll point people from the LOC catalog back to Flickr when large amounts of conversation, updating, and information-sharing are taking place for a particular photo.

During the Q&A, someone asked about how have time pressures changed over the course of the project. Turns out, they haven’t exactly gone down, though they have shifted. When the project first launched, staff was checking the new comments and tags every 24 hours, and it was totally overwhelming. Efficiencies have come from the technical solutions, like the ability to batch download all new comments, notes, and tags. However, as the number of photos keeps going up, time demands on moderators continue to go up. Part of time demand comes from level of participation in the community, which is a steady stream; activity doesn’t stop on the older photos, so the rising total number of images leads to a rising total amount of new user generated content.


One of the main goals of the project was to drive more traffic to the LOC photo collections website, and it worked. People visit the LOC pages for higher resolution images, to get additional information, and to browse related collections. The organizers feel that the pilot has definitely achieved goal of raising awareness of LOC photo collections.

An unexpected outcome: Major search engines are finding, exposing, and weighting LOC’s Flickr images in search results. Many of the photos rank very high in images searches. It’s an unforseen way to further expose the content to the world.

Many of LOC photos are also being embedded in blogs all over the web (including this one). When it happens via “Blog this” function in Flickr, it’s easy for LOC to track it (and I imagine it’s trackable even when it happens in other ways).

The user involvement has been very interesting as a source of further study. There is a core group of about 20 commenters who provide historical research, fixes, comments, notes, etc. They’ll often support the information with citations, links to NYTimes archives and other external sites and archives. There are also 10 “power taggers” who have applied more than 3,000 tags each. One person was responsible for over 5,000 tags. The people at LOC did some work examining the different types of tags that people apply, and identified nine different categories: LC description based, new descriptive words, new subject words, emotional/aesthetic responses, personal knowledge/research, machine tags, variant forms, foreign language, and miscellaneous.

Users frequently post modern photos in the comments to show what the featured locations look like now. Sometimes people will go to the featured location and reenact the photos. There is quite a bit of playfulness and humor in much of the user involvement. Notes are a useful way to identify people in crowd shots and to transcribe text that appears in the photos. Some people also use notes to make jokes or silly comments, and while some people in the Flickr community have objected to the proliferation of notes, LOC has decided that for now the value of the function outweighs the irritation.

Conclusion: There has been a great response to the pilot, and great user participation, learned a lot. It stimulated conversation both between users & librarians and also between librarians. The project tapped into expertise in that resides in communities of interest. It brought up issues related to presentation and engagement that can inform decisions about how materials are presented on the Library’s own web site. While there are some risks associated with jumping into the web 2.0 world, and you have to be willing to cede some control, the benefits and rewards have been terrific.

A bit of shameless self promotion

This has been a big week for the country, a big week for the University of Michigan Library, and a big week for me. I got published. Twice! Here are the articles:

Also this week, I was featured in the University Record’s “staff spotlight” series. It’s the paper for faculty and staff at U-M, and every week they do a profile of someone who works at the University who isn’t teaching faculty. This week it was me. You can learn all about my secret life in experimental theater.

</shameless self promotion>