DLF Forum: Library of Congress and Flickr

Women at work on bomber, from the Library of Congress

Phil Michel and Michelle Springer from the Library of Congress presented on the LOC’s Flickr Pilot Project. The Library of Congress was the first cultural heritage institution to partner with Flickr to share photographic content and invite user participation and comments. With 15 institutions participating in what is now the Flickr Commons, it is an idea that caught on quickly and has been quite successful. I’ve been very excited about this project since its launch, and so I was motivated to clean up and blog my rather extensive notes on the session. For more information about the project, check out this LOC webcast.


The motivation for the project came from a desire to explore including user generated content (UGC) in LOC descriptive processes. Photos seemed like a good place to start because there is no language barrier, there was already a big collection of photos online, and because they’re fun.

Initial investigations showed that bringing tagging to LOC collections would have had high technical barriers if handled in-house. There was a desire to keep initial expenditures low, and so they started looking around for existing web 2.0 sites that were doing the things they wanted to do.

The project had three goals: Increase awareness of LOC collections; gain better understanding of social tagging; gain experience participating in the kinds of web communities that are interested in LOC materials

There were a number of principles that guided the development of the pilot project: The involved content must already be available on the LOC site; the agreement with the third party site must be non-exclusive; access to the content must must be free; there must be an option to control or exclude advertising on the account; LOC should be clearly identified as the source of the images; must allow LOC to remove and moderate user-supplied content to prevent inappropriate tags and comments; UGC must be clearly distinguishable from Library generated content; must be possible to accurately convey copyright status.

Flickr had a great deal of appeal as a partner: It recently announced the upload of its 3 billionth picture, and has an active user community of over 23 million members. It had a pre-existing, vibrant community built around photography and a conversation that included notes, comments, and tags. From a technical standpoint, it also had APIs that allow for batch uploads and batch downloads of UGC, and a history of dealing with alternative copyright status (Creative Commons licenses).

Getting it off the ground

Flickr programmed the “No Known Restrictions” option especially for the LOC partnership, and it is now used by most of the institutions participating in The Commons. Every institution has its own page in its own webspace the explains exactly what they mean by the statement.

Some time and effort was required on the part of the General Counsel’s office to work with Flickr to create a modified Terms and Conditions agreement that could deal appropriately with the Library’s status as a government institution.

Technical process: Someone (I missed who – Flickr, LOC, or both) built a Java(?) app called Flickrj to push and pull content between the LOC databases and Flickr’s. They chose selected MARC fields whose content would go to Flickr along with the photos: The MARC 856 field was used as unique machine tag value, and so was the DublinCore identifier field.

All together, getting the project off the ground took about 100 hours of work for technical staff.

The photos all went to Flickr in their rough state. LOC folks didn’t do any cropping, color fixing, or clean up of dust or scratches. Part of the curiosity was to see how the public would respond to the images in this rough form.

Startup investments: The Library of Congress purchased a $24.95 Flickr Pro account, which offers members unlimited uploads and stats about traffic to photos. The Pro account is an annual expense that will go on as long as the project does. All Commons member institutions have pro accounts. There was no full time staff assigned to the project, but it required General Counsel involvement, some big conference calls, and eight staffers who contributed about 20 hours each to collaborate with Flickr on development.

Launching and maintaining

This was the first project that LOC ever announced without a press release. There were announcements on the Library of Congress and Flickr blogs, and the organizers considered it a soft launch. Though it involved no mainstream press, there was an enormous initial response, totally out of proportion to what was expected. The result was some near-immediate revisions of plans for maintenance and direct staff involvement; the scale was too big to be as involved as they’d planned.

A number of LOC staff share responsibility for monitoring all new comments, notes, and tags. They use the Flickrj app to pull all the new UGC at once. It takes about 2 hours a week to moderate comments/notes/tags for spam and inappropriateness. Sometimes users call attention to these things before staff find them. There are very few problems with inappropriate tags or comments; the Flickr community is quite well-behaved. LOC staff don’t correct spelling or syntax or remove seemingly useless tags. Staff do accept group invitations from public group administrators, but they only join public, nudity/vulgarity-free groups, so monitoring the group invites also takes time.

Updates to the images themselves take 15-20 hours a week. These involve corrections to descriptive information, fixes in the LOC catalog, and occasionally image fixes. Sometimes the orientation of images is wrong. First they fix it on the LOC server, then they generate new derivatives, and then send corrected versions to Flickr. In general, they limit edits to very basic changes and real errors. Sometimes they’ll point people from the LOC catalog back to Flickr when large amounts of conversation, updating, and information-sharing are taking place for a particular photo.

During the Q&A, someone asked about how have time pressures changed over the course of the project. Turns out, they haven’t exactly gone down, though they have shifted. When the project first launched, staff was checking the new comments and tags every 24 hours, and it was totally overwhelming. Efficiencies have come from the technical solutions, like the ability to batch download all new comments, notes, and tags. However, as the number of photos keeps going up, time demands on moderators continue to go up. Part of time demand comes from level of participation in the community, which is a steady stream; activity doesn’t stop on the older photos, so the rising total number of images leads to a rising total amount of new user generated content.


One of the main goals of the project was to drive more traffic to the LOC photo collections website, and it worked. People visit the LOC pages for higher resolution images, to get additional information, and to browse related collections. The organizers feel that the pilot has definitely achieved goal of raising awareness of LOC photo collections.

An unexpected outcome: Major search engines are finding, exposing, and weighting LOC’s Flickr images in search results. Many of the photos rank very high in images searches. It’s an unforseen way to further expose the content to the world.

Many of LOC photos are also being embedded in blogs all over the web (including this one). When it happens via “Blog this” function in Flickr, it’s easy for LOC to track it (and I imagine it’s trackable even when it happens in other ways).

The user involvement has been very interesting as a source of further study. There is a core group of about 20 commenters who provide historical research, fixes, comments, notes, etc. They’ll often support the information with citations, links to NYTimes archives and other external sites and archives. There are also 10 “power taggers” who have applied more than 3,000 tags each. One person was responsible for over 5,000 tags. The people at LOC did some work examining the different types of tags that people apply, and identified nine different categories: LC description based, new descriptive words, new subject words, emotional/aesthetic responses, personal knowledge/research, machine tags, variant forms, foreign language, and miscellaneous.

Users frequently post modern photos in the comments to show what the featured locations look like now. Sometimes people will go to the featured location and reenact the photos. There is quite a bit of playfulness and humor in much of the user involvement. Notes are a useful way to identify people in crowd shots and to transcribe text that appears in the photos. Some people also use notes to make jokes or silly comments, and while some people in the Flickr community have objected to the proliferation of notes, LOC has decided that for now the value of the function outweighs the irritation.

Conclusion: There has been a great response to the pilot, and great user participation, learned a lot. It stimulated conversation both between users & librarians and also between librarians. The project tapped into expertise in that resides in communities of interest. It brought up issues related to presentation and engagement that can inform decisions about how materials are presented on the Library’s own web site. While there are some risks associated with jumping into the web 2.0 world, and you have to be willing to cede some control, the benefits and rewards have been terrific.

Experimenting with Slideshare

Slideshare is one of those specialized Web 2.0 creations that I hear a lot about but have never really found a use for. Like Twitter, only more time intensive and with pictures. Since I teach a lot of workshops and periodically get requests to share my slides, it seems like the kind of thing I might use and appreciate, so I’m giving it a try.

I created a new page called Workshops and Presentations to link to some of my recent workshops and presentations in Slideshare. For now there’s just one, about Creative Commons. Another on copyright, author rights, and the NIH mandate is coming soon. I just taught a workshop on Open Access as well, but the OA landscape changes so frequently that it’s already out of date.

I have doubts about the usefulness of these Slideshare presentations, especially since there’s no audio and I tend to keep my slides light on text. I am putting up handouts as well, but the talks themselves are largely improvised, and there’s no script or set of notes to share. It will be interesting to see what if any feedback these get.

The Psychology of Creative Commons: A response in two parts

Paul Courant recently posted on his blog about changing his Creative Commons license from Attribution-NonCommercial (BY-NC) to Attribution (BY). It has me thinking about the significance of the different licenses, and it also has me wondering whether I should change mine. What follows is my meandering thought process.

For reference, here’s a page that describes all the CC licenses.

Part 1: What does your Creative Commons License say about you?

In his post, Courant writes about what he believes the NonCommercial restriction signifies to others, especially to people in business. He fears that a NonCommercial license marks the person using it as “anti-commerce,” and he is not anti-commerce (he’s an economist, after all) and does not want to be perceived as anti-commerce. This is really interesting. I’ve given some thought to what different CC licenses say about the people using them, but the possibility of appearing anti-commerce hadn’t occured to me.

Some of my opinions about the different licenses have made their way into workshops I’ve taught on the subject, but I’ve never considered these judgments systematically. I decided to give it a shot:

I call the Attribution license the “really generous license.” People who use this license are basically ceding all control over their work, granting blanket permission for anyone to do anything with it, even profit-making things. I assume that Attribution people are financially stable, but I also think of them as a little bit gutsy. I associate BY with people who are very dedicated to the cause of open content.

The Share-Alike (SA) set of licenses are also associated with Free Culturites in my mind, but in a slightly different way. These people care about promoting open content, but they do so in a way that I believe is both idealistic and naive. In my experience, Share-Alike licenses can be very confusing for people not already steeped in Open Source culture, and that limits the ability of those people to use SA-licensed works. For example, I spoke to someone who thought that he couldn’t use an unaltered BY-SA-licensed photograph in a conference presentation unless he licensed the whole presentation BY-SA. He had to sign the copyright over to the conference organizers, and therefore couldn’t apply a CC license, so he thought he couldn’t use the image. Share-Alike only applies to derivative works, but that’s a notoriously hard concept for non-lawyers to understand. As a result, I see SA licensors as people who put the cause of open content above the goal of maximizing future use.

[Leigh Blackall of Otago Polytechnic talks about his take on the limits of Share-Alike in an interesting interview on the Creative Commons blog].

I don’t think much about the No-Derivatives (ND) licenses, mostly because I don’t see them very often. My impression of ND people is that they want to share, and understand the potential power of CC to extend the reach of their work, but they are afraid of losing control. No Derivatives users, especially NonCommercial-No Derivatives users, are Creative Commons dabblers.

And then there’s the Attribution-NonCommercial license, which is the one I use, and it’s my favorite. I sell this license to my classes as a nice balance between sharing your work and protecting your interests. As long as the user is non-commercial – a librarian, a fan, a student – she can do whatever she wants with your work. If the user is planning to make money, she has to ask first. You’re still free to say yes, without compensation even, but you get to decide on a case by case basis.

This formula resonates with the people in my workshops, most of whom are either university faculty or librarians. When they use a NonCommercial license, they’re essentially granting permission to people like themselves: academics, scholars, teachers. People like them, making uses like they might make, are easy to trust. Commercial users, whose motives and methods are different, can feel less trustworthy.

This brings me back to Courant’s concern that profit-making enterprises see NonCommercial license users as anti-commerce, and his implicit suggestion that NC licensors put the cause of anti-commerce above the goal maximizing future use. I realized that in my case, he’s right. I do privilege the teacher, the student, the fan. I see their uses as more valuable, more worthy of my generosity, than the profit-makers’. Is that so wrong?

Part 2: Promoting the progress

Courant’s main reason for dropping the NonCommercial restriction comes from a combination of opinions about economic theory and copyright.

If you believe, as I do, that the purpose of copyright is to “Promote the progress of science and the useful arts”, then it is more important that the work be out in the world being read, and contributing to a larger discourse, than that strangers not be able to make money from it.

I do believe, as he does, that the purpose of copyright is to promote the progress. I love promoting the progress; I do it all the time. I think universities and governments should license everything they do under CC-BY, because maximizing access to scholarly and government works is so very important. But I struggle, as an individual, especially an individual at the bottom of the professional food chain, to feel comfortable offering up my work freely to the profit-makers. I want to contribute to the larger discourse, and I want my works to be read and my photographs to be seen, I just haven’t been ready to give everything away.

But Courant makes a compelling argument:

One maximizes the influence of the work by maximizing potential uses of the work, recognizing that commercial uses have just as much power to promote progress as non-commercial uses…

Maximizing influence sounds good, too. As an individual at the bottom of the professional food chain, I think a lot about maximizing my influence. What’s more, I tell people all the time about how Creative Commons (and Open Access) can help maximize their influence, increase their impact, improve their visibility. It follows that the freer you make a work, the farther it can travel.

When it comes right down to it, the chances that anyone is going to make any money at all from this blog are tiny. My pictures on Flickr are similarly lacking in likely financial value. This exercise is entirely theoretical. However, I am called upon with some regularity to advise people on the choosing of a Creative Commons license, and unpacking my beliefs about the meanings and significances of the different options has been very helpful, for me at least. It will certainly change my standard advice about choosing a license; I used to suggest BY-NC automatically, to everyone. Now I’m more likely to push BY, especially for projects that are meant to serve as resources for a broader community, like wikis or research guides.

Me, I’m sticking with BY-NC for the moment. But watch the sidebar; I might change my mind.