Monday, March 02, 2015

Everyday Digital Archives Q&A: Richard Pearce-Moses

What digital archives-related resources do you read--blogs, social media, articles, journals, listservs, etc.?

Lots of social media is noisy and demands a lot of time to follow, so I don’t use it too much. I’ve largely abandoned Facebook.  I check Twitter once or twice a day, but I limit the people I follow to individuals in the field.  Their posts are interesting and useful, because they point me to something I wasn’t aware of and wouldn’t have thought to search.   Charles Bailey (@DigitalKoans) links to interesting reports and jobs.  LC’s National Digital information Infrastructure and Preservation Program (@ndiipp) has lots of news.  Martin Hawksey (@mhawksey) and Trevor Owens (@tjowens) often link to interesting projects, but often they’re very technical and over my head.

I read The Signal, published by Library of Congress (blogs.loc.gov/digitalpreservation).  Twitter feeds often point me to a blog entry that I’m interested in.

Tech magazines often have lots of really interesting things that are relevant to digital archives, although the connection may be a bit tangential.  I really like Ars Technica and Wired.

Do you actively curate or archive your own personal digital materials? If so, how?

I don’t know that my personal papers are that interesting or merit selection by an archives.  Much of my work is captured in publications, ranging from chatty articles for Archival Outlook to research in professional journals and to a few monographs.  I have identified a few records (mostly email) from when I was president of SAA that may be of importance to the its archives.  I did grab a copy of all the email from the Archives and Archivists list from 1998 to 2006 when there was some question as to its fate.  I’ve used it to study the history of the profession, and I think it would be useful to others.

On the more personal side, my life has been good, but not extraordinary.  My personal records wouldn’t add much or be a source of fascinating research – or even a boring dissertation. My family is small, and they don’t need a terabyte of travel photos, email, and receipts from Amazon to keep my memory alive.  A few photos will probably be enough.

I am pretty religious about keeping things backed up, but that’s not an archives.  I don’t have a routine to keep backups off site, so if the house burns, I might lose both my computer and backup drive.  However, I use cloud storage for essential records. 

I’ve migrated a few important files as software became obsolete.  However, I have a lot of email in client applications (Eudora, WinMail) that Outlook couldn’t import.  My partner has email from a Compuserv account that requires proprietary software; I kept the software, but I haven’t tried to recover those files. 

Why is curating or archiving your own personal digital materials important?

As I noted above, I don’t see great value in my personal records.  I’ve inherited some records from family members.  I was pretty serious about appraisal and threw out many that were unidentified or redundant.  I’ll add a few of mine to that collection and pass it on to relatives. 

I try to keep enough records to remember people and events that were important in my life.  I have a digital picture frame with a few dozen images of people and events.  In many ways, that small collection captures the highlights of my life, without documenting the drudgery.

“Won't personal digital archiving solve itself as the digital generation comes of age?” Your thoughts?
**To give credit where credit is due, this question is taken from Catherine Marshall’s “Rethinking Personal Digital Archiving, Part 1” (http://www.dlib.org/dlib/march08/marshall/03marshall-pt1.html)

The problems of archives won’t change because those problems aren’t tied to technology.  They’re people problems.  Who will decide what to save?  Who will reach out to find collections that can be acquired for archives that are accessible to a larger public? 

Finding ways to get people to attend to their personal records for future use is always going to be a problem.  Most people live in the present, and their lives are busy.  Getting them to think about building and preserving a collection for future use can easily be put off a day, a month, a year.  To use myself as an example, I never thought too much about my own family’s archives until this year, when my mother-in-law passed and I was then part of the oldest surviving generation in my family.  Like many, my family relied on the elders to bear the torch of memory and preserve the past, and now it’s my turn. 

It may very well be that the digital generation will have the skills to preserve digital records.  At the same time, many members of the digital era are sophisticated consumers of  technology.  They can use existing tools, but they’re stymied when confronted with something out of the ordinary.  Unless there are tools readily available for extracting email from older formats, for reading archaic formats, those materials may yet be lost.

Due to the distributed nature of personal digital archives, (i.e. content of an individual all over the web in different arenas: Facebook, Twitter, blogs, etc.) how should archivists approach the challenge of acquiring these dispersed digital materials? Are there tools to help?

A good example of why some problems of digital archives will never go away. Archivists will need to reach out to a wide range of people, seeking individuals who will help archivists build collections.  The key difference is that archivists no longer have the luxury of time.  Previously, personal records were often donated to an archives after an individual passed away.  The family would look at their diaries, letters, and photos.  They’d think “history”, and the records would be donated. 

Archivists can’t wait until people pass away to collect their records.  Currently, they may not be able to get records because they family doesn’t have passwords and social media sites won’t grant access.  Even if they have a password, it’s likely that much would be lost over time.  Facebook may be pervasive now, but I will be surprised if it’s as popular in ten years.  Would an archivist in 2050 be able to capture Facebook posts of someone living now?  Archivists need to identify people and groups that the future will want to know about, to understand, to study, and then approach them about capturing records now.

Martin Hawksey has a nifty tool to capture Twitter feeds using freely available Google tools that aren’t that hard to implement (see http://mashe.hawksey.info/2013/02/twitter-archive-tagsv5/).  Facebook allows individuals to download their pages, but that will require cooperation of the page owner.  I firmly believe in harvesting web content relevant to an archives’ mission, including blogs.  The Internet Archive’s Archive-It is probably the best service out there, but it’s not free.  However, HTTrack is better than nothing.  One of the challenges is finding web content that’s in scope.  I developed a methodology to do that when I was in Arizona, but the tools to help automate that process are no longer available.  I’m in the process of building new tools (see http://arstweb.clayton.edu/domainid/ to read about the project and follow my progress).

What can we do as archivists to change the culture of “benign neglect” that people so often have in regards to their personal digital records?  How do you see people accessing personal digital records/archives in the future? 10 years? 20 years?

To avoid benign neglect – in a word – quit neglecting things.  In a sense, personal collections are not archives. People have collections of personal records that they have saved – often by happenstance – and not through considered appraisal.  Scrapbooking is an example of a conscious effort to save select items or tell a story, but I don’t know that the process is sufficiently systematic or complete to tell the whole story.

Archives are more than a bunch of old records.  They are in a secure place (although copies in the cloud may provide greater protection than a hard drive or DVD).  Most important, though, archives are a program to appraise, process, provide access, and support long-term preservation.  Records are selected to capture context, to document the richness of history – not a rosy remembrance.  Records are arranged and described to ensure that enough information is captured to make the records meaningful – identifying photographs, documenting family trees, noting important life events.  Records are preserved to make sure that the whole of the collection is preserved, and history is not documented by things that survived by chance.

I don’t want to discount collections of personal records because they don’t match up with a theoretical model.  Records that tell an important story of the past are valuable to families – and
some of these records should be acquired by formal archives to make them more widely accessible.  Archivists must find new ways to connect with those people to acquire records before preservation is an insurmountable barrier and bring them into collections where they will be properly cared for.  Without them, we will not have records that tell a complete, accurate, and authentic story of the past. 

A call for archives to be actively engaged in acquiring personal records is no small task.  But adapting sage advice from Fynnette Eaton, “Whatever we do, we may fail.  But if we do nothing, failure is guaranteed.”  Better that we make some effort and capture some piece of this history than to leave the future with none.
-------
Thanks to Richard for sharing his insights!  Want to volunteer to be interviewed for our Q&A blog posts? Know a digital records steward we should interview? Let us know: outreach [at] soga [dot] org.

Wednesday, January 21, 2015

Everyday Digital Archives Q&A: Seth Shaw

In our latest installment of our Everyday Digital Archives Q&A, Seth Shaw, assistant professor in the Master of Archival Studies program at Clayton State University, reminds us to be patient with ourselves and take comfort in the fact that we are not alone in the process of figuring out how to manage digital records.  Read on to learn about Seth’s reflections on digital archives and “glancing into the cloudy crystal ball” that is the future (where digital archives are concerned).

What digital archives-related resources do you read--blogs, social media, articles, journals, listservs, etc.?

There are a lot of blogs that drop into my RSS reader although their posting frequency are highly variable and I often just skim unless something really strikes me.
·        Agogified
·        Archives in the Digital Era
·        Engineering the Future of the Past
·        Future Proof
·        HangingTogether
·        Practical E-Records
·        The Signal: Digital Preservation
·        Web Science and Digital Libraries Research Group

There are a number of journals I will scan the contents of looking for interesting items but the two that I look to specifically for digital-archives would be D-Lib & Code4Lib.

For listservs I follow (again, posting frequency varies):
·        SAA: Electronic Records Section
·        SAA: Metadata & Digital Object Section
·        Digital-curation@googlegroups.com
·        Digital-preservation@googlegroups.com

What advice would you give to an archivist who is nervous to start tackling digital archives?  

First, be patient with yourself. None of us are born with the necessary technical skills; all of us learned at some point. You can too. Also recognize that you don’t need to know everything (none of us do). Focus on the technology that will meet your needs and learn as you go. Technology skills are best learned in tandem with a project.

Second, start small. It is easy to be overwhelmed. Starting with a small project allows you to start familiarizing yourself with the terminology and concepts in addition to building confidence.  Generally the best small project to begin with is inventorying the digital materials you have (or would like to acquire). Archivists are already familiar with this type of activity and it can help you identify the scope of issues you need to deal with. Then you can decide, based on that inventory, on your next small project. Don’t let a lengthy list of projects & problems intimidate you. Simply focus on the next one. Progress one step (project) at a time.

Third, take comfort that you are not alone in this process. More and more archivists with no previous technical training are taking their first steps on this journey and making good progress. Every time I go to SAA I hear new stories of archivists tackling digital archives projects and succeeding. Solutions may not be “ideal” but progress is better than avoidance and neglect. We all are trying to improve. There is no perfect program. Find colleagues that can serve as a support, that either are facing, or have dealt with your challenges as well.

Do you actively curate or archive your own personal digital materials? If so, how?

In a way. I don’t generate PREMIS metadata for my digital materials. I don’t have a finding-aid or inventory. I certainly haven’t installed a digital repository to store my files. Rather, I simply try to be a good personal records manager. I try to keep my materials organized and I tend to prefer open formats for my own materials when practical. (I usually generate PDF/A versions of any document-like files I distribute to others.) I am also generally considerate about how my data moves from one system to another. Sure, some of my materials could be better backed up and/or could have better descriptive metadata (e.g. our family photos & videos which are organized by year & date stored on a pair of external drives). But I am generally confident that an archivist (or even my family) could take custody of my files just fine.

Facebook though, that is another issue…

Why is curating or archiving your own personal digital materials important?

The fact is that there is a good amount of materials that I have lost from the past because I wasn’t always concerned about it. For example, I don’t have any of my email from my undergraduate email account or the personal email accounts predating Gmail. I am sure there are a number of floppies and CDs I have lost over the years (I couldn’t read them now anyway). It wasn’t until I began learning about good records management principles that my habits began to change.

Do your personal digital archives exist outside of the virtual/online environment? In what form?

Kind of. I have a number of articles I have printed and annotated. I also have some paperwork (e.g. travel reimbursements) that are part analog and part digital but they have relatively short retention periods.

The only digital materials you might find in an archives (family or otherwise) that might end up duplicated as analog are our family photos. One of these days I am going to start creating photo-albums to have printed but I haven’t done it yet.

“Won't personal digital archiving solve itself as the digital generation comes of age?” Your thoughts?
**To give credit where credit is due, this question is taken from Catherine Marshall’s “Rethinking Personal Digital Archiving, Part 1” (http://www.dlib.org/dlib/march08/marshall/03marshall-pt1.html)

Probably not. Has personal “analog archiving” solved itself as the analog generation came of age? That question doesn’t make sense because both notions are flawed for the same reason. The “digital generation” may have more experience engaging technology, generally speaking, but this does not equate effective personal recordkeeping nor understanding how technology works. Besides, new information technologies are being developed faster than generations come of age. Archivists will have a hard enough time keeping up. There will always be a new challenge to face.

Due to the distributed nature of personal digital archives, (i.e. content of an individual all over the web in different arenas: Facebook, Twitter, blogs, etc.) how should archivists approach the challenge of acquiring these dispersed digital materials? Are there tools to help?

Yes, I am sure I have data either created by or about me in systems all over the internet. (A quick google search for “Seth E Shaw” found profiles on Drupal.org, SlideShare, Delicious, MacOSX.com, and archives2014.sched.org.) Do any of these actually matter? Maybe. I still use only one of them. (SlideShare, and there is nothing unique there except the view counts.) The Delicious account, and possibly the Drupal one, might be of interest to future researchers but most likely not to my family. My Facebook and Twitter accounts, among other web-based accounts, didn’t come up at all.

Here’s the thing: we all leave digital traces but it would be nigh impossible and certainly impractical to gather them all. The salient questions are appraisal questions. Which of these traces are important, to whom, and why? Answer these questions, at least broadly, on a case-by-case (e.g. collection-by collection or donor-by-donor) basis first. Only then can you begin to make decisions about appropriate capture tools.

Web-harvesters (e.g. HTTrack, WGet, and Heretrix) are the general purpose tool for this task. They work fine in some cases, but not all. Unlike web 1.0, which relied on the HTML standards making them easy to capture, Web 2.0 additionally makes heavy use of JavaScript & custom APIs (Advanced Programming Interfaces) which are often poorly captured and may require different tactics. API-based captures can be effective but these interfaces are not standardized and require custom coding for each one (although some tools, such as ThinkUp, will support multiple APIs).


What can we do as archivists to change the culture of “benign neglect” that people so often have in regards to their personal digital records?

I think the Library of Congress’ efforts in establishing Personal Digital Archiving Days is a good model to follow. It will take public awareness if things are to change. This effort could be expanded and taken further. We, as a profession, could reach out to the popular press and blogosphere to educate their readers about this topic. We occasionally see these types of pieces written and I occasionally come across online forums or blog posts discussing these issues. These conversations are occurring even if we aren’t starting them but they could be made broader. Of course, and unfortunately, even with the awareness it often takes personal loss for the lessons to sink in.

How do you see people accessing personal digital records/archives in the future? 10 years? 20 years?

If there is one thing I have learned it is that the technology crystal ball is awfully cloudy. Previous trends pointed towards larger volumes of personal data held locally. You could have an entire library in your pocket! This came true although few, if any, expected it to converge with our phones. Also, most of us are far more likely to not have the library on our phone, but to use the portable connectivity technology to access the library remotely. Instead of using the device as a large personal repository it has become a thin-client accessing remote repositories. We saw part of this coming, but not as it actually occurred. Science-Fiction has often been pointed to as a predictor of future trends, but I think that is more of a function of throwing hundreds of darts at a dart board. Some are going to hit closer to the center than others and there is a chance someone will hit the bull’s-eye. I am not inclined to bet on which.

Glancing into the cloudy crystal ball does reveal a few cloudy figures: First, semantic, natural language processing, and physical context awareness technologies have the capability to influence the nature of search and browsing. Second, wearable technology is likely to influence access for whim-based inquiries although more serious reminisce or research inquiries will probably be influenced by developments in recreational and working-styles (fixed and mobile). Finally, there is a trend towards more data-centric, rather than document centric, activities (e.g. personal health tracking). How will all this play out in 10 or 20 years? I haven’t the foggiest—I’m just casually throwing darts—but it will be fun to see!

Thanks to Seth for sharing his insights! Want to volunteer to be interviewed for our Q&A blog posts? Know a digital records steward we should interview? Let us know: outreach [at] soga [dot] org.

Monday, December 22, 2014

Everyday Digital Archives Q&A: Erika Farr

After the rich discussions about digital records that were shared at the annual Society of Georgia Archivists meeting in Athens last month, the SGA Outreach Managers are resuming their "Everyday Digital Archives" Q&A blog posts (after a bit of a lull).  In this fourth installment of our Q&A blog posts, we continue the conversation about everyday digital archives with Erika Farr, Head of Digital Archives at Emory’s Manuscript, Archive, and Rare Book Library.

What digital archives-related resources do you read--blogs, social media, articles, journals, listservs, etc.?

There are some really helpful web resources in a range of formats including blogs, reports, white papers, and project websites.  A selection of highlights from each of these categories would include the Library of Congress’s The Signal (http://blogs.loc.gov/digitalpreservation/) and Chris Prom’s Practical E-Records (http://e-records.chrisprom.com/); a number of recent CLIR reports (http://www.clir.org/pubs/reports) and many of the Digital Preservation Coalition Technology Watch Reports (http://www.dpconline.org/advice/technology-watch-reports); and project/institution websites such as BitCurator (http://www.bitcurator.net/) and the Born Digital Archives program at the Hull History Centre (http://www.hullhistorycentre.org.uk/discover/hull_history_centre/about_us/born_digital_archives/work_in_progress.aspx). In addition, I rely on resources such as the AIMS White Paper (http://www.digitalcurationservices.org/aims/white-paper/) and the OCLC publications included in their Demystifying Born Digital (http://oclc.org/research/activities/borndigital.html).

What advice would you give to an archivist who is nervous to start tackling digital archives?  

If you are feeling nervous, find some test floppies, hardware and/or data so you can begin experimenting with new tools and workflows, without the stress of working on collection material. Beginning to understand the tools and practice can simplify the process and make someone new to the field feel more comfortable.

Also, the important exercise of counting what you already have in the collection can be both productive and familiar. Creating an inventory of existing born-digital content is a crucial first step and requires little more than effort and consistent documentation.

Do you actively curate or archive your own personal digital materials? If so, how?
Why is curating or archiving your own personal digital materials important?
Do your personal digital archives exist outside of the virtual/online environment? In what form?

Within my own personal digital archives, I have focused on curating digital photographs more than anything else. I use on online cloud storage service (JustCloud) to back up my personal files, mainly my photo library and my financial documentation. Otherwise, I have not curated my personal email correspondence or social media accounts. I should probably think more about social media curation, especially since there is information in applications like Facebook that probably doesn’t exist anywhere else.

As for importance, I have unconsciously prioritized my own born-digital content, by actively curating the digital photography and financial/tax documentation while largely ignoring all other content. Because I print so few photographs, my photo library is probably the single most important born-digital collection.

“Won't personal digital archiving solve itself as the digital generation comes of age?” Your thoughts?
**To give credit where credit is due, this question is taken from Catherine Marshall’s “Rethinking Personal Digital Archiving, Part 1” (http://www.dlib.org/dlib/march08/marshall/03marshall-pt1.html)

I don’t think all the complications and challenges of personal digital archiving will sort themselves out as engagement with social media and mobile devices becomes a cultural standard, though I do think some basic habits of data back-up will become more pervasive, even if in a passive way. Already, the use of applications on multiple devices via cloud storage allows users to access and synchronize their data in numerous ways and using various devices. These shared applications across devices mean that data is less likely to be lost through device crash or loss, thus, making the data more persistent. Having all of your data managed by a for-profit mobile application developer has its perils, too, of course, and there could be some hard future lessons on data loss when an application is no longer supported or an applications developer goes bust.

Due to the distributed nature of personal digital archives, (i.e. content of an individual all over the web in different arenas: Facebook, Twitter, blogs, etc.) how should archivists approach the challenge of acquiring these dispersed digital materials? Are there tools to help?

Emerging tools like ArchiveSocial (http://archivesocial.com/) and ePADD (http://library.stanford.edu/spc/more-about-us/projects-and-initiatives/epadd-project) offer us hope for more effective means of identifying, transferring, and managing social media and electronic correspondence. Neither of these services is ready for archives and libraries as they acquire personal digital archives, as of yet, so for now little is immediately helpful other than engaging with the donor.  My preferred approach now is to use survey tools and pre-acquisition efforts to identify relevant material and accounts for transfer then working with the donor to effectively transfer that data.

What can we do as archivists to change the culture of “benign neglect” that people so often have in regards to their personal digital records?

As a profession, we need to find a way to talk about donor’s personal digital archives that doesn’t ratchet up anxiety. In my experience, conversations with donors can be anxious for two different reasons: one, over the course of the conversation the donor gets spooked by the breadth of the data transfer and worries about potential exposure; and/or, two, the technical nature of the conversation overwhelms the donor prompting him or her to shrug off local preservation tactics because they seem out of reach. We need to find easy-to-use tools and applications that our donors can use and then introduce them in understandable, reassuring ways.

How do you see people accessing personal digital records/archives in the future? 10 years? 20 years?

I think it will depend on the records and the institution that holds them. Some records will demand different types of researcher access because of their format or because of the nature of the research questions likely to be brought to bear on the material. Furthermore, I worry that there will be real differences in how institutions can manage and provide access to born-digital records based on available local resources and infrastructure.  Because digital material is so easily disseminated and shared, I hope to see much more web-based access to born-digital records over the next decade or two. There needs to be a real effort made to develop means of virtual access that allows researchers full access to content and the tools they need to leverage that content without violating copyright and intellectual property law. Such advances will require much in the way of technical innovation, policy advancement, and legal advocacy, though, so I don’t expect such tools and access to appear on the scene without considerable consolidated effort.

Thanks to Erika for sharing her insights! Want to volunteer to be interviewed for our Q&A blog posts? Know a digital records steward we should interview? Let us know: outreach [at] soga [dot] org.

Friday, October 03, 2014

Space still available in pre-annual meeting workshops!

Register now for one of SGA's pre-annual meeting educational workshops, taking place Wednesday, November 5 in Athens, GA!

Disaster  Recovery: Wet Salvage Techniques
9:00 - 5:00 p.m.
Every collection is susceptible to damage from water – whether from ground floods or from above through leaks, fire suppression systems, or the water used to put out a fire. This hands-on workshop will give participants experience in dealing with the aftermath of waterlogged materials. Experienced collections conservator, Ann Frellsen, will share her experiences and train you on how to recover collections after the water recedes.
Participants will learn the importance of team effort in how to manage a response, from safety to public relations through recovery of materials after a water incident. In addition, participants will gain hands-on experience in the salvage of typical library and archive collections materials. A wrap-up session includes time for questions, especially regarding specific collections or situations.
Lunch will be on your own in Athens from 11:30 to 1:00.

Registration fees are $75.
Attendance is limited to 20.
To register for this course, click here.

Accessioning and Ingest of Electronic Records [DAS course]
9:00 - 5:00 p.m.
Perhaps your institution has found itself in a situation where a prominent donor has offered a trove of significant Office documents and digital photographs stored on her hard drive; or, an important department is ready to transfer records of long-term value from a file server to the archives; or, a professor drops off an external hard drive and DVDs with video footage from a symposium featuring nationally recognized participants….

If you were unprepared or unsure of how to handle such a donation, this one-day course will introduce you to basic policies, resources, and procedures that will enable your institution to successfully accession and ingest common born-digital materials (Office documents, PDFs, images, audio, video, and email).

Upon completion of this course you’ll be able to:
  • Discuss current practices and resources; and
  • Develop policies and workflows best suited to your institution’s mission and resources.
Registration - Early-Bird/Regular
SAA Member $199 / $269
Employees of Member Institutions $229 / $299
Nonmember $259 / $319

Early-bird registration ends October 5, 2014.

Attendance limited to 30. You may be asked to bring a laptop to successfully participate in this course.

For more information on this course and to register, click here.

This course is co-sponsored by the Society of American Archivists.