Wednesday, January 21, 2015

Everyday Digital Archives Q&A: Seth Shaw

In our latest installment of our Everyday Digital Archives Q&A, Seth Shaw, assistant professor in the Master of Archival Studies program at Clayton State University, reminds us to be patient with ourselves and take comfort in the fact that we are not alone in the process of figuring out how to manage digital records.  Read on to learn about Seth’s reflections on digital archives and “glancing into the cloudy crystal ball” that is the future (where digital archives are concerned).

What digital archives-related resources do you read--blogs, social media, articles, journals, listservs, etc.?

There are a lot of blogs that drop into my RSS reader although their posting frequency are highly variable and I often just skim unless something really strikes me.
·        Agogified
·        Archives in the Digital Era
·        Engineering the Future of the Past
·        Future Proof
·        HangingTogether
·        Practical E-Records
·        The Signal: Digital Preservation
·        Web Science and Digital Libraries Research Group

There are a number of journals I will scan the contents of looking for interesting items but the two that I look to specifically for digital-archives would be D-Lib & Code4Lib.

For listservs I follow (again, posting frequency varies):
·        SAA: Electronic Records Section
·        SAA: Metadata & Digital Object Section
·        Digital-curation@googlegroups.com
·        Digital-preservation@googlegroups.com

What advice would you give to an archivist who is nervous to start tackling digital archives?  

First, be patient with yourself. None of us are born with the necessary technical skills; all of us learned at some point. You can too. Also recognize that you don’t need to know everything (none of us do). Focus on the technology that will meet your needs and learn as you go. Technology skills are best learned in tandem with a project.

Second, start small. It is easy to be overwhelmed. Starting with a small project allows you to start familiarizing yourself with the terminology and concepts in addition to building confidence.  Generally the best small project to begin with is inventorying the digital materials you have (or would like to acquire). Archivists are already familiar with this type of activity and it can help you identify the scope of issues you need to deal with. Then you can decide, based on that inventory, on your next small project. Don’t let a lengthy list of projects & problems intimidate you. Simply focus on the next one. Progress one step (project) at a time.

Third, take comfort that you are not alone in this process. More and more archivists with no previous technical training are taking their first steps on this journey and making good progress. Every time I go to SAA I hear new stories of archivists tackling digital archives projects and succeeding. Solutions may not be “ideal” but progress is better than avoidance and neglect. We all are trying to improve. There is no perfect program. Find colleagues that can serve as a support, that either are facing, or have dealt with your challenges as well.

Do you actively curate or archive your own personal digital materials? If so, how?

In a way. I don’t generate PREMIS metadata for my digital materials. I don’t have a finding-aid or inventory. I certainly haven’t installed a digital repository to store my files. Rather, I simply try to be a good personal records manager. I try to keep my materials organized and I tend to prefer open formats for my own materials when practical. (I usually generate PDF/A versions of any document-like files I distribute to others.) I am also generally considerate about how my data moves from one system to another. Sure, some of my materials could be better backed up and/or could have better descriptive metadata (e.g. our family photos & videos which are organized by year & date stored on a pair of external drives). But I am generally confident that an archivist (or even my family) could take custody of my files just fine.

Facebook though, that is another issue…

Why is curating or archiving your own personal digital materials important?

The fact is that there is a good amount of materials that I have lost from the past because I wasn’t always concerned about it. For example, I don’t have any of my email from my undergraduate email account or the personal email accounts predating Gmail. I am sure there are a number of floppies and CDs I have lost over the years (I couldn’t read them now anyway). It wasn’t until I began learning about good records management principles that my habits began to change.

Do your personal digital archives exist outside of the virtual/online environment? In what form?

Kind of. I have a number of articles I have printed and annotated. I also have some paperwork (e.g. travel reimbursements) that are part analog and part digital but they have relatively short retention periods.

The only digital materials you might find in an archives (family or otherwise) that might end up duplicated as analog are our family photos. One of these days I am going to start creating photo-albums to have printed but I haven’t done it yet.

“Won't personal digital archiving solve itself as the digital generation comes of age?” Your thoughts?
**To give credit where credit is due, this question is taken from Catherine Marshall’s “Rethinking Personal Digital Archiving, Part 1” (http://www.dlib.org/dlib/march08/marshall/03marshall-pt1.html)

Probably not. Has personal “analog archiving” solved itself as the analog generation came of age? That question doesn’t make sense because both notions are flawed for the same reason. The “digital generation” may have more experience engaging technology, generally speaking, but this does not equate effective personal recordkeeping nor understanding how technology works. Besides, new information technologies are being developed faster than generations come of age. Archivists will have a hard enough time keeping up. There will always be a new challenge to face.

Due to the distributed nature of personal digital archives, (i.e. content of an individual all over the web in different arenas: Facebook, Twitter, blogs, etc.) how should archivists approach the challenge of acquiring these dispersed digital materials? Are there tools to help?

Yes, I am sure I have data either created by or about me in systems all over the internet. (A quick google search for “Seth E Shaw” found profiles on Drupal.org, SlideShare, Delicious, MacOSX.com, and archives2014.sched.org.) Do any of these actually matter? Maybe. I still use only one of them. (SlideShare, and there is nothing unique there except the view counts.) The Delicious account, and possibly the Drupal one, might be of interest to future researchers but most likely not to my family. My Facebook and Twitter accounts, among other web-based accounts, didn’t come up at all.

Here’s the thing: we all leave digital traces but it would be nigh impossible and certainly impractical to gather them all. The salient questions are appraisal questions. Which of these traces are important, to whom, and why? Answer these questions, at least broadly, on a case-by-case (e.g. collection-by collection or donor-by-donor) basis first. Only then can you begin to make decisions about appropriate capture tools.

Web-harvesters (e.g. HTTrack, WGet, and Heretrix) are the general purpose tool for this task. They work fine in some cases, but not all. Unlike web 1.0, which relied on the HTML standards making them easy to capture, Web 2.0 additionally makes heavy use of JavaScript & custom APIs (Advanced Programming Interfaces) which are often poorly captured and may require different tactics. API-based captures can be effective but these interfaces are not standardized and require custom coding for each one (although some tools, such as ThinkUp, will support multiple APIs).


What can we do as archivists to change the culture of “benign neglect” that people so often have in regards to their personal digital records?

I think the Library of Congress’ efforts in establishing Personal Digital Archiving Days is a good model to follow. It will take public awareness if things are to change. This effort could be expanded and taken further. We, as a profession, could reach out to the popular press and blogosphere to educate their readers about this topic. We occasionally see these types of pieces written and I occasionally come across online forums or blog posts discussing these issues. These conversations are occurring even if we aren’t starting them but they could be made broader. Of course, and unfortunately, even with the awareness it often takes personal loss for the lessons to sink in.

How do you see people accessing personal digital records/archives in the future? 10 years? 20 years?

If there is one thing I have learned it is that the technology crystal ball is awfully cloudy. Previous trends pointed towards larger volumes of personal data held locally. You could have an entire library in your pocket! This came true although few, if any, expected it to converge with our phones. Also, most of us are far more likely to not have the library on our phone, but to use the portable connectivity technology to access the library remotely. Instead of using the device as a large personal repository it has become a thin-client accessing remote repositories. We saw part of this coming, but not as it actually occurred. Science-Fiction has often been pointed to as a predictor of future trends, but I think that is more of a function of throwing hundreds of darts at a dart board. Some are going to hit closer to the center than others and there is a chance someone will hit the bull’s-eye. I am not inclined to bet on which.

Glancing into the cloudy crystal ball does reveal a few cloudy figures: First, semantic, natural language processing, and physical context awareness technologies have the capability to influence the nature of search and browsing. Second, wearable technology is likely to influence access for whim-based inquiries although more serious reminisce or research inquiries will probably be influenced by developments in recreational and working-styles (fixed and mobile). Finally, there is a trend towards more data-centric, rather than document centric, activities (e.g. personal health tracking). How will all this play out in 10 or 20 years? I haven’t the foggiest—I’m just casually throwing darts—but it will be fun to see!

Thanks to Seth for sharing his insights! Want to volunteer to be interviewed for our Q&A blog posts? Know a digital records steward we should interview? Let us know: outreach [at] soga [dot] org.