Posts Tagged ‘digital records’

NoClone: Identifying and deleting duplicated records

Sunday, June 28th, 2009

My records management class, although now over a month completed, has still got me thinking in the ways of my personal file management.  My computer has about 180GB of memory in its harddrive.  While at the time–2006– it seemed like it would last forever, the advent of larger and larger files such as MP3’s and Photoshop documents have already brought me to about 50%. I finally decided to look for a program to help with the process.  Searching “duplicate files” in Google lead me to NoClone 2007, which advertises itself with the following capabilities

Downloading the 30-day trial, I decided to use it on the music files first.  I inherited the music library of one of my friends, so the content is mostly unknown and I have noticed duplicates.  My first pass resulted in 22 files to be deleted.  This freed up approximately 60MB of space, over half a GB. I was impressed.  I then sent the program in the direction of the My Documents folder, which ends up being where I put files when I’m done wth them. The trial version only lets you delete 30 files at a time and I had selected over 150 when I realized that I would have to do this in stages.

Activities such as moving files from one computer to another, using a flashdrive and editing and creating drafts are main causes of the duplicate files on my system.  Creating folders that have similar uses, different names and the same files is another.  This program shows you the name, location, file size and file type of each of the files that it believes are duplicates.   I found the file location to be most helpful, especially when choosing which of the 2 or 4 files did not need to be present. In some cases, all 4 copies were kept.  Identifying the files can be, and probably needs to be, an automated process, but the decision on what to be kept or deleted needs to be a human-made choice.

noclone_filelist

What I’ve found from this is not only do I have duplicate files, but I have duplicate folder types.  Folders I created with similar purposes, similar documents, but they don’t need to be two separate entities.  I like how simple this is. It keeps track of cumulative statistics and so far I’ve deleted over 220 MB or 2.2 GB of dulplicate files.

Selecting and Appraising Websites

Wednesday, October 8th, 2008

Our speaker today was Jonathan Nelson from the Wisconsin Historical Society.

They’re working on projects using Archive-It (a subscription service from the Internet Archive) to document relevant websites in regards to the institutions collections.  I took a look that the Archive-It’s Partners list and I see some impressive names on there.

I must insert here that I am a student with no administrative experience, so the following comments are based off of my general impressions, not from real-life.

It was mentioned that this subscription costs approximately $10,000 a year.  That includes 1/2 Gigabyte of storage, the program and the interface with customizable depth of drilling for URLs.  This seems an extravagant amount for a service such as this, especially the low amount of storage space.  With the thousands of spam bots and viruses out there that do similar crawling, the programming of the project does not seem to be the limiting factor.  Is the draw that the information is being held off-site, in a “safe” repository?

The fact that it is held off-site and made searchable by the public through the Archive-It’s website is a plus, but I’m still thinking the cost is high. Realizing that you can search any partner’s archived sites is impressive to me, in the shoes of an end-user, where I don’t have to sign in or align and limit myself to a certain repository.  Strangely enough, I don’t see Wisconsin Historical Society on the list.

Some of the caveats that were mentioned about digital archiving in this manner included costliness (of course), over-collecting, and the consideration of the importance of the material in the future.  Are institutional or individual websites just a fad?  Or do they have lasting value?  I find this a  difficult question to grapple with.  The internet, for the most part, has been available in the home for a little over 10 years now.  Since then, the cost of a web domain on which to host pages has continued to decrease, to where it can be as low as $10/year (That is about what the www.thenovicearchivist.com domain was purchased for).

How does an archivist know who’s stuff to capture?  Often times many famous writers are not known until they’ve passed away.  By the time an archive is aware of their existence the website domain probably will have been resold.

Website capturing and archiving seems just as sticky as something like sound or video archiving, where the migration (or emulation) of the media will incur annual expenses. Whereas paper records can be processed and stored, with the only annual costs are the storage and proper conditions (and perhaps later on, digitization, but that is a whole different bag of worms).

Digital records: Emulate or migrate?

Saturday, October 4th, 2008

I am not a programmer and I don’t necessarily consider myself “techy,” even though some of my friends tell me I am. Listening to an electronic records archivist speak in class that other day talk about the emulation or migration question, I asked what his preferred method was. His answer of “I am a records person, format isn’t important to me,” didn’t satisfy me.

Image courtesy of http://www.fotosearch.com

Image courtesy of http://www.fotosearch.com

In my understanding, “emulate” is an intermediary program which mimics the original operating system or system requirements in which the file was created was designed to run in. “Migrate” involves reformatting a file/record/program into a newer version of a similar type of software.

My opinion: Emulation would allow for other files that may come into the repository’s possession to also be read and accessed. Migration, it seems, is a temporary fix to a persisting issue and if started, migration will need to continue, due to the ever-changing software that is produced.

Perhaps a completely different approach is needed. Trying to use non-proprietary methods of storing information (which would be one of those cases where the archivist needs to be involved in the planning stages of an organization’s records program). For example: If one would like to access an image saved as a .PSD, they would need a version of Adobe PhotoShop. But PhotoShop allows you to save image files as any number of other formats. A format such as .JPG, .TIF or .BMP can be opened by many different types of image editing and viewing programs.

If emulators can be created to mimic programs, I would say that I am more in favor of the emulator instead of the perpetual cycle of migrating.