Posts Tagged ‘electronic records’

NoClone: Identifying and deleting duplicated records

Sunday, June 28th, 2009

My records management class, although now over a month completed, has still got me thinking in the ways of my personal file management.  My computer has about 180GB of memory in its harddrive.  While at the time–2006– it seemed like it would last forever, the advent of larger and larger files such as MP3’s and Photoshop documents have already brought me to about 50%. I finally decided to look for a program to help with the process.  Searching “duplicate files” in Google lead me to NoClone 2007, which advertises itself with the following capabilities

Downloading the 30-day trial, I decided to use it on the music files first.  I inherited the music library of one of my friends, so the content is mostly unknown and I have noticed duplicates.  My first pass resulted in 22 files to be deleted.  This freed up approximately 60MB of space, over half a GB. I was impressed.  I then sent the program in the direction of the My Documents folder, which ends up being where I put files when I’m done wth them. The trial version only lets you delete 30 files at a time and I had selected over 150 when I realized that I would have to do this in stages.

Activities such as moving files from one computer to another, using a flashdrive and editing and creating drafts are main causes of the duplicate files on my system.  Creating folders that have similar uses, different names and the same files is another.  This program shows you the name, location, file size and file type of each of the files that it believes are duplicates.   I found the file location to be most helpful, especially when choosing which of the 2 or 4 files did not need to be present. In some cases, all 4 copies were kept.  Identifying the files can be, and probably needs to be, an automated process, but the decision on what to be kept or deleted needs to be a human-made choice.

noclone_filelist

What I’ve found from this is not only do I have duplicate files, but I have duplicate folder types.  Folders I created with similar purposes, similar documents, but they don’t need to be two separate entities.  I like how simple this is. It keeps track of cumulative statistics and so far I’ve deleted over 220 MB or 2.2 GB of dulplicate files.

Hot linking images, e-archiving and other archival concepts

Wednesday, February 11th, 2009

Hot linking to a photo is much like tagging it for your own use.  It may prevent copyright/citation issues, but it create other problems. Think about this:

  • It could allow someone to track the usage to you.
    While this may be negative or positive, it is something to consider.  There is a link between their photo and their site and your site, even if it is a one-way link.  Improper use of an image and your link could create issues for you.
  • What if the admin of the site changes the photo, but uses the same location name?
    Your purpose for linking might not make sense with the image that now appears.  The new image may be wholly inappropriate or controversial.
  • Bandwidth
    While hot-linking in itself does not take up a lot of bandwidth, many people doing it to the same place does.  The host of the site ends up hosting the image on the many other sites that connect to it and that can cause traffic issues, as well as incur costs to the hosting entity.  In general, it is poor etiquette from one site creator to another.

What is my solution on this site?
I save images to my hard drive and then give credit to the image if I haven’t provided a link to the site or to the individual image.  One reason I have embraced this practice with more earnest is due to limitations in some locations about viewing.  Companies with certain restrictions on websites cannot view the images which appear through a hot link only. Having the image hosted by my site ensures that if they can view the site they can view the images posted with it.

What e-archival implications does this have? Why am I tagging this with electronic records?
This again stems from the DIRKS project and my records management class.  As part of my file structure, one could see My Pictures–>Not Mine–>Archives Blog.  I save the images to this folder to upload to the sire.  The questions of copyright and image credit do concern me, especially having artistic photos of my own, and I prefer to give credit where credit is due.

The archival-or perhaps this falls more under records management-issue is that of origination.  Where did it come from?  Who created it? Is it safe?  Looking through the picture folder, I right-click on the images and some have a warning: “This file came from another computer and may be blocked to protect your computer.”  I haven’t seen that before, so it helps with security, but it does not tell us the author or purpose for its creation.  Just because its on my hard drive does not mean that I’m the creator, but it is definitely evidence of my activities as a blogger.

Me, as a blogger.  I also realized that my file structure is flawed.  With the Archives Blog photos also in the Not Mine folder does not account for photos that ARE mine that I wish to put in the blog.  Slightly semantic, though.

Me, as a blogger. I also realized that my file structure is flawed. With the Archives Blog photos also in the Not Mine folder does not account for photos that ARE mine that I wish to put in the blog. Slightly semantic, though.

Afterthought:  I think this is the first example where I really, truly grasp the meaning of evidential value.  The previous entry includes an image of the PARADIGM logo.  It is not my creation, but I use it was illustration in my blog and it is evidence of that activity. Yay!

PARADIGM and personal organizational structure

Monday, February 9th, 2009

For the DIRKS project for the Records Management class, it was suggested that we run a command on the C:\ which would list out the directory structure in tree form.  This is a really neat thing to see! Instructions can be found here on the PARADIGM site (or click logo).

paradigm

As they say, doing the entire C drive can take a long time.  It also results in files that are completely unnecessary.  After the file created itself, I spent a solid 10 minutes checking out my files and noticing some redundancies that can be eliminated. I then spent the next HALF HOUR deleting the program files so that I wouldn’t send a freaking huge .txt file to my group mates.  Even after that, it is still a 3+ MB text file.

Its quite a cool trick to know, but there is a caveat: This is invasive, as it shows you everything down to the file level.  Knowledge is power and archivists and records managers have access to tons of it.

Selecting and Appraising Websites

Wednesday, October 8th, 2008

Our speaker today was Jonathan Nelson from the Wisconsin Historical Society.

They’re working on projects using Archive-It (a subscription service from the Internet Archive) to document relevant websites in regards to the institutions collections.  I took a look that the Archive-It’s Partners list and I see some impressive names on there.

I must insert here that I am a student with no administrative experience, so the following comments are based off of my general impressions, not from real-life.

It was mentioned that this subscription costs approximately $10,000 a year.  That includes 1/2 Gigabyte of storage, the program and the interface with customizable depth of drilling for URLs.  This seems an extravagant amount for a service such as this, especially the low amount of storage space.  With the thousands of spam bots and viruses out there that do similar crawling, the programming of the project does not seem to be the limiting factor.  Is the draw that the information is being held off-site, in a “safe” repository?

The fact that it is held off-site and made searchable by the public through the Archive-It’s website is a plus, but I’m still thinking the cost is high. Realizing that you can search any partner’s archived sites is impressive to me, in the shoes of an end-user, where I don’t have to sign in or align and limit myself to a certain repository.  Strangely enough, I don’t see Wisconsin Historical Society on the list.

Some of the caveats that were mentioned about digital archiving in this manner included costliness (of course), over-collecting, and the consideration of the importance of the material in the future.  Are institutional or individual websites just a fad?  Or do they have lasting value?  I find this a  difficult question to grapple with.  The internet, for the most part, has been available in the home for a little over 10 years now.  Since then, the cost of a web domain on which to host pages has continued to decrease, to where it can be as low as $10/year (That is about what the www.thenovicearchivist.com domain was purchased for).

How does an archivist know who’s stuff to capture?  Often times many famous writers are not known until they’ve passed away.  By the time an archive is aware of their existence the website domain probably will have been resold.

Website capturing and archiving seems just as sticky as something like sound or video archiving, where the migration (or emulation) of the media will incur annual expenses. Whereas paper records can be processed and stored, with the only annual costs are the storage and proper conditions (and perhaps later on, digitization, but that is a whole different bag of worms).

Digital records: Emulate or migrate?

Saturday, October 4th, 2008

I am not a programmer and I don’t necessarily consider myself “techy,” even though some of my friends tell me I am. Listening to an electronic records archivist speak in class that other day talk about the emulation or migration question, I asked what his preferred method was. His answer of “I am a records person, format isn’t important to me,” didn’t satisfy me.

Image courtesy of http://www.fotosearch.com

Image courtesy of http://www.fotosearch.com

In my understanding, “emulate” is an intermediary program which mimics the original operating system or system requirements in which the file was created was designed to run in. “Migrate” involves reformatting a file/record/program into a newer version of a similar type of software.

My opinion: Emulation would allow for other files that may come into the repository’s possession to also be read and accessed. Migration, it seems, is a temporary fix to a persisting issue and if started, migration will need to continue, due to the ever-changing software that is produced.

Perhaps a completely different approach is needed. Trying to use non-proprietary methods of storing information (which would be one of those cases where the archivist needs to be involved in the planning stages of an organization’s records program). For example: If one would like to access an image saved as a .PSD, they would need a version of Adobe PhotoShop. But PhotoShop allows you to save image files as any number of other formats. A format such as .JPG, .TIF or .BMP can be opened by many different types of image editing and viewing programs.

If emulators can be created to mimic programs, I would say that I am more in favor of the emulator instead of the perpetual cycle of migrating.