Backup serves the need to prevent data loss by various reasons: environmental outages, logical failures, human errors or even sabotage.
In contrast to this, archiving is the attempt to store data for long time periods, in many cases for legal purposes. In addition to this, most archiving solutions provide fast search indexes in order to ease the process of retrieving information as fast, as possible.
I don’t want to dive into the legal part of archiving, because on the one hand this is up to national and business vertical regulations and on the other hand I’m not a lawyer and therefore the wrong person to write about this matter.
But there is another reason to implement a professional archiving solution. I mean the reduction of data stored in your production environment that has to be backed up daily, or depending on the business criticality, even more frequently.
Especially e-mail servers and file servers are, in most environments, hosting years of outdated data that may or may not be needed. Even if this old data is required, the question should be asked, whether it is required on active expensive disk rather than on slower SATA storage arrays where it can be retrieved and the cost of maintain it is less. Nevertheless it resides on the production servers and does mainly two things: increase storage requirements and extend the backup window. This data is classified as being “in the wild” and this type of data is where an archiving solution offers greatest benefits.
For this reason I will concentrate on this data type in this article
This sounds to be quite easy in the first place, but as with most IT related solutions, there are quite a number of things to consider. The two most important ones from my point of view are scalability on the one hand and, even more important, the usability and “invisibility” for end users.
By saying “invisible” I mean that such a solution should allow users to work the way they are used to. Let me give you an example: When implementing an archive solution for Exchange, I think, it’s quite important that users can access their archived e-mails not only by using Outlook, but also by Outlook Web App (formerly known as “Outlook Web Access, OWA”) or via their smartphones or tablets.
Another thing that is quite helpful when having an archiving solution in place is that you can define whether the archived data shall be kept for a defined period time (retention) and then be deleted automatically based on a policy or event (Expiry).
(Being German, I know that quite a lot of people set the retention time in their archiving solution to “forever” because they would only allow to delete data, after having created at least two copies of it…)
So, even though backup and archiving are two completely different matters, both technologies complement each other: Having an archiving solution in place helps to reduce the amount of data stored on the primary file servers, e-mail systems and so on and therefore also reduces the amount of data to be backed up (and to be restored.)
On the other hand, every archive systems should be protected by a solid backup. After all, the archived data might be business critical in terms of compliance. (If it wasn’t, why should it be archived?)
To summarize: even though backup and archiving are complementing each other, they are two completely different matters and none of them can replace the other.
I want to thank Liam Finn, Senior Product Manager,technical Field Enablement eDiscovery at Veritas, for helping me writing this article.