Data archiving is the process of moving data that is no longer actively in use to a separate storage device for long-term retention. Archive data is older data that remains important to the organization or needs to be retained for future reference or regulatory compliance. Data archives are indexed and have search capabilities, so files can be located and retrieved.
Archived data is stored on a lower-cost storage tier, reducing primary storage consumption and associated costs. An important aspect of a company’s data archiving strategy is to inventory its data and identify data that can be archived.
Some archiving systems treat archive data as read-only to protect it from modification, while other data archiving products allow writes as well as reads. For example, WORM (write once, read many) technology uses media that are not rewritable.
Data archiving is most appropriate for data that must be retained due to operational or regulatory requirements, such as document files, email messages, and possibly old database records.
Benefits of data archiving
The biggest benefit of data archiving is that it reduces the cost of primary storage. Primary storage is typically expensive because a storage array must produce a sufficient level of IOPS to meet the operational demands of user read/write activity. In contrast, archival storage costs less because it is usually based on high-capacity, low-performance storage media. Data archives can be stored on hard disk drives (HDDs), tape, or low-cost optical storage that are typically slower than performance disks or flash drives.
Archival storage also reduces the amount of data that needs to be backed up. Removing infrequently accessed data from the backup dataset improves backup and restore performance. Typically, data deduplication is performed on data moved to a lower storage tier, reducing the overall storage footprint and lowering secondary storage costs.
Data Archiving vs Backup
Data archives should not be confused with data backups, which are copies of data. Although both are considered secondary storage and use lower performance and higher capacity storage media than primary storage, they serve different purposes. Archives serve a purpose of data preservation, while backups are used for data protection and disaster recovery.
Data archives can be thought of as a data repository for data that is rarely accessed, but still readily available. Backups, on the other hand, are part of a data recovery mechanism that can be used to restore data in case of corruption or destruction. Backup data often consists of important information that needs to be restored quickly if lost or deleted.
Online or offline data storage
Data archives take different forms. Some systems use online data storage, which places archive data on disk systems where it is easily accessible. Archives are often file-based, but object storage is growing in popularity.
Other archival systems use off-line data storage in which archival data is written to tape or other removable media using data archiving software, rather than being stored online. Because tape can be removed, tape archives consume much less power than disk-based systems. This results in lower archive storage costs.
Cloud storage is another possible archive target. Amazon Glacier, for example, is designed for data archiving. This method is inexpensive, but requires a permanent investment. Additionally, costs can increase over time as more data is added to cloud storage. Cloud providers typically store archived data on tape or on slower, high-capacity hard drives.
Data Archiving and Data Lifecycle Management
The archiving process is almost always automated using archiving software. The capabilities of this software vary from vendor to vendor, but most archiving software automatically moves aging data to archives according to a data archiving policy defined by the storage administrator. This policy may also include specific retention requirements for each type of data.
Some archiving software will automatically purge data from archives once it has exceeded the lifespan imposed by the organization’s data retention policy. Many backup software and data management platforms have added archiving functionality to their products. Depending on your needs, this can be a cost-effective and efficient way to archive data. However, these products may not include all the functionality of a dedicated archive software product.
Some companies are required to retain data for certain lengths of time due to regulatory compliance. Whether mandated by industry regulations or government legislation, meeting compliance guidelines is a common business concern. Sanctions for compliance violations can include payments for damages, fines, and canceled contracts.
Data archiving helps companies achieve compliance by both storing data long-term and consolidating data for easy access in the event of an audit. Rules dictating how long data must be retained, where it can be stored, and who has access to it vary by industry and the type of data that companies in that industry generate.
Examples of regulations organizations may need to comply with include Sarbanes-Oxley (SOX), Health Insurance Portability and Accountability Act (HIPAA), and General Data Protection Regulation (GDPR).