About Vault

Overview of Vault

Vault is the Internet Archive’s digital repository and preservation service that provides an extensible, affordable suite of features for institutions to meet their needs in managing and preserving digital collections. Vault leverages the existing non-profit infrastructure and open-source tools of Internet Archive for collecting, providing access, and ensuring the preservation of digital collections and integrates with many of Internet Archive’s widely-used services, such as Archive-It and archive.org. Institutions can use Vault to customize a repository or preservation approach that meets their technical requirements, preservation goals, and financial resources, including extensible features for data replication, geographic redundancy, and fixity reporting.  If you have questions about the service, please contact us here.

In this article

Background

Internet Archive has offered digital storage and preservation services to over 1,000 libraries, archives, museums, and cultural heritage and non-profit organizations for over two decades. These range from the basic, free storage and access options at archive.org, to the more robust preservation included in services such as Archive-It, to customized solutions provided via contracts with governments and large institutions. In owning and operating its own data centers and physical infrastructure, Internet Archive is able to offer storage and preservation services at a far lower cost than for-profit, commercial “cloud” computing corporate providers. Internet Archive is also a 501c3 non-profit organization, which allows it to provide these services to mission-aligned organizations free of any profit-seeking or mercantile interest. Internet Archive also has many partnerships and systems integrations with other digital library services and currently stewards over 100 unique petabytes of data, with multiple copies of this collection resulting in hundreds of petabytes of data under management. Vault combines this technical expertise and community alignment and builds on Internet Archive’s experience and infrastructure to provide a low-cost, extensible storage and preservation solution that can meet the needs of a variety of different organizations.

Vault Design

Vault provides advanced repository and preservation and services, facilitates the transfer of data with multiple data ingest and egress methods, includes multiple geographic locations where archived data is stored, includes fixity audit and repair and other digital object management tools, and has a low-cost pricing model based on a one-time price-per-terabyte for depositing data into the system, with no additional annual storage fees or data ingest or egress costs.

Vault design and service principles include:

  • Content Diversity: Any type of content, from individual files to datasets to WARCs, to Archival Information Packages (AIPs), can be deposited in Vault.
  • Multiple Geolocation Options: Archived data can be stored in Internet Archive data center locations around the world, including data centers in currently 3 nations on 2 different continents. Basic Vault services include storage of data in a minimum of 2 locations and has features for users to select additional, or specific territorial, geographic locations.
  • Multiple Data Replication Options: Basic Vault repository services include multiple copies of data housed in multiple locations, with add-on features allowing partners to have additional replicas of their data as needed. As many copies as you wish of your archived data can be stored and preserved with Vault.
  • Multiple Technology Architectures: Archived data can be stored in multiple technical storage architectures in order to reduce technology risk.
  • Fixity Check Frequency: Fixity checks can be run as frequently as you wish, with the additional capability to apply different frequencies for different collections of archived data and receive reports on all audit and repair actions.
  • Third-Party Cloud Replication: Vault includes the option for partners to mirror or have portions of their archived data copied into various commercial and institutional third-party cloud systems. Contact us for more information on this feature if it is of interest.
  • API Interoperability: Vault is API-first in design, meaning that most information available in the service’s web application and dashboards is also retrievable via API. Basic API integration for syncing (meta)data to popular external repository services is also possible, as Internet Archive has many existing integrations with peer mission-aligned services, repositories, access, and preservation systems.

General Storage and Preservation Policy

  • For all Vault accounts, data deposited in Vault is stored on servers within at least two of Internet Archive's self-owned and self-operated data centers in separate locations with a minimum of two copies at each location. Additional add-on features to a Vault service plan allow users to increase the number of replicas and locations or specify specific geographic regions where preserved data is stored.
  • Internet Archive has six primary online data centers in three different countries. Storage of Vault partners' deposited data may also be held in other offline or nearline storage locations for further preservation replication.
  • Vault partners’ deposited data is stored and preserved in diverse repository systems and architecture, ensuring a diversity of technological systems with which this data is managed.
  • Periodic integrity checks are performed on all Vault partner data in an ongoing manner as part of overall monitoring operations. Fixity reporting in the Vault application and dashboard occurs at least yearly for all accounts. Additional add-on features to a Vault service plan allow for more frequency fixity audits and reporting.
  • All Vault partner data is stored and hosted in a controlled-access, alarmed, fire-protected building. Data integrity and system availability are assured using a combination of internal and external systems and processes.
  • Security and monitoring of the harvested data are accomplished through a mix of internal and external systems; data integrity through internal routine tests; and system availability through the use of internal and commercial web monitoring services.
  • Deposited data is periodically migrated onto new physical media to account proactively for physical media reliability. Monitoring, logging, and notification systems escalate any hardware issues to an on-call team responsible for infrastructure maintenance.
  • Incidents such as service outages, network issues, or other irregular performance parameters exceeding operating tolerances are detected, tracked on system support tools, and addressed promptly.
  • Partners are notified in advance of any routine maintenance or system reconfiguration with the potential of service interruption.

Related Content

 

Last updated on November 30, 2023. 

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.