HiPerGator-RV Data Management

Procedures and tools in place for users to manage the data stored in the HiPerGator-RV system.

DEFINITIONS

  • HiPerGator-RV has two ways to store data
    • My Vault: Files are stored in a way that is very similar to many cloud storage interfaces like Google Drive, Dropbox, OneDrive/Teams, Box, etc. 
    • Drives: For more complex workflows, files are stored in encrypted drives that can be mounted in Linux or Windows virtual machines (VMs) for accessing and processing.
  • Snapshots of files are copies taken at regular intervals with a number of historical copies maintained by the system for each file. This allows users to recover a version of a file at some point in the past, as long as it is within the list of retained copies. Snapshots are often made on the same storage system and do not protect from system failure. 
  • Replication of files is maintaining a copy of the file, usually at some regular interval, such as daily. When the original file is deleted, modified, or damaged, the replica file still exists. However, at the next synchronization time, the modification/corruption will affect the replica as well. Replicas are very useful to protect against obvious damage that is noticed immediately, so that replication can be stopped or the file can be restored immediately from the replica before the next synchronization time. Replicas are usually made on a different system in the same data hall and protect from single system failure, but not data hall-wide issues like fire or flood.
  • Backup of a file are copies of various versions of the file to another storage device, often using a different technology (magnetic tapes instead of spinning disks).
  • Off-site copies are stored on a storage device that is located at another geographic location. Off-site copies are the only way to protect against total system failure and issues that affect the whole datacenter or geographic site, such as flooding and destruction.

PROCESSES

  • At the hardware level, all storage systems have RAID redundancy to protect against hardware failures of disk drives.
  • All files in the My Vault are stored on the primary storage system individually encrypted.
    • The files are replicated to a secondary storage system once per day at 1 am. 
    • The files are also backed up to tape with an off-site copy using incremental backup.
    • Incremental copies are retained for 90 days, with the last retained copy (included after the file is deleted) kept for 1 year. 
  • The drives are listed in the Drives sub-tab of the Virtual Machines tab.
    • The drives are subject to an incremental snapshot backup process using the virtual device system (QEMU)  that works with the KVM hypervisor used in HiPerGator-RV to run VMs. The QEMU system ensures that the state of the virtual drive is consistent before it takes an incremental snapshot and replicates to the secondary storage device. That way, drives can be restored from one of the incremental snapshot backups on secondary storage.
    • This process works on encrypted data, the QEMU system or any HiPerGator-RV administrator does not have access to the data at any time.
    • Coming soon: As an option, users can designate particular drives to be backed up to tape with an off-site copy. This backup is done using QEMU in exactly the same way, but with the incremental snapshots being copied to tape. The keys to decrypt the drives or their incremental backups are not stored on the tape system or transferred off-site.
  • Drives for Windows (Linux drives do not have this capability) have the Microsoft Volume Shadow Copy Service (VSS) option turned on by default. This service uses a fraction of the virtual drive to keep snapshots of previous versions of all files that change, which can then be easily recovered by the user. The frequency, time and available space with which the snapshots are taken and the number of versions kept is configurable by the Research Computing staff who maintain the VM images. This mechanism handles 90% of the typical requests for restoring deleted or corrupted files. It does not protect the data from underlying system level problems.

BEST PRACTICES

We recommend that users choose a preferred workflow for using HiPerGator-RV. It is best practice to review the contractual requirements for the data for the project to ensure that the selected workflow satisfies the terms of the contract. Start simple and add complexity as the need arises. We describe a few use cases.

  1. Using My Vault – For some workflows that only involve storing, tracking and viewing files, My Vault may be sufficient for the project.
  2. Using encrypted drives and VM for processing – The files for the project can be stored in one or more encrypted virtual drives with all the work done in Windows or Linux VMs.
  3. Using a single-user VM – Users can perform work on data stored in drives using applications available in Linux or Windows VM. There are several VM templates available that have applications installed and are ready to go.
  4. Using a multi-user VM – To support larger and more complex operations, multiple users can login to a single Linux or Windows VM that provides access to the drives shared by the users in the group. Within the VM, the operating-system supported security mechanisms apply to provide access control to the files in the drives mounted in the VM. 

COSTS

  1. There is no additional cost for the My Vault replication and backup processes. 
  2. Backup and Offsite copies for Virtual Drives are not included in the base level pricing for HiPerGator-RV. Please contact Research Computing if you require this service for your project.

DATA RECOVERY PROCESS

  1. Restoring files to a previous version using the VSS feature within Windows VMs can be done by users. No request to Research Computing is needed for this type of recovery. For instructions on the procedure, please refer to the following link: https://it.clas.ufl.edu/kb/recover-files/
  2. Data recovery requests for My Vault files or encrypted virtual drives should be submitted to Research Computing by submitting a ticket through http://support.rc.ufl.edu
  3. Recovery timeframes depend on the type and size of recovery requested.