Data Management

Difference between data backup and replication

Replication of data means that every data set has an identical copy maintained in another system. The second system may live in the same data center, but not in the same rack.

  • The purpose of replication is to protect data from system malfunctions and system damage caused by unexpected external events, like power surges.
  • When the first, original copy of a data set is modified, the replicated data set will change in the same way within some determined time.
  • Replication does not allow recovery of older versions of data sets.

Backup of data means that a copy of data sets is made with some determined frequency, and that the copy is kept for some determined time.

  • The purpose of backup is to protect data from loss and corruption from system malfunctions.
  • Backup maintains a historical copy of data sets that allows restoration of older file versions.

It is important to consider several factors in the selection of the storage type to be used:

  1. The type of data that is being stored (restricted, sensitive, open) – Users must be aware of the different data types, the policies, and responsibilities on the use the data.  The storage option will depend on legal, regulatory and University policy. For example, data that contains Personal Identifiable information (PII) will require appropriate level of security, which is only provided by data storage designed and managed for restricted data.
  2. The volume of data stored – Different data storage options have a maximum volume of data that can be stored.  Faculty must be aware of the volume of data to be stored in order to select the appropriate option.
  3. Data access performance – Depending on how and what the data is used for, transfer speed may be an important consideration. For example, transfer or access to large volumes of data is best achieved from storage on local high-performance infrastructure as opposed to a cloud service, where data must travel over distant networks with unknown performance.
  4. Need for collaboration – Sharing data is often required to conduct work.  Care must be taken to ensure:
    • Privacy and security – Individuals whom the data is shared with must have the appropriate authorization to see and work with the data.
    • Ease of use – Consider the complexity of making the data available to the target group of collaborators. Ease of use will often be in counterpoint to privacy and security.  As a result, these two factors must be carefully balanced.
    • Scope of collaborators – Depending on the nature of the data and collaboration, some systems will require that all individuals have a GatorLink account. Other systems support convenient sharing with any individual or group, without the need for a GatorLink account, within or outside the University.
  5. Cost  Some services are provided by the University at no cost to the faculty. Other services, although subsidized, will require a fee for use or purchase of storage capacity. Fees generally apply to high-performance, high-capacity, and high-security systems.
  6. Management convenience  Various data storage systems are available and are managed according to the unit provisioning the service:
    • On premise enterprise and research storage systems are managed by departmental and university IT staff. These systems require GatorLink accounts for access.
    • Cloud storage systems provided by the University (DropBox for Faculty/Staff, OneDrive for Business) are managed by the individual. Using these services to share files and folders with small groups of people is quite easy. Faculty may find it more convenient to use managed enterprise storage, though, in situations where there are larger numbers of collaborators or when the group of collaborators changes over time.