Data Center Terms
In this glossary you will find many terms related to data backup and storage technologies.
All Flash Storage
All flash storage array is running SAN or NAS protocols on a set of controllers and is best when optimized software is engaged to take advantage of all enterprise SSD drives allowing very fast IOPs and throughput. All flash systems will not run spinning hard drives and must be all SSDs.
A method used for storing information on multiple devices. In storage terms ‘array’ is commonly a collection of hard disk drives in a server arranged in a particular way, that store the same defined data as each other but can have different values.
After data has been written to the primary storage site, new data can be written to that site, without having to wait for the secondary (remote) storage site to also finish writing data. Asynchronous Replication does not have the latency impact that synchronous replication does, but has the disadvantage of incurring data loss, should the primary site fail before the data has been written to the secondary site.
New technology first introduced by Compellent around 2007. EMC and 3PAR also came on strong around 2008 with their solutions. Autotiering is most commonly referred to; data residing on two or three classes (performance) of storage. These would most commonly be today; SSD, 10k and 7200RPM drives. Autotirering resides on the controllers and as data is utilized by the user, data moves to the appropriate performance storage. Example, high speed data being written to the storage can write to SSD then move to lower cost 10k drives. Once the data becomes less required, the data can move to lower cost nearline 7200RPM archive drives.
Backup is the process of replicating your vital data onto a secondary storage device or off site storage, for the purpose of recovery in case the original data is accidentally erased, damaged, or destroyed.
A two step process. Data is first backed up to a secondary storage device i.e. external media such as hard drive, tape, DVD or backed up remotely (Online Storage). In the event of computer problems (such as disk drive failures, power outages, or virus infection) resulting in data loss or damage to the original data, the backed up data is then retrieved and restored to a functional system.
Allows backups to occur automatically at a designated time on set days of the week.
The software used to create your backup data as a precaution against loss or damage of the original data.
Backup Storage Device
A hardware device used to record and store data.
The time period available or allotted for backing up data.
Software or hardware designed to be compatible with earlier versions of the same product.
In storage terms, bandwidth is the total amount of data that can be transferred at one time between CPU and storage. Generally, bandwidth refers to large block data transfers and is usually measured in MBps.
Bare-metal restore is a form of data recovery which allows users to restore a system from “bare metal”, i.e. without any requirements as to previously installed software or operating system.
Raw data which does not have a file structure imposed on it. Database applications such as Microsoft SQL Server and Microsoft Exchange Server transfer data in blocks. Block transfer is the most efficient way to write to disk.
The ability of an organization to continue to function even after a disastrous event, accomplished through the deployment of redundant hardware and software, the use of fault tolerant systems, as well as a solid backup and recovery strategy.
A unit of storage capable of holding a single character. On almost all modern computers, a byte is equal to 8 bits. Large amounts of memory are indicated in terms of kilobytes (1,024 bytes), megabytes (1,048,576 bytes), and gigabytes (1,073,741,824 bytes).
A high-speed memory or storage device that helps reduce the time required to read and write data to a slower device, such as a hard drive.
Cloning is a type of backup which allows users to copy the whole partition or disc to another partition or disc with all files and folders. If the partition is bootable, the cloned partition will become bootable too.
Cloud storage is a cloud computing model in which data is stored on remote servers accessed from the Internet, or “cloud.” It is maintained, operated and managed by a cloud storage service provider or on storage servers that are built on virtualization techniques. Cloud computing is essentially delivering computing as a service. Cloud computing uses different chargeback plans than traditional data center approaches, typically billing flat fees per user per month. These data centers also known as co-locations rent space in their data center and the customer can house their own hardware products. These data centers also provide data backup solutions.
The process of reformatting data so it takes less space on your storage medium(s).
A device’s ability to connect or link with other devices on the network.
Consolidated storage connects multiple servers and/or workstations to a centralized array of hard disk drives. This type of storage setup is designed to result in higher availability, manageability, scalability and performance for the applications these servers support.
The capability of software or hardware to run on different platforms i.e. an application that is compatible on Windows and Macintosh operating systems.
DAS (Direct Attached Storage)
DAS is storage that is directly connected to a server/workstation. This direct connection provides fast access to the data; however, storage is only accessible from that server/workstation. DAS includes the internally attached local disk drives or any other storage medium attached directly.
Data protection is a complex of measures which are used to protect data from being erased, damaged or destroyed.
Data Transfer Rate
The speed at which data is transmitted from one device to another. Data transfer rates are often measured in megabits per second (Mbps) or megabytes per second (MBps).
A dedicated device has only one function i.e. a backup server which has no other jobs other than to backup data.
A backup of all changes made after the last full backup. The advantage to this is the quicker recovery time, requiring only a full backup and the latest differential backup to restore the system. The disadvantage is that for each day elapsed since the last full backup, more data needs to be backed up.
Disaster recovery is the process used to restore your backed up data in the event of a disaster.
A round plate on which data can be stored. Disks are divided into two categories magnetic (floppy disks, hard disk drives) and Optical (CD-ROM, DVDS).
A machine that reads and writes data onto a disk. There are many different types of disk drives i.e. a hard disk drive which writes info to a hard drive or a Floppy drive which writes information to a floppy disk.
This is a variation of disk mirroring in which each of the multiple storage disks has its own SCSI controller. This is the practice of duplicating data in separate volumes on two hard disks to make storage more fault-tolerant in case one drive goes down you can still access the other.
The practice used to spread data over multiple hard disks. Disk striping can speed up data retrieval operations from disk storage. The user is normally allowed to set the data unit size of each strip.
This is an approach to computer storage backup and archiving in which data is initially copied to backup storage on a disk storage system and then periodically copied again to a tape storage system.
Distributed storage is set up so that each server has its own external storage subsystem.
A method used to convert data (in the form of passwords, files or emails) into an illegible format if intercepted or accessed without the correct Encryption Key. The Key is a passphrase set by the user encrypting the data before it is stored/sent online, with the encryption software performing complex mathematical operations on the data.
Ethernet is a nearly ubiquitous network technology that divides data into packets or frames. First commercially available in 1980, it has become an industry standard. Throughput typically ranges from 1 gigabit per second to 10 Gigabit Ethernet, but IEEE has published standards for 40 and 100 Gig-E speeds.
External RAID Controllers
Allow multiple devices/servers to access a form of Direct/Network attached storage, by keeping the RAID controller external from any one particular server. This means that if the server goes offline others will still be able to access the storage.
A Fibre Channel (or iSCSI) topology with at least one switch present on the network.
In the event of a physical disruption to a network component, data is immediately rerouted to an alternate path so that services remain uninterrupted. Failover applies both to clustering and to multiple paths to storage. In the case of clustering, one or more services (such as Exchange) is moved over to a standby server in the event of a failure. In the case of multiple paths to storage, a path failure results in data being rerouted to a different physical connection to the storage.
Fault–tolerance is the ability of computer hardware or software to ensure data integrity when hardware failures occur. Fault-tolerant features appear in many server operating systems and include mirrored volumes, RAID– volumes, and server clusters.
Data which has an associated file system. NAS
A high–speed interconnect used in storage area networks (SANs) to connect servers to shared storage. Fibre Channel components include HBAs, hubs, switches, and cabling. The term Fibre Channel also refers to the storage protocol.
File Replication Service
(FRS) is a technology that replicates files and folders stored in the SYSVOL shared folder on domain controllers and Distributed File System (DFS) shared folders. When FRS detects that a change has been made to a file or folder within a replicated shared folder, FRS replicates the updated file or folder to other servers.
The method or functions used to view, manage and manipulate your user’s data files, locally on the storage server or remotely from the user’s workstations. This includes the monitoring of the file sizes, the content of these and the lifecycle of how long this business information is pertinent and should be stored.
This function allows backup selections of your data to be restricted to certain types of files (either by extension .doc, .xls, .txt files or by date nothing older than 01/04/05).
A firewall is a piece of hardware or software, which prevents unauthorized information / data from being sent from or received to a computer or computers in a network. For example it can be used to restrict users accessing certain websites or to stop programs from connecting to computers outside your network. It is however, more commonly put in place to stop external computers from trying to access your network, computers and data remotely.
The splitting up of a large data file into smaller segments which can then be stored or sent across a network. It can also be used as a term for how much a storage media is split in a disorganized state, i.e. a hard drive that is badly fragmented takes longer to read/write files to it.
This is usually created the first time a backup is run, as it takes a snapshot of all the data selected to be backed up. All data could be restored from this backup, with the need for additional incremental backup tapes, online archives to be accessed.
This is a unit of size which is used to measure hard drive size or the amount of online backup storage needed. It equates approximately to 1000MB, where (1MB = 25 Word document files).
High Availability – HA
A continuously available computer system is characterized as having essentially no downtime in any given year. A system with 99.999% availability experiences only about five minutes of downtime. In contrast, a high availability system is defined as having 99.9% uptime, which translates into a few hours of planned or unplanned downtime per year.
HBA (Host Bus Adapter)
The HBA is the intelligent hardware residing on the host server which controls the transfer of data between the host and the target storage device.
Is a computer or device in a network which stores not only data but applications as well for other client machines to access online, effectively “hosting” these.
A drive within a RAID setup which takes over automatically the functions and rebuilds the data if one of the hard drives within the RAID array fails. Once the failed hard drive is replaced it replicates the data back to this new drive and resumes its function as a spare.
HSM (Hierarchical Storage Management)
This is a automatic management process, used in conjunction with a policy setup describing the importance of data within an organization. This process then moves older unused data onto a cheaper form of long term storage, e.g. accounts older than 5 years automatically moved from hard drive to tape.
ILM (Information Lifecycle Management)
The process of managing information growth, storage, and retrieval over time, based on its value to the organization. Sometimes referred to as data lifecycle management.
An initiator is the device (usually contained within a server) that makes the application requests; which are then sent to the target device.
iSCSI (Internet SCSI)
A protocol that enables transport of block data over IP networks, without the need for a specialized network infrastructure, such as Fibre Channel.
A method of only backing up files which have changed since the last full or incremental backup. Meaning only the latest changes/data is backed up, not the entirety of all the data within the backup set, allowing quicker backups of less data to be performed.
JBOD (Just a Bunch of Disks)
As the name suggests, a group of disks housed in its own box; JBOD differs from RAID in not having any storage controller intelligence or data redundancy capabilities.
Local Area Network that connects workstations to data centers through switches. Normally using Ethernet technology plus including wireless connectivity.
The amount of time between a packet of data being sent to it being received between two devices on a network.
Referring to the ability to redistribute load (read/write requests) to an alternate path between server and storage device, load balancing helps to maintain high performance networking.
LUN (Logical Unit Number)
A logical unit is a conceptual division (a subunit) of a storage disk or a set of disks. Logical units can directly correspond to a volume drive (for example, C: can be a logical unit). Each logical unit has an address, known as the logical unit number (LUN), which allows it to be uniquely identified.
A method to restrict server access to storage not specifically allocated to that server. LUN masking is similar to zoning, but is implemented in the storage array, not the switch.
A unit of measurement equal to 1024KB or 1million bytes where (approximately 1MB = 25 Word document files).
Generic term usually used in conjunction with the amount, of long or short term storage in a computer. Examples being 60GB hard drive or 512MB of RAM (short term).
Servers that provide the same services and data to users, providing a higher level of redundancy. So if one server is unavailable to the users the other takes over and provides the services/data.
The process of transferring data to another device or server within a mirrored set of devices (RAID array)/servers (Mirrored Servers).
Multipathing is the use of redundant storage network components responsible for transfer of data between the server and storage. These components include cabling, adapters and switches and the software that enables this.
NAS (Network Attached Storage)
A storage device connected directly to a network, so all devices on the network have the ability to independently access, read and write data to it.
A set of devices, be they computers, printers, routers that use connections to transmit, store or access data. It also allows the sharing of resources such as information, programs and peripherals (scanners, printers, etc.).
The use of a device e.g. NAS (see above) for the storage of files backed up from another device on the network. Alternatively, can also mean the backup of all devices on a network from a centralized location.
An application which runs on a remote storage device (server or computer), which allows users to read, write and change data files online as if they were stored locally on their own machine.
Network File System – (NFS)
The physical topology, speed or current availability of devices/ computers on a network being able to access each other or external resources via the cables, routers and switches which comprise it.
A component device which is attached to or makes up part of a network. For example, a computer, server, printer, network switch or router.
Online Capacity Expansion (OLCE)
The ability to add additional storage media, in the form of hard drives or external devices to a server, without the need to disconnect or disrupt any user’s current access to their data.
Operating System (OS)
The base software that all other software on a computer/server runs off. It controls access to input and output devices, programs and provides a single method of control. This allows a user to interact with the hardware to access their data and run programs.
A unit of measure approximately equal to (1,000 terabytes).
A base of the system comprised of hardware and software, onto which other functions/programs are built or installed.
A full copy or a backup of all data, available for restoration on a system from a particular date and time.
The amount of space taken up by a server or other device in a server cabinet. Usually measured in U, example a server being 2U high.
RAID – (Redundant Array of Independent Disks)
A method of splitting up data across a set of hard disks, allowing for faster access, more resiliency or a combination of both.
* RAID 0
Data is written/read across two or more drives simultaneously, effectively allowing it to be accessed much more quickly. It allows all the hard drives in the RAID array to appear as a single device with the total combined storage of all the drives. E.g. two hard drive of 256GB and 512GB, would appear as a single 768GB drive. This form of RAID is sometimes called striping and does not provide any extra redundancy than storing your data to a single hard drive. If one drive fails and is then replaced, the other can not rebuild the whole data file.
* RAID 1
Two or more disks are used to store the data as exact images of the other. As data is being written or read from one disk the RAID controller “mirrors” this data to the other drive/s. This provides a high level of redundancy as if one drive fails the other has a full working set of data. However, it means that you would only be able to have as much storage space available as the size of the smallest disk in the RAID array. E.g. two hard drive of 256GB and 512GB, would appear as a single 256GB drive.
* RAID 4
Data is striped at a block level across several drives, with parity stored on one drive. The parity information allows recovery from the failure of a single drive. The performance of RAID 4 is very good for reads. Writes, however, require that parity data be upgraded each time. This slows down random writes in particular, though large write or sequential writes are fairly fast.
* RAID 5
Employs data striping and parity across all drives in the array creating better performance and security. Since parity information is striped across all drives, lost data can be retrieved and rebuilt from the parity.
* RAID 10
A combination of RAID 0 and RAID 1, data is distributed across multiple drives without parity, and then the entire array is mirrored. Although this delivers good performance, the drive storage overhead is 50% because you are mirroring the data.
The duplication of information or a hardware component to ensure that should the primary resource fail, a secondary resource can take over its function.
A feature enabling a “remote” user or piece of equipment to gain access to/the use of resources that are not directly connected to it by a physical means. Examples include the ability to “Dial in” to a server to control it from a distance, or viewing your e-mail from a shared computer via the Internet.
Similar to Remote Access, but more focused. Often only allows access to a pre-determined set of tools to access management features of a device or network.
Remote Mirroring of Data
Remote mirroring is the process of creating an additional copy of written/stored data at a remote location, enabling a more robust disaster recovery environment. This capability is key for a disaster recovery solution that provides for resilience and access to business critical data.
Replication is the process of duplicating mission critical data from one highly available site to another.
A retrieval of data previously backed up to storage media. A restore is performed if data has been lost or corrupted since the backup.
A device that is connected to at least two networks, and administers the data sent between them depending on how it is configured.
SAN – (Storage Area Network)
A high-speed special-purpose network (or sub network) that can interconnect various kinds of data storage devices (usually disks) with associated data servers. This can be useful when providing storage to of a larger network of Users, where the requirements of each server can vary dynamically.
SAS (Serial Attached SCSI)
New technology replacing SCSI connectors and using SCSI protocol.
Scale Out NAS
Scale-out network-attached storage (NAS) addresses the explosive growth of structured and unstructured data, and performance demands of today’s workloads. A scale-out storage architecture takes advantage of the superior price and performance of clustered components, facilitates nondisruptive operations, and employs policy-based management for improved efficiency and agility.
SCSI (Small Computer System Interface)
A set of standards allowing computers to communicate with attached devices, such as storage devices (disk drives, tape libraries etc) and printers. SCSI also refers to a parallel interconnect technology which implements the SCSI protocol.
The amount of data a server is able to store. It is usually measured in Megabytes (MB), Gigabytes (GB) Terabytes (TB).
A term coined by Microsoft. A “Shadow Copy” is a point–in–time copy of the original data, and is often possible when the data is in use.
A storage processor (SP) is an intelligent RAID controller (and in some cases, a dedicated computer) that is enclosed within a storage device. They control the allocation and administration of the disks within.
In synchronous replication, each time data is written to the primary disk, the secondary (remote) disk must complete writing the copy before the primary can begin writing the next piece of data.
A subsystem which houses a group of disks together controlled by software usually housed within the subsystem.
Providing such functionality as disk aggregation (RAID), I/O routing, and error detection and recovery, the controller provides the intelligence for the storage subsystem. Each storage subsystem contains one or more storage controllers.
An intelligent device residing on the network responsible for directing data from the source (such as a server) or sources directly to a specific target device (such as a specific storage device) with minimum delay. Switches differ in their capabilities; a director class switch, for example, is a high end switch that provide advanced management and availability features.
A unit of storage measurement, similar in size to approximately 1,000 GB.
Tiered Storage Data
Storage is arranged according to its intended use. For instance, data intended for regular use or restoration in the event of data loss or corruption is stored locally on more expensive, faster disks. Data required to be kept for regulatory purposes is archived to lower cost disks. This approach allows funds to be spent where necessary and cost-savings possible so that hardware is not wasted in its role.
Tiering refers to the storage of data in the most appropriate medium based on its intended use. Data needed on demand would be top-tier and stored on solid-state or fast disks. Data rarely needed would be archived on the lowest tier, usually optical disks or tape (sometimes offline).
The term topology refers to network design. Planning data center operations and enhancements require both physical and logical topologies.
This is the pooling of physical storage from multiple network devices, allowing it to appear as a single storage device, managed from a central console. It is commonly used in a SAN environment.
VDS (Virtual Disk Service)
VDS is a set of application programming interfaces (APIs) that provides a single interface for managing disks in Windows Server 2003 operating systems. VDS provides a means of managing storage hardware and disks, and for creating volumes on those disks.
VDI (Virtual Desktop Infrastructure)
A Virtual Desktop Infrastructure (VDI) is a desktop-oriented service that hosts user desktop environments on remote servers and/or blade PCs. Users accessed the desktops over a network using a remote display protocol. A connection-brokering service connects users to their assigned desktop sessions. For users, this means they can access their desktop from any location, without having to use a single client device. Since the resources are centralized, users moving between work locations can still access the same desktop environment with their applications and data. For IT administrators, this means a more centralized, efficient client environment that is easier to maintain and able to respond more quickly to the changing needs of the user and business.
VM (Virtual Machines)
Virtual servers is software allowing multiple servers residing on a single server. Most virtual servers are using VM Ware or HyperV for their software.
VTL (Virtual Tape Library)
Disk array that has software housed within the unit and makes the unit of disks appear to the backup software as a tape library.
A volume is an area of storage on a hard disk. A volume is formatted by using a file system, such as file allocation table (FAT) or NTFS, and typically has a drive letter assigned to it (in Windows). A single hard disk can have multiple volumes, and volumes can also span multiple disks.
A Server whose role is to access and publish information on the public Internet or private company Intranet via web pages, databases and other back-end systems. These have become popular vehicles for sharing information, and examples include e-commerce, web-based services (such as Online Backup) and security functions.
Wide Area Network (WAN)
A WAN is comprised of two or more local area networks (LAN’s) that are connected together through an intermediary network. For example, a LAN in London needs to connect to a LAN in Manchester. When these two are connected together, a WAN has been created.
A method used to restrict server access to storage resources that are not allocated to that server. Zoning is similar to LUN masking, but is implemented in the switch and operates on the basis of port identification (often using port numbers).