Storage Concepts Storing and Managing Digital Data
Storage Concepts Storing and Managing Digital Data
Storage Concepts Storing and Managing Digital Data
Edition 1.3
Storage Concepts
Storing and Managing Digital Data
Storage Concepts
Published by
HDS Academy, Hitachi Data Systems
Corporate Headquarters
2845 Lafayette Street
Santa Clara, California 95050-2639 USA
www.HDS.com community.HDS.com
Americas: +1 866 374 5822 or [email protected]
Europe, Middle East and Africa: +44 (0) 1753 618000 or [email protected]
Acknowledgements
Editor: Peter Manijak
Martin Stewart
Pavel Vild
Rebecca George
Copyeditor:
Graphical work:
Vojtech Marek
Tina Pankievich
Consultants and specialists:
Dr. Miroslav Kotrle
Radim Petrzela
Project managers:
Cecil Chamberlain
Jaroslav Fojtik
Contents
Table of contents
Foreword11
Conventions used in the book
13
15
16
17
18
20
22
24
26
28
28
30
32
34
36
37
39
54
55
56
56
57
58
59
60
62
66
69
71
73
75
78
79
80
82
86
87
88
89
93
94
94
95
99
103
105
106
108
109
110
113
115
117
120
123
124
125
127
132
134
135
137
138
140
144
145
146
149
151
154
157
160
163
165
167
168
170
172
174
175
176
178
179
183
184
186
190
191
192
194
195
197
198
199
204
206
210
212
216
218
221
222
225
226
227
228
229
233
234
236
238
240
242
244
245
246
247
248
249
250
253
254
255
258
262
265
268
269
269
270
273
274
275
277
283
286
290
292
293
9. Business challenges
Business challenges
Data growth forecast
Advanced data classification and tiering
High availability challenges
Fast and cost-effective response to business growth
Compliance and security challenges
Power requirements and cooling
HDD and FAN power savings
Green data center
Total cost of ownership
297
298
299
303
307
308
309
311
313
314
316
Self-assessment test
319
Glossary
327
11
Foreword
Data. Its a word we all know and use regularly, but how does it impact our personal and professional lives? We live in an era of technology that places information at our fingertips. Technology empowers us
to create anything from digital photographs, films and symphonies to
advanced medical images and space systems. Data grows exponentially
and stretches further each day. It makes the highest achievements possible and connects us like never before.
But how do we store this ever growing data the countless files
created daily around the world? The answer depends on the data owner. Individuals need to store files in an easily accessible location and
maximize free space for new files. Small businesses face more complex
issues with the data they use and store, but it is critical to store their
data. Midsized businesses and large corporations face far greater concerns. Their information, or data, is a corporate asset that needs to be
stored, managed and protected plus there are government compliance regulations that impact the process.
For years, many IT professionals have seen data storage as a mystery.
Larger businesses employ dedicated storage administrators or contract
with professional services consultants to perform the more complex
procedures and manage the all important data. Most IT professionals
know a lot about servers but very little about storage systems. However,
these specialized systems are a modern day necessity for the storage
of data. In this regard, an IT administrator (or other responsible party)
must have at least elementary knowledge of common problems related
to data storage and storage systems.
This book covers the basic areas of data storage. Each chapter is written as an independent section, so you do not have to read the chapters in serial order to glean knowledge on a particular topic; feel free
to jump from one chapter to another and select only the chapters that
interest you.
12
The book is for anyone who would like to understand the cornerstones of data storage technology and the significance of basic terminology used in connection with storage systems. We recommended this
book for students or starter administrators, as well as experienced IT
workers who want to deepen their knowledge in data storage. Some
individuals do not work directly with storage systems but need to communicate with personnel in IT; the book is suitable for this audience as
well.
Our intent is not to provide a detailed manual for coping with the
technologies of individual manufacturers, concrete commands, scripts
and procedures, or to compare equipment from different vendors in
the market. We respectfully leave these next steps to you.
After reading this book, you should understand the basic rules of
data storage and get a systematic overview of modern tools and technologies. You will also be acquainted with the economic viewpoint of
technology operation, and for this reason, you will better understand
the essence of IT investments.
If the subject of data storage systems piques your interest and you
want to deepen your knowledge after reading this book, you can search
for additional information through data storage companies such as Hitachi Data Systems (www.hds.com).
We wish you a successful first step in the amazing world of data storage and storage systems.
Hitachi Data Systems Academy
13
14
16
17
proach will help you become familiar with key terms that well look at
in detail later.
Common user data:
Photos uploaded to web galleries or social networks
Movies uploaded to services like youtube.com
Documents created by online applications like Google Docs
Email communication stored on the Internet
Data backed up to online storage such asMicrosoft Sky Drive
Personal web pages and blogs
Increasingly popular cloud systems and online applications
All this data can be considered important, at least from the users
point of view, and deserves careful and secure handling. However,
limited access to these types of data for example, during hardware maintenance usually doesnt create any financial loss to the
end user. If you cant access your photo gallery for an hour once a
year, it most likely wont cost you a lot of money. For large institutions that store ongoing financial transaction data, this disruption can
have sometimes devastating consequences. To prevent data loss, you
need to employ several basic concepts various techniques for data
backup and data replication. The key term in the storage technology is
redundancy. Data needs to be stored as several copies in several locations to ensure no disaster can affect its availability.
18
Figure 1.1 Services like YouTube and Facebook have to store huge amounts of data.
19
Figure 1.2 Business companies are dependent onthe properly working data storage system.
20
The term data lifecycle applies to all data that comes into existence,
while a data retention period applies only to certain types of data and is
governed by law.
Another thing to mention here is that long-term data retention is often inversely proportional to high data availability. This means we have
to store certain data for, lets say, forty years, but its fast availability
21
Data Lifecycle
High Performance
Storage Tier
Low Cost
Storage Tier
Online Archive
Storage Tier
Offline
Archive
22
23
databases and tables) you can use software for analysis. Then people
can use this data for interpretation. As for managing and processing
unstructured data, you need more sophisticated software able to work
with just one file type (for example, with MP3 files or Microsoft Word
documents). The computer analysis of unstructured data does not always provide satisfying results and there is often a need for human factor intervention. In other words, there must be a person to organize,
manage, process and analyze this data. Some examples of unstructured
data are:
Medical images such as MRI scans and x-rays
Photographs and diagrams
Digital documents such as check images and contracts
Electronic assets such as presentations, emails and design documents
To have a general idea of the structured and unstructured data ratio,
refer to Figure 1.4, which illustrates the exponentially growing demand
for data storage.
(TB)
Unstructured Data
Database
Figure 1.4 Structured and ustructured data ratio, growing demands on data
storage capacity, HDS statistics.
24
In this section, we discussed several types of data. You learned that different levels of importance can be attributed to data, at least from the perspective of business and storage systems. The trick is ensuring that data
is safely stored and available for creation of several copies. Whenever you
send an email or upload a picture to a web gallery, you create at least two,
maybe even three or four copies, because the service provider cannot afford losing your data just because of technical failure. This, of course, demonstrates that the need for storage space is increasing, because all data
must be duplicated multiple times. This is called redundancy and it is a
key word that will be mentioned many times throughout this book. Other
terms worth remembering are business continuity and disaster recovery.
We will talk about these in detail later.
25
to install the software on your computer but you can use online software on the cloud. The advantages of cloud computing are clear:
The user does not have to worry about licensing and software
costs.
The user has the same working environment on any computer in
any location that is connected to the Internet.
The user does not need powerful hardware to run certain applications; all the data processing and computing is done by a
remote computer that is part of the cloud solution.
All online applications are automatically updated.
Systems are scalable and pay as you go you can buy the exact
amount of services you need.
There is a low risk of data loss.
Figure 1.5 An example of cloud technology, iCloud solutions provided by Apple. Among its
competitors is Google, which offers Google Apps such as Google Docs.
26
In the last section, we discussed what forms data can take and its
importance to businesses. Before we continue, lets define the term
data itself and make a clear distinction between data and information. This should help you understand more clearly the way data is
handled in computers and storage systems. We can say that data is a
physical and written representation of information, knowledge and
results of any real world observation.
From an information technology point of view, data:
Is a succession of written characters, which can be represented by numbers, letters or symbols. These characters may seem
strange or meaningless at first glance, but they serve to organize
the data.
Does not have to bear any useful information.
Must be processed, organized and structured in order to extract
information.
Is easily processed by modern electronic devices. Its the language of computers and storage systems.
27
28
Distinction between data and information can also help us understand the more complex distinction between logical and physical levels of data. This distinction can be more difficult to grasp because
it is quite abstract. Generally speaking, we can say that the physical
level is the hardware level, with a primary focus on where exactly
raw blocks of data are stored on a hard drive, how they are optimized
for input and output (I/O) performance, and mutual communication
among hardware components on the lowest level. The logical level of
data, however, can be understood as a virtual interface working as a
bridge between the lowest level hardware operations and the user.
The user of the logical construct can be the server, the application or
the person using the equipment. This may sound confusing, but there
are many practical implementations of this abstract concept. The key
terms that will help you understand how physical and logical levels of
data work include physical unit, logical unit, partition, volume, file
system and microcode (to an extent).
In most of modern systems, the physical level of data processing is hidden from the user. The user either does not have the possibility to control
this level or can partially make changes using tools that are able to influence the physical level through the logical level.
29
Logical Disk D:
Logical Disk C:
Boot Record D:
Figure 1.6 One physical disk drive is partitioned into two logical drives (C: and D:).
Now you may ask what the joining element between physical hard
drive and logical hard drives is. In this case, it is the metadata stored on
the first physical sector of a hard drive that contains the Master Boot
Record (MBR) table with information on the partitioning of the disks,
30
their proportions and their status (i.e., whether they are formatted and
contain some kind of file system). The MBR table is able to address only
disk drives with a capacity of up to 2TB. For bigger disks, there is a new
standard called a GUID partitioning table (GPT). This is, however, not
very important for us at the moment.
Why would people partition their hard drives? There can be several
reasons:
OS and application system files may be separated from user files
for purely organizational reasons.
It is easier to create a backup image of a partition that contains
the operating system. (It is not very practical to create an image
of the whole disk.)
You may want to install different file systems and operating systems on a single drive.
You may want to encrypt one partition that contains important
data.
Volume
Compared to the term partition, the term volume is more general
as it describes any single accessible storage area with asingle installed
file system. The term volume is mostly used in the context of operating
systems, but we can encounter this term even when we use software
for managing storage arrays. Volume will always be interpreted as a logical layer because it is not connected directly with physical operations.
When we have a physical hard drive with two or three partitions, the
first partition can contain a New Technology File System (NTFS) used by
the Windows NT series, the second partition can be formatted with the
ext3 file system used by Linux operating systems and the third partition
can remain unformatted, without a file system perhaps because we
31
have not yet decided what to do with this disk space. When we boot
Windows, we see only one volume, because Windows systems cannot
access and use the Linux ext3 file system (at least not without third party
utilities that may enable access to partitions that use different file systems). Windows also cannot use unformatted disk space. When we boot
Linux, we see two volumes one with the ext3 file system and one
with the NTFS, because Linux is able to access disk space that is using
the NTFS (even though Linux does not use the NTFS for its installation by
default settings). The third partition cannot be called volume because it
does not have a file system installed, and thus does not comply with the
definition of the term volume.
The term volume does not even have to be connected with partitions. If you insert a CD with music or a DVD with film, it will be again
called a volume. The same applies to USB pen drives, memory cards or
legacy floppy disks. They are not partitioned, and yet they are recognized as volumes by an operating system.
The term volume is then again part of the logical level of data and
it comprises the terms partition or disk drive and file system. The term
volume can be understood as a logical interface used by an operating
system to access data stored on the particular media while using a single instance of a file system. A volume, therefore, cannot contain two
or more file systems.
We have learned that you can split one physical hard drive into two or
more logical drives through a technique known as partitioning. If the logical volume contains a file system, an operating system sees it as a volume.
It is very important to note that a partition (or volume) can also be created from two or more physical hard drives. If you have two physical hard
drives and you want to create one unified volume, then you can join them
together, creating a single logical disc that will have the capacity of both
physical hard drives. This technique is called concatenation of hard drives.
32
File system
We have mentioned that volume is defined by two things it must
be a single storage area and it must have some sort of file system installed. We have encountered the term file system quite a lot throughout the last few paragraphs without describing what it actually is. In
simple words, a file system is a way in which files are named, organized
and stored on the hard drive. Again, it can be described as some sort of
logical interface between the user layer (individual files, for example,
.mp3 music files and .avi video files) and the physical layer that contains
only ones and zeros. File system translates raw binary data into what we
see when we open a file explorer in our preferred operating system. A
file system is, therefore, also a way of storing file attributes and metadata that contain additional information about the file and are used by
the operating system. This information can bear details such as access
permissions to users, a read only attribute, and the date of file creation
and modification. Please note that the file system is closely connected
with the users operating system choice. If you prefer to run Windows,
you must use NTFS. If you wish to run Linux, you must employ a different type of afile system, probably ext3.
To sum it up, a file system serves these purposes:
It sets up the way files are named. Applications such as text processors and video players use these names to access files. These
applications do not see where the particular file is physically
stored on the hard drive.
Applications access the file system using an application programming interface (API) and make requests, such as copy, paste, rename, create directory, update metadata and file removal.
A file system is always installed on a homogenous storage area
that is, in the context of operating systems, called a volume.
A file system stores information about where data is physically
located.
33
Figure 1.7 shows how a file system works in general terms. To have a
better idea how a particular file system works, we can take a closer look
at the NTFS used by Windows.
To understand how this particular system works, please refer to Figure 1.8, where you can see the Windows boot procedure. The MBR is
stored on the first sector of a hard drive. This record contains disk partitioning information, and also tells BIOS which partition is active and
bootable. The boot sector can be found on each partition and contains
information about the file structure (in Windows it is a tree structure).
The NT loader is a set of instructions describing how to use the file system and in which order to load all the drivers and libraries. At the end of
this process, we have a running operating system. Note the difference
between kernel mode and user mode. In kernel mode, the operating
system has direct access to installed hardware. In user mode, applications access these devices via the operating system and its drivers.
34
Picture.jpg
IInformation
f
ti
Picture
Operating
System
Worksheet1.
Holiday.png
Picture.jpg
user.dat
win.ini
DISK
DATA)
Kernel Mode
User Mode
Hard Disk
Master
Boot Record
Boot
Sector
NT
Loader
Operating
System
Applications
35
you create on a storage system are basically logical units that are striped
across a large number of physical hard drives. When we talked about
partitioning, you learned that you can split one physical drive into several partitions. On a storage system, this works the other way around
you usually connect several physical hard drives, or their parts, to create one partition. This partition or logical unit is then assigned a unique
identification number, which is called a logical unit number (LUN). Remember this definition because we will be talking about LUNs many
times throughout this book. The LUN created on the storage system
physical hard drive is then linked to a server. This is called mapping.
We create a logical unit and then we map it to a server. When we map
a LUN to a server, the server then sees this LUN as a physical disc and
is able to perform read and write operations as if it really were a single
physical hard drive.
From this description, we can see that physical and logical layers
work on several levels. You have physical hard drives on a storage system and then you create a logical unit and map it to a server, which sees
it as if it were a physical hard drive. The server installs a file system on
this physical drive, making it logical to the user, which can be an application or an actual person sitting in front of the server.
Though this is already a very complex topic, it is actually just the beginning. Storage systems today create so many virtualized logical layers
that it is easy to get lost. At this point, it is also good to mention why
storage systems work on so many virtualized logical layers it is because of performance. In the next chapter, well explain how we achieve
performance through virtual logical layers.
36
Microcode
In terms of mere definition, microcode is built-in software that works
on the lowest layer of instructions, directly controlling hardware equipment. It is important to note that microcode is a term used not only with
computers and storage systems but with all electronic devices that are
based on microprocessor technology. As a cell phone or digital camera
user, you may have encountered the procedure of a firmware update
you connect your cell phone or digital camera to a computer and
it downloads new internal software (firmware) from the Internet. The
term firmware can be, in most cases, interchangeable with the term
microcode. However, firmware installed in your digital camera or even
in your washing machine usually contains microcode and some kind of
user interface (menu, icons, etc.), while the term microcode can be interpreted as the most basic software, with no graphical user interface
(GUI). Microcode is usually stored on a high performance memory chip,
which can be either read only or read and write.
37
Microcode, in a way, represents the interface between the physical hardware and the user, and can be represented by another piece of
hardware, application or even end user. It can also be seen as a set of
instructions that governs the relationship between the physical layer and
the logical layer.
Remember that when we talk about storage systems, the term microcode refers to the built-in software that contains basic tools for managing
disk arrays. To perform more complicated operations (LUN mapping, RAID
group setup, replication, etc.) you need additional software provided by a
storage system manufacturer. Storage systems do not require an operating
system to run; they use microcode instead.
Data consistency
Data consistency plays a crucial role in any data storing device. To
have consistent data basically means you have valid, usable and readable data. In other words, you need to have all ones and zeros in the
right order to be able to read data. If something goes wrong in the
storage device and just one bit of information is changed, then the
whole sequence of ones and zeros becomes unreadable and useless
38
for the end user, which can be an application running on the server. Therefore, it is vitally important to employ various technological
means to maintain data consistency and prevent data corruption.
When we use a computer at home, we usually have just one hard
drive installed. If this hard drive malfunctions and one sector becomes
unreadable, we have little protection and are likely to lose the whole
file stored on this bad sector. However, all modern hard drives have bad
sector detection tools included in their microcode, and when one particular sector on the disks shows signs of possible failure, the hard drive
automatically moves the data from this sector to a more secure place
on the platter. Another way to make sure your data remains consistent
is to back it up regularly to a data storing device.
The problem of data consistency is even more painful on large storage systems. Due to redundancy concepts, there is virtually no risk of
data inconsistency caused by a single hard drive malfunction; the problem is connected with ongoing data transactions. Imagine asituation
where a bank operates a server that runs an Internet banking application. This application stores data on an enterprise storage system.
The internet banking service is used by many users and there is a large
number of transactions made every second. If the server or the storage
system loses power, there would be at least one incomplete transaction
that hasnt been stored in time. When the bank gets the power back,
this incomplete transaction is likely to cause database inconsistency. It
can be very difficult to restore data consistency, and that is why storage
systems offer sophisticated software tools that in most cases are able to
achieve this result.
When we are talking about data consistency, we can differ among:
Point-in-time consistency
Transaction consistency
Application consistency
39
Data integrity
While data consistency deals with data readability and usability
based on the correct sequence of ones and zeros (if there is one mistake in the sequence, the whole file is corrupted and unreadable), data
integrity describes accuracy, reliability and correctness in terms of security and authorized access to the file. To put it in a different way to
maintain data integrity, we have to ensure data in a database is not
altered. If we lose data integrity, data is still readable and technically
usable, but it is not correct. If we disrupt data integrity, we replace one
40
sequence of ones and zeros with another sequence of ones and zeros
that also makes sense to the storage system (i.e., data is consistent) but
alters the information carried by the data in an undesirable way.
Data can be altered either by unauthorized user access or incorrect
application and it can endanger previously stored data by overwriting it
instead of, for example, keeping both copies of the file (the original and
the altered version).
It is important to maintain integrity and trustworthiness for the entire lifecycle of the data. The reason is that the differences among the
various versions of a single file or a single database entry can also carry
useful and, in some cases, essential information. To ensure data integrity, we have a large set of tools ranging from permissions and restrictions of access to a single file or a single database entry (we can set
the data to read only and forbid writing operations), and we can also
employ data integrity policies that consist of a set of rules and govern
access and possible data alterations.
Storage administrators must take data integrity into consideration
when designing data architecture, and they must be trained on how to
set rules governing data integrity. Data integrity becomes acomplicated
matter especially when it comes to structured data, or databases.
Storage components
andtechnologies
42
43
44
platter
spindle
actuator arm
actuator
read/write head
interface
power
connector
power connector
HDD mainboard with
controller
interface
The read/write head of an HDD reads data from and writes data to
the platters. It detects (when reading) and modifies (when writing) the
magnetization of the material immediately underneath it. Information
is written to the platter as it rotates at high speed past the selected
head. The data is, therefore, written in circles called tracks. The tracks
are then divided into sectors, which represent the smallest usable space
on the hard drive.
45
An actuator arm with an R/W head is another component that highly influences the performance of a hard drive. The actuator arm must
quickly find the data on the platters; this is called seek time. Average
seek times range from 8 10ms for common hard drives. However, in
the most modern hard drives designed for use in servers, it can be only
3 5ms.
Spindle
Platters
46
Interface
HDD Bus
Storage
System
Cache
Hard Disk
Assembly
Controller
47
48
Greater bandwidth
Faster data transfer rates up to 600GB/sec
Easy to set up and route in smaller computers
Low power consumption
Easier manipulation
Hot-swap support
49
To sum it up, PATA and SCSI interface technologies are becoming obsolete. SATA can be found in most modern personal computers. Both SATA
and SAS hard drives are suitable for use in servers and storage systems.
50
You have learned how a hard drive looks and what its components
are. You have also learned that when you interconnect multiple hard
drives you get a disk array. Now its time to find out how this is done.
The key word is redundancy.
The technology behind connecting hard drives together is called redundant array of inexpensive (or independent) drives (RAID). The trick
is to combine several physical hard drives to create a single logical disk
that can be further split or partitioned into LUNs, as mentioned in the
previous chapter. The physical hard drives connected in one RAID group
must be of the same make and the same capacity they must be identical. The outcome of all this is that an operating system does not see
individual hard drives, but just one logical disk that combines the capacity of all individual hard drives.
51
Logical Disk
Physical Disk
block 1
block 2
Disk 1
Disk 2
block 1
block 5
block 2
block 6
block 3
block 7
block 4
block 8
block 3
block 4
block 5
block 6
block 7
block 8
Different types of RAID can be implemented, according to application requirements. The different types of RAID implementation are
known as RAID levels. Common RAID levels are: RAID-0, RAID-1, RAID1+0, RAID-3, RAID-5 and RAID-6. Each RAID level is a combination and
application of four features, or parameters, which are data striping,
data mirroring, data spanning and parity distribution. Selection and
implementation of RAID levels is closely connected with performance
requirements and amount of redundancy. See Figure 2.6
Data striping
Data striping is one of the basic techniques that help us achieve increased performance of a storage system. With one hard drive configuration, you store your files sequentially on the platter. When you wish
52
to access particular data, the R/W head on the actuator must find the
correct place on the spinning platters, and then it reads the sequence
of bits. This can be further complicated by data fragmentation, when
one file is stored on several locations within a single hard drive. This
is how data is accessed on personal computers. On a storage system,
performance of this sequential access to data is far from sufficient. Data
striping provides a solution to this problem.
Imagine that you have, for example, five hard drives and a single file.
You can divide this single file into five parts, and then you can store each
part on one of those five hard drives. When you want to access this file, it
will be read from five hard drives at once, combining their performance.
In a storage system, you concatenate these five drives into a single logical disk that can be further partitioned and mapped to a host (server).
Whenever you store something on this logical drive, a RAID controller
installed in a storage system will stripe the data and distribute it to the
physical drives in the fashion that will ensure the highest possible performance. Remember, by data striping, you can achieve increased read
and write speed, but you cannot improve seek time performance (with
the exception of solid state drives with no moving parts). The significant
disadvantage of data striping is the higher risk of failure when any of
the above mentioned five physical drives fails, you will not be able to
access your data anymore. In other words, there is no redundant data
that could be used for reconstruction of files stored on the damaged
hard drive. So data striping must be combined with other techniques to
ensure there is no data loss.
Data mirroring
Data mirroring is efficient and the easiest way to prevent data loss
due to malfunction or failure of one or more physical drives. The principle of data mirroring is quite simple read and write operations of
data are conducted simultaneously on two physical hard drives. One
53
file is automatically saved on two physical drives. This not only ensures
a high level of data redundancy, but it also provides increased read performance, because data is accessed on both drives. This performance
boost, however, is not as significant as in data striping. This is why these
two techniques data striping and data mirroring are implemented
together. Storage system data mirroring works on the hardware level in
the RAID controller. Note that because the data is written in duplicates,
the created logical disk is just half the size of the actual installed capacity of physical hard drives. The major disadvantage is costly application.
Data spanning
Data spanning, or linear data writing, is the simplest way of storing
data. It does not offer any increase in performance and no level of redundancy and it is therefore not suitable for enterprise application. It
can, however, be practical for home use or in small servers. The whole
trick is to concatenate several physical drives to create one logical disk.
The data is then written on this disk sequentially. In other words, when
one physical hard drive gets full, the data is stored on the next available physical drive. This can be useful when you want to have a unified logical disk on several hard drives with no demands on increased
performance. The main advantage of data spanning implementation is
that you do not need a hardware RAID controller. The other advantage
is that the system that consists of several physical hard drives is easily
scalable; when you run out of space, you can easily add more physical
hard drives. The last advantage is that if one physical hard drive fails,
the data on other hard drives is not lost [except for the file(s) that are
overlapping both physical hard drives]. Data spanning is also used on
CDs and DVDs. When you are making a backup image of your data, it
often happens that capacity of one backup medium is not enough. You
will therefore back up your system on several DVDs.
54
Parity
The last procedure for RAID implementations uses parity data to balance the disadvantages of data striping and data mirroring. Data striping
offers no redundancy and data mirroring is costly because you can only
use half of the physically installed capacity. If you want to overcome
these issues and get the best of all the above mentioned techniques,
you should use parity. In simple terms, parity data contains information that can be used to reconstruct and recalculate data on a failed
hard drive. Using parity requires at least three physical hard drives. You
would concatenate these drives to create a single logical disk. The logical disk offers twothirds of the physically installed capacity. Whenever
you perform write operation on this logical disk, the RAID controller
stripes data on two physical hard drives and then, using sophisticated algorithms, calculates parity data to be stored on the third physical drive.
Note that you can also stripe parity data. In this case, the RAID controller calculates the parity data and then stores data and parity stripes
on all three physical hard drives. Parity data works by complementing
actual data. The advantages are obvious you save money, and you
have a good level of redundancy and performance. The disadvantage is
that, if one disk fails, it takes time to reconstruct the data from the parity. However, the RAID controllers used in modern storage systems are
powerful enough to overcome even this issue, and when a disk fails, it
provides fast reconstruction times and prioritized immediate access to
required data.
55
RAID-0
In RAID-0, data is spread evenly across two or more disks, data is
stored just once without any backup copy. RAID-0 is easy to implement
and offers increased performance in terms of data access (more disks
equals more heads, which enables parallel access to more data records). This RAID level has no redundancy and no fault tolerance. If any
disk fails, data on the remaining disks cannot be retrieved, which is the
major disadvantage.
With a three disk implementation, one third of the
data is sequentially written to
each disk. In other words, if
you store a file to a logical disk
that consists of three physical
disks connected in a RAID-0
level, this file is then divided
into three pieces, and each
piece is stored on a separate
disk. We can see that RAID-0
uses only data striping.
Block 1
Block 2
Block 3
Block 4
Block 5
Block 6
Block 7
Block 8
Block 9
56
RAID-1
RAID-1 implements mirroring to create exact copies of data on two
or more disks. In this way, it can offer increased availability of the data,
resistance to a disk failure and, thus, a good level of redundancy. RAID-1
is easy to implement, and it reduces the overhead of managing multiple
disks and tracking the data. It provides fast read/write speed because
the system can read from either disk. If a disk fails, RAID-1 ensures there
is an exact copy of the data on the second disk. This RAID level also has
some disadvantages. The most important is limited capacity in RAID1, the true storage capacity is only half of the actual capacity, as data is
written twice. This also makes RAID-1 implementation more expensive
it doubles the costs.
RAID-1+0:(2D+2D)
RAID 1
A1
A1
A2
A2
A3
A3
A4
A4
8 Tracks
A1
B1
C1
D1
E1
F1
A2
B2
C2
D2
E2
F2
A1
B1
C1
D1
E1
F1
A2
B2
C2
D2
E2
F2
Disk 1
Disk 2
Disk 3
Disk 4
Primary Data
DISK 0
DISK 1
Primary Data
RAID-1+0
RAID-1+0 is an example of multiple, or nested, RAID levels. A nested
RAID level combines the features of multiple RAID levels. The sequence
in which they are implemented determines the naming of the nested
RAID level. For example, if RAID-0 is implemented before RAID-1, the
RAID level is called RAID-0+1. RAID-0+1 combine the features of RAID-0
and RAID-1 by mirroring a striped array. The advantages of this connec-
57
tion are obvious you get fast read/write speed and data protection
based on redundancy of data. The only disadvantage is high implementation cost.
58
RAID-5:(3D + P)
D2
D1
Parity
D3
D3
D2
D1
Parity
Parity
D3
D2
D1
Disk 1
Disk 2
Disk 3
Disk 4
RAID-6:(6D + 2P)
D2
D3
D4
D5
D6
P1
P2
D1
D3
D4
D5
D6
P1
P2
D1
D2
D4
D5
D6
P1
P2
D1
D2
D3
D5
D6
P1
P2
D1
D2
D3
D4
D6
P1
P2
D1
D2
D3
D4
D5
P1
P2
D1
D2
D3
D4
D5
D6
P2
D1
D2
D3
D4
D5
D6
P1
Disk 1
Disk 2
Disk 3
Disk 4
Disk 5
Disk 6
Disk 7
Disk 8
The last few pages provided you with fundamental redundancy concepts based on RAID implementation. You should also understand the direct connection of RAID implementation to the performance of a storage
system.
You now know what a hard drive is and how it works. You also know
basic principles of interconnecting hard drives together. In the next pages,
we will essentially build up a midrange storage system, taking a look at all
the components and their architecture.
Remember the key principles: data striping, data mirroring, redundancy and parity. You will hear about these often.
59
RAID-5 you need at least three hard drives two for data and one
for parity. However, this configuration can be extended. When we use
adistributed parity, we can, for example, have seven hard drives used
for data and one for parity and the result will still be RAID-5. Usually it
is more advantageous to use a higher number of hard drives than allowed by a minimal configuration, especially for performance reasons.
The more hard drives you have in a parity group, the more data stripes
created, which offers higher read speed. Storage systems from various
vendors support different RAID level configurations. These RAID level
configurations are also often called a RAID level range. The RAID parity
group is a building block of a RAID group. In other words, if you create a
RAID-5 group in an eight disk configuration (7D+1P) this will be your
parity group. Several parity groups will then make a RAID group. Remember that hard drives used in one RAID group must be of the same
make, capacity and interface.
Random Read,
Sequential Read
Sequential Write
Random
Write
RAID-1+0 (2D+2P)
100 %
100 %
100 %
RAID-5 (3D+1P)
100 %
150 %
50 %
RAID-5 (7D+1P)
200 %
350 %
100 %
RAID-6 (6D+2P)
200 %
300 %
66.7 %
Note
Proportional to the
number of disks
Proportional to the
number of data disks
60
The second column shows that the random read and sequential read
performance is proportional to the number of disks, because the disks
can be accessed simultaneously. With sequential writes, there are no
reads involved as with random writes. Therefore, the performance is
proportional to the number of data disks.
As for the random writes column, the reason for the performance
difference between RAID-6 (6D+2P) and RAID-5 (7D+1P) is that RAID-6
(6D+2P) must process 1.5 times (see below) more disk I/Os than RAID-5
(7D+1P). Therefore, the random write performance in RAID-6 (6D+2P) is
33% lower than with RAID-5 (7D+1P). The number of disk I/Os in RAID-5
random writes is four (old data/old parity reads, new data/new parity
writes). The number of disk I/Os in RAID-6 random writes are six [old
data/old parity (P)/old parity (Q) reads, new data/new parity (P)/new
parity (Q) writes].
The values listed in the table were collected using the Hitachi Adaptable Modular Storage 2000 family system. The table is meant to illustrate that different applications may require different performance
characteristics from its RAID group.
61
responsible for all the operations. All operations are controlled by the
central processing unit (CPU), which also calculates parity information.
Performance of the CPU is therefore diminished. As a result, this type of
solution is applicable only in home environments and is not suitable for
application in business environments.
Some manufacturers of mainboards for personal computers offers
dedicated RAID chip controllers. This is, in most cases, just marketing
because all the computing is done by the CPU. This type of solution
also does not usually support interfaces other than SATA or the now
obsolete IDE.
Hardware RAID controllers, on the other hand, offer much more than
their software counterparts. They include dedicated processing units
for parity data calculation, and they have cache memory for improved
I/O operations. If one disk fails, the reconstruction is automatic and
faster. You do not need any additional software, and you can configure
the RAID array and perform all the operations by accessing the built-in
microcode interface. Hardware RAID controllers can ensure data consistency by adding an independent backup battery that is connected to
cache and keeps it alive in case of power disruption. When the power is
back online, data in the cache is stored on the hard drive, and the user
does not lose any data. A hardware RAID controller is usually employed
in servers or in high performance professional computers. In storage
systems, you always have a powerful integrated RAID controller modified to support a large number of individual hard drives.
62
Spare disks
In any storage system you must have spare hard drives so that if one
disk in a RAID group fails, there is another one ready to take over. The
damaged hard drive is later manually replaced with a new disk.
There are two methods that support sparing out of RAID group
data:
Correction copy
Dynamic sparing
Correction copy occurs when a drive in a RAID group fails and a compatible spare drive exists. Data is then reconstructed on the spare drive.
Figure 2.10 shows how the reconstruction is conducted depending on
the RAID level implemented.
Dynamic sparing occurs if the online verification process (built-in diagnostic) determines that the number of errors has exceeded the specified threshold of a disk in a RAID group. Data is then moved to the spare
disk, which is a much faster process than data reconstruction. See Figure 2.11 for details.
Notice that the speed of reconstruction of a failed disk depends on
the implemented RAID level. If we have a RAID-1 level, which uses only
mirroring and no parity information, the whole process of reconstruction is simple, because the system just copies the data from the mirrored drive to the spare disk. With RAID-5, the reconstruction will take
longer, because lost data must be calculated from the remaining disks
and parity information. In RAID-6, we have double-disk failure protection if two disks fail, the procedure takes even longer than in RAID-5.
In Figures 2.10 2.13 all the diagrams are marked with an I/O is possible title. This means the data is accessible even during the process of
reconstruction. Performance of storage will be affected, but you have
immediate access to your data because requested data reconstruction
is prioritized during the process. Remember that in the event of dou-
63
ble-disk failure on a RAID level other than RAID-6, you lose all the data
and you are not able to reconstruct it using conventional means. That is
why RAID technology is usually combined with other techniques to ensure redundancy and high availability. The most common of these techniques is data replication, which will be discussed in its own chapter.
RAID-5 (4D+1P)
I/O is possible
A, B, C,... : data
P: parity data
A, B, C,... : mirror data
Controller
Disk Drive
Disk Drive
Disk Drive
Disk Drive
Disk Drive
Spare Disk
RAID-6 (4D+2P)
I/O is possible
Controller
Disk Drive
Disk Drive
Disk Drive
Disk Drive
Disk Drive
RAID-1
I/O is possible
C
Spare Disk
RAID-1+0 (2D+2D)
I/O is possible
Controller
Controller
Disk Drive
Disk Drive
Disk Drive
Disk Drive
Disk Drive
Disk Drive
Disk Drive
Spare Disk
Figure 2.10 This diagram shows the process of data reconstruction in event of failure of
one or more hard drives. In RAID-6 we have two parity disks: P and Q.
I/O is possible
A, B, C,... : data
P: parity data
Controller
Disk Drive
Disk Drive
Disk Drive
Disk Drive
Disk Drive
Spare Disk
64
On most modern storage systems, correction copy can have two parameters set by a storage system administrator. These two parameters
are copy back and no copy back, and they influence the behavior of
the storage system once the failed drive is manually removed from the
storage system and replaced by a new hard drive. Depending on the
settings, the storage system can move data from the spare disk that was
used for data reconstruction to this new drive, or it can set this newly
added hard drive as a new spare disk while the former spare disk becomes the regular member of the RAID group.
In Figure 2.12, you can see the diagram of copy back settings.
Acopy back procedure takes less time than data reconstruction, but it
can still have a minor impact on the storage system performance. That
is why it should be conducted when the system is not on high load, for
example at night. Copy back parameter can also be used with dynamic
sparing technology.
65
A, B, C,... : data
P: parity data
Controller
Disk Drive
Disk Drive
Disk Drive
Disk Drive
Disk Drive
Spare Disk
A, B, C,... : data
P: parity data
Controller
Disk Drive
Disk Drive
Disk Drive
Disk Drive
Disk Drive
Spare Disk
I/O is possible
Controller
Disk Drive
Disk Drive
Spare DIsk
Disk Drive
Disk Drive
Disk Drive
66
Figure 2.14 Hot-swappable hard drive units include a small drawer that simplifies disk installation.
In SATA hard drives, an additional board with electronics is included. These electronics turn the SATA
interface into a SAS interface with two ports for two
independent controllers to ensure redundancy. SAS
hard drives have this double architecture implemented natively.
For storage systems, the term hard drive enclosure often covers not
only the hot-swappable disk chassis, but the whole expansion unit that
is basically a rack for installing many individual physical hard drives. In
67
other words, in the context of storage systems, the terms hard drive
enclosure and expansion unit are interchangeable.
An expansion unit or disk enclosure (based on SAS architecture),
consists of following components:
Individual hard drives (SAS, SATA, SSD)
Expander (buses and wires for connecting the drives together)
Power supplies
Cooling system (fans)
Chassis
Power/Ready/Warning/
Alarm/HDD LEDs
Disk drives (15)
Front view
Rear view
Power units
Figure 2.15 Front and rear view of an expansion unit/disk enclosure, Hitachi Adaptable Modular Storage 2000 family.
In Figure 2.15, you can see how a typical expansion unit looks. Note
that we usually have two types of expansion units a standard expansion unit, which can usually accommodate 15 hard disks, and a high
density expansion unit, which can accommodate up to 50 hard drives.
In high density expansion units, the disks are installed vertically and are
not directly accessible from the front panel. The advantage is that you
save space. The disadvantage is more complicated access to individual
disks and demand for more powerful cooling system (more disks generate more heat). Also notice the status LED lights that are a standard
68
part of regular expansion units and serve for monitoring purposes. Another reason each disk is accompanied with LED lights is localization
of a particular hard drive. Imagine that one disk in the enclosure fails.
The reconstruction process starts immediately and a spare disk is used.
An illuminated LED marks the failed disk that needs to be manually replaced. In high density expansion units, this is more complicated since
you need to pull out the whole expansion unit from the rack, remove its
cover and then localize the failed hard disk.
On the rear side of an expansion unit, you can see ports for connecting the enclosure to the back end controller of a storage system. These
ports can be based either on Fibre Channel or SAS technology, whichever is more efficient. We will describe the operations and differences
of these two technologies in detail later in this chapter. You can also see
power cord outlets.
We know that each SAS disk is equipped with two SAS connectors.
An expansion unit has two expanders each one is connected to one
SAS connector on each hard drive. If we use a SATA hard drive, we need
an additional device that will convert one SATA port into two SAS ports.
All this is done to meet cause of redundancy requirements. Storage systems are also equipped with double architecture every component
in a storage system is duplicated to maintain the no single point of failure requirement. This means that we also have two back end controllers that are connecting expansion units. Any hard drive is, therefore,
connected to both back end controllers via expanders and SAS cables.
See Figure 2.16.
69
Base unit
CTL0
CTL1
Expansion
unit
Expansion
unit
Expansion
unit
Expansion
unit
Expansion
unit
Expansion
unit
Expansion
unit
Expansion
unit
70
71
Base unit
CTL0
Front end
DCTL/RAID
Front end
cache
cache
SAS
Controller
WL
WL
Expander
WL
SAS
Bus
DCTL/RAID
SAS
Controller
SAS
AAMux
SATA
SAS
Bus
cable
CTL1
WL
WL
Expander
WL
SAS
Bus
SAS
Bus
cable
Expansion unit
ENC0
ENC1
WL
Expander
WL
SAS
AAMux
SATA
WL
Expander
WL
72
73
TOKEN
PATH 1
FC Port
Back end
Controller
5
6
Figure 2.18 Fibre Channel Arbitrated Loop back end architecture. Only one hard drive can
communicate with the controller at a time. A token determines which device can communicate
with the controller; other hard drives have to wait for their turn.
74
figuration with four SAS links, there can be four hard drives communicating with the controller at the same time, each with a data transfer
rate of 3Gb/sec. We know that FC-AL can deliver 4Gb/sec. In SAS architecture we can, therefore, reach three times higher performance. FC-AL
provides good performance with larger chunks of data written sequentially, while SAS performs much better with random I/O operations. We
can illustrate this with an example: imagine that you have a train and
a car. Train cargo transportation is efficient when you have lots of load
directed to one destination. On the other hand, the car will be more
efficient with small loads to multiple locations. Since storage systems
must handle more small random I/O operations, SAS back end architecture offers better performance.
Controller 1
SAS wide cable
Expander
Controller 2
SAS wide cable
Enclosure
15 Disks
4-SAS
Links
4-SAS
Links
SAS
Disk
Slot 0
SAS
Disk
Slot 1
SAS
Disk
Slot 2
SATA
Disk
Slot 3
SAS
Disk
Slot 4
4-SAS
Links
SATA
Disk
Slot 5
SSD
Disk
Slot 13
To
next
tray
SSD
Disk
Slot 14
4-SAS
Links
To
next
tray
Expander
Figure 2.19 SAS back end architecture. Hard drives are connected to expanders, which are
placed in enclosures. In this picture, we can see an example of a four SAS link expander.
75
Cache
We have described individual hard drives in detail and how they are
connected together. You learned that the hard drive is the component
that stores data, and its performance is measured in transfer data rate.
In paragraphs dedicated to RAID, you learned that data striping can increase performance of a storage system the storage system performs
more I/O operations per second. I/O operations are basically read and
write requests coming from the host, which is the server. Nevertheless, hard drives are still the major bottleneck in storage systems. Their
performance is not sufficient, simply because of the limitations of their
construction. When you want to perform I/O operations with a hard
drive, you have to wait for the platter to turn into the right position and
for the actuator to move the head on the spot where you want to write
or read data. You may object that solid-state drives already overcame
these limitations, but they are still very expensive and their limitation is
somewhere else solid-state drives offer a reduced number of writes
per one memory unit, compared to common hard drives with platters.
The technology of solid-state drives is advanceing, but it still needs perfection. The component that drastically improves storage system performance is cache.
76
77
Two kinds of cache can be implemented: global cache for both read
and write operations or dedicated cache that supports either read or write
operations, but not both. Modern systems are equipped with global cache.
In midrange storage systems, the cache provides capacity of at least
8GB 16GB. Cache is the component that lies between the storage systems front end and back end.
Note, that individual hard drives also have small built-in cache to improve their performance. Capacity of this built-in cache ranges from 8MB
64MB. This cache basically works on the same principles as cache in a
storage system. Built-in cache improves performance of the external data
transfer rate compared to the internal data transfer rate.
78
Cache operations
Cache operations are controlled by the CPU and interface board that
are usually part of the front end. Because the capacity of cache is limited, efficient data management is of crucial importance. To maintain uninterrupted and fast data flow from the host to the back end hard drives
and vice-versa, the storage systems microcode contains algorithms that
should anticipate what data is advantageous to keep (for faster read access) and what data should be erased. There are two basic algorithms
that affect the way cache is freed up:
Least Recently Used (LRU) Data that is stored in cache and has
not been accessed for a given period of time (i.e., data that is not
accessed frequently enough) is erased.
Most Recently Used (MRU) Data accessed most recently is
erased. This is based on the assumption that recently used data
may not be requested for a while.
The capacity of cache is shared by two basic kinds of data read
data, coming from the hard drives to the servers, and write data, coming from the servers to the hard drives. The ratio of cache available
for read and write operations is dynamically adjusted according to the
workload.
Other algorithms govern the behavior of cache when it is getting
full or is underused. When the cache approaches the value set by the
microcode, dirty data is immediately sent to the hard drives for write.
Dirty data is a term for write data that has been confirmed to the server
as written but has not been written to actual hard drives yet.
Other operations are based on configuration. Storage systems are
usually supplied with software for cache management and cache partitioning. Cache partitioning can be useful because various applications
running on servers have various requirements on cache. Optimal configuration of cache highly influences performance of the whole storage
system and the whole infrastructure.
79
80
81
server to one LUN. One of these paths goes via CTL0, and the other goes
via CTL1. Similarly, back end enclosures (with individual hard drives) are
connected to back end controllers.
(4) 8Gb/sec Fibre Channel Ports
QE8
Controller 0
Celeron
1.67GHz
1 GB RAM
MCH
+ICH
QE8
PCI-e x8
2GB/sec
PCI-e x8
QE8
Hitachi
Adaptable
Modular
Storage 2100
QE8
Controller 1
Celeron
1.67GHz
MCH
+ICH
PCI-e x8
2GB/sec
PCI-e x8
1 GB RAM
2 x 2GB/sec
RAID Processor
(DCTL)
4GB/sec
2-4GB
Cache
2-4GB
Cache
4GB/sec
PCI-e x8
SAS CTL
Expander
4-15 HDDs
PCI-e x8
RAID Processor
(DCTL)
Internal
Enclosure
SAS CTL
Expander
1
Tray 0
15 HDDs
internal
Tray 1
Tray 2
15 HDDs
15 HDDs
Tray 3
Tray 4
24 HDDs
ENCLOSURE
STACK
24 HDDs
Tray 5
High Density
Tray
120-159 Disks
(SAS, SATA, SSD)
Figure 2.20 Overview of the whole mainstream midrange storage system architecture and
components using Hitachi Adaptable Modular Storage 2100 as an example. In the front end,
we have QE8 Fibre Channel port controllers that are part of the interface board. These controllers are mainly responsible for conversion of FC transfer protocol into PCIe bus used for internal
interconnection of all components. The storage system in this configuration is not equipped
with FCoE or iSCSI ports because they are optional. Notice the CPU and local RAM memory (not
cache). This processor is the microcode engine or the I/O management brain.
82
LUN ownership
Earlier we discussed that, in a storage system, we have to configure
data paths. The data path starts in a server that has two FC ports for
connection to two FC ports on a storage system one port is located
on CTL0 the other one is on CTL1. From the interface board, data is sent
to cache. The DCTL/RAID processor then takes care of data striping and
sending the stripes to their destination physical hard drives.
On the user level, we have to configure where the path leads we
have to map the LUN to a server. From one server to one LUN, we have
two paths for redundancy. When one path fails, data can be transferred
through the second path. As you learned, in a storage system we have
double architecture with two controllers for redundancy, but so far, we
have not mentioned that double architecture also equals double performance if no malfunction occurs. Since the storage system is accessed
from several servers, there are also several paths leading to different
LUN destinations. Imagine you have five servers accessing five LUN destinations. This makes 10 data paths. To use all hardware means efficiently, each controller is assigned a certain number of LUNs. When we create LUNs, we must also set LUN ownership, or decide which controller
will be responsible for which LUN. This way we can split the workload
efficiently between two controllers.
Data will then be transferred primarily through the path that leads
to the controller that owns the particular LUN. The second path is used
as a backup. In event of failure of the path that leads to owning controller (lets say CTL0), the backup path is used. However, the backup path
leads to CTL1, which is not set as a LUN owner. Therefore, data is transferred internally to CTL0, bypassing the malfunctioning FC port.
The path failure is depicted in Figure 2.21 on the next page. This configuration, when one controller is dominant and the second one is for
backup purposes, is called active-passive architecture.
83
Host
1. The host changes access path
because of a path failure
Port
CTL 0
Port
CTL 1
2. LU ownership
is not changed
LU0
Figure 2.21 Active-passive architecture. In the event of a path failure, LUN ownership does not
change. Data is transferred via the backup path to CTL1 and then internally to CTL0, bypassing
CTL0 front end ports. This diminishes performance because of internal communication.
Modern storage systems, however, add the functionality of activeactive architecture, which brings several major advantages. In activeactive architecture, you can use more than one path to a single LUN
simultaneously.
84
HBA HBA
HBA HBA
HBA HBA
HBA HBA
LUN:0
LUN:1
LUN:2
LUN:3
CTL0
CTL1
Figure 2.22 Multiple paths to a single LUN are possible. LUN ownership automatically changes
in the event of path failure. Unlike active-passive architecture, active-active architecture offers
equal access to the particular LUN via both paths. This means the performance is not influenced
by what path is currently used.
If both paths are working correctly, both of them can be used for
communication. In Figure 2.22, you can see that one path from the first
server to LUN:0 is connected directly, and the second path to the same
LUN leads via CTL1 and also include internal communication. This requires communication overhead; however, this overhead has been reduced drastically through the implementation of modern PCIe buses in
modern storage systems. The PCIe interface allows fast internal communication between CTL0 and CTL1.
In the event of path failure in active-active architecture, LUN ownership can be dynamically changed. When the storage administrator
creates the LUN, ownership to the controller is assigned automatically.
85
Firmware
- Create LU0
- Create LU1
- Create LU2
- Create LU3
LU0
LU1
LU2
LU3
The active-active front end design of modern storage system controllers allows for simultaneous host access of any LUN on any host port on either controller with very little added overhead. A host accessing a LUN via
a port on CTL0 can have most of the I/O request processed completely by
CTL1 with little intervention by the main CPU and DCTL processors in CTL0.
86
SW
Port
SW
Port
Bottleneck
Load balance
monitoring
MPU
MPU
CTL 0
CTL 1
LU0
LU3
LU1
LU2
SW
Port
Bottleneck
recovered
SW
Port
MPU
MPU
CTL 0
CTL 1
Automatically
LU0
LU3
LU1
LU2
LU1
Before
After
Figure 2.24 Controller load balancing diagram. Black boxes at the top of the pictures are host
servers. SW stands for switches, which are part of the storage area network (discussed in the
next chapter). MPU stands for microprocessing units (DCTL together with MPU). Ifone MPU is
too busy while the other one is underused, LUN:1 ownership is automatically changed to CTL1
and the corresponding path is redirected.
87
At this point, it is good to mention that one more element plays arole
in using several paths simultaneously. If you have one host server connected to a single LUN via two paths, then the servers OS will see two
mapped LUNs because it has no chance to realize that there is only one
LUN connected via two paths. Thats why there must be an additional
software layer installed on the host server. This software layer is based
on dynamic link provisioning and enables the server to work with the
mapped LUN via two paths. Note that dynamic link provisioning cannot
get the information that one particular controller is too busy. That is
why the storage system must redirect the path to aless busy controller, exactly as in Figure 2.24. Notice that the green arrow in the After
diagram is redirected within the storage system, even though it would
be better to change LUN ownership and communicate directly via the
second path. This example demonstrates that even active-active architecture is still imperfect.
88
iSCSI interface
iSCSI interface is an alternative to Fibre Channel interface. Fibre Channel ports are powerful, efficient and can be used for transmitting data
over long distances. However, their disadvantage is that they require the
entire network infrastructure to be built on Fibre Channel technology,
using Fibre Channel optical wires. This means that you have to build a
new SAN, and all the components, such as switches, servers and other
storage systems, must support Fibre Channel. This can be costly. To balance output performance and costs, it is possible to use iSCSI front end
ports that are often optional parts of a midrange storage system. The
first letter in iSCSI means Internet. This technology is therefore built on
a combination of TCP/IP and SCSI protocol. Through the technique of
encapsulation of SCSI instructions to TCP/IT protocols, data can be transmitted without special requirements on a network environment (i.e., a
specialized SAN network). This allows you to connect your storage sys-
89
tem to a LAN or WAN (especially Internet) using common Ethernet cabling. The advantage is that you can use your infrastructure and you do
not have to invest in costly Fibre Channel components. The downside
is that the performance of this solution is much lower. Therefore, iSCSI
connections are usually implemented in small companies that use only a
few midrange storage systems. Any storage system equipped with iSCSI
ports also offers Fibre Channel ports, so it is not complicated to upgrade
to an Fibre Channel infrastructure if needed.
90
Enterprise storage systems offer a higher level of reliability, scalability and availability. We know that midrange storage systems have two
independent controller boards, which contain components that can be
attributed to front end, cache or back end. In enterprise storage systems, these components are installed on dedicated boards that can be
replaced individually. They are also often hot swappable, so there is no
need to turn off or reboot a storage system when replacing. In enterprise storage systems, you would have one board for the DCTL controller, one board for cache, boards for individual front end ports, etc. This
also means that you can install more than one instance of the component, which makes enterprise storage systems more flexible and more
scalable. In case you decide to upgrade your enterprise storage system,
for example, with additional ports or cache, you just need to order the
specific board that is then installed into the grid switch that provides
the platform and interface for mutual communication of the installed
components. For redundancy reasons, whichever component you order is always duplicated to maintain no single point of failure. The set
of two identical components that are to be installed in the grid switch is
called an option or feature. You can, therefore, order one cache option
that will consist of two identical modules, one for CTL0 and one for CTL1.
Enterprise storage systems can offer additional performance by
providing:
A large storage capacity more back end ports can be installed
and more expansion units can be connected to these back end
ports.
A significantly larger amount of cache to service host I/O optimally.
The ability to deliver high performance in multi-thread processing it can handle much higher number of I/O requests from
multiple hosts.
Higher throughput the internal buses and interface are able to
handle more data faster than midrange storage system. Throughput is measured in GB/sec.
91
92
nology. Therefore, you can have a fast tier for frequently accessed data
based on SSD drives. Then you can have a tier based on SAS or SATA
drives or an archiving tier with tape backup technology. Automated dynamic tiering is very useful and will be discussed later when we talk
about archiving.
In Figure 2.25, we can see the Hitachi Virtual Storage Platform, which
is an enterprise storage system that brings state-of-the-art technology
and many innovative features. One of them is 3D scaling, which allows
you to expand the storage system into three dimensions you can add
performance and storage space, and you can connect other vendors
storage systems and virtualize them.
This chapter provided you with detailed information on the design and
architecture of midrange storage systems. Now you should be able to:
Name all individual components and explain their function.
Explain basic concepts of achieving and improving performance.
Explain basic concepts that ensure data redundancy.
In addition, we compared midrange and enterprise storage systems.
Both are based on very similar concepts and architectures the main differences are in performance and application.
Storage networking
and security
What are you going to learn in this chapter?
How to describe basic networking concepts
How common network devices operate
What possibilities we have in storage system networking
How devices communicate with each other through
anetwork
How to secure storage area networks
94
Introduction to networks
Nowadays, a standalone piece of IT equipment is of little use. It is
the ability to communicate with other devices that makes technology
so powerful. People would hardly spend so much time in front of their
computers if they could not share the outcome of their work with other
people. When we say the ability to communicate with other devices is
important, we especially mean the ability to transmit and receive information encoded in the form of data to and from other members of
a network. A network, therefore, describes an infrastructure of technical means that facilitate this interchange of information. These means
are both hardware and software based. Among the hardware network
components, there are cables, network adaptors installed in personal
computers, servers, storage systems or other devices, and components
dedicated for routing the stream of data to its destination. The software layer of networking consists mainly of protocols that contain sets
of instructions that govern the process of communication among the
network members. Each protocol supports a specific area of application (i.e., type of network) and type of network adaptor. For each type
of network, we need adedicated solution through a particular set of
cables, network adaptors and protocols. Storage systems are highly sophisticated devices that must be able to provide extremely high performance. As a result, storage systems require a different type of networking than regular workgroup computers. In this chapter, we will focus
primarily on storage system networking.
95
Network cables
Network cables are the basic building block of any wired network.
They serve as the medium through which data flows from one network
96
In Figure 3.1, you can see an example of a twisted pair cable. Atwisted pair cable is usually made of copper and has pairs of insulated wires
twisted together. The twisting reduces electromagnetic interference
and cross-talk from neighboring wires. Twisted pair is the least expensive type of cabling used in networks. However, it provides limited distance connectivity. This means that you cannot use twisted pair cables
in a wide area network (WAN). The maximum available throughput
depends on the cable quality and construction. The tighter the twisting, the higher supported transmission rate. The cables allowing higher
speeds are also more expensive. The common throughput ranges from
10Mb/sec to 1Gb/sec. The maximum cable length reaches only 100m.
Twisted pair cable usually use RJ45 standardized connectors well known
from common network adaptors installed in personal computers and
laptops.
Cable Jacket
Strengthening Fibres
Coating
Cladding
Core
97
Fiber optics is a technology that uses glass or plastic fibers to transmit data as light impulses. A fabric optic cable consists of a bundle of
fibers; each fiber can transmit millions of messages modulated onto
light waves. Usage of glass or plastic materials makes the cable resistant
to electromagnetic interference. However, these materials also make
the cable fragile. That is why additional cladding and strengthening fibers are added. Fiber optic cables have the following advantages over
twisted pair cables:
Provide a greater bandwidth
Allow data to be transmitted digitally
Can carry data over long distances
All these advantages make fiber optic cables ideal for WANs. The
main disadvantage of fiber optic cables is that they are expensive to install. The decision to use a twisted pair or a fiber optic cable in a network
depends on the protocol used. It is possible to distinguish between multi-mode optical fiber and single-mode optical fiber. Multi-mode optical
fiber is suitable for communication over short distances, up to 600m.
This points to the area of application in an organizations infrastructure.
Single-mode optical fiber is suitable for long-distance communication.
The difference in technology is that multi-mode optical fiber is cheaper,
since it has a larger and more robust core that allows cheaper implementation of light emitting diodes (LEDs) and vertical-cavity surfaceemitting lasers (VCSELs). In simple terms, the manipulation with multi-mode optical fiber is easier and less expensive to implement. Fiber
optical cables are color coded to easily show the type. Single-mode
optical fiber cables are yellow, and multi-mode optical fiber cables
are orange. However, purely for practical reasons, sometimes we use
different colored cables of a single type to distinguish among paths in a
network and simplify cable management.
As for the connectors that are used with fiber optic cables, the situation is more complex than with twisted pair. There are many standards
of fiber optic connectors depending on the area of application. Optic ca-
98
bles are used not only in networking, but also for transmission of audio
signal and other purposes. The common connector for fiber optic cables used with networking of storage systems is called LC, which stands
for Lucent connector (name of manufacturer), little connector or local
connector. Fiber optic cable connection to aport also requires a special
device able to convert optical signals into electrical signals and vice
versa. This device is called a transceiver. Sometimes the transceiver is
built-in as part of a port, but more often it is a standalone component.
There are two types of transceivers commonly used in storage system
networking:
Gigabit interface converter (GBIC)
Small form-factor pluggable (SFP)
It is possible to say that both GBIC and SFP do the same work, while
the GBIC solution is older and becoming obsolete. SFP transceivers are
smaller and offer higher speeds. Both types of transceivers are highly
reliable and hot swappable. There are several sub-types of SFP transceivers that are implemented according to the area of application.
For multi-mode fiber with orange color coded cable coating:
SX 850nm, maximum 4.25Gb/sec over 150m
99
100
A media access control (MAC) address is a worldwide unique identifier assigned to a network interface card (NIC). This identifier is not
dependent on protocol, such as IP addressing, because a MAC address
is built into the device by its manufacturer. This is why MAC addresses
are sometimes referred to as physical addresses. A MAC address looks
something like this: 01:2F:45:67:AB:5E. As you can see, hexadecimal
digits separated by colons or hyphens are used. Behind hexadecimal
digits there is a 48b binary code.
A World Wide Name (WWN) is a unique label, which identifies
aparticular device in a Fibre Channel network. To be more precise, it
is used for addressing Fibre Channel communication targets or serial
attached SCSI (SAS) targets. As you may guess, WWN addresses are
used in storage area networks (SANs), the primary focus of this chapter. Later we will discuss the structure and usage of WWN addressing.
Except for such nodes as computers, printers or even storage systems, which we call endpoint network members, we have network devices that serve primarily as data stream redistribution points. To understand how data is transmitted and distributed, it is necessary to get
familiar with the Open Systems Interconnection (OSI) model definition
of network communication layers.
The lowest communication layer, the physical layer, is represented
by astream of bits and their transport over a network. On this level of
communication we deal just with ones and zeros.
The second layer is called the data link layer, in which data frames
are transported over a network. A frame consists of three basic elements: a header, payload field and trailer. A payload field contains the
usable data we want to transfer the data that conveys information.
The header and the trailer carry metadata that notify the beginning of
a frame, the end of a frame and, most importantly, the originating device and target device physical addresses (i.e., MAC or WWN). The primary function of the second OSI layer is, therefore, physical addressing.
Thanks to this metadata, we know exactly where the frame is coming
101
Host Layers
data
data
data
Media Layers
segments
packets
frames
bits
layers
application
presentation
session
Interhost Communication
transport
End-to-End Connections
and Reliability
network
data link
physical
102
103
Client
Server
Nodes
Hub
2
Storage
Printer
3.5 Network nodes are all the devices connected in the network. We distinguish between
endpoint communication nodes and data redistribution nodes.
104
105
2
1
Switch
Fiber Chanel
Storage
ge
Figure 3.6
Switched fabric
topology. All the
devices in the network can communicate with each
other any time at
full speed.
106
The disadvantages are that one or two servers usually do not use
the full potential of a storage system, which is achieved by sharing the
hardware means across the network.
LAN
Server
Storage
Figure 3.7 Direct attached storage infrastructure. Server is directly connected to a storage system. Storage system can be accessed only through the server. Server can be accessed from LAN.
107
108
12-bit extension
4-bit prefix
64-bit
WWN example - 5 0 0 6 0 E 8 0 1 0 4 05 3 0 3 0 1 6
Figure 3.8 Host bus adaptor and an example of WWN.
109
connection of up to the theoretical maximum of 16 million devices, limited only by the available address space (224). Multiple switches in a
fabric usually form a fully connected mesh network, with devices being
on the edges of the mesh.
Fibre Channel switches are not highly reliable or scalable. To ensure
high availability and no single point of failure, each node needs duplicate paths across the SAN via independent Fibre Channel switches.
A Fibre Channel director is a large and complex switch designed for
application in large storage area networks such as enterprises or data
centers. It is highly available, reliable, scalable and manageable. It is designed with redundant hardware components, which provide the ability
to recover from a non-fatal error. Fibre Channel directors are capable of
supporting mainframe connection via FICON protocol. FICON protocol
will be described in detail later in this chapter.
110
iSCSI is best suited for web server, email and departmental applications. iSCSI ports can be found in a storage system front end interface
board. Storage systems usually support both Fibre Channel ports and
iSCSI ports.
111
simultaneously. See Figure 3.9 to get a general idea of how NAS solutions actually look.
Clients
NAS Head
Application
Server
Print
Server
Storage
NAS Device
Figure 3.9 Network attached storage. The NAS device is represented by the server that functions as the NAS head and common storage system. There are solutions that integrate both
these functionalities in one package (NAS appliances). NAS devices work relatively independently; they do not require servers with applications. All clients, application and other servers
can access files stored in a NAS device.
112
Network interface using one or more interface cards (NIC), typically 10Gb Ethernet
Network file system protocols (NFS, CIFS)
Internal or external RAID storage
File system
Cache
Industry standard storage protocols to connect to and manage
physical disk storage resources
The advantages of a NAS solution are that it offers storage to computers running different operating systems over a LAN. It can be implemented within the common network infrastructure with no requirements for SAN Fibre Channel technology. It solves the problem of
isolating data behind servers typical for DAS and even SAN block-level
storage systems. NAS devices are optimized for sharing files between
many users. Implementation to current infrastructure is easy.
The disadvantages are that a NAS solution relies on the client-server
model for communication and data transport, which creates network
overhead. Compared to SAN, NAS provides lower performance. Furthermore, NAS provides only file sharing no other traditional file server
applications, such as email, database or printing servers, are supported.
The mentioned advantages and disadvantages suggest the area of
application to be midsized organizations with remote departments, or
branch offices of large organizations, with networks that have a mix of
clients, servers and operations. It is ideal for collaborative environments
with file sharing between departments. NAS solutions are successfully
implemented where there are requirements for easy file sharing, software development, computer-aided design and computer-aided manufacturing (CAD/CAM), audio-video streaming, publishing, broadcasting,
etc.
113
114
NAS Gateway
Hosts
SAN
Raid Storage
Figure 3.10 Converged SAN and NAS infrastructure. NAS head with storage connected over
SAN. NAS scales to the limits of SAN, which is limited by the NAS file systems capacity. Implementation of several NAS gateways is possible; this type of solution is scalable. Converged solution can coexist with application servers, and it can be centrally managed.
DAS
NAS
Application
File System
Direct
Connected
SAN
Application
File System
IP Network
File System
SAN
115
Network protocols
A protocol is a set of rules that governs communication between
computers on a network. It regulates network characteristics in terms
of access method, physical topologies allowed in the network, types of
cable that can be used in the network and speed of data transfer. The
different types of protocols that can be used in a network are:
Ethernet
Fibre Channel Protocol (FCP)
Fiber connectivity (FICON)
Internet Protocol (IP)
Internet small computer system interface (iSCSI)
Fibre Channel over IP (FCIP)
Internet Fibre Channel Protocol (iFCP)
Fibre Channel over Ethernet (FCoE)
Ethernet protocol uses an access method called carrier sense
multiple access/collision detection (CSM/CD). Before transmitting,
anode checks whether any other node is using the network. If clear,
the node begins to transmit. Ethernet allows data transmission over
twisted pair or fiber optic cables and is mainly used in LANs. There
are various versions of Ethernet with various speed specifications.
The Fibre Channel protocol defines a multi-layered architecture for
moving data. FCP packages SCSI commands into Fibre Channel frames
ready for transmission. FCP also allows data transmission over twisted
pair and over fibre optic cables. It is mainly used in large data centers
for applications requiring high availability, such as transaction processing and databases.
116
FICON is a protocol that connects a mainframe to its peripheral devices and disk array. FICON is based on FCP and has evolved from IBMs
implementation of Enterprise Systems Connection (ESCON).
Internet protocol is used to transfer data across a network. Each device on the network has a unique IP address that identifies it. IP works
in conjunction with the TCP, iSCSI and FCIP protocols. When you transfer
messages over a network by using IP, IP breaks the message into smaller
units called packets (third layer in OSI model). Each packet is treated as
an individual unit. IP delivers the packets to the destination. Transmission control protocol (TCP) is the protocol that combines the packets
into the correct order to reform the message that was sent from the
source.
iSCSI establishes and manages connection between IP-based storage
devices, and it hosts and enables deployment of IP-based storage area
networks. It facilitates data transfers over intranets, manages storage
over long distances and is cost-effective, robust and reliable. As we already mentioned, iSCSI is best-suited for web server, email and departmental business applications in small to medium sized businesses.
Fibre Channel over IP is a TCP/IP based tunneling protocol that connects geographically distributed Fibre Channel SANs. FCIP encapsulates
Fibre Channel frames into frames that comply with TCP/IP standards.
It can be useful when connecting two SAN networks over the Internet
tunnel, in a similar fashion to virtual private networks (VPNs) allowing
connection to a distant LAN over the Internet.
iFCP is again TCP/IP based. It is basically an adaptation of FCIP using routing instead of tunneling. It interconnects Fibre Channel storage
devices or SANs by using an IP infrastructure. iFCP moves Fibre Channel data over IP networks by using iSCSI protocols. Both FCIP and iFCP
provide means to extend Fibre Channel networks over distance. Both
these protocols are highly reliable and scalable. They are best suited
for connecting two data centers for centralized data management or
disaster recovery.
117
118
119
what port in a switch) you do not have to reconfigure zoning. The WWN
address is a physical address, which means it is built-in and does not
change. On the other hand, in port based zoning, we do not use WWN
addresses; we define which ports of a switch allow mutual communication instead. This is sometimes called hard zoning. Compared to WWN
zoning, port zoning is more secure. But if you want to change cabling,
you have to reconfigure zoning.
Storage
System
FC Switch
2
Servers
Figure 3.12 An
example of WWN
based zoning.
Mixed zoning combines both port zoning and WWN based addressing. When we set up a mixed zone, we define which ports located on
the FC switch can communicate, and then, using WWN, we authorize
the particular host bus adaptor to access the zone. Mixed zoning is the
most secure. Implementation of certain types of zoning depends on security requirements and the number of installed nodes.
Additional tools that can help us make SANs more secure include the
definition of policies and standards and regular auditing. It is a matter
of course that all node microcode and firmware are updated regularly,
120
access to all node configuration tools is secured by using strong passwords and event logs are stored in a secure location.
Both LUN masking and fabric zoning should ensure that SAN resources
are accessed only on a need to access basis. This means that a server
can access only its own assigned LUN even though the switched fabric
allows practically all servers to all storage device connections. LUN mapping and fabric zoning cannot be configured from the production servers, eliminating possible intrusion threats. In LUN masking, a logical unit
is paired with the servers host bus adaptor WWN address. LUN masking
is configured on the storage system controller. In zoning, we can apply
soft zoning, hard zoning or a combination of both. Zoning is configured on
Fibre Channel switches and prevents frame forwarding to nodes that are
not members of the particular zone.
121
resources that are needed by a certain group of users. For example, you
can create VLANs for company departments, project teams, subsidiaries, etc. To a certain extent, VLAN segmenting can resemble fabric zoning used in SAN because both are configured on the switch level. Access
to a particular VLAN is granted according to which switch port is used
or according to MAC address. Joining a VLAN can also be governed by a
special server that can grant access based on authentication.
This chapter provided you with a technological description of networking. Now, you should be familiar with storage system networking possibilities. Remember that modern storage systems are usually implemented in
a SAN environment, i.e., they are connected to Fibre Channel switches and
HBAs installed in servers. NAS devices connected directly to a LAN are also
increasingly popular. On the other hand, DAS and iSCSI connection implementation is increasingly marginal. The most important protocols are FC,
TCP/IP, iFCP and FCIP. FICON protocols are used only with mainframes.
122
124
125
most countries, it is a legal requirement to have a plan for handling crisis situations. The standards provide methodology that helps to create
a functional business continuity plan.
The best known standard is the British Standard for Business Continuity Management (BS25999), which provides terminology, as well as
guidance for the determination of business processes and their importance. Training based on this standard also provides you with best practices that can significantly simplify business continuity management
implementation in your organization. When you meet all the criteria
represented by BS25999, you can receive certification from an independent external auditing company, which proves that your organization is well prepared for handling the most probable incidents. This can
bring you an advantage over your competitors, not only through an improved response to times of distress, but also through increased credibility and trustworthiness. Business continuity management also helps
you protect your trademark, shows you how to make your investments
and can grant you lower insurance prices.
There are other well known and commonly used standards in addition to BS25999, such as the North American Business Continuity Standard (NFPA1600) and BS ISO/IEC 27001-2:2005 International Information
Security Standard. It is important to note that, according to these standards, business continuity principles are applicable to any organization,
regardless of its business area and size.
126
ever, to prepare a functional BC plan, both organizations need to undergo certain procedures based mainly on analysis of their inner and outer
environments. The output of these procedures is then represented by
several documents that are all part of the BC plan. The functional business continuity plan includes:
A business impact analysis is the assessment of crucial business
processes of a particular organization.
A risk assessment is the determination of possible risks and the
attitude an organization assumes towards these risks.
The definition of policies and roles is based on the risk assessment output. A BC policy should define the focus of the BC plan
with respect to the organizations business strategy and the current market state. The definition of roles provides transparent
and clear division of responsibilities among BC team members.
A recovery resources description is also a vital part of the BC
plan. An organization can have its own resources in several locations or these resources can be provided by suppliers and contractors.
A business recovery plan is a step-by-step plan, or script, that defines procedures to be taken when a particular situation occurs.
A disaster recovery plan is a recovery plan for IT infrastructure
and equipment. It describes how to get the critical applications
working again after an incident takes place.
A testing and training schedule and maintenance plan provide
methodology and a timeline for business continuity plan testing.
It should also include the name of a person responsible for keeping the BC plan updated.
The first step is to set up a business continuity team, which usually
includes business processes owners and upper management. This team
must translate the business requirements into an overall business continuity plan that includes technology, people and business processes for
127
recovery. The major considerations are requirements for data restoration (such as whether the business requires restoration up to the point
of the disaster or can restore from a previous data backup) and recovery
time requirements. These considerations determine the technologies
and method used to support the disaster recovery plan. For example,
if the business requires near-continuous recovery of data (with no lost
transactions), it will likely use remote mirroring of data and wide-area clusters that enable a hot standby application environment. The
shorter the recovery time and the less transaction loss, the higher the
cost of the recovery solution. As more and more businesses rely on
critical applications to generate revenue, the requirement increases
for shorter recovery time (less than 1hr) or even continuous recovery
(within minutes).
We have described a business recovery plan, which includes scripts
that provide step-by-step instructions on how to handle particular situations. There can be anywhere from two or three scripts to dozens of
scripts prepared for different possible scenarios, depending on the size
of an organization. Scripts are prepared for handling events such as fire,
burglary or sabotage. When such an event occurs, the business continuity team activates the right script and simply follows the instructions
divided into tasks. A well prepared BC plan significantly increases an organizations resistance to unexpected situations. All BC documentation
needs to be regularly updated and audited to ensure that the business
continuity plan reflects the current state of inner and outer environments for the particular organization.
128
of data unavailability (downtime), which can be used to support decisions for various recovery solutions.
A BIA provides information necessary for creating BC strategies. Its
extent is determined by the scope of products and services the organization provides. Business processes can be divided into operation processes (such as production, sales, customer support, distribution and
billing), support processes (such as IT, human resources and external
services) and strategic processes (such as management, project management and planning). If the importance of a process changes over
time (e.g., accounting at the end of financial year), assessment is made
according to the busiest period. The mutual relationship and dependencies of the processes also need to be considered.
The output of a business impact analysis is expressed through the
following values: maximum tolerable period of disruption (MTDP), recovery time objective (RTO) and recovery point objective (RPO). These
values are attributed to each process and process support layers. These
values are very important and basically tell us what level of redundancy
and availability we need and how costly it will be to meet these requirements. RTO and RPO are major concerns when deciding what backup or
replication strategy we must use.
The recovery time objective value tells us what the tolerable duration of the process disruption is. In other words, how soon we have
to resume the process to avoid significant negative consequences.
RTO is usually determined in cooperation with the business process
owner. We can also set RTO for process dependencies; in this case we
are talking about RTO layers, which can be, for example, resumption
of power supply, air conditioning in the server room, resumption of
server operation, network or storage system, RTO of installed application.
The recovery point objective, on the other hand, tells what data loss
we can afford for the particular business process. In other words, it tells
us how old our backup data can be. If the last backup is at 7:00 PM and
129
<250 t.
<100 t.
<50 t.
<10 t.
process 1
1/2
MTPD
30
The maximum tolerable period of disruption is output of the business impact analysis, but risk management also plays a role when determining MTDP values. The MTDP value shows us the length of a business
process disruption after which the viability of this particular process,
or organization as whole, is seriously threatened. In other words, once
the time marked by MTDP value passes, it is practically impossible to
restore the business process because the damage caused by the disruption is too severe. The MTDP is determined on the basis of financial
loss incured because the business process was interrupted, and the risk
acceptance level, which marks the amount of financial loss an organization is capable of handling. To avoid reaching the MTDP boundary, an
organization must take precautions. The best option would be to take
such precautions that the financial loss is near to negligible. However,
130
Replacement procedure
RPO
Disaster
Regular
process
operation
RTO
MTPD
Regular
operation
Time
In Figure 4.2, you can see that the RPO is always lower than the RTO.
We also know that the RTO is lower than the MTPD. The resumption of
a disrupted business process is achieved by completing individual recovery steps. First, it is important to recover the infrastructure to have
power and cooling for servers and disk arrays. Then we can recover the
data from backup. The technology nowadays allows us to set the RPO
value to zero and the RTO to seconds or minutes. This can be achieved
by redundant duplicate infrastructure (servers and disk arrays) placed
in a separate location, which is ready to take over when the first location fails. This type of solution is called geoclustering, and it is by far
131
132
Risk assessment
Risk assessment is another part that constitutes the proper business
continuity plan. It provides important data that amends business impact
analysis and helps to make the right choice of means when designing
failure resistant IT environment. Risk assessment determines the probability of threats that can lead to disruption of an organization, and it also
determines the impact of these threats. The difference between BIA and
RA is that BIA determines the impact of business process disruption on
the operation of an organization, while RA measures and evaluates the
possibility of threat occurrence and its impact on the business processes.
The major concerns are the probability of threat and the impact of
this threat. Threats can be represented by a human factor failure, a natural disaster, a technological failure, an economic situation or political
instability. Another categorization divides threats into random and intentional threats that are possible to prevent and threats that are not
possible to prevent. Some threats can be recognized before they strike,
which buys us some time to prepare for them, and other threats have
no prior notice. It is also necessary to determine whether the particular
threat can strike locally (e.g., fire) or regionally (e.g., earthquake). All
these factors need to be accounted for, and this information, in cooperation with a business impact analysis, helps us choose a functional
business continuity strategy.
The impact of a threat can lead to financial loss, unavailability of
employees, limited or no access to a building and premises, hardware
failure, data unavailability, infrastructure outages and unavailability of
critical documentation. If you took classes in risk management, you
probably know there are many methods that help us assess and evaluate the risks. Among the most important are the what-if analysis, the
cause-consequence analysis, the brainstorming approach and the event
tree analysis. For the purpose of this book, all you need to know is that
some of these methods are more sophisticated and scientific than others, and good risk management usually makes use of a combination of
different methodologies.
133
Insurance
Outsourcing
Risk Elimination
Low
Risk
Acceptance
Risk Mitigation
Precautions
Low
High
Financial loss
Probability
Figure 4.3 Risk assessment matrix.
If the threat is unlikely to happen and would cause low financial loss,
an organization usually accepts the risk. The threats that are unlikely
to happen but would cause high financial loss (e.g., a fire or flood) are
usually treated with the proper insurance. If the threat is likely to occur and cause low financial loss, an organization will try to mitigate the
risk by taking precautions. When the threat is likely to take place and
the financial loss that would be incurred is high (e.g., your organization
is set up on a river bank that floods every spring), the risk needs to be
eliminated.
From the perspective of the IT infrastructure and the continuity of
its operation, we are mostly concerned with threats with impact that
can be avoided by taking precautions. The precautions to be taken are
summed up in a risk treatment plan, which is part of risk assessment
documentation. The risk treatment plan determines clearly what precautions and actions will be taken to prevent the threat or to mitigate
the threat impact.
134
RPO and RTO are major considerations when designing a failure resistant IT environment. Remember that RPO stands for recovery point objective, and it tells us what amount of data transactions we are allowed to
lose in the event of disaster. It determines how old our backup data can
be. RTO, which stands for recovery time objective, tells us what the maximum desirable period of disruption is. After the RTO time passes, the application should work properly. The MTDP, or maximum tolerable period
of disruption, is purely hypothetical and it defines the point of no return.
The RTO will always be significantly lower than the MTDP.
All these values are outputs of the business impact analysis. The risk
assessment complements and amends the BIA.
135
136
137
WAN recovery
Domain controller
Email server
SAN recovery
Backup server
Each procedure has its own recovery time estimate, checklist, and
supplier and maintenance provider contact details (escalation). Once
all the procedures are complete, the functionality of all systems can be
tested. When everything is operational, the DR plan is terminated.
Speed of individual process recovery is determined especially by
technological means. The performance of disaster recovery planning
depends highly on backup strategies, data replication strategies and
cluster solutions. Some solutions are less resistant to failure than others, but they are usually less expensive.
138
139
140
Backup concepts
We have been talking about backup for a while and now it is time
to learn how it is actually done. In your organization, you have several
141
142
File server
Backup server
Disk backup
Tape backup
SQL server
143
which contains production data. This image can then be used for recovering the server to the exact state it was in at the time of backup. This
technique is often used for recording known good configurations after
a clean install and for distribution of this configuration to other servers. A system image can be useful for backing up employee desktops
and notebooks, but it is not likely to be used as a tool for making ongoing backups of all the server applications and their databases. Other
backup models are by:
Incremental backup at the beginning, a full backup is made.
Then at every scheduled backup, only the data that was changed
or added since last time backup is recorded. The disadvantage is
that when you need to recover data, you need to combine the
initial full backup with all incremental backups which is demanding in terms of time and capacity.
Differential backup at the beginning, a full backup is made. At
every scheduled backup, only the data that differs from the initial full backup is recorded. This technique delivers good recovery
times you need to combine only the initial full backup with the
most recent differential backup.
Reverse delta backup at the beginning, a full backup is made.
At every scheduled backup, the initial full backup image is synchronized so that it mirrors the current state of data on servers.
During the reverse delta backup, a differential file is created that
allows you to go back and restore previous versions of data if
they are needed. This is probably the best option to choose, because it allows you to achieve the fastest recovery times since
you always have the most recent full backup ready to use.
The last thing that remains to be considered is backup validation. No
matter what backup model you choose, it will always be a structured
backup, meaning that backed up data contains additional information,
such as when the data was recorded, where, and what the extent of the
backup was. Backup validation is a functionality presented by backup
management software. It checks whether all data was backed up suc-
144
We didnt mention one backup concept closely connected with storage systems serverless data backup. This technique is sometimes called
extended copy and is based on SCSI command that allows you to backup
data from LUNs directly onto a tape library, without the participation of
any backup server. A serverless backup environment is therefore SAN
based.
145
Backup strategies
When deciding what backup strategy to employ, we need to sum up
all our requirements and make sure these requirements are met. We
have to be sure we are protected against:
Disk failure if we use a storage system, then RAID redundancy
should provide a good level of security. In addition, we can employ a storage-to-tape backup strategy.
Accidental file deletion backup software running on a backup
server should take care of this problem. It is also possible to restore files from tapes.
Complete machine destruction we need a backup machine,
system image that will restore known and used configuration,
storage system, tapes, etc.
Destruction of any onsite backups we would need remote
replication and offsite tapes.
We also need to decide to what extent we will back up. A business
impact analysis lets us know the importance of business processes. This,
together with RPO and RTO values, is the key to determining our backup
strategy. You need to work with many variables how much storage
capacity is possible to dedicate to backups, time and money available
for backup software and technology, human resources allocation on
backups, and performance impact on production servers and network
146
Backup optimization
Every organization has limited resources, especially in terms of capacity that can be used for backup and the cost of backup hardware and
software. It is therefore important to use resources carefully to maximize the utilization of your equipment. There are several techniques
that can make backup more effective. The desirable backup solution is
fast, does not have much influence on production server performance
and network bandwidth, and offers good value for money. The tech-
147
niques that allow you to achieve better utilization are data compression, deduplication, multiplexing and staging.
Compression is well known from personal computers. Most users
have already encountered .zip or .rar files that represent a package of
files encapsulated into one archive to provide easier manipulation and
compression the output archive file is smaller than the total of the
original files. The efficiency of compression is expressed as a ratio. For
example if you have a compression ratio of 1:2, the compressed file
will be half the size of the original file. The compression is achieved by
coding the files with a special algorithm. The efficiency of the algorithm
is based on the level of information entropy presented in the source
file. Simply said, information entropy is represented by sequence of
numbers in code and their randomness. The higher the level of randomness, the higher the entropy. If we had code represented by totally random numbers with no hidden pattern, then compression is not
possible. This, however, does not happen very often, and usually it is
possible to achieve at least some compression ratio. Lossless data compression can be compressed to an extent allowed by the data entropy.
A lower amount of randomness allows the algorithm to find underlying
patterns that are logged. If the pattern occurs more than twice, the
second occurrence is replaced by a reference to the first occurrence.
Compression is employed in tape libraries without exception. Processing compression algorithms may affect backup performance. In tape
libraries, this problem is eliminated by adding extra hardware processors that take care only of calculating data compression and decompression. Compression therefore does not affect read and write speed
in tape drives.
Deduplication is the technique that eliminates duplicities in data.
Thanks to this technique, it is possible to save disk space. Unlike compression, deduplication searches for a larger string of identical data.
If we backup our email database, it is likely there will be dozens of
instances of the same email. If the duplicate data is found, it is replaced by reference to the original data. Deduplicated backup data
can occupy up to 90% less disk space than data that contains duplici-
148
149
present on the staging disk, ready for fast restore (depending on the
backup software configuration).
All these techniques can be employed together, and they often are.
Remember that you cannot employ deduplication when using a tape
library as a primary backup device. Deduplication is, however, possible
when we also employ disk staging. Data is deduplicated on the hard
drive and then transferred to a tape library. This, however, adds an extra step when restoring the data data is loaded from the tape and
then it must be restored one more time by reversing the deduplication
process. Multiplexing and compression are common features offered by
most backup software manufacturers.
Replication objectives
In this chapter, we are describing techniques that ensure high availability of data and business continuity with a focus on disaster recovery
and data loss prevention. There is a large number of solutions that help
to accomplish these objectives. Their efficiency goes hand-in-hand with
their cost. The backup solutions we have discussed so far are among the
cheaper solutions. However, even backup solutions are scalable and can
deliver high performance. As you know, all backup techniques require
some time for restoration, which will always be their limitations. For
critical applications that need to be online all the time, backup itself is
not sufficient, because recovery time from backup does not meet RPO
and RTO requirements. For this case, we have data replication. On the
following pages, we will not be dealing with database replication or replication done on the application level. We will focus mainly on replication possibilities offered by storage system manufacturers. Remember
that all the replication techniques we describe in detail happen on the
controller level, without participation of production servers. We are
talking about array-based replication.
150
The reason we use data replication is to achieve lower RPO and RTO
of key business processes. When the data replication is employed together with server redundancy (server clusters) we can reach zero RPO
and RTO in minutes. When we add a hot standby site in another location, we can reach zero RPO and RTO in a few seconds. The data replica
contains the same data blocks as a source volume, which enables immediate usage of data by application with no restore procedures. Data
replicas are also useful for application testing and data migration. We
are going to start with in-system replication and and snapshots, which
can also be considered as a replication technique. We will see how
replication works with server clusters. Towards the end of this chapter
we will get to solutions that can offer 99.999% availability. These solutions are based on remote replication and geocluster technology. As
the chapter progresses, the solutions presented are going to get more
technical, sophisticated and costly.
The replication strategy depends on the importance of data and its
availability. Bank institutions that offer Internet banking services or
mobile service providers need the highest availability possible because
banking transactions and call logging are ongoing processes that will
always have zero RPO and RTO in seconds. Imagine a bank that loses
data on user accounts or a mobile service provider that cannot store
information about customer calls. Even a minute of disruption can pose
an unacceptable threat and can lead to devastating consequences. Data
backup is not powerful enough to prevent data loss on such a level. That
is why we employ more sophisticated and costly solutions data replication and geographically diversified IT infrastructure.
151
Business Application
Exchange/SQL/Oracle/DB2
Primary
volume
Secondary
volume
Figure 4.5 The basic depiction of data replication as it is supported in storage systems. The
Primary Volume (P-VOL) is a LUN mapped to a server. Data from the P-VOL is replicated to an SVOL that can be located in the same storage system as the P-VOL or in a storage system located
elsewhere. The P-VOL and S-VOL must be the same size.
152
153
Host
P-VOL available to Host
fot R/W I/O operations
INITIAL COPY
P-VOL
All data
S-VOL
Data bitmap
P-VOL Differential
Host
P-VOL available to Host
fot R/W I/O operations
UPDATE COPY
P-VOL
Different data
S-VOL
Figure 4.6 When we create a replication pair, initial copy takes place. Once the initial copy has
been completed, the content of the S-VOL is synchronized with the P-VOL regularly. When the
S-VOL status is PAIR, its not possible to read or write from the S-VOL.
Pair split operation performs all pending S-VOL updates (those issued prior to the split command and recorded in the P-VOL differential bitmap) to make the S-VOL identical to the state of the P-VOL
when the suspend command was issued and then provides full read/
write access to the split S-VOL. While the pair is split, the system establishes a bitmap for the split P-VOL and S-VOL and records all updates to
both volumes. The P-VOL remains fully accessible during the pair split
operation.
Once the P-VOL and S-VOL replication pair is split, you can access the
data on the S-VOL for testing, data mining, performance test or backup.
When you want to resynchronize the suspended pairs, you start the pair
synchronization operation. During this operation, the S-VOL becomes
unavailable for I/O operations. The regular pair synchronization procedure resynchronizes the S-VOL with the P-VOL. It is also possible to use
reverse synchronization with the copy direction from the S-VOL to the
P-VOL. When a pair synchronization is performed on a suspended (split)
154
pair, the storage system merges the S-VOL differential bitmap into the
P-VOL differential bitmap and then copies all flagged data from the PVOL to the S-VOL. When reverse pair synchronization takes place, the
process goes the other way. This ensures that the P-VOL and S-VOL are
properly resynchronized in the desired direction.
Another common operation that can be performed on a replication
pair is pair status transition. If necessary, it is possible to swap P-VOL
and S-VOL labels. S-VOL then becomes P-VOL and P-VOL becomes SVOL.
When you want to use the data replication feature, you need to create
a target LUN that has the exactly same size (measured in blocks) as the
source LUN. You should create the target LUN in a RAID group different
from the one that contains the source LUN to avoid performance bottlenecks. You can also create the target LUN on slower hard drives, depending on your requirements. Then you can start pair create operation. The
result of this operation is a replication pair. The replication pair consists of
a P-VOL (source volume) and an S-VOL (target volume). When the replication pair is active, data on S-VOL is updated from the P-VOL. It is not possible to perform read or write operations on a paired S-VOL. To be able to
use the S-VOL, you need to split the pair. To resynchronize the P-VOL and
S-VOL, issue a pair synchronization command. The synchronization can
work in both directions according to your needs.
155
vides you the contents of static LUNs without stopping the access. It
is nondisruptive and allows the primary volume of each volume pair
to remain online for all hosts for both read and write I/O operations.
When the replication pair is established and the initial copy procedure
is completed, replication operations continue unattended to provide an
internal data backup.
In-system replication automatically creates a differential management logical unit (DM-LU), which is an exclusive volume used for storing
replication management information when the storage system is powered down. For redundancy purposes, it is usually possible to configure
two DM-LUs. The data stored on the differential management logical
unit is used only during shutdown and startup of the system. Note that
the DM-LU does not store the actual changed data; it just has the metadata for the differential data.
When we split the replication pair, we stop the ongoing synchronization and we make the S-VOL accessible for other applications. The SVOL created through in-system replication can then be used for backup
purposes. This enables IT administrators to:
Execute logical backups faster and with less effort than previously
possible
Easily configure backups to execute across a SAN
Manage backups from a central location
Increase the speed of applications
Ensure data availability
To use in-system replication to produce full logical unit clones for
backup purposes makes sense because it works on the volume level.
There is no need for a backup server, and the traffic does not lower network performance. The storage system management software controls
the whole procedure. Data from the S-VOL can be used for serverless
backup to tape library. Aside from high-performance serverless backup,
156
the replicated volume can be used for data mining, data warehousing
and full volume batch cycle testing.
Data mining is a thorough analysis of structured data stored in databases. It uncovers and extracts meaningful and potentially useful
information. The data mining methodology is based on statistics functions, interpretation and generalization of mutual relationships among
the data stored in a database. Data mining can provide highly valuable
information for marketing purposes. Data mining is conducted by specialized software that is programmed with some degree of artificial intelligence to be able to deliver relevant results.
A data warehouse is a specialized database used for data reporting
and analysis. The data stored in a warehouse is uploaded from production servers and their databases. This could affect the performance of
the server and the mapped LUN. This effect can be avoided by implementing in-system replication and using an S-VOL for feeding the data
warehouse database. The data warehouse database provides powerful
search capabilities conducted by queries.
Testing and benchmarks can be conducted using an S-VOL to determine database performance and suggest changes in configuration. It is
risky to test new procedures and configuration on a P-VOL because it
can potentially lead to data loss and it can affect performance. An S-VOL
provides the ideal testing environment.
The S-VOL replica also provides hot backup for instantaneous data
recovery if the P-VOL fails. If this happens, the P-VOL and S-VOL statuses
are swapped, which makes the S-VOL immediately available to the production server.
157
Server A
Server B
Backup
server
Onsite
tape
SAN
Disk Array
Disk
Production data B
Production data A
Clone A2
Offsite
tape
Clone A2
Clone B1
Clone B2
Figure 4.7 In-system data replication in combination with backup infrastructure. In-system
replication mirrors the source volumes and creates its exact replica or clone.
Copy-on-write snapshots
Copy-on-write snapshot software is a storage-based functionality
for point-in-time, read-only backups. Point-in-time backups capture the
state of production volume at a particular time. With in-system replication, the whole LUN is mirrored or cloned. On the other hand, snapshots
store only differential data into a data pool. The data pool consists of
allocated physical storage and is used only for storing differential data.
One data pool can store differential data from several P-VOLs. A snapshot pair consists of a P-VOL and a virtual volume (V-VOL). The V-VOL
does not physically exist but represents a set of pointers that refer to
the datas physical location, partially in the pool and partially in the PVOL. Since only part of the data belonging to the V-VOL is located in the
pool (and the other part is still on P-VOL), copy-on-write snapshot software does not require twice the disk space to establish a pair in the way
in-system replication creates full volume clones. However, a host will
158
recognize the P-VOL and the V-VOL as a pair of volumes with identical
capacity. The copy-on-write snapshot internally retains a logical duplication of the primary volume data at the time of command instruction.
This software is used for restoration; it allows data to be restored data
to the time of snapshot instruction. This means that if a logical error occurs in the P-VOL, the snapshot can be used to return the P-VOL to the
state it was in at time the snapshot was taken. The duplicated volume
of the copy-on-write snapshot function consists of physical data stored
in the P-VOL and differential data stored in the data pool. Although the
capacity of the used data pool is smaller than that of the primary volume, a duplicated volume can be created logically when the instruction
is given. The data pool can share two or more primary volumes and
the differential data of two or more duplicated volumes. Copy-on-write
means that whenever the data in the P-VOL is modified, the modified
blocks are transferred to the pool to maintain the V-VOL relevance to
the point-in-ime the snapshot was created.
Read
Write
Read
Differential
Data Save
Virtual
Volume
P-VOL
POOL
Write
10:00
11:00
12:00
Figure 4.8 Copy-on-write snapshot. Notice that both the P-VOL and V-VOL are accessible for
I/O operations. Snapshots can be created instantly.
159
P-VOL
V03
Wednesday
V02
Tuesday
V01
Monday
Link
Pool
Physical VOL
Figure 4.9 Snapshots are taken regularly. In this case the snapshot was created on Wednesday. The new snapshot image will refer to the data on the P-VOL. As the data is modified on the
P-VOL, the set of pointers in V-VOL 03 will change to point to the data stored in the pool.
160
Copy-on-write snapshot
In-system replication
Instantaneous
Data recovery
time after P-VOL is
corrupted
Instantaneous
Size of physical
volume
P-VOL = S-VOL
Features
Time to create
Pair configuration
Restore
Server clusters
To create completely failure resistant IT environment with zero RPO
and nearly zero RTO, it is not enough to employ data replication and
backup. Data backup requires time for data restore. Data replication
provides instantly available data if the P-VOL becomes corrupted and
fails. Neither of these solutions ensures high availability and redundan-
161
162
External LAN
5
2
LAN 2
LAN 1
Heartbeat 1
Heartbeat 2
SAN 1
Server 1
Server 2
SAN 2
Figure 4.11 A failure resistant server cluster. Servers are connected to a storage system
through two Fibre Channel switches to meet redundancy requirements. The servers are also
interconnected by a LAN that allows communication with external users and between servers.
Notice the line between servers that is labeled as the heartbeat. The heartbeat is a clustering
software monitoring process. If Server 1 fails, Server 2 does not receive a heartbeat response
from Server 1, so it takes over the tasks Server 1 was working on.
To enable cluster functionality, you need to install a specialized clustering software layer on the servers. Servers then share the same tasks.
It is possible to implement an active-active cluster connection, which
provides load balancing. In this case, both servers are working together,
performing the tasks of one application. When we implement an activepassive cluster connection, only one server is processing the application
request and the second is in stand-by mode, ready to take over in case
the active server fails. The application status and configuration is syn-
163
chronized constantly. The clustering software allows you to set the conditions for processing takeover. Usually, the servers are communicating
their updated statuses in defined periods of time this is called the
heartbeat. If one server does not receive a response for a status update
request, it determines that the server on the other side is broken, and it
takes over immediately with no noticeable lag. This allows us to achieve
zero RPO and RTO.
The cluster protection can work within one location (organization
building). In this case, the IT infrastructure is completely resistant to
hardware failure and, to an extent, is resistant to power failure (depending on power source redundancy). It is also resistant to minor security incidents, such as theft and sabotage. To make your IT infrastructure resistant to natural disasters such as flood, fire or destruction of
the building, you need to geographically diversify your IT infrastructure.
This means that you need to replicate data to another location or create
geoclusters clusters over a WAN.
164
The public line that allows asynchronous transfer mode (ATM) represents a standard of unified network that serves both for transmission
of telecommunication services (landline) and data. It was designed to
achieve low latency and high speed. It requires a dedicated infrastructure. This technology is declining in favor of transmitting data over IP
networks.
Dark fiber is a specific technology based on dense wavelength-division multiplexing (DWDM). At your primary location, you have several
storage systems that need to replicate data over distance. Fibre Channel
cables from every storage system are plugged into a special multiplexing device that merges signals from multiple Fibre Channel cables into
one optical cable that connects both locations primary and secondary. At the secondary location, the multiplexing device transforms the
signal from one optical cable into more Fibre Channel cables connected
to each storage system. Dark fiber optical cable is capable of transmission over 100km. To enable data transmission over longer distances, it is
necessary to implement a signal amplification device a repeater. Dark
fiber technology requires an optical cable infrastructure that connects
both sites. Nowadays, it is possible to rent the dark fiber bandwidth
from a telecommunication company that owns optical wiring infrastructure. Dark fiber technology ensures high performance and reliability.
Once we have successfully interconnected storage systems at both
sites, we can perform standard replication procedures create, split
and resynchronize replication pairs. The only thing different from insystem replication is that we always have some network limitations,
because the bandwidth and reliability of local networking is always
higher than the bandwidth and reliability of solutions designed to carry
data over significant distances. To overcome these limitations, we need
to optimize remote replication. We need to consider network performance, especially latency, and decide whether we want to employ synchronous or asynchronous remote replication. Both solutions are highly
reliable, and their function will be further described.
165
DWDM/ATM/IP
any distance
Production
server
Write
acknowledge
1
Fibre
Channel
Extender
Fibre
Channel
3
Extender
S-VOL
P-VOL
te
Remote site
Primary site
Prima
Figure 4.12 Remote replication scheme. As you can see, DWDM, ATM or IP connections to a
remote site are possible.
166
167
168
The log of changes that is exchanged between sites during asynchronous replication is called a journal. Journal data is stored in a dedicated
journal volume (pool).
169
170
Rapid recovery
Support of disk-to-disk-to-tape storage for backup and activation
Single site or remote site data center support
Heterogeneous storage and server environment support
Application integration with Microsoft Exchange, Microsoft SQL
Server, Microsoft SharePoint, Oracle, etc. (dedicated agents designed to run on server)
The replication appliance is usually implemented in split write over
TCP/IP mode as depicted in Figure 4.13. Pass-through implementation
is possible but usually avoided because the replication appliance can
decrease the performance of data processing over the SAN.
Primary Data Center
Production Servers
Virtual/Physical Targets
LAN
Dynamic Replicator
Replicato
Appliance(s)
Figure 4.13 Possible implementation of replication appliance. Data is collected from servers
over the LAN and then sent to a storage system. Each server is running an agent that splits the
data.
171
Location B
LAN A
LAN B
SAN A
SAN B
WAN (FC)
FC Switch
2
1
Disk array A
2
1
FC Switch
Disk array B
Figure 4.14 Geocluster interconnection scheme. Both sites (local and remote) are equipped
with the same nodes. Data from Disk Array A is synchronously replicated to Disk Array B over
iFCP, FCIP or dark fiber technology. Both SANs are interconnected. Servers in both locations are
also interconnected, usually using TCP/IP protocol.
172
You can imagine that the configuration and installation is rather complicated. Crucial parameters that determine the overall performance of
geoclusters are again the network transfer rate and response time. Geoclustering with synchronous replication can be deployed to a remote
site located no more than 50km from the primary location.
When we talk about geoclustering, we are usually talking about
high availability clustering, not computing clustering, which would be
extremely demanding on network performance. The key words here
are cold site and hot site. A cold site is a remote site with hardware
that is powered down and consists only of the most important components. This solution is cheaper, but it takes time to resume all the
business processes if the primary site fails. For large institutions that
are highly dependent on data availability, hot site geoclustering is the
right solution. The whole IT infrastructure is mirrored. The heartbeat
between servers is checked regularly and specialized geoclustering
software is used. Complicated algorithms that are part of geoclustering software need to decide when to forward the traffic to the remote
location. In other words, it needs to evaluate whether the situation in
the primary location is critical enough to perform a swap operation.
A swap operation is usually performed by a tailored script that
switches the S-VOL status of the remote replica to the P-VOL and makes
it accessible for production servers. This script also restores database
consistency, usually by issuing a rollback command. Maintaining database consistency is of major importance and will be discussed later.
173
Synchronous
Remote Copy
Ansynchronous
Remote Copy
JNL
JNL
Figure 4.15 Three data center multi-target replication. The maximum possible data protection is ensured by using two remote sites for data replication.
174
175
176
back step by step, reversing all individual changes that have been made.
Continuous data protection works on the block level, not the file level.
This means that if we have a Microsoft Word document and you change
one letter, the backup software does not save a copy of the whole file.
It saves only the changed bytes. Traditional backup usually make copies
of entire files. The CDP technique is therefore more disk space efficient.
The downside is that continuous data protection can be expensive to
implement, and it can also heavily affect LAN performance, especially if
the data changes are too frequent and extensive. CDP is, therefore, not
very suitable for backup of large multimedia files. Small changes in databases or documents can be handled efficiently because we are talking
about relatively small amounts of data. In the case of multimedia processing, changes in data can be too large (gigabytes of data). The whole
backup procedure is then slow, and LAN performance is highly affected.
If the CDP functionality is supported by your backup software, it can
usually be implemented using your existing backup infrastructure. CDP
functionality requires the presence of agents on the production servers.
177
even discussed location redundancy issues: remote replication and geoclustering. The trick is that you need to have everything at least twice.
The last thing to take care of is a redundant basic infrastructure.
By basic infrastructure we mean especially power supply, air conditioning of server rooms and Internet (WAN) connectivity. When there is
a power outage, a storage system powers down immediately. Installed
batteries keep data in cache alive for at least 24hrs, which allows operation resumption with no data loss once we have power back on. In
some storage systems, large batteries are installed that allow flushing
the data from the cache to the designated hard drive. Once this operation is completed, the whole system powers down, including the cache.
To overcome short power outages that last for seconds or minutes,
we can install uninterrupted power supply (UPS) units. A UPS device
is equipped with extremely powerful batteries that are able to provide
electricity to servers usually for several minutes. There is a large scale
of UPS products that can be used with home computers as well as with
whole server rooms. UPS devices able to support whole server rooms
are very expensive. If a power outage occurs, the UPS provides electricity stored in batteries. UPS batteries can last for several minutes, which
is enough to overcome short power outages or cover the time necessary
for switching to an alternate source of electricity, such as a power generator. If the UPS batteries are nearly depleted and the electricity supply has
not been resumed, the UPS unit starts a controlled shutdown procedure,
safely shutting down all servers, preventing a cold/hard shutdown. Large
companies and especially data centers have redundant power sources
such as connection to two separate grid circuits or a diesel engine based
power generator.
IT equipment produces a lot of heat. That is why server rooms are
equipped with air conditioning. If there is a disruption of air conditioning, the threat of IT equipment overheating becomes imminent and all
the servers and disk arrays must be powered down to prevent hardware
damage. That is why air conditioning must also be made redundant.
Regular air conditioning maintenance is necessary as well.
178
Internet connectivity and access to public data circuits based on optical cables are another threat of operation disruption. An organization
should ensure connectivity availability by contracting at least two connectivity providers.
179
etc. These drills are very effective and help to discover weaknesses of
the documentation. Tests and drills are an essential part of business
continuity documentation, and their importance should not be underestimated.
180
Disaster
Power cut
Stolen
equipment
Fire
incident,
minor
RTO
Delayed
RTO
Immediate
RTO
Delayed
RTO
Immediate
RTO
Delayed
RTO
Immediate
RTO
Delayed
RTO
Flood
Immediate
RTO
Delayed
RTO
Immediate
RTO
Earthquake
181
Completely Duplicated/
Interconnected Hot Site
Remote Disk Mirroring
More
Disk Mirroring
Shared Disk
Disk Consolidation
Single Disk Copy
Electronic Vaulting
Tape Onsite
Tape Backup
Offsite (trucks)
Importance
of Data
More
Amount
of Data
Less
Less
Delayed
Recovery
Time
Immediate
All data is not created equal. It is likely that only a portion of data is
critical to the basic operation of a company. The key is to think through
the data protection requirements for different classes of data. Its quite
likely that in most scenarios you will have some subset of data that
would warrant remote disk mirroring. Refer to Figure 4.17 to see appropriate solutions depending on the data importance, recovery time
objective and amount of data.
Fibre Channel offers the best performance for both short and long
distances, but it has the highest cost. NAS and iSCSI are at the low end
of the cost spectrum, but they sacrifice performance for both short and
long distances. FCIP, iFCP and DWDM are solutions that perform better than NAS and iSCSI but at a higher cost. However, they are cheaper
when compared to Fibre Channel. See Figure 4.18 for reference.
182
High
FC
DWDM
NAS, iSCSI
Cost
FCIP, iFCP
Longest
Low
Distance
Less
Worst
Performance
Best
This chapter provided you with the most important aspects of business
continuity planning and solutions that enable disaster recovery. Remember that the final design of failure resistant IT infrastructure is always a
combination of the techniques mentioned, and it must reflect the needs
of the particular organization.
Virtualization of storage
systems
184
What is virtualization?
When we take a look at trends in information technology, we can observe several important facts. For example, we can agree that hardware
is getting more and more powerful. You have probably heard about
Moores law, which says that every two years the processing of computers is doubled. At the same time, the price of hardware is decreasing. It
means that, for less money, you can get more performance. This is more
or less applicable in every area of IT. Powerful enterprise level storage
systems used to be so expensive that only a few organizations could afford them. Nowadays, a midsized organization can afford to purchase
IT equipment that was too expensive for government institutions and
banks about ten years ago. As hardware gets more and more powerful
and costs less and less, we can observe another phenomenon software becomes the most expensive item on the shopping list.
In the past few years the hardware became so powerful that it was
nearly impossible to achieve good utilization of all the performance it
could provide. The traditional model is based on a one server per one
application scheme. (Refer to Figure 5.1.) If your company uses 10 applications, then you need to have 10 physical servers. Lets say that out
of these 10 applications, only three are mission critical and can use all
the performance of a server. The remaining applications are not used
so intensively and thus require only about 25% of the actual computing
power of a server. Yet all 10 servers consume power, have high cooling
requirements, occupy some space and need to be backed up or clustered. This leads to inefficiency because you have 10 servers and you
pay for their operations, but you actually use less than 50% of their
computing capacity.
Virtualization addresses this problem. If you looked up the definition of the word virtualization you would probably find that virtualization is an abstraction of hardware. This is true, but it can be hard to
imagine what actually lies behind this definition. Virtualization takes all
the hardware resources, such as processor performance, RAM size and
disk space, and creates aresource pool. This resource pool can then be
185
P
OS
AP
Ap
pli
Op
cat
era
AP
OS
OS
Vir
tu
ion
aliz
tin
gS
yst
e
Traditional Architecture
ati
on
AP
AP
P
OS
AP
P
OS
P
OS
Lay
er
Virtual Architecture
Figure 5.1 Server virtualization. The traditional architecture model requires one physical server
per operating system and application. A virtualized server is able to run several virtual machines
that all share the physical hardware.
186
187
the Internet. Usage of virtual drive eliminates the need to burn these
images on physical media before usage. CD/DVD imaging is also often
used for backup of purchased film DVDs or audio CDs.
Some operating systems also natively allow you to run another operating system. We can take Windows 7 Ultimate as an example. This
operating system allows you to run older Windows XP as a virtual machine to enhance compatibility of user application with operating system. If you have an old application that does not run in a Windows
7 environment, you can set up a virtualized Windows XP environment
in which the application runs without problems. It is also possible to
purchase specialized virtualization software, such as VMware Workstation, that allows you to run several instances of an operating system.
This can be useful for setting up a testing environment. Information
technology students, software developers or IT administrators can set
up several virtual machines while running an ordinary computer with
regular client version of operating system. There they can perform application development or testing without the interference with other
installed applications and operating systems. VMware Workstation is installed within the operating system. Thanks to encapsulation, all virtual
machines are then saved on the hard drive in the form of files that can
be copied or easily transferred to another computer. Virtual machines
are then accessed in a similar fashion as remote desktops you can
open a full screen window with an OS instance.
VMware Workstation and similar virtualization applications usually
require a CPU that natively supports virtualization. Nowadays, most
processors in personal computers offer virtualization capabilities. Hardware support of virtualization is sometimes referred to as virtualization
acceleration. Virtualization support in a CPU allows virtual machines
to effectively access hardware means through the virtualization layer
without performance overhead. This leads to very good virtual machine
performance.
In servers, virtualization is becoming a standard for several reasons.
Virtualization in servers allows you to achieve a higher level of hardware
188
Application
Application
Application
Application
Windows
Server
2008 R2
Windows
Server
2008 R2
Linux
Solaris
Virtualization Layer
Physical Hardware
CPU
RAM
Disk
LAN
HBA
Figure 5.2 The virtualization layer provides interface between physically installed components
and virtual machines. Virtual machines see unified
virtual components that
are linked to their physical counterparts.
189
190
Layers of virtualization
We can virtualize personal computers, servers, storage systems and
even networks. We can picture this as horizontal virtualization. However virtualization can also be vertical, implemented in layers. When we
have a storage system, we get the first layer of virtualization when we
create a parity group, or RAID array. When we take several parity groups
and create a storage pool, we take already virtualized logical storage
and virtualize it even further. Out of this pool we can create several virtual volumes that will be dynamically allocated the storage space; this
is yet another layer of virtualization. We usually deal with several layers
of virtualization, which are implemented to provide enhanced functionality. The host will then access the highest layer of virtualization in
this case a virtual volume without knowing about other virtualization
layers. Only virtualization software knows where the data is physically
located, because data is striped within a single parity group across several physical hard drives and then over several parity groups within one
storage pool.
Users
Virtualization
Application
Virtualization
Compute
Virtualization
191
Virtualization benefits
Lets review a summary of the virtualization benefits we have mentioned in this chapter. The most important motivation that leads us to
virtualization implementation is management simplification. Virtualization is implemented to simplify management of heterogeneous IT infrastructure. The individual benefits of virtualization are:
Migration virtual machines and LUNs can be easily transferred
from one physical device to another. In some cases this is possible
without disruption of operation. This means that we can transfer
a running virtual machine with OS and applications to another
hardware environment without impacting its operations. Users can still access the application during the migration. This is
achieved by encapsulation of a virtual machine into a single file.
Backup encapsulation also makes it easier to back up the
whole virtual machine. Virtualization software usually provides
backup functionalities.
Hardware platform independence physical servers can be of
different configurations, yet the virtualization layer ensures unified environment for virtual machines. This means that within
one virtualization platform, you can transfer virtual machines
from one physical server to another regardless of their physical
configuration. There are no issues with drivers.
Enhanced utilization virtualization allows effective use of resources. A virtual machine is allocated the exact amount of computing performance, RAM and storage space it requires, and
no more. If it requires more, other resources are provided dynamically and nondisruptively. Imagine you are running a virtual
server that is under high load. The high load is detected by the
virtualization platform and the virtual server is allocated another
processor that would support its performance.
Unification unification of infrastructure allows you to save
money. You can manage several storage systems by different ven-
192
193
194
195
VMI Provider
Virtual Machine
Management Service
Applications
Applications
VM Worker
Processes
Windows
Kernel
Device
Drivers
Virtualization
Service
Provider
(VPS)
VMBus
Virtualization
Service
Consumer
(VSC)
Windows
Kernel
VMBus
Virtualization
Service
Consumer
(VSC)
VMBus
Windows
Kernel
User Mode
Child Partition
Kernel Mode
Parent Partition
Hypervisior
Hardware
Figure 5.4 Server virtualization with Hyper-V. The hypervisor virtualization layer is thin and
optimized for direct access to hardware resources. Virtual machines in child partitions access
the virtualization layer through the VMBus interface. Device drivers for virtual machines are
loaded from the parent partition with the original instance of Windows 2008 Server. Virtual machine configuration and management are also done in the parent partition operating system.
196
197
blades with CPUs and memory installed in the chassis. Individual blades
are hot swappable. This is a powerful solution that does not require the
participation of any third party virtualization software. Logical partitioning functionality sometimes hides behind the abbreviation LPAR.
Figure 5.5 Blade servers are installed in a blade server chassis. The chassis is then placed in
a standard rack. In this picture we see Hitachi Data Systems blade servers that are an essential
part of Hitachi Content Platform. These blade servers offer logical partitioning, which is a highly
sophisticated form of server virtualization.
198
hardware requirements are usually determined by the operating system requirements to be installed on the virtual machine. The most important resources are computing power represented by the number of
processors or processor cores and the size of RAM. For small dedicated
servers, we usually need at least one virtual processor and 2GB of RAM.
With robust solutions and servers that run mission critical applications
under high load, we have up to 64 processor cores and 256GB of RAM
per virtual machine. These are rough numbers to give you a general
idea of virtual machine hardware requirements.
A standalone virtual machine would hardly be of any use with no
connection to a network. Virtual machine networking is configured from
a virtualization software management console by allocating a network
interface card (NIC) and HBA cards. Once you add an NIC, you may configure it in the traditional way using common protocols, usually meaning TCP/IP protocol for LAN networks and Fibre Chanel Protocol for SAN
networks. When we connect a virtual machine to physical network, we
say we bridge the connection. Virtualization software also provides the
functionalities of a virtual switch by enabling mutual communication
among virtual machines.
199
hundreds of client operating systems instances. The reason for this approach is to use one powerful server with high computing performance
instead of dozens of expensive and powerful client computers or laptops. Employees of the company then use thin clients, for example, less
powerful, inexpensive computers that serve only for connection to a
virtualized client operating system running on a server. This is similar
to a remote desktop connection, only the performance and stability of
this solution is much higher. Desktop virtualization also simplifies management and backup of client workstations. The example of memory
overcommitment is based on two assumptions. The first is that not all
employees are using their computers simultaneously. In other words,
there is always a certain number of virtual machines that are turned off.
The second assumption is that the client operating system is not using
the allocated memory completely. If you cleverly allocate more hardware resources than physically available, you save money and increase
utilization, because you can install more virtual machines per server.
Memory overcommitment is not limited to desktop virtualization; there
are other areas of application. The IT administrator must decide how
much memory to overcommit. If the overcommitment were too high,
running virtual machines could potentially use all the physically installed
memory, and on the attempt to use memory beyond what is physically
available, the virtual machines would crash.
Memory overcommitment is another example of a virtualization
technique. Even though it may seem illogical and potentially dangerous, it can increase utilization of hardware and save costs when skillfully
implemented. In the context of storage systems, memory overcommitment is similar to thin provisioning.
Thin provisioning
Provisioning is the process of allocating storage resources and assigning storage capacity for an application, usually in the form of server
200
disk drive space. The storage administrator has to determine the applications need for storage and allocate adequate resources on a storage
system. These resources are then mapped to a server as a logical unit
(LUN) and made available for an application. Nowadays, we differentiate between the traditional model of provisioning (also called fat provisioning) and thin provisioning.
When you deploy a new application, you need to assign it a storage
area. To avoid future service interruption, it is common to overallocate
storage by 75% or more. This is called fat provisioning. Imagine that
your application requires a storage area of 100GB. You expect that application requirements on storage will rise in future. To ensure that the
application always has enough storage, you allocate 200GB. To allocate
200GB you need to have this capacity installed in your storage system.
If you do not, you need to purchase additional hard drives. Remember
that with RAID implementation and possible employment of in-system
replication you may need about 1TB of physically installed capacity to
be able to provide 200GB of storage for this particular application. This
can be costly, especially if you have dozens of applications. To achieve
higher utilization, you can implement thin provisioning functionality.
In thin provisioning, whenever there is a new application, its entire
storage allocation is not required initially, so the application is allocated
virtual storage. This is similar to memory overcommitment in server virtualization.
Purchased,
Allocated
BUT UNUSED
Actual DATA
Fat Provisioning
Purchase
capacity
As needed
Initial
purchase
and
allocation
Purchased, Allocated
BUT UNUSED
What you
need initially
Purchased, Allocated
BUT UNUSED
Initial
purchase
Actual DATA
Thin Provisioning
Figure 5.6 Comparison of fat and thin provisioning.
201
SERVERS
HDP POOL
Parity Groups/HDDs
Figure 5.7 Thin provisioning. Parity groups are added to a thin provisioning pool. Virtual volumes are mapped to servers. Virtual volumes do not contain any actual data. Data is stored in
the storage pool. Virtual volumes contain pointers that point to the location of data in the pool.
202
203
Example: When you store 200GB in each LU, you need to purchase
1TB only. Some margin needs to be taken into account.
Data size
200GB
LU
(2TB)
LU
(2TB)
LU
(2TB)
LU
(2TB)
LU
(2TB)
Data size
200GB
Virtual
LU
(2TB)
Virtual
LU
(2TB)
Virtual
LU
(2TB)
Virtual
LU
(2TB)
Virtual
LU
(2TB)
Figure 5.8 An example of how thin provisioning can help you save the cost of buying all the
capacity in advance. Thin provisioning is sometimes called dynamic provisioning.
204
To add physical capacity at a later time, you should analyze the actual
data consumption for each thin provisioning storage pool in the past.
With this data you can plan the schedule to add physical capacity to the
storage pool. Also, when the actual data consumption increases rapidly,
you can get notification through alerts sent from the storage system.
205
mapped virtual volume, this data should be removed from the storage
pool and the space it occupied should be freed for other virtual volumes. This does not happen. When you delete data, the space in the
storage pool is not automatically freed. The file system usually does not
delete data it just marks it as empty and allows overwriting. The storage system does not know this and therefore cannot free the storage
space in the storage pool.
The file system can be tuned for performance or for effective data
management. NTFS is performance oriented. That is why it does not remove deleted data completely; it just marks it as deleted instead. NTFS
also performs write operations where the actuator arm and I/O head
are at the moment. With frequent delete and write operations, more
and more storage pool capacity is taken and almost none is freed. This
is not very effective when using thin provisioning.
On the other hand, certain file systems prefer more effective usage of
storage space. For example, the VMware file system overwrites deleted
data, which does not consume additional capacity from storage pool.
The issues connected with deleted data and cleaned storage pool
are caused by the independent nature of the relationship between
thin provisioning functionality and the file system. When we are not
using thin provisioning, the file system operations do not cause trouble. However, customers often demand thin provisioning implementation, so the choice of file system needs careful consideration. To ensure
optimized utilization of the storage pool, storage system vendors implement zero page reclaim functionality. The storage system scans the
storage pool for used data blocks that contain only zeros. These blocks
are then erased and freed automatically. The problem is that zero page
reclaim is hardware functionality, which is again independent from the
file system. A file system usually does not overwrite deleted data blocks
with zeros. To make zero page reclaim functionality work, we have to
interconnect its logic with file system logic by choosing the suitable file
system and configuring it properly.
206
Because of its advantages, thin provisioning is very often implemented. To ensure good utilization of a storage pool that provides capacity to
virtual volumes, we need to choose a suitable file system. The file system
should support effective deletion of once used data blocks and it should
perform write operations by overwriting data blocks that were marked as
empty instead of using unused capacity. Storage systems offer zero page
reclaim functionality, which helps to solve issues connected with freeing
up deleted data blocks in a storage pool. The file system must support zero
page reclaim functionality.
207
Unix
Host
Windows
Host
Mainframe
SAN
Virtual Storage Platform
Cache
Vol1
Hitachi Modular
Array
Vol1
Vol2
Vol2
Vol3
Hitachi Enterprise
Array
Vol3
Vol1
Vol2
Vol4
Vol5
EMC Array
Vol3
Windows
Host
Vol1
Vol2
Vol3
Vol4
Unix
Host
IBM Array
Vol1
Vol2
Windows
Host
HP Array
Vol3
Vol1
Vol2
Vol3
Vol4
Unused
Volumes
Mainframe
SAN
Virtual Storage Platform
Cache
Vol1
Hitachi Modular
Array
Vol2
Vol3
Vol4
Vol1
Vol2
Vol3
Vol1
Vol2
Vol1
Vol1
Vol2
Vol3
Vol3
Vol4
Vol3
Hitachi Enterprise
Array
Vol5
Vol2
EMC Array
Vol1
Vol2
Vol3
Vol4
IBM Array
HP Array
Figure 5.9 Controller based virtualization of external storage, the physical and logical view.
Hitachi Virtual Storage Platform (VSP) is an example of an enterprise level storage system that
supports virtualization of external storage.
208
209
210
211
HP
IBM
Microsoft
Windows
FC/IP
SAN
4
2
1
Virtual Storage
Platform
Physical Storage Pool
High Perf.
99.99%
Thunder 9500V
100%
General Purpose
Lightning 9900V
Backup
Archive
Thunder 9500
SATA
Figure 5.10 Logical partitioning of a storage system. In this figure we see one storage
system (Hitachi Virtual Storage Platform) with three external storage systems that create
a virtualized storage pool. USP is then virtualized to provide two logical partitions private
virtual storage machines. Hosts are then able to access and use only the resources (cache,
ports and disks) assigned to the respective partition.
212
213
214
With automated (or dynamic) tiering, the complexities and overhead of implementing data lifecycle management and optimizing use
of tiered storage are solved. Dynamic tiering automatically moves data
on fine grained pages within virtual volumes to the most appropriate
media according to workload, maximizing service levels and minimizing
the total cost of storage system ownership.
For example, a database index that is frequently read and written
will migrate to high performance flash technology, while older data that
has not been touched for a while will move to slower, cheaper media.
No elaborate decision criteria are needed; data is automatically moved
according to simple rules. One, two or three tiers of storage can be
defined and used within a single virtual volume using any of the storage media types available for the particular storage system. Tier creation is automatic based on user configuration policies, including media
type and speed, RAID level and sustained I/O level requirements. Using ongoing embedded performance monitoring and periodic analysis,
the data is moved at the sub-LUN level to the most appropriate tier.
The most active data moves to the highest tier. During the process, the
system automatically maximizes the use of storage, keeping the higher
tiers fully utilized. Automated tiering functionality is usually available
only on enterprise level storage systems.
All the benefits of Dynamic
Provisioning
Dynamic Tiering
Figure 5.11 Dynamic tiering (also automated tiering) uses dynamic provisioning technology
and thus inherits its advantages of simplified provisioning, capital savings and self-optimizing
performance maximization. OPEX is an abbreviation of operational expenses.
215
Tier 2
Least
Referenced
Pages
Automated
Tiering
Volume
Normal
Working
Set
Tier 3
Quiet
Data Set
216
Automated tiering works on the storage level, without the participation of hosts. It does not recognize files; instead it divides raw data blocks
into chunks called pages. This is often called page level tiering.
20% of data accounts for 80% of the activity this is why automated
tiering increases utilization and optimizes costs. It moves active, frequently accessed data to high performance and expensive tiers, while storing
cold data in cheaper volumes.
Automated tiering is often implemented with thin provisioning. Both
technologies use virtual volumes that draw capacity from virtualized storage space storage pools.
217
look at the difference between the two shortly. First, lets look at the
similarities.
Both in-band and out-of-band virtualization controllers add intelligence to provide functionality available only in high-end enterprise level storage systems. These functionalities include mirroring, replication
and leveraging the connectivity of the SAN. Storage and SAN vendors
argue that intelligence and storage functionality belongs in the SAN (it
should be controlled by a dedicated appliance), and storage should be a
commodity. Both in-band and out-of-band solutions have no relevance
to the controller-based storage virtualization implemented in some
enterprise storage system. These solutions are thought to have an advantage over storage controllers since they map common functionality
across heterogeneous storage systems.
The general concept of in-band is that the mapping of virtual to
physical is done in an appliance that sits in the data path between the
host application and the target device. With out-of-band, the virtual to
physical mapping is done in an appliance that sits outside the data path
and the mapping is communicated over a control path to SAN switches
(or HBAs) that sit in the data path between the host application and the
target device.
In-band virtualization refers to its location in the storage network
path, between the application host servers and the storage systems.
An appliance or blade intercepts the I/O from the host, cracks open
the data packets to examine the original content directory blocks and
remaps them to another storage address. It provides both control and
data along the same connection path.
Out-of-band virtualization describes products where the controller
is located outside of the SAN data path. The virtualization appliance
communicates over a separate network with agents installed on the
hosts, intercepting and redirecting the I/O across the SAN. This type of
solution separates control and data on different connection paths.
218
In-band
virtualization engine
SAN interconnection
Data
Server
Out-of-band
virtualization engine
Storage pool
SAN interconnection
Data
Server
Storage pool
219
Private
Hybrid
Internal
Data center agility
Public
External cloud
Trusted cloud
External cloud
Regional Data
Center 1
Hybrids
Hybrids
Regional Data
Center 2
Public Cloud
Service Provider
Data Center
Private Cloud
Hybrids
Regional Data
Center 3
Regional Data
Center 4
Adoption
Figure 5.14 Cloud solution topology.
220
221
222
Future of virtualization
Virtualization of IT infrastructure is becoming a mainstream trend.
We can encounter a certain degree of virtualization implementation in
almost every organization that uses an IT infrastructure. Its likely that
blade servers with native virtualization capabilities are going to spread
massively. The technology of Hyper-V Server is also undergoing major
development and can be expected to increase its share in the virtualization solutions market.
As for storage systems, we primarily use networked storage systems.
By networked storage systems we mean either a NAS system or a traditional block-level SAN storage system. Virtualization in storage systems
is frequently represented by thin provisioning and automated tiering.
In the future, its likely there will be an increased demand for storage
system logical partitioning as IT moves toward the cloud model of IT
services distribution. The capability to virtualize external storage is also
expected to undergo a boom as the need for consolidation increases. So
far, expensive licenses and storage systems capable of external storage
virtualization prevent large-scale implementation of this type of solution.
Market Adoption Cycles
Direct Attached
Storage
TODAY
Network
Storage
Cloud or Virtualized
Storage
Figure 5.15 Storage systems are heading towards fully virtualized solutions.
This chapter provided you with all the essential information on virtualization, its layers and areas of application. You were introduced to server
virtualization with VMware and Hyper-V software. You should know that
blade servers can be especially suitable for virtualization implementation.
Some blade servers offer advanced built-in virtualization capabilities we
call logical partitioning.
In storage systems, the prevailing virtualization capabilities are represented by thin provisioning and automated tiering. Both of these functionalities allow for greater flexibility, increased performance and improved
utilization of a storage system. They also provide simplified management
of your IT infrastructure.
High-end enterprise level storage systems can be used to virtualize external storage, which is very helpful when you need to consolidate your
heterogeneous IT infrastructure. Logical partitioning is extremly useful
when you want to ensure high performance and QoS of a critical application. Logical partitioning of a storage system is extremely useful in large
data centers that lease virtual storage machines for money. Cloud solutions can also use and benefit from this functionality.
223
224
226
Introduction to archiving
In previous chapters, you learned that there are different types of
data. Traditional storage systems, both modular and enterprise, are
very efficient when processing structured data, or data in transaction based applications that work with large databases. This efficiency
is possible thanks to large, powerful cache and high data throughput
on front end and internal busses. Storage systems can handle a large
amount of I/O per second (IOPS).
Unstructured data includes text documents, worksheets, presentations, pictures or digitized material, such as scanned copies. This data
is represented by individual files. To store this kind of data, we can implement tiering that allows low cost storage of data that isnt accessed
frequently. It is also possible to implement a NAS device and share logical volumes of a storage system as network drives, making them easily
accessible to users. To effectively organize and manage your unstructured data, you can use content management platforms, such as Microsoft SharePoint or Hitachi Content Platform. A content management
application usually runs on a server and offers extensive possibilities
of storing, organizing and distributing unstructured data. We will talk
about content management platform capabilities in detail later in this
chapter.
Apart from structured and unstructured data, we have backup data
and data that needs to be archived. Backup data is traditionally stored
using low performance hard drives or tape libraries. The purpose of
backup data is clear in the event that we lose primary data, we can
restore it from a backup copy.
Archiving data is very different from backing up data. In archiving,
we always work with fixed content. Fixed content represents data that
will not be changed anymore. We archive data to meet legal requirements (as with medical records and accounting). We also archive data
because we want to protect older information that is not needed for
everyday operations but may occasionally need to be accessed. Data
227
Fixed content
Fixed content consists of data such as digital images, email messages, video content, medical images and check images that do not
change over time. Fixed content differs from transaction based data
that changes frequently and is marked by relatively short usefulness.
Fixed content data must be kept for long periods of time, often to comply with retention periods and provisions specified by government regulations and law. Transaction data needs to be accessed very quickly.
However, access times for fixed content data are not so important.
Traditionally, fixed content data used to be stored on low cost tapes
or optical media usually with write once read many requirements,
while high performance storage systems were used mainly for transaction data. With the price of hardware going down and the application
of techniques such as automated tiering, nowadays its customary to
store fixed content data on low performance, inexpensive, online SATA
drives installed on storage systems.
Fixed content can be managed through a server running specialized
archiving software or by a dedicated appliance. In an organization with
many branches, an archiving appliance can be installed in the branch
office and feed the archiving solution located in the main data center.
228
Legal Records
Satellite Images
Biotechnology
Email
Digital Video
Medical Records
229
230
NAS Head
Storage
Tape Library
NAS
- No retention policies
- Limited protection from
user deletion of files
Archive
Optical Jukebox
RAID Array
- No protection from
user deletion (WORM)
- Limited grooming policies
(deduplication, search... )
- Limited WORM
flexibility
for deletion
- Older (expensive)
technology
Figure 6.2 Seeking an archiving solution. The storage systems we have discussed up until now
are not very suitable for fixed content storage and archiving.
231
Manufacturing biologics
Healthcare HIPAA
All hospital records in original forms
Medical records for minors
3 years
End-of-life enterprise
OSHA
Sarbanes - Oxley
Figure 6.3 Legal requirements of data retention periods. We can see that many organizations have increasing regulations, especially in the pharmaceutical industry, the food processing
industry, healthcare, financial services and auditing. The Sarbanes-Oxley Act very strictly regulates the length of retention of financial records and accounting in companies.
232
Data creation
applications
Email Server
Document
Management
General Accounting
Web Application
Completely Duplicated/
Interconnected hot-site
Remote Disk Mirroring
More
Disk Mirroring
Shared Disk
Disk Consolidation
Single DIsk Copy
Electronic Vaulting
Tape On-site
Tape Backup
Off-site (trucks)
Search
#1
- Lack of scalability
Search
#2
SMTP
CIFS
Imortance
of Data
More
Search
#3
Amount
of Data
Less
Less
Delayed
Recovery
Time
NFS
Immediate
Search
#4
HTTP
- No search across
disparate storage
system
Storage
NAS Head
Tape Library
Optical Kukebox
NAS
RAID Array
Figure 6.4 An example of a decentralized and fragmented archiving solution. Disparate storage systems do not provide a common search engine, and they are not very scalable. A digital
archive can solve this problem.
233
234
235
<M>
Policies
Blocks
Files
Email
X-rays
Photographs
Sattelite images
Microsoft Word documents
Microsoft PowerPoint
presentations
Object
<M>
Figure 6.5 A digital archive works on the object level. Each object contains fixed content data,
metadata and description of policies.
236
<M>
LUN Based
Object Based
High Speed
Object Aware
Huge Capacity
Policy Enforcement
Figure 6.6 A traditional block level storage system compared to an object level storage system. The object-level storage system consists of powerful proprietary servers and management
software. These servers are connected to a RAID array.
237
.lost+found
My Document.doc
fcfs_metadata
.directory-metadata
info
settings
My Document.doc
Metadata presented as
file system object
core-metadata.xml
created.txt
hash.txt
Authenticity established
with standard hash
algorithms
retention.txt
Retention period
managed for
each object
dpl.txt
index.txt
shred.txt
tpof.txt
Number of copies
maintained
Figure 6.7 A data object and its components in detail. This example illustrates how objects are
handled by a Hitachi Content Platform digital archive.
238
239
digital archive must repair the data and return it to its original, uncorrupted state.
Corrupted data can be repaired from an existing good copy (such
as a RAID or digital archive replica). The digital archive creates a new
copy of the data from the good copy and marks the corrupted copy for
deletion.
Protection service is implemented in order to ensure stability of the
digital archive by maintaining a specified level of archived data redundancy. This level is called the data protection level (DPL) and is set by
the digital archive administrator. Different settings can be configured
for different types of archived content. DPL configuration tells the digital archive how many copies of the archived document it should keep.
Compression service can be implemented in order to achieve better
utilization of storage space assigned to the digital archive. As we described in Chapter 4, compression detects repeating sequences of ones
and zeros, searching for repeating patterns. These patterns are then replaced by pointers to the first occurrence of the pattern. Compression
effectiveness is determined by the level of data entropy. Please refer to
Chapter 4 for more detail.
Deduplication service, which is very well known from backup solutions, can be implemented in a digital archive as well. Unlike compression, which searches for strings of repeating bits, deduplication
searches for duplicities in files. Compression works on the level of ones
and zeros, while deduplication works on the file level. Its possible for a
single document to be marked for archiving from several locations. The
digital archive detects duplicities. For two or more files to be deemed
identical, their hash values must match. Deduplication can save significant amounts of storage space.
Replication service represents a recommended solution for ensuring redundancy of archived data. It is usually not possible to back up a
240
Search is a faster way to find desired information within multiple storage resources. Indexing is a process that helps you search content much
more quickly. The indexing process creates an interpretation of the content and metadata of a file or object and stores that interpretation in an
optimized way for faster, more complete and accurate search results.
Search tools query the index for keywords, metadata and more. Results
are presented back through the search tool. A digital archive often provides indexing services that help search engines.
241
NFS and CIFS protocols allow connection of a digital archive as a network drive on UNIX and Windows systems. SMTP protocol allows direct
connection between an email server and digital archive. Thanks to http
and https protocols, authorized users can access the archived data through
any compatible web browser.
242
NFS
CIFS
Compatibility
interface, primarily
for UNIX
Mount the HCP
file system path
(data or metadata)
High protocol
overhead
Compatibility
interface, primarily
for Microsoft
Windows
Map network drive
to HCP file system
path (data or
metadata)
SMTP
Archive email
directly from
email servers
Fastest protocol
Many good client
libraries
GET, PUT, HEAD,
operations
Can specify
metadata in URL
Figure 6.8 In this example, a digital archive can be accessed using four independent standard
protocols. WebDAV (Web-based Distributed Authoring and Versioning) is an extension to HTTP
protocol that allows remote management of files stored on web server.
Open standards based protocols provide applications and users a common and familiar means for interfacing to the repository. Each maintains
the familiar and ingrained file system metaphor people have embraced for
years and does not require changing the paradigm for how files and content are organized in storage. It also gives independent software vendors
(ISVs) options for integrating based on the best approach for their particular application and its specific performance and/or functional needs.
That means multiple applications, and potentially users, share a common
repository environment simultaneously.
243
244
245
246
Digital archives are working with objects. The reason for this is to accommodate metadata and related information for each file or set of
files that constitutes an object. These objects are also ideal for future
data migration. Archived data is stored in the form of objects; the objects are then locked for compliance reasons. Individual objects are created in such a way that they are hardware platform independent. By
design, the objects must allow easy manipulation and migration to a
new hardware platform.
247
application. You learned earlier that archiving does not create copies
of data; it moves the original data to the archiving location. The data
taken from the parent application is moved to the digital archive and
replaced by a link reference to its location in the digital archive. Complete archiving solutions can be differentiated as file archiving, email
archiving, enterprise content management (ECM)/electronic resource
management (ERM) archiving, enterprise resource planning (ERP) archiving or health care archiving. Before choosing a particular model for
a digital archive, we need to think through the process of integration
into our existing infrastructure. The right digital archive must be capable of processing data from all important systems running on production servers.
248
Figure 6.9 Virtualization of a digital archive. Digital archives are suitable for integration in
large data centers. Individual virtual digital archives can then be rented to small and midsized
organizations that wish to outsource digital archiving.
249
Healthcare:
All medical images are automatically replicated to meet regulatory
requirements. All treatment records are to be kept for 50 years after a
patient passes away. All billing records are to be kept for 10 years after
they are paid in full. All MRI images are automatically moved from Tier
1 storage to the archive after 30 days.
Financial Services:
Automation based on defined policies stores mortgage documents
on WORM media upon completion and store for the life of the mortgage plus 10 years. Content that has fulfilled all retention requirements
is deleted and purged.
Policies are defined using the management console of a digital archive. Each object contains information about related policies and their
configuration in the form of metadata.
250
251
ing reports. We also have live data that does not fit these categories.
Every organization that conducts business needs to work on project
documentation. Every organization also needs to share information resources among employees to ensure knowledge is available throughout the company. This documentation is handled very well by content
management solutions such as Microsoft SharePoint.
SharePoint is a software based solution independent of the storage system field. It is an application that runs on a production server.
Files that are fed into SharePoint are stored in the form of database.
This database can be stored on the internal hard drives of a server,
on a NAS device or on a mapped LUN in a traditional storage system.
The management console allows sharing of documents mostly text
documents, graphs, spreadsheets and similar files among groups of
users that are given permission to access them. SharePoint allows both
effective content management and workflow management. Project
documentation can be modified, reviewed and versioned by all people involved in the particular project. SharePoint is a file based sharing
platform that allows comprehensive and sophisticated organization of
documents in relation to users and individual projects or tasks.
Digital archives can support integration with Microsoft SharePoint.
Specialized agents are installed on a server that runs SharePoint. Once
the particular project is finalized and closed, the live content (project
documentation) effectively becomes fixed content, which is no longer
accessed frequently. An agent identifies such files and automatically
transfers them to the digital archive.
252
254
255
a storage system administrator with very few duties. (In this situation,
whenever you need to change the configuration of your storage system,
you have to call the support line.) In contrast, some vendors prefer for a
storage system administrator to do all the tasks while the implementer
performs only the equipment installation. As a result, we will discuss
only the most common storage administrator tasks. At the end of the
chapter, we will also discuss the most common storage system implementer tasks.
256
257
258
259
RAID group, you need to choose physical hard drives that will be linked
together in a RAID array, and you need to select the desired RAID level.
Depending on your requirements, you will probably choose a RAID-5
level with a distributed parity that protects against one hard drive failure. If higher data protection is desired, you may choose a RAID-6 level,
which protects against two hard drive failures. If the highest possible
performance is required, you may even choose RAID-1+0, which implements data striping and disk mirroring. There is no parity in this RAID
level. Therefore, there is no need for data reconstruction if one disk
fails, since you always have a fully operational mirror disk. To balance
performance and cost, you would probably go with RAID-5 or RAID-6.
260
Once you have created RAID groups, you need to decide your LUN
allocation policy. You have to decide especially whether you prefer fat
provisioning or thin provisioning. In the first case, you create a LUN and
map it to the server. For thin provisioning, you need to set up a storage
pool, create virtual volumes and map the volumes to servers.
As a storage administrator, you need to consider which front end
ports are used for mapping LUNs. To balance the load, the number of
LUNs linked to servers should be even for each port. If you have eight
LUNs and four ports, you should configure LUN mapping so that there
are two LUNs per port. This important step allows you to achieve high
performance, and it prevents bottlenecks that might have occurred if
you had linked all eight LUNs through only one or two ports.
In the case of fat provisioning, you should know how to enlarge the
existing LUN. There are various tools to do so. First the changes in configuration must be made in the storage systems management console.
The existing volume is enhanced. To complete this task, you also need
to enhance the file system to spread over the volume.
Imagine you have created a 10GB volume and mapped it to a server,
which formatted the volume with a file system. When you enlarge the
LUN on the storage systems level, the LUN will have, lets say, 20GB, but
the file system is still capable of accessing only the originally allocated
10GB. To overcome this issue, you need to perform file system extension operation. At the end of the procedure, you have a 20GB LUN formatted with a file system that is able to use all 20GB. In thin provisioning, when you get an alert that signifies the virtual volume and storage
pool are running out of capacity, you need to contact your supplier for
installation of additional physical hard drives.
Storage administrator tasks connected with data replication are
often limited. Usually, the supplier creates and configures replication
pairs, and you can make adjustments, such as changing the frequency
in which snapshots are created. The default supplier setting (according
to organization requirements) is that snapshots are created every 4hrs.
261
If this becomes insufficient, you can change this to 2hrs. You can also
perform pairsplit operations with in-system replication to be able to access data replica located on an S-VOL for the purposes of data mining or
data warehousing. However, you cannot usually create new pairs or set
up and configure remote replication pairs.
Cache optimization is one of the most difficult tasks you will face as
a storage administrator. Cache greatly influences the performance of
the whole storage system. We know that hard drives, even with implemented data striping, are still the slowest component of the storage
system. Cache is designed to compensate for the performance differential between hard drives and solid state components. Proper configuration is, therefore, essential. You can adjust certain parameters of cache
to enhance I/O performance, which is the key for transaction based applications that use large databases. According to the database type, you
need to choose the right block size the cache is handling. If a database
uses large chunks of data (512K), the cache setting must correspond.
When a database works with small chunks of data (16K), you need to
adjust the cache settings as well to achieve the best performance and
utilization. It is also possible to prioritize certain hosts. I/O requests
coming from such hosts are processed with higher priority. Additionally,
you can divide cache into logical units when you virtualize the whole
storage system (one storage system is divided into several logical storage systems), or when you need to support volumes under high load
and the character of the load requires large cache. It is, therefore, possible to assign a certain portion of cache to a particular volume.
You can export the configuration of the whole storage system into
a single file. You should export regularly and back up the configuration
files. If the configuration is accidentally altered, it is your task as a storage administrator to return the configuration to its working state, which
is easy to do using the backed up configuration files.
262
263
Figure 7.2 Among IOPS on ports and CPU load, the storage administrator also monitors effective utilization of installed capacity. Here you see an example of a RAID group load and performance monitoring.
264
265
Checking
Operating
Action
Alarm
Setting
Create alarm definition
To set alarm definition, Performance Reporter
provides Alarms Settings GUI
including import/export alarm definition capability.
Figure 7.3 Automated monitoring functionality. You can define situations that trigger an action once they occur. Whenever performance and hardware health values reach the defined
status, an alarm (alert, warning) is sent to the storage system administrator. The storage system
administrator assesses the situation and ensures the problem is eliminated.
266
267
Figure 7.4 Command line interface. An example of a Hitachi Dynamic Link Manager configuration. This software takes care of path management.
268
269
Security management
Part of the storage administrators job is to make sure the data is
stored securely. A SAN environment provides inherited security features, mainly because of its isolation from LANs requiring more complex
security management. LUN mapping itself can be seen as a security feature because it links a LUN to a designated HBA (according to its WWN)
through a selected front end port. Other servers cannot see this LUN.
Zoning implemented on Fibre Channel switches is another common
method of security enhancement. Zoning is described in detail in Chapter 3. It is also possible to use data encryption on the disk array, but this
can affect its performance. In the case of NAS devices connected directly
270
271
the customer with the new device. The storage implementer is authorized to conduct all hardware replacement and upgrade procedures. If
a disk or other component fails, it is the implementer who can replace
the malfunctioning part. The implementer also monitors the storage
system remotely and, in cooperation with a storage system administrator, helps with performance tuning and advanced operations. Microcode updates are also the responsibility of a storage implementer.
This chapter provided you with a more practical insight into the duties
and responsibilities of a storage system administrator. Remember that the
storage system administrator is responsible for the whole SAN environment and should be able to configure Fibre Channel switches and server
HBAs, as well as perform basic storage system operations. By basic operations we particularly mean the creation and configuration of volumes
and their management. A storage system administrator also monitors the
operation of a storage system to detect possible malfunctions or performance bottlenecks. The storage system administrator is the person responsible for data management, design and infrastructure upgrades that
ensure data storage.
272
Performance tuning,
optimization and storage
consolidation
274
275
CPU
Time
Queue
Time
I/O
Time
25%
15%
60%
Figure 8.1 Typical online transaction processing. A single transaction requires a certain portion of CPU time. Then it queues and is processed further. At the end of the whole process, the
transaction is concluded. In other words, the requested data or acknowledgment of a successful
write operation is sent back to the host. Notice that the average time for processing a single
transaction is 30ms. This does not include the time necessary for delivering data to its destination over the network.
276
When the I/O response time takes longer than it should, the storage administrator must determine where there is a problem. The simplest performance tuning is based on an even distribution of load over
storage systems and their components. If one controller is overloaded,
then the first option is to reassign the problematic volume to a different controller or to enable load balancing. The same counts for front
end port traffic. When one port becomes overutilized, redirect the data
traffic over a different port.
Storage system vendors offer software tools for performance tuning. These tools are designed to cooperate with the most often implemented applications and databases that run on the production servers.
These tuning managers can analyze load characteristics and suggest
changes in configuration. Modern storage systems allow nondisruptive
performance tuning. When we change the storage system configuration, it usually does not happen quickly. For example, when we change
the RAID level type, enable load balancing or change LUN characteristics, it takes time to implement these changes (especially because all
stored data needs to be moved to a new location). Nondisruptive performance tuning is, therefore, a very important feature.
However helpful the tuning managers are, their area of application
is usually limited to storage systems. The bottlenecks may occur on a
SAN as well. The tuning managers provide only monitoring and are able
to recommend changes in configuration. But in the end, it is always
the storage system administrator who must decide which changes to
implement and how. To complete this task, the storage system administrator must have the tools to measure the most important characteristics and the knowledge to use them. The administrator (or the storage
system implementer, depending on vendor policies) must have tools
for implementing a new, improved storage system configuration. The
administrator must also be able to write scripts that can provide automation of the most common performance tuning tasks.
277
278
Host
Switch
4
ILS
2
Back end
Cache
Front end
2
1
Switch
Figure 8.2 This picture depicts all the locations where we can measure performance. Continuous monitoring of these locations provides us with data necessary for performance analysis.
279
St
Gr ora
ou ge
ps Sy
st
e
St
Ba ora
ck ge
En Sy
d ste
Sy
st
e
St
Po ora
rt ge
Co
re
D
h
eS
wi
tc
Ed
g
Ad
a
Ho
st
Bu
s
Ap
pl
ica
tio
n
pt
er
ire
c
to
r
Cache
2
Ex
Sy ter
st na
em l S
to
ra
g
Figure 8.3 The I/O journey starts and ends at the application. In the host we can measure
the response time, including the sum of time required to process an I/O transaction in each
device located in the data path. The response time measured in the host includes network time
and storage system I/O transaction response time. This measurement can show you there is a
problem with response time, but it does not tell you whether the problem is in the network or
the storage system.
In a SAN we focus mainly on Fibre Channel switches. We can measure throughput on individual ports (MB/sec), buffer credits, cyclic redundancy check (CRC) errors and the operations of interswitch links.
The throughput measured on a particular Fibre Channel switch port
provides us with the amount of data transferred from all hosts connected to this port. Remember that we connect all our hosts to Fibre
Channel switches. These switches are also connected to our storage
systems. Depending on your zoning and infrastructure architecture, the
port linked to a storage system has to service several ports connected
to the host. One Fibre Channel switch port can therefore aggregate
traffic coming from several hosts.
As an example, imagine a situation where three hosts are linked
to one front end port on a storage system. The throughput measured
on the switch should correspond with the throughput measured in
280
the HBA and the storage systems front end port. The Fibre Channel
switches need to be configured so that they are able to handle the
throughput. There need to be reserves in the available bandwidth to
accommodate a peak load without delays or disruptions.
Buffer credit flow control is Fibre Channel switch functionality. Buffer credits represent the number of data frames a port can store. This
is particularly useful for long distance Fibre Channel connections (for
example, a Fibre Channel offsite connection for synchronous remote
replication), where the response time is longer because there are limitations determined by the laws of physics (for example, the response
time cannot be faster than the speed of light). If the network is under a
high load and its bandwidth becomes insufficient, ports can store several data frames that then wait for processing. If a port cannot store
more data frames, the host stops sending I/O requests until space is
freed for more data frames. Buffer credit values can show us whether
our network has any bottlenecks.
CRC and other error reporting mechanisms operate on the switch
level. Their function is to ensure consistency of transmitted data. The
switch communicates with the HBA and front end ports in a storage
system and detects any possible loss of data frames. Data frame loss
can occur mainly if there is a problem with cabling and connectors.
If a data frame is lost during the transport, the host must repeat the
I/O request, which causes delays. This type of error points to problems
with the physical networking infrastructure. All cables and connectors
should be checked.
Note that CRC errors are well known in CD and DVD media, as well as
hard drives. In CD and DVD media, it usually means the surface of the disk
is scratched and the data cannot be reconstructed, which results in missing blocks of data and inconsistency. In a hard drive, the CRC error occurs
when the data is stored on a bad sector. This bad sector is then marked as
damaged and no longer used for data storage.
281
An interswitch link is the connection between individual Fibre Channel switches. The status report shows us whether there are any problems with the connection between individual Fibre Channel switches.
In storage systems we have a number of possibilities regarding what
and where to measure:
Cache Write Pending Rate
- This feature shows us cache is not able to handle all I/O
requests efficiently.
- You must reconfigure cache or upgrade cache capacity.
- Problems are likely to occur when we have insufficient cache
capacity for installed or virtualized capacity (external storage).
Response time on front end ports
Throughput on front end ports
Usage of parity groups
Back end FC-AL or SAS operations
CPU load and load of SAS controller counting parity data
All these values will be described later in this chapter together, with
examples of best practices. In Figure 8.4 you can see an overview of
performance data we can collect. We have mentioned the most important ones, but there are many more. This should give you an idea how
complex performance tuning can be.
282
Storage Systems:
By processor
By LDEV
Cache utilization
By disk parity group
By database instance, tablespace, index and more
SAN Switches:
Servers:
Server capacity/utilization/performance
I/O performance total MB/sec, queue lengths, read/write
IOPS, I/O wait time
File system Space allocated, used, available
Reads/writes, queue lengths
Device file performance and capacity
Applications:
Figure 8.4 An overview of performance data we can collect. Remember that, for performance
analysis and evaluation, you need to collect data over time. Current values are of little use. The
measurements should be performed regularly and data for performance analysis should include
weeks or months of measurement in order to a get precise and helpful image of the utilization
of individual resources.
When you collect all this data, you need to perform analysis. The
analysis helps you investigate the tuning possibilities and suggest configuration changes. It should also help you formulate your long term
performance strategy. Using reporting tools that operate on all the
levels of your infrastructure including hosts, Fibre Channel switches
and storage systems provides you with output data. To be able to
interpret this data meaningfully, you must interpret the meaning of a
value from one source in relation to values from the other sources.
If you have a response I/O time value from a host, you need to interpret it in relation to the I/O response time measured in Fibre Channel
switches and storage systems. To do this you need to analyze the mutual relationship of the values and their meaning, either manually or
using specialized software. See Figure 8.5 for reference.
283
Server
Application
SAN
Storage
Server tool
Switch tool
Storage tool
Server
report
Switch
report
Storage
report
Spreadsheet
Server
Application
SAN
Storage
External
storage
Tuning Manager
Server
Oracle, SQL Server, DB
instances, tablespaces, folie
systems, CPU utilization,
memory, paging, swapping,
file system performance,
capacity, and utilization,
VMguests correlation
Switch
Whole fabric, each switch,
each port, MB/s, frames/sec,
and buffer credits
Storage
Ports, LDEVs, parity group,
cache utilization, performance
IOPS, MB/sec and utilization
Figure 8.5 You can analyze collected values and their relationships manually or by using a
specialized software tool.
284
285
286
287
288
read request much faster when the desired data is stored in cache than
if this data has already been destaged to hard drives. You can imagine
this as the highest tier when frequently used data stays in cache, it
can be accessed much faster than if it was stored on a hard disk or even
a solid state disk. Note that by frequently accessed data we mean data
that is accessed several times a minute. It is therefore a very small portion of data compared to the total capacity of a storage system.
When it comes to cache configuration, another thing to remember
is that sequential workloads use smaller amounts of cache memory as
compared to random workloads. The key to success, therefore, is in
understanding how an application utilizes cache. Applications have different cache utilization characteristics.
Impact of cache size on performance
A positive impact will be seen using larger cache, especially on workloads that are predominantly random read/write activity. The larger the
cache is, the longer a frame will stay in cache, thus increasing the chance
of a read hit on the same frame. Sequential workloads use small amounts
of cache memory and free cache segments as soon as the job is completed. As a result, cache sizing generally has little to no impact on sequential
workloads.
The size of cache you should use, therefore, largely depends on how
your application works. However, there are some basic rules for cache
sizing. When you upgrade your infrastructure with a new storage system, you should use at least as much cache as the storage system being replaced. Increase cache proportionally to any increase in storage
capacity. If you do not have any current performance data to start with,
configure at least 1.5GB to 2GB of cache per TB of installed capacity.
Start with the current hit rate if you know it. Remember the guideline
that doubling cache size reduces the miss rate by half. For example, if
your read hit rate is 80% with 16GB of cache, then doubling cache size
to 32GB should improve hit rate to 90%.
289
The last thing you need to consider when configuring cache is what
portion of cache will be used either for read or read/write operations.
The correct settings of cache proportions help to achieve higher performance. If these settings are not correct, the write and read operations
may perform more slowly than desired.
In addition to ports and cache, we can optimize LUN response time
and parity group performance. The LUN response time characteristics
and optimization are practically the same as in midrange storage systems and have been discussed earlier in this chapter. Parity group performance optimization is, again, similar to the midrange storage systems, so lets just look at a few of the most important tips:
Treat every sequential I/O request as four random I/O requests
for sizing purposes.
Do not allocate more than four hot LUNs per parity group. The
LUN is characterized as hot when its utilization reaches 70% or
more.
Do not allocate more than eight medium hot LUNs per parity
group. The LUN is characterized as medium hot when its utilization ranges between 50% and 70%.
290
Metric Name
I/O rate
Read rate
Write rate
Read block size
Write block
size
Read response
time
Write response
time
Avarage queue
length
Read hit ratio
Write hit ratio
Average write
pending
Value Description
I/Os per second
I/Os per second
I/Os per second
Bytes transferred per
I/O operation
Bytes transferred per
I/O operation
Time required to
complete a read I/O
(millisecond)
Time required to
complete a write I/O
(millisecond)
Average number of
disk requests queued
for execution on one
specific LUN
% of read I/Os satisfied
from cache
% of write I/Os satisfied
from cache
% of the cache used for
write pending
Normal Value
N/A
N/A
N/A
Bad Value
N/A
N/A
N/A
4096
N/A
4096
N/A
10 to 20 (DB)
2 to 5 (log)
>20
>5
10 to 20 (DB)
5 to 10 (log)
>20
>10
1 to 8
8 is the maximum queue
depth value
>8
25% to 50%
<25%
<100%
1% to 35%
>35%
Figure 8.6 Overview of metrics relevant to performance tuning and recommended normal
and bad values.
291
Microsoft Exchange
IBM DB2
This software correlates and analyzes storage resources with servers
and applications to improve overall system performance. It continuously monitors comprehensive storage performance metrics to reduce delay or downtime caused by performance issues. It facilitates root cause
analysis to enable administrators to efficiently identify and isolate performance bottlenecks. It allows users to configure alerts for early notification when performance or capacity thresholds have been exceeded.
In addition, it forecasts future storage capacity and performance requirements to minimize unnecessary infrastructure purchases.
Among other features of advanced tuning management software,
we can find tools that provide in-depth performance statistics of storage systems and all network resources on the applications data path. It
works together with storage functionalities such as thin provisioning or
automated tiering pools for usage analysis and optimization.
This software provides customizable storage performance reports
and alerts for different audiences and reporting needs.
When we implement such software, we can benefit from:
Simplification of performance reporting and management of the
storage environment
Improvement of service quality through accurate performance
reporting
Increased application availability through rapid problem identification and isolation
Reduction of storage costs through proper forecasting and planning of required storage resources
292
Automation of performance tuning is usually available only on enterprise level storage systems. Prerequisites for such automation are
implementation of thin provisioning and automated tiering. Some storage systems have utilities for performance monitoring and tuning inbuilt in the microcode.
Good tuning management software focuses not only on storage
systems but also on the SAN and hosts. It can happen that we look
for problems in the storage systems when the problems are actually
caused by incorrect application settings.
293
Consolidation project
When we realize that our company is spending too much money on
IT infrastructure operations, we should take action. The consolidation
starts with a consolidation project, which can be divided into four stages. To illustrate how a consolidation project looks and what it should
include, we will use consolidation of servers and storage systems as an
example situation. The stages of a consolidation project are:
Analysis of the current IT infrastructure state
Development of consolidation plan
Migration to new consolidated platform
Optimization of the new solution
The first step is to analyze the current IT infrastructure state. To do
so, you need to write down all your hardware and software resources.
Among hardware resources there are servers, networking components,
storage systems and workstations. Software resources include operating systems and applications. When you complete this list, you need to
focus on how all these resources work together and what performance
they deliver. You need extensive, detailed information on how your
hardware resources are implemented, interconnected and secured. As
294
for the software resources, you need to know the versions of the operating systems and applications are being used, a history of updates
and patches, and a detailed licensing analysis. You could use performance reports together with the performance monitoring history. Bills
for electricity and personnel training may come in handy as well.
Essentially, we can say that the first stage of consolidation, the analysis stage, focuses on three levels: technical, financial and performance
statistics. You can see that there are plenty of factors that need to be
considered. Because of the consolidation projects complexity, it is not
something that can be completed overnight. It can take up to several
months and the efforts of both IT management and IT administrators.
In the second stage of IT infrastructure consolidation, we take the
output of the analysis we made in the first stage and suggest a solution. There are many strategies on how to do this, but the key is to have
comprehensive knowledge of the products available on the market
that could help in consolidation. We also need to calculate operating
expense savings to prove the profitability of the new solution. Nowadays, most consolidation projects include migration to blade servers
and virtualization. Virtualization in servers is usually much simpler and
cheaper to implement than virtualization of storage systems.
The third stage is the migration to a new consolidated platform. To
perform a seamless migration, we should stick to the well known Information Technology Infrastructure Library (ITIL) methodology. Migration can be physical to virtual (P2V). Virtual to virtual (V2V) migration is
an option when we are talking about consolidation in an environment
that has already been virtualized.
Once we have migrated to the new solution, the last stage takes
place. We perform regular monitoring to determine whether the expected results of consolidation have been reached.
In this chapter, you learned performance tuning basics. You are now
familiar with basic performance metrics. You know where in your IT equipment you can make performance measurements, and you are able to interpret the output of these measurements. We have also discussed what
software tools you can use for performance tuning and what these tools
usually offer. Remember that the values presented in this chapter represent general information. Recommended performance values for your
storage system can differ from what is mentioned here. Please refer to
the documentation that was supplied with your storage system for more
detail.
295
296
Business challenges
What are you going to learn in this chapter?
How to see a storage systems infrastructure from the
perspective of IT managers
What business challenges companies face
How to accommodate accelerating data growth
How to describe advanced data classification and tiered
storage
Environmental concerns connected with data center
operations
The importance of TCO, CAPEX and OPEX calculations
298
9 Business challenges
Business challenges
Previous chapters of this book dealt mostly with the technical aspects of storage systems and SANs. You also gained insight into storage
system administration and maintenance. In this final chapter, we will
provide you with yet another perspective on storage systems. We will
show you what challenges companies face today and what solutions
help IT managers meet these challenges. This chapter will therefore
be particularly useful to IT managers. The data provided in this chapter
originated from complex research conducted by Hitachi Data Systems
on its customers. You will learn what the most painful issues are and
the options we have to overcome them. You will also be introduced to
some functionalities of storage systems that are designed to be environmentally friendly.
At the beginning, its necessary to realize that different companies
have different needs and they encounter different challenges. Your organization may face extremely fast data growth and need more and
more capacity to accommodate such growth. Another possibility is that
your organization does not suffer from a lack of storage capacity, but instead has serious problems with high availability this happens when
your data becomes so important that you cannot afford any disruptions. You may also have problems with a lack of physical space to place
new racks, insufficient cooling systems unable to service newly integrated devices and other issues. As a result, we will need some clear
classification of business challenges to avoid getting lost in the myriad
of possibilities. We can classify the challenges as follows:
Accelerating storage growth need more and more capacity
Increasing requirements on high availability cannot afford
any dis-ruptions because our data has become too important for
our business
Fast and effective response to business growth need to be
able react quickly to new conditions
9 Business challenges
299
300
9 Business challenges
300
Capacity (GB)
250
200
150
100
50
0
2008
2009
2010
2011
2012
2013
Figure 9.1 An overview of the storage requirements in the past years provide you with the
necessary information to forecast data growth.
The demand for storage capacity can be linear. If your storage infrastructure has not been consolidated yet, it can happen that the amount
of stored data drops once you centralize and consolidate your storage.
During consolidation, you also classify data and implement tiering. A
portion of your data can be archived to tapes, freeing space for live
data. Linear data growth is represented by a straight line in a graph. If
the growth is linear, you can forecast growth easily.
However, more often, the demand for storage capacity is geometrical, meaning the amount of data coming into existence is growing exponentially. Exponential growth is often caused by an increase of unstructured data, large quantities of metadata or an increased demand
for in-system replication, where S-VOLs are used by data mining or
warehousing applications.
9 Business challenges
301
302
9 Business challenges
Capacity (GB)
250 000
200 000
Tier 1
150 000
Tier 2
100 000
Tier 3
50 000
0
2011
2012
2013
Years
Figure 9.2 Data growth forecast in relation to performance tiers.
250 000
Capacity (GB)
200 000
150 000
Replicated data
100 000
Unstructured data
Traditional
Structured data
50 000
0
2005
2006
2007
2008
2009
2010
2011
Years
Figure 9.3 Structured, unstructured and replicated data growth.
303
9 Business challenges
Expensive
Inexpensive
Performance
Tier 1
Tier 1
Tier 2
Tier 1
Tier 1
Tier 2
Sec
uri
ty
ai
Av
lit
i
ab
Tier 2
Tier 2
Tier 3
Tier 3
Figure 9.4 Tiered storage infrastructure.
304
9 Business challenges
Implementation of a tiered storage infrastructure is a very powerful solution, which can offer many benefits, especially in terms of TCO.
Implementation of a tiered storage infrastructure would, however, be
useless without careful classification of data. This is the task of IT managers who are responsible for IT infrastructure design, development
and maintenance, including a SAN. Data classification is primarily a
manual process that cannot be fully automated. It is the IT manager
who has to discover and evaluate information assets and determine
whether or not this information has worth for your business. Data classification is, therefore, not only based on frequency of access, but also
on data value. For reference, visit the Storage Networking Industry Association (SNIA) Data Management Forum at www.snia.org. SNIA defines information classes in the following categories:
Mission-critical
Business vital
Business important
Important productive
Not important
Discard
At this point, its good to realize that we also need to determine
what kind of data can be deleted. Companies and their employees are
reluctant to erase data, but there should be certain policies governing
data shredding as well. These should be included in the information
lifecycle planning. To assign data to one of the above mentioned categories, we can use various metrics:
Usage pattern of the information frequent, regular, periodic,
occasional, rare, on demand or request, or never
Information availability requirement immediate, reasonable,
defined time frame, extended time frames, limited, not defined
or unnecessary
305
9 Business challenges
Financial impact of information unavailability significant longterm and/or short-term, potential long-term, possible, unlikely or
none
Operational impact of information unavailability significant
and immediate, significant over time, probable over time, possible over time, doubtful or none
Compliance impact of information unavailability definite and
significant, eventual, probable, potential or possible, or none expected
Tier 1 High Performance
Mission Critical Applications
SCM
ERP
Tier 1 Good
Performance
Business
Critical
Applications
Data
Mining
Hitachi
Universal
Storage Platform
CAD
IBM Enterprise
Storage Server
Data
Warehouse
EMC
DMX2000
Hitachi Virtual
Storage Platform
CRM
NAS
Video
Nearline or Offline
Storage
CLARION
Online
Tape
Hitachi
Adaptable
Modular
Storage 2300
Hitachi
Adaptable
Modular
Storage 2100
Figure 9.5 Overview of virtualized multitiered storage infrastructure with implemented data
classification.
306
9 Business challenges
307
Throughput
9 Business challenges
Enterprises design
Host multipathing
Hardware load balancing
Dynamic Optimized Performance
Dense storage, SAS, and SATA
99.999+% reliability
e
lac
n-p e
i
ta
ad
Da pgr
U
Adaptable Modular
Storage 2100
4-8GB Cache
4FC, or 8 FC, or 4 FC and
4iSCSI ports
Mix Up to 159 SSD, SAS
and SATA-II Disks
Up to 313TBs
Up to 2048 LUNs
Up to 1024 Hosts
e
lac
n-p e
i
ta
ad
Da pgr
U
Adaptable Modular
Storage 2300
8-16GB Cache
8FC, or 16 FC, or 8 FC and
4iSCSI ports
Mix Up to 240 SSD, SAS
and SATA-II Disks
Up to 472TBs
Up to 4096 LUNs
Up to 2048 Hosts
Adaptable Modular
Storage 2500
16-32GB Cache
16FC, or 8 iSCSI, or 8 FC and
4iSCSI ports
Mix Up to 480 SSD, SAS
and SATA-II Disks
Up to 492TBs
Up to 4096 LUNs
Up to 2048 Hosts
Scalability
Figure 9.6 Not all organization are big enough to implement a multitiered storage infrastructure. For small and midsized organizations and companies, there are modular storage systems designed to meet their needs. In this picture, you can see the Hitachi Adaptable Modular
Storage 2000 family and its scalability. This scalability and the possibility of easy upgrades are
important features that should be considered when designing a storage system infrastructure.
308
9 Business challenges
the best protection available, you can implement three data center
multitarget replication. Please refer to Chapter 4 for more details. The
benefits of such a solution lie in fast data synchronization with long
distance recoverability and in reduced storage cost by dynamically provisioning pools at backup sites.
The universal replication feature, which is available in modern enterprise storage systems, supports synchronous and asynchronous replication for both internal and external virtualized storage. A single replication framework for all externally attached arrays by different vendors
represents a powerful solution that can effectively address all issues
connected with high availability clustering. There is no requirement
to implement multiple, disparate business continuity solutions across
each array. This significantly minimizes costs, reduces infrastructure
and simplifies disaster recovery management and implementation.
This single framework approach to managing business continuity also
eases IT manager concerns related to having a consistent, repeatable,
enterprise wide disaster recovery process.
Virtualization of external storage also simplifies backup solutions
since the backup infrastructure only needs to be zoned to one storage
system, no matter how many different physical storage systems are virtualized behind it. This simplifies the overall management of the entire
centralized backup infrastructure.
9 Business challenges
309
310
9 Business challenges
311
9 Business challenges
Power and
Spending
(US$B)
29%
New serve
Power and
$80
$70
21%
$60
13%
$50
6%
$40
3%
$30
$20
Excessive
Heat
cility problem
center?
Insufficient
Power
$10
$0
$80
$70
$60
(M units)
of the Poor
Location
3%
14
$30
6
4
$10
$0
16
10
$40
$20
Excess
Facility Cost
18
12
$50
6%
Excess
Facility Cost
2
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
312
9 Business challenges
Electricity and cooling costs are not the only motivation for reducing
power consumption and implementing electricity saving features. The
onset of global warming is expected to increase the demand on existing
cooling systems, leading to costly investments in new plants. More efficient use of an existing storage infrastructure can reduce the demand
for new storage, ease the demand for power and cooling and enable
a business to reduce costs and address its green obligations. Some of
the challenges facing data centers in respect to electricity, cooling and
environmental requirements include:
Running out of power, cooling and space
Growing energy costs
Increasing regulatory compliance issues
Data center expansion without consideration for future power
and cooling requirements
Data storage configured without adequate consideration to heat
distribution (equipment racks should be installed with cold rows
and hot rows this will be discussed later in this chapter)
Difficulty relieving data center hot spots without disrupting applications
The solution capable of addressing these issues should provide features that lead to cost-effective and energy efficient management and
operation of storage systems. It should also provide the ability to extend the life of existing storage assets.
When you design a storage infrastructure, you need to analyze the
power and cooling requirements as well as the requirements on the
space in your server room. In addition to power consumption metrics
in kW, there are other metrics that should be considered:
Total five-year power and space costs
Heat loading (kW/sq ft)
9 Business challenges
313
314
9 Business challenges
figured, SATA drives will park heads when idle for more than 2hrs. Modern
storage systems must also constantly assess cooling requirements and dynamically adjust fan speeds to maintain correct temperatures.
It would be useless to implement a spin down feature on RAID
groups that are highly utilized (production volumes used by transaction
based applications). However, there are disk functions eligible for spin
down solutions. These include:
Mirrored drive groups involved in backup to tape
Virtual tape library drive groups involved in backups
Local (internal) backup copies of data
Drive groups within archive storage
Unused drive groups
The capability to turn off hard drives after they are not accessed for
2hrs can increase power savings by 23%. This functionality is often implemented together with smart algorithms that minimize HDD usage.
These algorithms ensure that sequential writes and reads are done in
massive batches. Data to be written to a particular hard drive is kept
in cache as long as possible. Imagine that within a few hours there are
only four write requests for that particular hard drive. The algorithm
ensures these four write requests are sent to the hard drive in a batch
instead of accessing the hard drive four times.
The storage system controller also varies the fan speed to the internal temperatures to reduce power requirements for fan operation.
315
9 Business challenges
friendly and power efficient, is now possible. In 2009, this ideal data
center was built and began its operations. See Figure 9.8 for reference.
Applies principles from Eco friendy Data Center Project and IT Power Saving Plan
Location: Yokohama
Operation: 2009
Dimension: 10,000m2
Power Usage
effectiveness rating: 1.6
Rooftop forestation
Air conditioning facility
Electricity generator
High voltage substation
Revolving gate
Photovoltaic light
Security camera
Infrared sensor
In large data centers, racks with storage systems and servers are
placed in rows. Modern storage systems are designed for such positioning. The goal is clever distribution of heat by managing air flow. As
a result, you get hot rows and cold rows. In simple words, you arrange
racks in alternating rows with cold air intakes facing one way and hot
air exhausts facing the other. Typically, the front side of a storage system is designed to face the cold row, where cold air is distributed from
the air conditioner. The rear side of a storage system is then pouring
heated exhausts into hot rows. Typically, hot rows face air conditioner
return ducts, which take the hot air away.
316
9 Business challenges
Floor space
Power and cooling
SLA penalties: backups
Scheduled downtime
Outage risk
Contract labor
Internal labor
Hardware maintenance
Software maintenance
Purchase
9 Business challenges
317
Throughout this chapter, we have discussed many business challenges. Most of the challenges presented are related to TCO, and most
of the solutions presented help to lower TCO.
In addition to considering TCO, an IT manager must consider distribution of costs over time, such as the costs at purchase, the costs
after one year and the costs after five years. These expenses are calculated through two other values capital expenditure (CAPEX) and
operational expenditure (OPEX). CAPEX represents mostly the initial
purchase and the investments into technology. In other words, CAPEX
is how much you have to spend to get new equipment, licenses and
personnel. OPEX represents the costs for maintenance, electricity and
cooling.
CAPEX can be lowered by implementing functionalities such as thin
provisioning and zero page reclamation. OPEX can be lowered by virtualization and consolidation of infrastructure through higher utilization
of hardware resources.
The motto for lowering TCO, CAPEX and OPEX can be best described
as: Reduce costs by implementing technology! By now you have learned
that virtualizing and consolidating storage infrastructure can result in major cost savings. Implementing data tiers is also very important. All these
factors represent major investments, but when properly calculated, they
will pay you back through long term cost savings. Companies are often reluctant to invest large quantities of money into storage infrastructure, but
the financial figures are clear it is not efficient to maintain old technology that is no longer supported. From a long-term perspective, it is always
best to keep your hardware up to date.
318
9 Business challenges
Self-assessment test
320
1.
Self-assessment test
2.
3.
Self-assessment test
4.
321
5.
6.
322
7.
Self-assessment test
8.
9.
Self-assessment test
323
324
Self-assessment test
14. To ensure that the stored data has not been modified, the digital
archive:
a) Periodically checks whether the stored data still matches its
cryptographic hash value
b) Performs a pair split operation, which disconnects the volume
from production severs
c) Uses password protected file access and encryption
d) Needs to be regularly audited by an external auditing company
Self-assessment test
325
326
Self-assessment test
Correct answers:
1a, 2b, 3c, 4d, 5a, 6b, 7b, 8a, 9c, 10d, 11a, 12b, 13c, 14a, 15b, 16b, 17b,
18c, 19b, 20a
Glossary
328
Glossary
A
AaaS Archive as a Service. A cloud computing business model.
AD Active Directory
Address A location of data, usually in
ASSY Assembly
Asymmetric virtualization See Outof-band virtualization.
A disk drive implementation that integrates the controller on the disk drive
itself. Also known as IDE (Integrated
Drive Electronics) Advanced Technology Attachment.
B
Back end In client/server applications,
Backup image Data saved during an archive operation. It includes all the associated files, directories, and catalog
information of the backup operation.
329
Glossary
C
Cache Cache Memory. Intermediate
buffer between the channels and
drives. It is generally available and controlled as two areas of cache (cache
A and cache B). It may be batterybacked.
CBI Cloud-based Integration. Provisioning of a standardized middleware platform in the cloud that can be used for
various cloud integration scenarios.
CF Coupling Facility
Chargeback A cloud computing term
that refers to the ability to report on
capacity and utilization by application
or dataset, charging business users or
departments based on how much they
use.
330
Glossary
Cloud Fundamental A core requirement to the deployment of cloud computing. Cloud fundamentals include:
Self service
Pay per use
Dynamic scale up and scale down
331
Glossary
Data Integrity Assurance that information will be protected from modification and corruption.
Controller-based
virtualization
Driven by the physical controller at
the hardware microcode level versus
at the application software layer and
integrates into the infrastructure to allow virtualization across heterogeneous storage and third party products.
data from one storage device to another. In this context, data migration is
the same as Hierarchical Storage Management (HSM).
D
DAS Direct Attached Storage
Data block A fixed-size unit of data that
332
Glossary
Direct Attached Storage (DAS) Storage that is directly attached to the application or file server. No other device
on the network can access the stored
data.
E
ENC Enclosure or Enclosure Controller.
F
FaaS Failure as a Service. A proposed
business model for cloud computing in
which large-scale, online failure drills
are provided as a service in order to
test real cloud deployments. Concept
developed and proposed by the College of Engineering at the University
of California, Berkeley in 2011.
333
Glossary
ure tolerant systems in which a component has failed and its function has
been assumed by a redundant component. A system that protects against
single failures operating in failed over
mode is not failure tolerant, since failure of the redundant component may
render the system unable to function.
Some systems (e.g., clusters) are able
to tolerate more than one failure;
these remain failure tolerant until no
redundant component is available to
protect against further failures.
level, when one or more of its components has failed. Failure tolerance in
disk subsystems is often achieved by
including redundant instances of components whose failure would make the
system inoperable, coupled with facilities that allow the redundant components to assume the function of failed
ones.
FC-3 This layer contains common services used by multiple N_Ports in a node.
334
Glossary
FS File System
FTP File Transfer Protocol. A client-
input/output (I/O) interface for mainframe computer connections to storage devices. As part of IBMs S/390
server, FICON channels increase I/O
capacity through the combination of
a new architecture and faster physical
link rates to make them up to 8 times
as efficient as ESCON (Enterprise System Connection), IBMs previous fiber
optic channel standard.
G
Gb Gigabit
GB Gigabyte
Gb/sec Gigabit per second
GB/sec Gigabyte per second
GbE Gigabit Ethernet
Gbps Gigabit per second
GBps Gigabyte per second
GBIC Gigabit Interface Converter
Global Cache Cache memory is used
H
HA High Availability
HBA Host Bus Adapter - An I/O adapter
335
Glossary
kind
I
I/O Input/Output. Term used to describe
336
Glossary
IP Internet Protocol
iSCSI Internet SCSI. Pronounced eye
skuzzy. An IP-based standard for linking data storage devices over a network and transferring data by carrying
SCSI commands over IP networks.
IDE Integrated Drive Electronics Advanced Technology. A standard designed to connect hard and removable
disk drives.
Internal bus Another name for an internal data bus. Also, an expansion bus
is often referred to as an internal bus.
J
Java A widely accepted, open systems
programming language. Hitachis enterprise software products are all accessed using Java applications. This
enables storage administrators to access the Hitachi enterprise software
products from any PC or workstation
that runs a supported thin-client internet browser application and that
has TCP/IP network access to the computer on which the software product
runs.
337
Glossary
Mb Megabit
MB Megabyte
Metadata In database management
338
Glossary
MP Microprocessor
MTS Multitiered Storage
Multitenancy In cloud computing, mul-
N
NAS Network Attached Storage. A disk
array connected to a controller that
gives access to a LAN Transport. It handles data at the file level.
O
OPEX Operational Expenditure. This is
systems where the controller is located outside of the SAN data path. Separates control and data on different
connection paths. Also called asymmetric virtualization.
339
Glossary
P
PaaS Platform as a Service. A cloud com-
PiT Point-in-Time
PL Platter. The circular disk on which the
magnetic data is stored. Also called
motherboard or backplane.
340
Glossary
R
RAID Redundant Array of Independent
Q
QoS Quality of Service. In the field of
computer networking, the traffic engineering term quality of service (QoS)
RD/WR Read/Write
Glossary
technique which distributes data packets equally among the available paths.
Round robin DNS is usually used for
balancing the load of geographically
distributed Web servers. It works on
a rotating basis in that one server IP
address is handed out, then moves
to the back of the list; the next server
IP address is handed out, and then it
moves to the end of the list; and so
on, depending on the number of servers being used. This works in a looping
fashion.
341
A parallel bus architecture and a protocol for transmitting large data blocks
up to a distance of 15m to 25m.
342
Glossary
Module. Stores the shared information about the subsystem and the
cache control information (director
names). This type of information is
used for the exclusive control of the
subsystem. Like CACHE, shared memory is controlled as 2 areas of memory
and fully non-volatile (sustained for
approximately seven days).
Spare An object reserved for the purpose of substitution for a like object in
case of that objects failure.
343
Glossary
U
UPS Uninterruptible Power Supply - A
power supply that includes a battery
to maintain power in the event of a
power outage.
344
Glossary
type that is undergoing active development, and the details of the implementation may change considerably. It
is an application interface that gives user-level processes direct but protected
access to network interface cards.
This allows applications to bypass IP
processing overheads (for example,
copying data, computing checksums)
and system call overheads while still
preventing 1 process from accidentally
or maliciously tampering with or reading data being used by another.
a disk or tape. The term volume is often used as a synonym for the storage
medium itself, but it is possible for a
single disk to contain more than one
volume or for a volume to span more
than one disk.
YB Yottabyte
Yottabyte A highest-end measurement
Glossary
Z
Zettabyte (ZB) A high-end measure-
345
Storage Concepts
Storing and Managing Digital Data