On the Reliability of RAID Systems: An Argument for More Check Drives

Mann, Sarah Edge; Anderson, Michael; Rychlik, Marek

Abstract:In this paper we address issues of reliability of RAID systems. We focus on "big data" systems with a large number of drives and advanced error correction schemes beyond \RAID{6}. Our RAID paradigm is based on Reed-Solomon codes, and thus we assume that the RAID consists of $N$ data drives and $M$ check drives. The RAID fails only if the combined number of failed drives and sector errors exceeds $M$, a property of Reed-Solomon codes.
We review a number of models considered in the literature and build upon them to construct models usable for a large number of data and check drives. We attempt to account for a significant number of factors that affect RAID reliability, such as drive replacement or lack thereof, mistakes during service such as replacing the wrong drive, delayed repair, and the finite duration of RAID reconstruction. We evaluate the impact of sector failures that do not result in drive replacement.
The reader who needs to consider large $M$ and $N$ will find applicable mathematical techniques concisely summarized here, and should be able to apply them to similar problems. Most methods are based on the theory of continuous time Markov chains, but we move beyond this framework when we consider the fixed time to rebuild broken hard drives, which we model using systems of delay and partial differential equations.
One universal statement is applicable across various models: increasing the number of check drives in all cases increases the reliability of the system, and is vastly superior to other approaches of ensuring reliability such as mirroring.

Comments:	13 pages, 11 figures, 3 tables
Subjects:	Performance (cs.PF); Probability (math.PR)
MSC classes:	60K10 (Primary) 62N05, 90B25 (Secondary)
Cite as:	arXiv:1202.4423 [cs.PF]
	(or arXiv:1202.4423v1 [cs.PF] for this version)
	https://doi.org/10.48550/arXiv.1202.4423

Computer Science > Performance

Title:On the Reliability of RAID Systems: An Argument for More Check Drives

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators