EMC Isilon OneFS An Ops Managers Introduction

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Customer-contributed services & gear

Isilon OneFS: An Ops Manager’s Introduction


http://www.emc.com/collateral/hardware/white-papers/h10719-isilon-onefs-technical-overview-wp.pdf
EMC-contributed services & gear
http://www.emc.com/collateral/TechnicalDocument/docu52911_Isilon-Site-Preparation-and-Planning-guide.pdf

Theory Clients
The Isilon product delivers scale-out NAS using a distributed file system running over a symmetric cluster. Operational Services Integration
The file system employs a mix of mirroring and Reed-Solomon erasure codes as its parity scheme for deliver- SMB NFS FTP Aspera HTTP Hadoop Isilon clusters require tight integration with DNS servers in order to load-balance clients
ing fault-tolerance. In the language of the CAP theorem, OneFS delivers Consistency and Availability, sacrific- across nodes (the SmartConnect function), depend on a reliable NTP time hierarchy, and rely
ing Partitioning in the face of hardware failure (retaining read/write functionality only so long as a simple major- on NIS/LDAP/Kerberos/Active Directory services for authentication and authorization.
ity of nodes survive). OneFS leverages the underlying cluster to excel at streaming read/write throughput and
concurrency at the cost of transactional performance. Inspired by the *nix philosophy, Isilon clusters support
rich flexibility, allowing fine-grained hardware and software customization to support a wide range of work-
flows along with a rich tool set for optimizing performance. As nodes are added, the cluster’s processing,
caching, and IO capabilities increase, along with the efficiency and resiliency of its file system layout.
Front-end Network
Customers typically provide redundant Ethernet switches equipped with SFP+ or QSFP+ ports
DNS NTP LDAP AD KRB NIS
Clusters require robust Clusters consult one or more authentication /
Protection Levels DNS and NTP infrastructure authorization providers
The administrator specifies a cluster’s Protection Level: how many simultaneous failures of disks and/or Ethernet-A Ethernet-B
nodes the cluster can tolerate before data loss begins. OneFS responds to this setting by striping data appro- Node Types
priately. In the event of hardware failure, or the administrator changing the Protection Level, the FlexProtect Each Node Type contains multiple models, with configure-to-order mixes of drive types, drive sizes,
job runs, rebuilding the stripes as needed. Choosing a low Protection Level increases capacity while simulta-
and RAM within each model
neously increasing the risk of data loss. Isilon recommends careful attention to this choice.
F-Series SSD drives Flash: IOPS-intensive applications
Common Ways to Degrade Your Cluster
- Consume more than 85% of the cluster’s available space H-Series SAS & SATA drives Hybrid: customized mix of capacity & performance
- Exceed the cluster’s resources in terms of CPU, RAM, and/or IOPS
- Redline the cluster and then kick-off a resource-intensive job, like FlexProtect or AutoBalance Twinax or fiber optic cabling supports connectivity at 1G, 10G, or 40G A-Series SATA drives Archive: Capacity (cheap & deep)
- Set the Protection Level below your business’ tolerance for risk and for data loss Prepare for substantial IP space needs, which can range from a single /24 to multiple /20s
- Dawdle (or hit typical supply chain / delivery delays) when replacing failed disks or nodes
- Deploy suboptimal power delivery & cable management strategies Node Design
- Employ complex configurations: VLANs, identity management, and Access Zones
Nodes consist of customized PC server hardware, equipped with a dual-port back-end Network card
These choices interact synergistically to increase the chance of cluster down time and data loss, i.e. pick two (Infiniband or 40G Ethernet), a Journal card (aka NVRAM), RAM, and redundant batteries to keep
or more to substantially increase the odds of knocking out your cluster uncommitted writes in the Journal alive in the event of power loss. Nodes run OneFS (a FreeBSD
derivative), standard daemons, and a slew of custom daemons.
Optional Services
Customers can choose to add supporting services,
such as SyncIQ, InsightIQ, or third-party applications Ethernet Redundant Power Supplies
like anti-virus and auditing
InsightIQ Mirrored boot disks (flash)

NVRAM Batteries
httpd

Redundant
ICAP Audit
Performance monitoring
& reporting station node-n ndmp
ntpd
sshd
Integration with anti-virus Integration with audit node-1 node-2 node-3 node-4 node-5 node-6 node-... node-n vsftpd

Network Card
Journal Card
servers servers ...
node-1 node-5
Aspera
Nodes contain disks: they are heavy, requiring Flexnet
node-2 node-6 server-lift gear, robust racks, and reinforced flooring Likewise
Replication Data Disks
node-3 node-... Replication between clusters using SyncIQ,
leveraging each node to parallelize data flow Licensed Features
node-4 node-n CloudPools Tier data to cloud providers
InsightIQ File system metrics and performance trending
Remote Site Isilon for vCenter VMWare integration
SmartConnect Client load-balancing across nodes
SmartDedupe Space conservation within directories
File System SmartLock WORM in support of regulatory requirements
Each node contributes its data disks to the global file system /ifs (Isilon File System), with no intermediate volume or SmartPools Automated tiering between node pools of varying resources
partitioning abstraction. Files and their associated parity are chunked into 128K Protection Groups, which are striped SmartQuotas Limit utilization based on directories, users, and groups
across multi-drive DiskPools spread between multiple nodes according to algorithms which first honor the Protection SnapshotIQ Copy-On-Write snapshotting strategy
Level settings and then optimize performance. Where needed, these algorithms will mirror instead of stripe, e.g. SyncIQ Replication between Isilon clusters with failover/failback
small files consume a single Protection Group and are mirrored (i.e. the smallest file will consume 128K x 2). ...
Metadata is always mirrored at 8x. Storing a metadata mirror on SSD (Global Namespace Acceleration) is a popular
performance tweak. As nodes are added and removed, the available space in /ifs expands and contracts auto- Job Engine
matically while the AutoBalance and FlexProtect jobs modify and shuffle Protection Groups to continue to meet Infiniband cabling places intense demands on cable management infrastructure This scheduler runs numerous cluster maintenance processes
Protection Level guarantees and performance strategies. OneFS contains no in-built limitation to the size of /ifs, AutoBalance Redistribute data to more effectively leverage spindles
number of files, or breadth/depth of directory trees. File size is currently limited to 4TB (and >1TB degrades job
AVScan Initiated by ICAP servers
engine performance).
Collect Return deleted blocks to the free pool
Each node maintains a cluster-coherent view of the file system in terms of {node, disk, offset}, allowing the node to The data plane communicates using the Remote Block Manager protocol, while the control DeDupe Deduplicates identical blocks within a directory
which a client is attached to initiate its reads & writes, reaching across Back-end Network to other nodes as needed. plane uses TCP/IP protocols, both riding atop a 20Gb/s Infiniband or a 40G Ethernet fabric FlexProtect Restripe data and parity after disk/node failure
From the File System point of view, all nodes are peers -- there are no metadata or coordinating masters. File locking IntegrityScan Sanity-check and repair file system and block layout metadata
and locking coordination is similarly distributed. The administrator can tune read caching on a cluster, directory, or file MediaScan Scan drives for media-level errors
level to optimize for concurrency (adaptive algorithm), streaming, or random IO. Switch-A Switch-B TreeDelete Delete paths in the file system
...
OneFS makes heavy use of caching and delayed writes to improve performance. All writes pass through the Journal
on their way to disk; battery backup allows OneFS to treat all writes as synchronous, i.e. to acknowledge commit-to- Switches are dedicated to the Cluster. Historically, Infiniband; Gen6+ Isilon nodes also support 40G Ethernet Data Center Impact
disk before actually writing bits to spinning rust. Isilon clusters place particular demands on power and cable management, behaving most reliably

If the cluster drops below quorum, defined as a simple majority of nodes, OneFS places itself into read-only mode.
Back-end Network Stuart Kendrick 2019-03-07
when nodes are continuously powered and sophisticated cable management strategies are used to
support and route cabling.

You might also like