Pepperdata provides Guarantee SLAs in Multi-tenant Hadoop cluster

Raja D.

Published May 27, 2015

Rely on Hadoop to guarantee service levels in mixed workload and multi-tenant production clusters. Pepperdata installs in less than 30 minutes on your existing Hadoop cluster without any modifications to the scheduler, workflow, or jobs.

Diagnose problems faster. Pepperdata gives you a both a macro and granular view of everything that’s happening across the cluster by monitoring the use of CPU, memory, disk I/O, and network for every job and task, by user or group in real time. This detailed telemetry data is captured second by second, and is saved so that you can analyze performance variations and anomalies over time.

Guarantee SLAs. Pepperdata senses contention for CPU, memory, disk I/O, and network at run time and will automatically slow down low-priority tasks when needed to ensure that high-priority SLAs are maintained.

Increase cluster throughput by 30-70%. In many cases, jobs will run much faster. Pepperdata knows the true hardware resource capacity of your cluster and dynamically allows more tasks to run on servers that have free resources at any given moment.

Visibility

Diagnose problems faster. Pepperdata gives you both a macro and granular view of everything that’s happening across the cluster by monitoring the use of CPU, memory, disk I/O, and network for every job and task, by user or group, in real time. This detailed telemetry data is captured second by second, and is saved so that you can analyze performance variations and anomalies over time.

Control

Make Hadoop run the way you want it to and guarantee SLAs. Pepperdata senses contention for CPU, memory, disk I/O, and network at runtime and automatically slows down low-priority tasks when needed to ensure that high-priority jobs are completed on time.

Enterprises deploying Hadoop benefit from its ability to scale across thousands of servers and offer unprecedented insight into business operations. Hadoop’s downside today is that it lacks predictability. Hadoop does not allow enterprises to ensure that the most important jobs complete on time, and it does not effectively use a cluster’s full capacity. As a result, companies deploying Hadoop are often forced to create separate clusters for different workloads and to overprovision those clusters, which results in lower ROI on their Hadoop investments and significantly higher recurring operational costs.

Guarantee on-time execution of critical jobs
Enterprises depend on production jobs executing reliably, but Hadoop jobs are notoriously unpredictable, especially in the world of multi-tenancy. A single poorly-behaving job can unexpectedly consume much more than its fair share of network or disk, causing critical jobs to miss their SLAs and leaving the operations team scrambling. Pepperdata constantly monitors the use of all computing resources by every task on the cluster and takes automated action to ensure that each job, queue, user, or group is given the resources specified in the cluster policies. It allows administrators to set specific SLA goals, monitor adherence to those goals during execution, and, if needed, reclaim manual control to reassign resources.

Pepperdata enables administrators to prioritize the completion of business-critical production jobs over ad hoc jobs in multi-tenant clusters with diverse jobs, for example by deploying a policy that limits the amount of bandwidth that low-priority jobs can use when that bandwidth is needed by other jobs.

Capacity

Run more jobs in less time on your existing hardware. Pepperdata can improve your cluster throughput by 30-70% by measuring the true hardware resource capacity of your cluster and dynamically allowing more tasks to run on servers that have free resources at any given moment.

Reclaim wasted capacity
On ad hoc Hadoop clusters, capacity is typically limited, and there is little, if any, ability to accommodate unexpected but high-priority jobs.

Meanwhile, on production Hadoop clusters, usage can be bursty and inefficient. These servers may have as much wasted capacity as used capacity. Pepperdata addresses these challenges, making job execution predictable while reclaiming this unused capacity.

Instantly increase effective cluster capacity and throughput
Pepperdata monitors and controls CPU, RAM, disk, network, and HDFS per task, job, user, and virtual cluster. Proprietary, patented algorithms identify critical resources and contention across jobs and reallocate jobs accordingly, thereby generating between 30% to 70% more usable capacity that would otherwise be wasted.

Pepperdata allows you to run your ad hoc and production jobs on a combined cluster, and then unlocks extra capacity on that combined cluster.

Pepperdata Technology

Pepperdata software is easy to install and runs on top of all Hadoop distributions without modifying the existing scheduler, workflow, and job submission process.

Architecture
Pepperdata is composed of two main components.

Pepperdata Supervisor runs on the JobTracker/Resource Manager node and communicates with agents that run on every data node in the cluster. It collects over 200 metrics associated with the consumption of CPU, memory, disk I/O, and network resources by task, job, user and group on a second-by-second basis and dynamically optimizes the usage of those resources. It enables administrators to implement, continuously measure, and improve the service-level policies that guarantee the completion of high-priority jobs while maintaining the cluster at peak performance.

Pepperdata Dashboard communicates with Pepperdata Agents and renders interactive, real-time visualizations and reports on hardware usage with user-level, job-level, and developer views.

Key Features and Capabilities

Cluster configuration policies file – Hadoop administrators can specify how much cluster hardware to guarantee to specific users, groups, or jobs. Pepperdata’s optimization ensures that high-priority workloads get the hardware resources they need, while dynamically making any remaining capacity available to other jobs.
Charge-back reports – Administrators can accurately allocate hardware expenses by measuring cluster resource consumption at the user, group, and job level over any time period.
HBase protection – HBase jobs can be safely run side-by-side with MapReduce and other types of jobs on the same cluster.
Near-zero overhead – Pepperdata agents consume just 1-2% of a single core, out of the 8 to 24 cores on a typical Hadoop server.
Installs on any Hadoop cluster – Pepperdata runs on clusters using any standard distribution, including Apache, Cloudera, Hortonworks, IBM, and MapR. Pepperdata works with both classic Hadoop (Hadoop 1) and YARN (Hadoop 2). Pepperdata supports clusters running on either physical nodes or virtual machines.
Complements schedulers – Pepperdata works with all popular schedulers (capacity scheduler, fair scheduler, etc.) without modification to workflows, job code, or existing cluster tuning parameters.
Complements YARN – The YARN ResourceManager allows a more diverse range of job types to be scheduled and launched on the cluster. Once those jobs start, Pepperdata’s optimization ensures they complete safely and on time.
Multi-cluster support – Multiple clusters can be monitored in a single dashboard

Pepperdata provides Guarantee SLAs in Multi-tenant Hadoop cluster

Raja D.

Visibility

Control

Capacity

Pepperdata Technology

More articles by this author

Insights from the community

Others also viewed

Big Data: The Top 10 Commercial Hadoop Platforms

What is the future of Hadoop?

Billion Dollar Unicorns: Hortonworks Needs To Deliver Results

Data Lake & Hadoop : How can they power your Analytics?

Hadoop: Revolutionizing Big Data Management

How Enterprise Data Observability gives Hadoop an After Life

Innovate faster by migrating from Hadoop to Azure Databricks

Hadoop Market All Set To Grow At CAGR 37.3%, Market Value To Reach USD 851.4 billion By 2030

HADOOP: "How to share Limited Storage of Datanode to the Namenode in Hadoop Distributed Storage Cluster?"

Navigating the Hadoop Ecosystem: A Hands-On Guide

Explore topics

Visibility

Control

Capacity

Pepperdata Technology

Big Data Versus Data Warehouse

Oct 29, 2015

Hybrid Cloud

Oct 27, 2015

EPIC BlueData

May 27, 2015

Hadoop Compression

May 25, 2015

New in CDH 5.4: Hot-Swapping of HDFS DataNode Drives

May 25, 2015

Do's and don'ts of planning your Hadoop infrastructure by Hortonworks

May 23, 2015

Insights from the community

Others also viewed

Big Data: The Top 10 Commercial Hadoop Platforms

What is the future of Hadoop?

Billion Dollar Unicorns: Hortonworks Needs To Deliver Results

Data Lake & Hadoop : How can they power your Analytics?

Hadoop: Revolutionizing Big Data Management

How Enterprise Data Observability gives Hadoop an After Life

Innovate faster by migrating from Hadoop to Azure Databricks

Hadoop Market All Set To Grow At CAGR 37.3%, Market Value To Reach USD 851.4 billion By 2030

HADOOP: "How to share Limited Storage of Datanode to the Namenode in Hadoop Distributed Storage Cluster?"

Navigating the Hadoop Ecosystem: A Hands-On Guide

Explore topics