NPU IP for Embedded AI | Ceva-NeuPro-Nano

NPU IP family for TinyML applications in AIoT devices

Ceva-NeuPro-Nano is a highly efficient, self-sufficient Edge NPU designed for TinyML applications. It delivers the optimal balance of ultra-low power and the best performance in a small area to efficiently execute TinyML workloads across AIoT product categories, including Hearables, Wearables, Home Audio, Smart Home, Smart Factory, and more. Ranging from 10 GOPS up to 200 GOPS per core, Ceva-NeuPro-Nano is designed to enable always-on use-cases on battery-operated devices, integrating voice, vision, and sensing use cases across a wide array of end markets. Bringing the possibilities enabled by TinyML into a reality for low cost, energy efficient AIoT devices.

Ceva-NeuPro-Nano is not an AI accelerator and does not require a host CPU/DSP to operate. However, it includes all the processing elements of a standalone NPU, including code execution and memory management. The Ceva-NeuPro-Nano embedded AI NPU architecture is fully programmable and efficiently executes Neural Networks, feature extraction, control code and DSP code, and supports most advanced machine learning data types and operators including native transformer computation, sparsity acceleration and fast quantization. This optimized, self-sufficient architecture enables Ceva-NeuPro-Nano NPUs to deliver superior power efficiency, with a smaller silicon footprint, and optimal performance compared to the existing processor solutions used for TinyML workloads which utilize a combination of CPU or DSP with AI accelerator-based architectures.

Benefits

The Ceva-NeuPro-Nano NPU family is specially designed to bring the power of AI to the Internet of Things (IoT), through efficient deployment of Tiny Machine Learning (TinyML) models on low-power, resource-constrained devices.

Fully programmable and efficiently executes Neural Networks, feature extraction, control code and DSP code

Supports most advanced machine learning data types of 4,8,16, and 32-bit, and many operators including native transformer computation, fast quantization and sparsity support with weight decompression and acceleration

Self-sufficient architecture enables Ceva-NeuPro-Nano NPUs to deliver superior power efficiency, with a smaller silicon footprint, and optimal performance

Main Features

Fully programmable to efficiently execute Neural Networks, feature extraction, signal processing, audio and control code
Scalable performance by design to meet wide range of use cases with MAC configurations with up to 64 int8 (native 128 of 4x8) MACs per cycle
Future proof architecture that supports the most advanced ML data types and operators
- 4-bit to 32-bit integer support
- Native transformer computation
Ultimate ML performance for all use cases
- Acceleration of sparse model inference (sparsity acceleration)
- Acceleration of non-linear activation types
- Fast quantization – up to 5 times acceleration of internal re-quantizing tasks
Ultra-low memory requirements achieved with Ceva-NetSqueeze™
- Up to 80% memory footprint reduction through direct processing of compressed model weights without need for intermediate decompression stage
- Solves a key bottleneck inhibiting the broad adoption of AIoT processors today
Ultra-low energy achieved through innovative energy optimization
- Dynamic Voltage and Frequency Scaling support - tunable for the use-case
- Dramatic energy and bandwidth reduction by distilling computations using weight-sparsity acceleration
Complete, simple to use Ceva NeuPro-Studio AI SDK
- Optimized to work seamlessly with leading, open-source AI inference frameworks, such as TFLM and µTVM
- Model Zoo of pre-trained and optimized machine learning models covering TinyML voice, vision and sensing use cases.
- Comprehensive portfolio of optimized runtime libraries and off-the-shelf application-specific software
Two NPU configurations to address a wide variety of use cases
- Ceva-NPN32 with 32 4x8, 32 8x8, 16 16x8, 8 16x16, 4 32x32 MPY/MAC operations per cycle
- Ceva-NPN64 with 128 4x8, 64 8x8, 32 16x8, 16 16x16, 4 32x32 MPY/MAC operations per cycle and 2x performance acceleration using 50% weight sparsity (Sparsity Acceleration)

Product Note

Block Diagram

TechInsights report

Ceva NPU Core Targets TinyML Workloads

Ceva’s NeuPro-Nano licensable neural processing unit (NPU) targets processors that run TinyML workloads, offering up to 200 billion operations per second (GOPS) for power-constrained edge IoT devices.

Download Report

Resources and Downloads

NPU IP family for TinyML applications in AIoT devices

Benefits

Main Features

Block Diagram

TechInsights report

Resources and Downloads

Product Note

TechInsights report

Product Introduction

Related Products