Vela Compiler: The first step to deploy your NN model on the Arm Ethos-U microNPU

January 17, 2022

7 minute read time.

Arm Ethos-U microNPU

There is an explosion of edge and endpoint artificial intelligence (AI) in the world today. To address this wave of edge and endpoint AI devices, the microNPU, a new class of machine learning(ML) processor has been specifically designed by Arm to accelerate ML inference in area-constrained embedded and IoT devices. With Arm’s range of Ethos-U microNPUs, you can easily build low-cost, highly efficient AI solutions in a wide range of embedded devices and systems based on Arm Cortex and Arm Neoverse. Ethos-U provides a scalable range of performance and memory interfaces and integrates low-power Cortex-M SoCs, as well as SoCs based on high-performance Arm Cortex-A, Cortex-R, and Arm Neoverse.

Chip Diagram of Ethos-U55 Chip Diagram of Ethos-U65 in Cortex-M based system
Figure 1: Chip diagram of Arm Ethos-U MicroNPU

Vela overview

To deploy your neural network (NN) model on Ethos-U, the first step you need to do is use Vela to compile your prepared model. Vela is an open-source Python tool which can optimize a neural network model into a version that can run on an embedded system containing an Ethos-U NPU.

After compilation, the optimized model will contain TensorFlow Lite custom operators for those parts of the model that can be accelerated by the Ethos-U microNPU. Parts of the model that cannot be accelerated are left unchanged and will run on the CPU using an appropriate kernel.

Figure 2: Operators workflow

We constantly contribute to increase Vela’s support operators. To check the operator list that Vela currently supports, type the following command into your computer's CLI window (Such as Windows cmd or Linux terminal) to generate a report after you have installed Vela. The report is generated under your current working directory and named as the “SUPPORT_OPS.md”.

vela --supported-ops-report

However, you may be aware that in the report, some operators have constraints. If the constraints are not met, then that operator will be scheduled on the CPU. Note that under this situation, it doesn't mean the whole model cannot be run. Just some parts of your network cannot be run on Ethos-U. These unsupported parts run on the CPU instead. By using the following command, you can check which operators of your model falls back to the CPU.

vela network.tflite --show-cpu-operations

Generic workflow

Vela is the first and essential step to deploy your NN model on Arm Ethos-U microNPUs. In this blog, we show you the generic workflow of how to use Vela to compile your model.

Vela runs on the Linux, Mac OS, and Microsoft Windows 10 operating systems. You can easily install it from PyPi community by the following command, but you can also obtain the source code and see more advanced installation methods from Arm ML platform.

pip3 install ethos-u-vela

Please note that your computer should meet the prerequisites listed in the “Prerequisites” session before you kick off the installation process. For version details, you can check with the following command.

vela --version

The generic workflow is in the following diagram.

Figure 3: Generic workflow

1. Prepare your NN model

To be accelerated by the Ethos-U microNPU, your network operators must be quantized to either 8-bit or 16-bit (signed). Vela is run with an input .tflite file passed on the command line, which contains your optimized neural network.

You can prepare the .tflite initial model by the following two ways:

If you already have your own pre-trained models on hand,
there are many existing tools like TensorFlow Model Optimization toolkit that can help you obtain the well-quantized model. You can also follow our Optimization blog here to optimize your custom NN model.

If you do not have your models on hand, depending on your specific ML applications, Arm ML Zoo and TensorFlow Hub can offer you a wide variety of ML models in .tflite format. You can download and use them directly as your own model.

2. Prepare your Vela configuration file

Vela is a highly customizable offline compilation tool for the Ethos-U series. You can easily customize various properties of the Ethos-U embedded system, like memory latencies and bandwidths, by rewriting the Vela configuration file. But we strongly recommend you customize it as close as to the real hardware system which you plan to deploy your NN model on.

The format of Vela configuration file is a Python ConfigParser .ini file format. In the .ini file, it mainly consists of 2 sections, System Configuration and Memory Mode, used to identify a configuration, and key and value pair options used to specify the properties. Note that all sections and key/value pairs are case-sensitive.

Similarly, the following two ways can help you prepare your Vela configuration file.

Use default Vela configuration file
We offer you a default "vela.ini" file to describe some generic classes of embedded devices and systems. You can use it as your vela configuration file directly and current choices we offer in default "vela.ini" file can be seen in following table. For more detailed properties information of each choice, you can check the vela.ini file.

System Configuration and Memory Mode option
Option	Choices and description
System Configuration	Ethos_U55_Deep_Embedded: SRAM (1.6 GB/s) and Flash (0.1 GB/s) Ethos_U55_High_End_Embedded: SRAM (4 GB/s) and Flash (0.5 GB/s) Ethos_U65_Embedded: SRAM (8 GB/s) and Flash (0.5 GB/s) Ethos_U65_Mid_End: SRAM (8 GB/s) and DRAM (3.75 GB/s) Ethos_U65_High_End: SRAM (16 GB/s) and DRAM (3.75 GB/s) Ethos_U65_Client_Server: SRAM (16 GB/s) and DRAM (12 GB/s)
Memory Mode	Sram_Only: Model static data and tensor arena are placed into SRAM. The SRAM is shared between Ethos-U and Cortex-M. Shared_Sram: Model static data is read from flash and DRAM, tensor arena is placed in SRAM. The SRAM is shared between the Ethos-U and the Cortex-M. Dedicated_Sram: The SRAM (384KB) is only for use by the Ethos-U. This memory mode could be used in Ethos-U65 if SRAM is too small to store model data. All model data is placed in DRAM. Ethos-U has a dedicated SRAM carve out for caching. Dedicated_Sram_512KB: the SRAM (512KB) is only for use by the Ethos-U

_{Note: The choices shown are offered in vela 3.2.0 version and they are version-specific.}

Use custom Vela configuration file
If existing generic configuration choices in current vela version don't meet your hardware system requirements, you can write your custom Vela configuration file. Meanwhile, use the following command to specify the path to your custom Vela configuration file. Refer to the detailed writing instructions in the "Configuration File" session to complete your custom vela configuration file. Setting should be aligned with a driver programing the region configuration registers to control which AXI port to use for model data access (see Ethos-U programmers model for more details).
```
vela network.tflite --config your_vela_configuration_file.ini
```

3. Configure and run

Vela provides users with lots of command-line interfaces (CLI) to configure each specific calling process. The verbose and detailed description can be found in "Command Line Interface" session.

Among the numerous parameters, besides the “Network” which is required, it is essential and important to set the following key parameter options correctly to reflect the real hardware platform configuration. If you do not specify these parameters additionally, it runs under the internal default values which are version-specific. Refer to each version’s “Vela Options” documentation to find out the default value of each parameter.

Key Configure Parameter list
Key Parameter	Descriptions
Network (required)	Filename of the network model to compile. The file has to be a .tflite file.
Output Directory	Specifies the output directory of the optimized network model
Config	Specifies the path to the Vela configuration file.
Accelerator Configuration	Choose which hardware accelerator configuration to compile for. Format is accelerator name followed by a hyphen, followed by the number of MACs in the configuration.
System Config	Selects the system configuration to use as specified in the Vela configuration file.
Memory Mode	Selects the memory mode to use as specified in the Vela configuration file.
Optimize	Set the optimization strategy.

Currently, we offer the following choices for you to use as your hardware accelerator configuration and the optimization strategy.

Accelerator Configuration and Optimization strategy choice
Key Parameter	Choices
Accelerator Configuration	ethos-u55-32: Ethos-U55 with 32 MACs. ethos-u55-64: Ethos-U55 with 64 MACs. ethos-u55-128: Ethos-U55 with 128 MACs. ethos-u55-256: Ethos-U55 with 256 MACs. ethos-u65-256: Ethos-U65 with 256 MACs. ethos-u65-512: Ethos-U65 with 512 MACs.
Optimize	Size: The size strategy results in minimal SRAM usage (it does not use arena cache memory area size). Performance: The performance strategy results in maximal performance (it uses the arena cache memory area size if specified either with the CLI option of Vela configuration file).

One configuring and calling example can be seen as follows:

vela network.tflite \
–-output-dir ./output \
--accelerator-config ethos-u55-256 \  
--optimise Performance \
--config vela.ini \
--system-config Ethos-U55_High_End_Embedded \
--memory-mode Shared_Sram

After using the above command to call Vela seen previously, you will obtain the optimized output model under your specified directory "./output". The output file is in _vela.tflite format. Meanwhile, your computer’s console window will present a log of the Vela compilation process.

Sometimes you find some warnings appear in the log. Take a careful look at them. They will indicate to you the decisions that the compiler has made to create the optimized network, like which operators will be called back to CPU.

Try it out today

As your first step to deploy your NN model on Ethos-U, the Vela compiler is open-source and easy-use. Try it out today to experience the huge improvement in Machine Learning ability brought about by Arm’s Ethos-U to an embedded system.

[CTAToken URL = "https://review.mlplatform.org/plugins/gitiles/ml/ethos-u/ethos-u-vela" target="_blank" text="Access Ethos-u Vela" class ="green"]

AI and ML blog

Getting started with PyTorch, ExecuTorch, and Ethos-U85 in three easy steps

Robert Elliott

Get started in exploring ExecuTorch on Arm Ethos-U85 by checking out the code and tools Arm has released to seamlessly deploy AI models on IoT solutions built on Arm.
- October 24, 2024
Unleashing the Power of AI on Mobile: LLM Inference for Llama 3.2 Quantized Models with ExecuTorch and KleidiAI

Gian Marco Iodice

Learn how Arm and Meta have collaborated to enable AI developers to deploy quantized Llama models on Arm CPUs using ExecuTorch and KleidiAI.
- October 24, 2024
Real-time low light video enhancement using Neural Networks on mobile

Ayaan

In this blog post, we explore a neural network based solution for improving low light video and optimizing for mobile.
- September 10, 2024

AI and ML blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded blog

Graphics, Gaming, and VR blog

High Performance Computing (HPC) blog

Infrastructure Solutions blog

Internet of Things (IoT) blog

Operating Systems blog

SoC Design and Simulation blog

Tools, Software and IDEs blog