|
| 1 | +# MVE-ACI Example |
| 2 | + |
| 3 | + |
| 4 | +This example implements an alpha-blending algorithm that are widely used in 2D image processing—**image-copying-with-an-alpha-mask**—as a case study. We provide three versions of the same algorithm: a pure C implementation, a Helium-accelerated version, and a Helium-ACI-accelerated version. The copied output is displayed on an LCD panel simulated by the FVP, allowing users to visually compare and inspect the effects of these three implementations. Additionally, the `__cycleof__()` function is used to measure the CPU cycle count for each version. |
| 5 | + |
| 6 | +In the [White-Paper Innovate by Customized Instructions, but Without Fragmenting the Ecosystem](https://armkeil.blob.core.windows.net/developer/Files/pdf/white-paper/arm-custom-instructions-without-fragmentation-whitepaper.pdf), data shows that the MVE-ACI accelerated alpha-blending algorithm is 19x of the scalar implementation and 5.9x of the Helium accelerated implementation. |
| 7 | + |
| 8 | +In the chip design, the functional verification of hardware logic—especially newly added ACI instructions—relies on dedicated **test benches**. These test benches use C-based test cases tailored for specific functionalities. |
| 9 | + |
| 10 | +To facilitate development and validation of ACI-related firmware alongside hardware development, we provide a software environment in the **test** folder that allows direct execution of test bench test cases. |
| 11 | + |
| 12 | +> [!CAUTION] |
| 13 | +> |
| 14 | +> It is important to note that **FVP is not a cycle-accurate simulation model**. While `__cycleof__()` relies on system hardware counters such as SysTick or PMU for cycle measurement, these counters themselves are software-simulated in FVP and do not guarantee cycle accuracy. Therefore, the performance measurements obtained during simulation should be considered only as **rough estimates** rather than definitive benchmarks. |
| 15 | +
|
| 16 | + |
| 17 | + |
| 18 | +## Steps to Custom Instruction |
| 19 | + |
| 20 | +The following steps describe how to implement the `popc_u32` function as custom instruction. Each step is explained in more detail and in a related directory. |
| 21 | + |
| 22 | +| Step | Description | Directory | |
| 23 | +| :----: | :----------------------------------------------------------- | :-------- | |
| 24 | +| **1.** | [Map custom instructions](#map-custom-instructions) to Custom Datapath Extension (CDE) | inc | |
| 25 | +| **2.** | [Create plugin for AVH-FVP simulation models](#create-avh-fvp-plugin) that adds custom instructions | plugin | |
| 26 | +| **3.** | [Create test code to validate](#create-test-code) the correctness of the AVH-FVP simulation | test | |
| 27 | +| **4.** | [Use custom instructions](#use-custom-instructions) in your algorithm to estimate performance gains | example | |
| 28 | + |
| 29 | +The steps for creating the processor hardware are not described here, but the test code created in step 3 can be reused also for hardware verification. |
| 30 | + |
| 31 | +### Map Custom Instructions |
| 32 | + |
| 33 | +The include file `./inc/aci_gpr_lib.h` contains the ACI mapping for the `popc_u32` instruction. In this example, the `CX1A` intrinsic function is used with `ID=0` and `imm=0`. Further instructions may be defined with a different `imm` value. |
| 34 | + |
| 35 | +The header file also defines the functions: |
| 36 | + |
| 37 | +- `aci_gpr_init` to enable the related ACI accelerator. |
| 38 | +- `aci_gpr_NS_access` which is called in secure mode to enable access in non-secure mode. |
| 39 | + |
| 40 | +### Create AVH-FVP Plugin |
| 41 | + |
| 42 | +The simulation of the CX1A instruction is implemented in the module `./plugin/cde_plugin.cpp` with the function `aci_fvp::exec_cx1`. For `imm=0` the a simulation code for popc_u32 is called. |
| 43 | + |
| 44 | +### Create Test Code |
| 45 | + |
| 46 | +The test code verifies the execution of the `pop_u32` instruction. This test may be reused later also for validation of the hardware implementation. |
| 47 | + |
| 48 | +### Use Custom Instructions |
| 49 | + |
| 50 | +To use the custom instruction, just all the function `popc_u32` that is defined in the include file `./inc/aci_gpr_lib.h`. |
| 51 | + |
| 52 | +------ |
| 53 | + |
| 54 | + |
| 55 | + |
1 | 56 |
|
2 | 57 |
|
3 | 58 | # Get Started with MVE-ACI using Fast Model |
@@ -48,16 +103,6 @@ The AN552 implements a Helium-ACI instruction set for RGB565 image processing; h |
48 | 103 | | plugin | the CDE plugin makefile project | |
49 | 104 | | test | test project | |
50 | 105 |
|
51 | | -In the **example** folder, we use a typical algorithm in graphics processing—**image-copying-with-an-alpha-mask**—as a case study. We provide three versions of the same algorithm: a pure C implementation, a Helium-accelerated version, and a Helium-ACI-accelerated version. The copied output is displayed on an LCD panel simulated by the FVP, allowing users to visually compare and inspect the effects of these three implementations. Additionally, the `__cycleof__()` function is used to measure the CPU cycle count for each version. |
52 | | - |
53 | | -In the chip design, the functional verification of hardware logic—especially newly added ACI instructions—relies on dedicated **test benches**. These test benches use C-based test cases tailored for specific functionalities. |
54 | | - |
55 | | -To facilitate development and validation of ACI-related firmware alongside hardware development, we provide a software environment in the **test** folder that allows direct execution of test bench test cases. |
56 | | - |
57 | | -> [!CAUTION] |
58 | | -> |
59 | | -> It is important to note that **FVP is not a cycle-accurate simulation model**. While `__cycleof__()` relies on system hardware counters such as SysTick or PMU for cycle measurement, these counters themselves are software-simulated in FVP and do not guarantee cycle accuracy. Therefore, the performance measurements obtained during simulation should be considered only as **rough estimates** rather than definitive benchmarks. |
60 | | -
|
61 | 106 |
|
62 | 107 |
|
63 | 108 | ## 1 Prepare the Environment |
|
0 commit comments