Skip to content

Commit fbea718

Browse files
committed
minor update
1 parent c5d8140 commit fbea718

File tree

2 files changed

+65
-21
lines changed

2 files changed

+65
-21
lines changed

GPR/README.md

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,16 @@ uint32_t popc_u32 (uint32_t x) {
2525
The [White-Paper Arm Custom Instructions: Enabling
2626
Innovation and Greater Flexibility on Arm](https://armkeil.blob.core.windows.net/developer/Files/pdf/white-paper/arm-custom-instructions-wp.pdf) uses this example and shows a reduction of the execution time from 25 cycles to just 1 cycle using ACI.
2727

28+
In the chip design, the functional verification of hardware logic—especially newly added ACI instructions—relies on dedicated **test benches**. These test benches use C-based test cases tailored for specific functionalities.
29+
30+
To facilitate development and validation of ACI-related firmware alongside hardware development, we provide a software environment in the **test** folder that allows direct execution of test bench test cases.
31+
32+
> [!CAUTION]
33+
>
34+
> It is important to note that **FVP is not a cycle-accurate simulation model**. While `__cycleof__()` relies on system hardware counters such as SysTick or PMU for cycle measurement, these counters themselves are software-simulated in FVP and do not guarantee cycle accuracy. Therefore, the performance measurements obtained during simulation should be considered only as **rough estimates** rather than definitive benchmarks.
35+
36+
37+
2838
## Steps to Custom Instruction
2939

3040
The following steps describe how to implement the `popc_u32` function as custom instruction. Each step is explained in more detail and in a related directory.
@@ -107,17 +117,6 @@ To simulate the ACI instruction set, **ACI-GetStarted** contains a makefile proj
107117
| plugin | the CDE plugin makefile project |
108118
| test | test project |
109119

110-
In the **example** folder, we implemented a Hamming distance computation function in C and provided an ACI-accelerated version for comparison. During this process, we use `__cycleof__()` to measure the CPU cycle count for both algorithms.
111-
112-
In the chip design, the functional verification of hardware logic—especially newly added ACI instructions—relies on dedicated **test benches**. These test benches use C-based test cases tailored for specific functionalities.
113-
114-
To facilitate development and validation of ACI-related firmware alongside hardware development, we provide a software environment in the **test** folder that allows direct execution of test bench test cases.
115-
116-
> [!CAUTION]
117-
>
118-
> It is important to note that **FVP is not a cycle-accurate simulation model**. While `__cycleof__()` relies on system hardware counters such as SysTick or PMU for cycle measurement, these counters themselves are software-simulated in FVP and do not guarantee cycle accuracy. Therefore, the performance measurements obtained during simulation should be considered only as **rough estimates** rather than definitive benchmarks.
119-
120-
121120

122121
## 1 Prepare the Environment
123122

MVE/README.md

Lines changed: 55 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,58 @@
1+
# MVE-ACI Example
2+
3+
4+
This example implements an alpha-blending algorithm that are widely used in 2D image processing—**image-copying-with-an-alpha-mask**—as a case study. We provide three versions of the same algorithm: a pure C implementation, a Helium-accelerated version, and a Helium-ACI-accelerated version. The copied output is displayed on an LCD panel simulated by the FVP, allowing users to visually compare and inspect the effects of these three implementations. Additionally, the `__cycleof__()` function is used to measure the CPU cycle count for each version.
5+
6+
In the [White-Paper Innovate by Customized Instructions, but Without Fragmenting the Ecosystem](https://armkeil.blob.core.windows.net/developer/Files/pdf/white-paper/arm-custom-instructions-without-fragmentation-whitepaper.pdf), data shows that the MVE-ACI accelerated alpha-blending algorithm is 19x of the scalar implementation and 5.9x of the Helium accelerated implementation.
7+
8+
In the chip design, the functional verification of hardware logic—especially newly added ACI instructions—relies on dedicated **test benches**. These test benches use C-based test cases tailored for specific functionalities.
9+
10+
To facilitate development and validation of ACI-related firmware alongside hardware development, we provide a software environment in the **test** folder that allows direct execution of test bench test cases.
11+
12+
> [!CAUTION]
13+
>
14+
> It is important to note that **FVP is not a cycle-accurate simulation model**. While `__cycleof__()` relies on system hardware counters such as SysTick or PMU for cycle measurement, these counters themselves are software-simulated in FVP and do not guarantee cycle accuracy. Therefore, the performance measurements obtained during simulation should be considered only as **rough estimates** rather than definitive benchmarks.
15+
16+
17+
18+
## Steps to Custom Instruction
19+
20+
The following steps describe how to implement the `popc_u32` function as custom instruction. Each step is explained in more detail and in a related directory.
21+
22+
| Step | Description | Directory |
23+
| :----: | :----------------------------------------------------------- | :-------- |
24+
| **1.** | [Map custom instructions](#map-custom-instructions) to Custom Datapath Extension (CDE) | inc |
25+
| **2.** | [Create plugin for AVH-FVP simulation models](#create-avh-fvp-plugin) that adds custom instructions | plugin |
26+
| **3.** | [Create test code to validate](#create-test-code) the correctness of the AVH-FVP simulation | test |
27+
| **4.** | [Use custom instructions](#use-custom-instructions) in your algorithm to estimate performance gains | example |
28+
29+
The steps for creating the processor hardware are not described here, but the test code created in step 3 can be reused also for hardware verification.
30+
31+
### Map Custom Instructions
32+
33+
The include file `./inc/aci_gpr_lib.h` contains the ACI mapping for the `popc_u32` instruction. In this example, the `CX1A` intrinsic function is used with `ID=0` and `imm=0`. Further instructions may be defined with a different `imm` value.
34+
35+
The header file also defines the functions:
36+
37+
- `aci_gpr_init` to enable the related ACI accelerator.
38+
- `aci_gpr_NS_access` which is called in secure mode to enable access in non-secure mode.
39+
40+
### Create AVH-FVP Plugin
41+
42+
The simulation of the CX1A instruction is implemented in the module `./plugin/cde_plugin.cpp` with the function `aci_fvp::exec_cx1`. For `imm=0` the a simulation code for popc_u32 is called.
43+
44+
### Create Test Code
45+
46+
The test code verifies the execution of the `pop_u32` instruction. This test may be reused later also for validation of the hardware implementation.
47+
48+
### Use Custom Instructions
49+
50+
To use the custom instruction, just all the function `popc_u32` that is defined in the include file `./inc/aci_gpr_lib.h`.
51+
52+
------
53+
54+
55+
156

257

358
# Get Started with MVE-ACI using Fast Model
@@ -48,16 +103,6 @@ The AN552 implements a Helium-ACI instruction set for RGB565 image processing; h
48103
| plugin | the CDE plugin makefile project |
49104
| test | test project |
50105

51-
In the **example** folder, we use a typical algorithm in graphics processing—**image-copying-with-an-alpha-mask**—as a case study. We provide three versions of the same algorithm: a pure C implementation, a Helium-accelerated version, and a Helium-ACI-accelerated version. The copied output is displayed on an LCD panel simulated by the FVP, allowing users to visually compare and inspect the effects of these three implementations. Additionally, the `__cycleof__()` function is used to measure the CPU cycle count for each version.
52-
53-
In the chip design, the functional verification of hardware logic—especially newly added ACI instructions—relies on dedicated **test benches**. These test benches use C-based test cases tailored for specific functionalities.
54-
55-
To facilitate development and validation of ACI-related firmware alongside hardware development, we provide a software environment in the **test** folder that allows direct execution of test bench test cases.
56-
57-
> [!CAUTION]
58-
>
59-
> It is important to note that **FVP is not a cycle-accurate simulation model**. While `__cycleof__()` relies on system hardware counters such as SysTick or PMU for cycle measurement, these counters themselves are software-simulated in FVP and do not guarantee cycle accuracy. Therefore, the performance measurements obtained during simulation should be considered only as **rough estimates** rather than definitive benchmarks.
60-
61106

62107

63108
## 1 Prepare the Environment

0 commit comments

Comments
 (0)