Skip to content

Conversation

@lesteral
Copy link

@jeremybennett - Here's PR to update the "master" branch as per our discussion yesterday. Regards, Lester

I-mikan-I added 30 commits March 4, 2024 12:36
Remove minver;
Remove st;
Update SPDX Identifiers
Setup toolchain (STM32CubeF4 + gcc based).
Update python scripts.
Update example README.
ZSusskind and others added 19 commits May 1, 2024 11:17
This change removes all floating-point operations from the benchmark,
     and reduces the size of the x86 executable to 57k. It also enables
     the use of deeper trees (max_depth increased from 4 to 5), which
     slightly increases the complexity of the benchmark. Overall
     accuracy on the 8x8 downscaled MNIST dataset is 95.82%.
Update xgboost benchmark to use uint8-quantized weights
If we call exit, we end up pulling in the C standard library.

	* support/beebsc.c: Use assert_beebs rather than assert with
	init_heap_beebs.
	* support/beebsc.h: rewrite assert_beebs to not use exit.

Signed-off-by: Jeremy Bennett <[email protected]>
We separate out the CPU_MHZ into its two roles.  The first uses
GLOBAL_SCALE_FACTOR to scale the benchmarks when building so each runs in
around 4 seconds.  The second is to work out the Embench score per MHz.

We now scale the benchmarks, with two nested loops, one for the
LOCAL_SCALE_FACTOR and one for the GLOBAL_SCALE_FACTOR.  This allows us to
not overflow the loop count with 8/16-bit architectures, while being able
to scale up to modern big fast machines.

We adjust LOCAL_SCALE_FACTOR values for the benchmarks kept from Embench
IoT 1.0 to take account of improvements in compiler performance.

	* baseline-data/speed.json: Updated for Embench 2.0.
	* benchmark_speed.py: Script updated for new GLOBAL_SCALE_FACTOR;
	remove parallel execution; new options to generate MD and CSV
	output.f; generate total and per MHz scores for relative results.
	* doc/README.md: Updated to document GLOBAL_SCALE_FACTOR.
	* examples/arm/stm32f4-discovery/README.md: Updated to use
	GLOBAL_SCALE_FACTOR.
	* pylib/embench_core.py: Add MD and CSV to class output_format;
	move stats output functions to benchmark_speed.py.
	* pylib/run_stm32f4-discovery.py: Move --cpu_mhz to
	benchmark_speed.py, pass args to functions.
	* sconstruct.py: Add --gsf option and help test, remove trailing
	whitespace.
	* src/aha-mont64/mont64.c: Use LOCAL_SCALE_FACTOR and
	GLOBAL_SCALE_FACTOR in nested loop to scale performance.
	* src/crc32/crc_32.c: Likewise.
	* src/depthconv/depthconv.c: Likewise.
	* src/edn/libedn.c: Likewise.
	* src/huffbench/libhuffbench.c: Likewise.
	* src/matmult-int/matmult-int.c: Likewise.
	* src/md5sum/md5.c: Likewise.
	* src/nettle-aes/nettle-aes.c: Likewise.
	* src/nettle-sha256/nettle-sha256.c: Likewise.
	* src/nsichneu/libnsichneu.c: Likewise.
	* src/picojpeg/picojpeg_test.c: Likewise.
	* src/qrduino/qrtest.c: Likewise.
	* src/sglib-combined/combined.c: Likewise.
	* src/slre/libslre.c: Likewise.
	* src/statemate/libstatemate.c: Likewise.
	* src/tarfind/tarfind.c: Likewise.
	* src/ud/libud.c: Likewise.
	* src/wikisort/libwikisort.c: Likewise.
	* src/xgboost/testbench.c: Likewise.

Signed-off-by: Jeremy Bennett <[email protected]>
	* sconstruct.py: Set up the environment from the parent process.

Signed-off-by: Jeremy Bennett <[email protected]>
The previous data, fell foul of the scons config not importing the
environment, so in fact was with system GCC 13.2.  This correctly has data
for GCC 14.1, and adjusts local scale factors accordingly.

	* baseline-data/speed.json: Updated data for GCC 14.1.
	* src/aha-mont64/mont64.c: Adjust LOCAL_SCALE_FACTOR.
	* src/edn/libedn.c: Likewise.
	* src/huffbench/libhuffbench.c: Likewise.
	* src/matmult-int/matmult-int.c: Likewise.
	* src/md5sum/md5.c: Likewise.
	* src/nettle-aes/nettle-aes.c: Likewise.
	* src/nettle-sha256/nettle-sha256.c: Likewise.
	* src/sglib-combined/combined.c: Likewise.
	* src/sglib-combined/sglib.h: Likewise, also replace assert by
	assert_beebs throughout.
	* src/slre/libslre.c: Adjust LOCAL_SCALE_FACTOR.
	* src/statemate/libstatemate.c: Likewise.
	* src/tarfind/tarfind.c: Likewise.
	* src/ud/libud.c: Likewise.
	* src/wikisort/libwikisort.c: Likewise.

Signed-off-by: Jeremy Bennett <[email protected]>
	* baseline-data/size.json: Updated values for Embench 2.0
	* benchmark_size.py: Extend to measure BSS separately, add CSV and
	MarkDown output formats, generate statistics for relative runs.

Signed-off-by: Jeremy Bennett <[email protected]>
	* benchmark_speed.py (benchmark_speed): Ensure res is set before
	use.
	* pylib/run_stm32f4-discovery.py: Add dictionary of exported
	functions.

Signed-off-by: Jeremy Bennett <[email protected]>
We have updated the defaults, to be based on using garbage collection of
unused sections. The baseline data for speed is from a run configured
with:

  scons --config-dir=examples/arm/stm32f4-discovery/ \
    cc=arm-none-eabi-gcc \
    cflags='-O2 -mcpu=cortex-m4 -mthumb -mfloat-abi=soft -ffunction-sections -fdata-sections' \
    ldflags='-O2 -Wl,--gc-sections -mcpu=cortex-m4 -mthumb -mfloat-abi=soft -T${CONFIG_DIR}/STM32F407IGHX_FLASH.ld -L${CONFIG_DIR} -static -nostartfiles' \
    user_libs='m startup' gsf=16

with results collected using:

  ./benchmark_speed.py --target-module run_stm32f4-discovery \
    --gdb-command gdb-multiarch --cpu-mhz 16 --gsf 16 --absolute \
    --baseline-output

The baseline for size is from a run configured with:

  scons --config-dir=examples/arm/stm32f4-discovery/ cc=arm-none-eabi-gcc \
    cflags='-Os -ffunction-sections -fdata-sections -mcpu=cortex-m4 -mfloat-abi=soft -mthumb '   \
    ldflags='-Os -Wl,--gc-sections -mcpu=cortex-m4 -mfloat-abi=soft -mthumb -T${CONFIG_DIR}/STM32F407IGHX_FLASH.ld -L${CONFIG_DIR} -static -nostartfiles' \
    user_libs='m startup' gsf=1

with results collected using:

  ./benchmark_size.py --absolute --baseline-output

	* baseline-data/size.json: Update data.
	* baseline-data/speed.json: Likewise.

Signed-off-by: Jeremy Bennett <[email protected]>
This is a read through to clarify wording, and ensure consistency for
Embench 2.0 and its Arm reference board.

	* README.md: Updated for Embench 2.0.
	* doc/Makefile: Correct spelling of hunspell dictionary
	* doc/README.md: Updated for Embench 2.0.
	* doc/custom.wordlist: Add new words needed for updated documentation.
	* examples/arm/stm32f4-discovery/README.md: Updated for Embench 2.0.

Signed-off-by: Jeremy Bennett <[email protected]>
	* examples/riscv32/cv32e40pv2fpga/README.md: Created.
	* examples/riscv32/cv32e40pv2fpga/boardsupport.c: Created.
	* examples/riscv32/cv32e40pv2fpga/boardsupport.h: Created.
	* examples/riscv32/cv32e40pv2fpga/link.ld: Created.
	* examples/riscv32/cv32e40pv2fpga/openocd-nexys-hs2.cfg: Created.
	* examples/riscv32/cv32e40pv2fpga/unilink.ld: Created.

Signed-off-by: Jeremy Bennett <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants