-
Notifications
You must be signed in to change notification settings - Fork 128
update master branch from embench-2.0-branch #210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
lesteral
wants to merge
49
commits into
embench:master
Choose a base branch
from
lesteral:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This change removes all floating-point operations from the benchmark,
and reduces the size of the x86 executable to 57k. It also enables
the use of deeper trees (max_depth increased from 4 to 5), which
slightly increases the complexity of the benchmark. Overall
accuracy on the 8x8 downscaled MNIST dataset is 95.82%.
Update xgboost benchmark to use uint8-quantized weights
If we call exit, we end up pulling in the C standard library. * support/beebsc.c: Use assert_beebs rather than assert with init_heap_beebs. * support/beebsc.h: rewrite assert_beebs to not use exit. Signed-off-by: Jeremy Bennett <[email protected]>
We separate out the CPU_MHZ into its two roles. The first uses GLOBAL_SCALE_FACTOR to scale the benchmarks when building so each runs in around 4 seconds. The second is to work out the Embench score per MHz. We now scale the benchmarks, with two nested loops, one for the LOCAL_SCALE_FACTOR and one for the GLOBAL_SCALE_FACTOR. This allows us to not overflow the loop count with 8/16-bit architectures, while being able to scale up to modern big fast machines. We adjust LOCAL_SCALE_FACTOR values for the benchmarks kept from Embench IoT 1.0 to take account of improvements in compiler performance. * baseline-data/speed.json: Updated for Embench 2.0. * benchmark_speed.py: Script updated for new GLOBAL_SCALE_FACTOR; remove parallel execution; new options to generate MD and CSV output.f; generate total and per MHz scores for relative results. * doc/README.md: Updated to document GLOBAL_SCALE_FACTOR. * examples/arm/stm32f4-discovery/README.md: Updated to use GLOBAL_SCALE_FACTOR. * pylib/embench_core.py: Add MD and CSV to class output_format; move stats output functions to benchmark_speed.py. * pylib/run_stm32f4-discovery.py: Move --cpu_mhz to benchmark_speed.py, pass args to functions. * sconstruct.py: Add --gsf option and help test, remove trailing whitespace. * src/aha-mont64/mont64.c: Use LOCAL_SCALE_FACTOR and GLOBAL_SCALE_FACTOR in nested loop to scale performance. * src/crc32/crc_32.c: Likewise. * src/depthconv/depthconv.c: Likewise. * src/edn/libedn.c: Likewise. * src/huffbench/libhuffbench.c: Likewise. * src/matmult-int/matmult-int.c: Likewise. * src/md5sum/md5.c: Likewise. * src/nettle-aes/nettle-aes.c: Likewise. * src/nettle-sha256/nettle-sha256.c: Likewise. * src/nsichneu/libnsichneu.c: Likewise. * src/picojpeg/picojpeg_test.c: Likewise. * src/qrduino/qrtest.c: Likewise. * src/sglib-combined/combined.c: Likewise. * src/slre/libslre.c: Likewise. * src/statemate/libstatemate.c: Likewise. * src/tarfind/tarfind.c: Likewise. * src/ud/libud.c: Likewise. * src/wikisort/libwikisort.c: Likewise. * src/xgboost/testbench.c: Likewise. Signed-off-by: Jeremy Bennett <[email protected]>
* sconstruct.py: Set up the environment from the parent process. Signed-off-by: Jeremy Bennett <[email protected]>
The previous data, fell foul of the scons config not importing the environment, so in fact was with system GCC 13.2. This correctly has data for GCC 14.1, and adjusts local scale factors accordingly. * baseline-data/speed.json: Updated data for GCC 14.1. * src/aha-mont64/mont64.c: Adjust LOCAL_SCALE_FACTOR. * src/edn/libedn.c: Likewise. * src/huffbench/libhuffbench.c: Likewise. * src/matmult-int/matmult-int.c: Likewise. * src/md5sum/md5.c: Likewise. * src/nettle-aes/nettle-aes.c: Likewise. * src/nettle-sha256/nettle-sha256.c: Likewise. * src/sglib-combined/combined.c: Likewise. * src/sglib-combined/sglib.h: Likewise, also replace assert by assert_beebs throughout. * src/slre/libslre.c: Adjust LOCAL_SCALE_FACTOR. * src/statemate/libstatemate.c: Likewise. * src/tarfind/tarfind.c: Likewise. * src/ud/libud.c: Likewise. * src/wikisort/libwikisort.c: Likewise. Signed-off-by: Jeremy Bennett <[email protected]>
* baseline-data/size.json: Updated values for Embench 2.0 * benchmark_size.py: Extend to measure BSS separately, add CSV and MarkDown output formats, generate statistics for relative runs. Signed-off-by: Jeremy Bennett <[email protected]>
* benchmark_speed.py (benchmark_speed): Ensure res is set before use. * pylib/run_stm32f4-discovery.py: Add dictionary of exported functions. Signed-off-by: Jeremy Bennett <[email protected]>
We have updated the defaults, to be based on using garbage collection of
unused sections. The baseline data for speed is from a run configured
with:
scons --config-dir=examples/arm/stm32f4-discovery/ \
cc=arm-none-eabi-gcc \
cflags='-O2 -mcpu=cortex-m4 -mthumb -mfloat-abi=soft -ffunction-sections -fdata-sections' \
ldflags='-O2 -Wl,--gc-sections -mcpu=cortex-m4 -mthumb -mfloat-abi=soft -T${CONFIG_DIR}/STM32F407IGHX_FLASH.ld -L${CONFIG_DIR} -static -nostartfiles' \
user_libs='m startup' gsf=16
with results collected using:
./benchmark_speed.py --target-module run_stm32f4-discovery \
--gdb-command gdb-multiarch --cpu-mhz 16 --gsf 16 --absolute \
--baseline-output
The baseline for size is from a run configured with:
scons --config-dir=examples/arm/stm32f4-discovery/ cc=arm-none-eabi-gcc \
cflags='-Os -ffunction-sections -fdata-sections -mcpu=cortex-m4 -mfloat-abi=soft -mthumb ' \
ldflags='-Os -Wl,--gc-sections -mcpu=cortex-m4 -mfloat-abi=soft -mthumb -T${CONFIG_DIR}/STM32F407IGHX_FLASH.ld -L${CONFIG_DIR} -static -nostartfiles' \
user_libs='m startup' gsf=1
with results collected using:
./benchmark_size.py --absolute --baseline-output
* baseline-data/size.json: Update data.
* baseline-data/speed.json: Likewise.
Signed-off-by: Jeremy Bennett <[email protected]>
This is a read through to clarify wording, and ensure consistency for Embench 2.0 and its Arm reference board. * README.md: Updated for Embench 2.0. * doc/Makefile: Correct spelling of hunspell dictionary * doc/README.md: Updated for Embench 2.0. * doc/custom.wordlist: Add new words needed for updated documentation. * examples/arm/stm32f4-discovery/README.md: Updated for Embench 2.0. Signed-off-by: Jeremy Bennett <[email protected]>
* examples/riscv32/cv32e40pv2fpga/README.md: Created. * examples/riscv32/cv32e40pv2fpga/boardsupport.c: Created. * examples/riscv32/cv32e40pv2fpga/boardsupport.h: Created. * examples/riscv32/cv32e40pv2fpga/link.ld: Created. * examples/riscv32/cv32e40pv2fpga/openocd-nexys-hs2.cfg: Created. * examples/riscv32/cv32e40pv2fpga/unilink.ld: Created. Signed-off-by: Jeremy Bennett <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
@jeremybennett - Here's PR to update the "master" branch as per our discussion yesterday. Regards, Lester