Skip to content

Conversation

@domcharrier
Copy link
Collaborator

@domcharrier domcharrier commented Oct 28, 2021

FEATURES:

  • Initial support for acc declare

  • OpenACC (gpufort runtime)

    • offloaded loops:
      • add default clause handling
      • default strategy is present_or_copy if neither default(none) nor default(present) s specified.
    • Initial support for acc declare (module/subroutine/function/program variables, fixed-size, allocatable, pointer)
  • Add interoperable GPUFORT array datatype (up to 7 dimensions; autogenerated):

    • Manage host and device pointer pair (can be null)
      • Either wrap exisiting pointers or
      • Allocate (pinned) host memory requested
      • Allocate device memory if requested
    • Provide H2D, D2H copy operations
    • Encode bounds and sizes of an Fortran array
    • Can be configured to perform H2D, D2H copy operations at init/destruction
    • In C++, equipped with operator()(int i1, int i2, ...) to support Fortran style array indexing
      in C++ code. No index macros.
  • Will be used by GPUFORT to construct interoperable derived types from non-interoperable device types.
    This will allow AoS syntax such as:
    domains(5).cells(i).coord_x in GPUFORT C++ code, which is analogous to the Fortran equivalent:
    domain(5)%cells(i)%coord_x.

BUGFIXES:

  • Fix crash when encountering top-level subroutines/functions.

* Reason: Need to know all variables in loop kernel
body to generate `present_or_\(copy\|copyin\) runtime calls
for the vars not appearing in clauses.
*If no default clause is specified, present_or_copy is performed for all
 unmapped variables
*If 'default(none)' is specified and not all variables are mapped, warning
 is posted (will add option to convert to error)
*If 'default(present)' is specified,

TODO: Take parent data directive into account to prevent
some unnecessary runtime calls (if current behaviour is performance issue).
* vector-add-declare/vector-add.f90 example with declare in program seems
  to work correctly
* Some more care required for enabling declare in subroutines
FEATURE: gpufort acc runtime behavior can now be influenced/
 tuned via the following environment variables:
 GPUFORT_LOG_LEVEL                    (default=0)
   log level. Maximum log level used in code 3.
 GPUFORT_MAX_QUEUES                   (default=64)
   maximum number of async queues.
 GPUFORT_INITIAL_RECORD_LIST_CAPACITY (default=4096)
   mapping records are managed via a vector. Specify
   the initial vector capacity via this flag.
   If the maximum capacity is reached, the vector
   capacity is doubled.
 GPUFORT_BLOCK_SIZE                   (default=32)
   all device arrays are allocated as multiples of
   this block size.
 GPUFORT_REUSE_THRESHOLD              (default=0.9)
   reuse an existing device buffer if the requested
   buffer size greater than GPUFORT_REUSE_THRESHOLD x size of
   already existing buffer.
 GPUFORT_NUM_REFS_TO_DEALLOCATE       (default=-5)
   number of references for which a released array
   will be deallocated

OPTIMIZATION: gpufort acc runtime will now try to
reuse device arrays that have been previously
released but not allocated yet. Behavior can be tuned
via env. vars. GPUFORT_REUSE_THRESHOLD,
GPUFORT_NUM_REFS_TO_DEALLOCATE and in some sense via
GPUFORT_BLOCK_SIZE.

BUGFIX/OPTIMIZATION: Lookup records from back.
* add `scope` arg to signature of _intrnl_inout_arrays_in_subtree
* Fixes all tests in <gpufort_dir>/python/test/grammar_translator/openacc
* Detects now (additionally) expressions such as
  ```
  <line 1>&
  !$acc <rest of line 2>
  ```

  and removes the &\s*\n\s*!$acc
FEATURE/linemapper:
  Allow to prepend and append lines directly to
  statement data structures and not only whole line.
  New data structure triggered changes in all dependent packages (scanner,indexer)

FEATURE/gpufort:
  Add option to dump linemapper datastructure
@domcharrier domcharrier added the enhancement New feature or request label Oct 28, 2021
*Fix mismatching arg lists between wrapper/impl
function; put long dummy arg list of gpufort_acc_present_...
in macro and reuse macro in wrapper
(gpufort_acc_runtime) and implementation
(gpufort_acc_runtime_base).

minor/unrelated:
*rename internal function in gpufort.py (parse_cl_args)
@domcharrier domcharrier changed the title New OpenACC features for GPUFORT runtime WiP: New OpenACC features for GPUFORT runtime Nov 25, 2021
@domcharrier domcharrier self-assigned this Nov 25, 2021
domcharrier and others added 6 commits November 25, 2021 05:52
TODO: Improve test parkour for translator
to improve declaration parsing.
* Fix issue with parsing expressions that have '=>' in declared variable
  RHS.
* RHS of declared variable can now be logical expression too.
* Add more rigorous test for declaration.
GPUFORT tries to preserve comments.
Unfortunately, this becomes
difficult when a comment begins
after a line continuation character.

GPUFORT will move these comments
to the before the statement that
contains them.
* Add test to folder python/test/utils
…l-statement

WiP: BUGFIX: Support comments in multi-line statements
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

3 participants