Conversation
cava/samples/cudart/cudart.cpp
Outdated
| ava_argument(dst) { | ||
| if (kind == cudaMemcpyHostToDevice) { | ||
| ava_opaque; | ||
| ava_handle; |
There was a problem hiding this comment.
As this change requires to annotate devptr as handles which doesn't support the "offsetting" (+/-) operation I think we should merge this after we redesign the handle.
There was a problem hiding this comment.
Do you know of any test applications that this breaks?
There was a problem hiding this comment.
There is another issue. Devptrs should be handles everywhere or nowhere. I don't think this spec makes the devptrs in arguments to kernels handles.
There was a problem hiding this comment.
Do you know of any test applications that this breaks?
Not for Rodinia CUDA benchmarks. But for the supported AI frameworks (like TF), copying memory at offsets is a normal behavior.
There was a problem hiding this comment.
There is another issue. Devptrs should be handles everywhere or nowhere. I don't think this spec makes the devptrs in arguments to kernels handles.
@arthurp The spec for cuLaunchKernel already checked if the argument was a handle and applied the annotation accordingly. Is this not doing the right thing?
CUresult CUDAAPI
cuLaunchKernel(CUfunction f,
unsigned int gridDimX,
unsigned int gridDimY,
unsigned int gridDimZ,
unsigned int blockDimX,
unsigned int blockDimY,
unsigned int blockDimZ,
unsigned int sharedMemBytes,
CUstream hStream,
void **kernelParams,
void **extra)
{
ava_argument(hStream) ava_handle;
ava_argument(kernelParams) {
ava_in; ava_buffer(ava_metadata(f)->func->argc);
ava_element {
// FIXME: use the generated index name in the spec to
// reference the outer loop's loop index at this moment.
if (ava_metadata(f)->func->args[__kernelParams_index_0].is_handle) {
ava_type_cast(void *);
ava_buffer(ava_metadata(f)->func->args[__kernelParams_index_0].size);
ava_element ava_handle;
}
else {
ava_type_cast(void *);
ava_buffer(ava_metadata(f)->func->args[__kernelParams_index_0].size);
}
}
}
ava_argument(extra) {
ava_in; ava_buffer(__helper_launch_extra_size(extra));
#warning The buffer size below states that every kernelParams[i] is 1 byte long.
ava_element ava_buffer(1);
}
}
There was a problem hiding this comment.
If you haven't touched that piece of spec, then it treats devPtr kernel args as non-handle.
There was a problem hiding this comment.
What should it look like? It seems like it checks if each argument is a handle and if it is adds the ava_handle annotation. Is this doing something different?
cudaLaunchKernel spec:
ava_argument(args) {
ava_in;
ava_buffer(ava_metadata(func)->func->argc);
ava_element {
// FIXME: use the generated index name in the spec to
// reference the outer loop's loop index at this moment.
if (ava_metadata(func)->func->args[__args_index_0].is_handle) {
ava_type_cast(void *);
ava_buffer(ava_metadata(func)->func->args[__args_index_0].size);
// ava_element ava_handle;
} else {
ava_type_cast(void *);
ava_buffer(ava_metadata(func)->func->args[__args_index_0].size);
}
}
```
There was a problem hiding this comment.
oh, it's commented out now. Should it be uncommented?
There was a problem hiding this comment.
Uncommented ava_element ava_handle and the behavior is unchanged.
There was a problem hiding this comment.
If you haven't touched that piece of spec, then it treats devPtr kernel args as non-handle.
@yuhc can you elaborate on this?
3313d5e to
b63b9cf
Compare
b63b9cf to
f5e3728
Compare
Memory allocations must be handles
f5e3728 to
6c955d7
Compare
|
This needs a range map to handle pointer arithmetic correctly. In what part of the source tree should the files for the range map go? |
Authors: @ArnavMohan and @arthurp
Added annotations to cudaMalloc, cudaFree for migration support.
Memory allocations must be handles.