[BUG] Concurrent execution of many instances of the same CUDA executable causes few of them to fail randomly.

A note: I don't know whether this issue should be categorized as a bug. My setup steps might be wrong as well. If it is the later situation, please guide me accordingly.

**Describe the bug**
Concurrent execution of many instances of the same CUDA executable causes few of them to fail randomly.
In the case of : 
1. `onnx_dump`,it fails with `terminate called without an active exception`
     - Sometimes shows `ERROR sending to socket: Bad file descriptor` before printing `terminate called without an active exception`
3. `cudart`,it fails with a simple `Segmentation fault      (core dumped)`

---
Suppose I write a CUDA executable named say `toy.cu` as follows:

```CUDA
#include <cuda.h>
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
#define BLOCK_SIZE 128

__global__
void do_something(float* d_array)
{
    int idx = blockIdx.x*blockDim.x + threadIdx.x;
    d_array[idx]*=100;
}
int main()
{
    long N= 1<<7;
    float *arr = (float*) malloc(N*sizeof(float));
    long i;
    for (i=1;i<=N;i++)
        arr[i-1]=i;
    
    float *d_array;
    cudaError_t ret;
    
    ret = cudaMalloc(&d_array, N*sizeof(float));
    printf("Return value of cudaMalloc = %d\n", ret);
    
    if(ret != cudaSuccess)
    {
	fprintf(stderr,"GPUassert: %s\n", cudaGetErrorString(ret));
	exit(1);
    }

    ret = cudaMemcpy(d_array, arr, N*sizeof(float), cudaMemcpyHostToDevice);
    printf("Return value of cudaMemcpy = %d\n", ret);

    if(ret != cudaSuccess)
    {
	fprintf(stderr,"GPUassert: %s \n", cudaGetErrorString(ret));
	exit(1);
    }

    int num_blocks= (N+BLOCK_SIZE-1)/BLOCK_SIZE;
    do_something<<<num_blocks, BLOCK_SIZE>>>(d_array);

    ret = cudaMemcpy(arr, d_array, N*sizeof(float), cudaMemcpyDeviceToHost);
    printf("Return value of cudaMemcpy = %d\n", ret);

    int j;
    for(i=0;i<N;)
    {
        for(j=0;j<8;j++)
                printf("%.0f\t", arr[i++]);
        printf("\n");
    }
    cudaFree(d_array);
    return 0;
}
```
And compile it as : 
```shell
​nvcc -o toy toy.cu --cudart shared​
```
Then, in the docker container setup to use the appropriate `libguestlib.so` I call the following `script.sh`
```shell
#!/bin/bash

if [ $# -ne 2 ]; then
    echo "Usage: $0 <executable> <num_instances>"
    exit 1
fi

executable=$1
num_instances=$2

for ((i=1; i<=$num_instances; i++)); do
    $executable &
done
```
And run the following command:
```shell
$ ./script.sh toy 20
```
Many (not all) fails, whether I use `cudart` or `onnx_dump`

**To Reproduce**
I'll go ahead and describe how I set up AvA.
First, I installed NVIDIA driver 418.226.00 using the `NVIDIA-Linux-x86_64-418.226.00.run` from the NVIDIA website.
Second, I installed CUDA Toolkit 10.1 using the  `cuda_10.1.168_418.67_linux.run` from the NVIDIA website.
Third, I install cudnn 7.6.3.30 using the following files:
```
libcudnn7_7.6.3.30-1+cuda10.1_amd64.deb      
libcudnn7-doc_7.6.3.30-1+cuda10.1_amd64.deb
libcudnn7-dev_7.6.3.30-1+cuda10.1_amd64.deb
```
Next, I forked the AvA repository.
I modified the `ava/guestlib/cmd_channel_socket_tcp.cpp` to connect to my host using it's IP address.

And then did the following:

```
$ ava
$ ./generate -s onnx_dump
$ cd ..
$ mkdir build
$ cd build
$ cmake ../ava
$ ccmake . # and then selected the options for onnx_dump and demo manager
$ make -j72
$ make install
```
Then I used a CUDA-10.1 docker image (the one given this repository under `tools/docker`, with a bit of modification to remove the issue of cuda keys for apt update)
Bind mounted my build directory in the docker container and then copied the `libguestlib.so` from the build directory to `/usr/lib/x86_64-linux-gnu` and `/usr/local/cuda-10.1/targets/x86_64-linux/lib/` in the docker container. And modified the library symlinks accordingly:

```shell
/usr/lib/x86_64-linux-gnu$ ls -lh libcu*
lrwxrwxrwx 1 root root   17 Feb 25  2019 libcublasLt.so -> libcublasLt.so.10
lrwxrwxrwx 1 root root   14 Sep 10 04:41 libcublasLt.so.10 -> libguestlib.so
-rw-r--r-- 1 root root  12M Sep 10 04:40 libcublasLt.so.10.1.0.105
-rw-r--r-- 1 root root  23M Feb 25  2019 libcublasLt_static.a
lrwxrwxrwx 1 root root   15 Feb 25  2019 libcublas.so -> libcublas.so.10
lrwxrwxrwx 1 root root   14 Sep 10 04:41 libcublas.so.10 -> libguestlib.so
-rw-r--r-- 1 root root  12M Sep 10 04:40 libcublas.so.10.1.0.105
-rw-r--r-- 1 root root  87M Feb 25  2019 libcublas_static.a
lrwxrwxrwx 1 root root   29 Sep  9 16:09 libcudadebugger.so.1 -> libcudadebugger.so.535.104.05
-rwxr-xr-x 1 root root 9.8M Sep  9 15:43 libcudadebugger.so.535.104.05
lrwxrwxrwx 1 root root   12 Sep  9 16:09 libcuda.so -> libcuda.so.1
lrwxrwxrwx 1 root root   14 Sep 18 15:49 libcuda.so.1 -> libguestlib.so
-rw-r--r-- 1 root root  16M Feb 25  2019 libcuda.so.418.39
-rwxr-xr-x 1 root root  28M Sep  9 15:43 libcuda.so.535.104.05
lrwxrwxrwx 1 root root   29 Mar  7  2019 libcudnn.so -> /etc/alternatives/libcudnn_so
lrwxrwxrwx 1 root root   14 Sep 10 04:42 libcudnn.so.7 -> libguestlib.so
-rw-r--r-- 1 root root 7.0M Sep  9 16:14 libcudnn.so.7.5.0
lrwxrwxrwx 1 root root   32 Mar  7  2019 libcudnn_static.a -> /etc/alternatives/libcudnn_stlib
-rw-r--r-- 1 root root 351M Feb 15  2019 libcudnn_static_v7.a
lrwxrwxrwx 1 root root   23 Apr  6  2018 libcupsfilters.so.1 -> libcupsfilters.so.1.0.0
-rw-r--r-- 1 root root 211K Apr  6  2018 libcupsfilters.so.1.0.0
-rw-r--r-- 1 root root  34K Dec 12  2018 libcupsimage.so.2
-rw-r--r-- 1 root root 558K Dec 12  2018 libcups.so.2
-rw-r--r-- 1 root root  12M Sep 10 04:40 libcurand.so.10
lrwxrwxrwx 1 root root   19 Jan 29  2019 libcurl-gnutls.so.3 -> libcurl-gnutls.so.4
lrwxrwxrwx 1 root root   23 Jan 29  2019 libcurl-gnutls.so.4 -> libcurl-gnutls.so.4.5.0
-rw-r--r-- 1 root root 499K Jan 29  2019 libcurl-gnutls.so.4.5.0
lrwxrwxrwx 1 root root   16 Jan 29  2019 libcurl.so.4 -> libcurl.so.4.5.0
-rw-r--r-- 1 root root 507K Jan 29  2019 libcurl.so.4.5.0
lrwxrwxrwx 1 root root   12 May 23  2018 libcurses.a -> libncurses.a
lrwxrwxrwx 1 root root   13 May 23  2018 libcurses.so -> libncurses.so
```
```shell
/usr/local/cuda-10.1/targets/x86_64-linux/lib$ ls -lh libcu*
-rw-r--r-- 1 root root 701K Feb 25  2019 libcudadevrt.a
lrwxrwxrwx 1 root root   17 Feb 25  2019 libcudart.so -> libcudart.so.10.1
lrwxrwxrwx 1 root root   14 Sep 18 15:45 libcudart.so.10.1 -> libguestlib.so
-rw-r--r-- 1 root root 493K Feb 25  2019 libcudart.so.10.1.105
-rw-r--r-- 1 root root 868K Feb 25  2019 libcudart_static.a
lrwxrwxrwx 1 root root   14 Feb 25  2019 libcufft.so -> libcufft.so.10
lrwxrwxrwx 1 root root   14 Oct 29 21:39 libcufft.so.10 -> libguestlib.so
-rw-r--r-- 1 root root 112M Feb 25  2019 libcufft.so.10.1.105
-rw-r--r-- 1 root root 132M Feb 25  2019 libcufft_static.a
-rw-r--r-- 1 root root 119M Feb 25  2019 libcufft_static_nocallback.a
lrwxrwxrwx 1 root root   15 Feb 25  2019 libcufftw.so -> libcufftw.so.10
lrwxrwxrwx 1 root root   21 Feb 25  2019 libcufftw.so.10 -> libcufftw.so.10.1.105
-rw-r--r-- 1 root root 489K Feb 25  2019 libcufftw.so.10.1.105
-rw-r--r-- 1 root root  33K Feb 25  2019 libcufftw_static.a
lrwxrwxrwx 1 root root   18 Feb 25  2019 libcuinj64.so -> libcuinj64.so.10.1
lrwxrwxrwx 1 root root   22 Feb 25  2019 libcuinj64.so.10.1 -> libcuinj64.so.10.1.105
-rw-r--r-- 1 root root 7.5M Feb 25  2019 libcuinj64.so.10.1.105
-rw-r--r-- 1 root root  32K Feb 25  2019 libculibos.a
lrwxrwxrwx 1 root root   15 Feb 25  2019 libcurand.so -> libcurand.so.10
lrwxrwxrwx 1 root root   14 Oct 29 21:39 libcurand.so.10 -> libguestlib.so
-rw-r--r-- 1 root root  58M Feb 25  2019 libcurand.so.10.1.105
-rw-r--r-- 1 root root  58M Feb 25  2019 libcurand_static.a
lrwxrwxrwx 1 root root   17 Feb 25  2019 libcusolver.so -> libcusolver.so.10
lrwxrwxrwx 1 root root   14 Oct 29 21:40 libcusolver.so.10 -> libguestlib.so
-rw-r--r-- 1 root root 175M Feb 25  2019 libcusolver.so.10.1.105
-rw-r--r-- 1 root root  88M Feb 25  2019 libcusolver_static.a
lrwxrwxrwx 1 root root   17 Feb 25  2019 libcusparse.so -> libcusparse.so.10
lrwxrwxrwx 1 root root   14 Oct 29 21:40 libcusparse.so.10 -> libguestlib.so
-rw-r--r-- 1 root root  87M Feb 25  2019 libcusparse.so.10.1.105
-rw-r--r-- 1 root root  97M Feb 25  2019 libcusparse_static.a
```
Added the guest config in the docker container as:

```
$ cat /etc/ava/guest.conf 
channel = "TCP";
manager_address = "10.192.34.20:3333";
gpu_memory = [1024L];
```
Then I tried to launch the manger on the host as follows:
```
build$ ./install/bin/demo_manager --worker_path install/onnx_dump/bin/worker
Manager Service listening on ::3333
```

And on the guest, I try to run the `toy` cuda program. But it fails as described earlier.

-----
I described the setup for `onnx_dump` but the setup for `cudart` is similar. But still it gives the error as described earlier.

**Expected behavior**
I expect all the instances of the `toy` executable launched concurrently to run successfully.


**Environment:**
 - OS: Ubuntu 18.04.6 LTS x86_64
 - Python version: 3.6.9
 - GCC version: 7.5.0
 - Kernel: 5.4.0-150-generic
 - Host: SYS-7049GP-TRT 0123456789
 - CPU: Intel Xeon Gold 6140 (72) @ 3.700GHz
 - GPU: NVIDIA Tesla P40
 - NVIDIA Driver Version: 418.226.00
 - CUDA Version: 10.1


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Concurrent execution of many instances of the same CUDA executable causes few of them to fail randomly. #204

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Concurrent execution of many instances of the same CUDA executable causes few of them to fail randomly. #204

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions