CL_OUT_OF_RESOURCES error on Nvidia GTX 1060 copying position buffer from device #136

jhurlbut · 2018-01-04T07:12:32Z

I am able to run the simulation on the CPU and Intel HD GPU but when running on my Nvidia GPU on the same machine I get a CL_OUT_OF_RESOURCES error when the code tries to copy the position buffer from device. This is on Windows 10.

lungd · 2018-01-04T09:26:17Z

@jhurlbut I didn't run Sibernetic on a Windows machine yet. With Ubuntu, I think I got the same error after nvidia released and shipped CUDA 9 with the graphics driver package. After downgrading to an older version (375.82) it worked again.

@a-palyanov @skhayrulin Any comments?

skhayrulin · 2018-01-04T18:36:01Z

@lungd you can install clinfo utility (sudo apt-get install clinfo), than after installation newest driver (with which you have a problem) you can run it and it will show you all available OpenCL devices if your NVIDIA card there than the problem is on sibernetic side if it's not than something wrong with new NVIDIA driver, also could you please give information about this driver?

lungd · 2018-01-04T19:17:36Z

@skhayrulin
clinfo_out
sibernetic_out
nvidia-driver version: 387.12
I got the same error with another machine (other GPU, driver version) so I think this issue can be fixed by changing the kernel?
Or maybe the OpenCL version has to be >= 2.0 with CUDA >= 9.0 ?

@jhurlbut Do you get the same error?

jhurlbut · 2018-01-04T23:41:24Z

I tried installing Cuda 7.5 next to the Cuda v9 already installed. The same error occured with Cuda v9. The nvidia driver version is 385.54. Yes I get that same error "ERROR: Could not enqueue read data from buffer error code is error code is -5" which after looking up online is the CUDA error CL_OUT_OF_RESOURCES which I understand is usually an issue of trying to read a buffer of a larger size than was written into the buffer. Is there a version of clinfo for Windows? Google only comes up with linux installs. I will try 375.82 driver version. @lungd do you recall which version of CUDA comes with that driver? Thanks!

lungd · 2018-01-05T19:55:52Z

@jhurlbut with 375.82 I get the following output:

...
Configuration was loaded
 CL_PLATFORM_VERSION [0]: 	OpenCL 2.1 LINUX
 CL_PLATFORM_VERSION [1]: 	OpenCL 1.2 CUDA 8.0.0
 CL_PLATFORM_VERSION [2]: 	OpenCL 2.0 LINUX
CL_CONTEXT_PLATFORM [1]: CL_DEVICE_NAME [0]:	GeForce GTX 1080 Ti

CL_CONTEXT_PLATFORM [1]: CL_DEVICE_TYPE [0]:	GPU
...

So if you can see CUDA 8 it should work.
If you can see your GPU after 'CL_DEVICE_NAME [0]', you don't need to check the devices with clinfo or with an alternative tool for Windows.

BTW, I didn't install the cuda toolkit only the nvidia driver.

LucianSheen · 2024-12-27T09:08:21Z

I am experiencing a similar issue while I've been up for running the ow-0.9.5 code on Ubuntu 22.04.
The hardware I am using is Nvidia 4060 laptop gpu with 16gb Vram,
the driver version is 550.120 and here's the clinfo results.

Platform Name NVIDIA CUDA
Platform Vendor NVIDIA Corporation
Platform Version OpenCL 3.0 CUDA 12.4.131

and the following was the error log I've seen:

[[ Step 112934 (total steps: unlimited, t in sim: 2.25868s) dt: 2e-05 (in s), time elapsed: 40.42 (in min) ]]
_runHashParticles: 0.022 ms
_runSort: 9.880 ms
_runSortPostPass: 0.046 ms
_runIndexx: 0.023 ms
_runIndexPostPass: 0.261 ms
_runFindNeighbors: 7.734 ms
_runPCISPH: 0.172 ms 3 iteration(s)
membraneHandling: 0.028 ms
Python >> MuscleSimulation does NOT save results
ERROR: Could not enqueue read data from buffer error code is error code is -5

Could this possibly be caused because of my hardware? My laptop has Ryzen 8645HS and 16gb RAM.
Or should I change the version of Nvidia driver?

skhayrulin · 2024-12-27T09:18:00Z

@LucianSheen I don't thilk that changing version of drivers will help did you have any chance to check how much resources did sibernetic use before crash? Maybe some sort of memory leak...

LucianSheen · 2024-12-27T10:29:19Z

@skhayrulin I ran the simulation while monitoring both memory and GPU. System memory usage has been around 2.3 gb/ 16 gb while GPU memory was 298 Mb / 8188 Mb (sorry I was mistaken, my gpu has 8gb memory not 16) during the whole simulation and never spiked. If the memory isn't the problem here should I install a CUDA toolkit separately when I already installed NVIDIA driver do you think? NVIDIA-smi showed GPU 00000000:01:00.0: Detected Critical Xid Error when the crash happened.

AND - I don't experience such issues with worm_crawl_half_resolution, but when I run -f worm_crawling.
Regardless of logstep settings it seems to crash precisely at Step 112934.

EDITED: installed CUDA toolkit but got stuck at the very same step of 112934.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CL_OUT_OF_RESOURCES error on Nvidia GTX 1060 copying position buffer from device #136

CL_OUT_OF_RESOURCES error on Nvidia GTX 1060 copying position buffer from device #136

jhurlbut commented Jan 4, 2018

lungd commented Jan 4, 2018

skhayrulin commented Jan 4, 2018

lungd commented Jan 4, 2018

jhurlbut commented Jan 4, 2018 •

edited

Loading

lungd commented Jan 5, 2018

LucianSheen commented Dec 27, 2024

skhayrulin commented Dec 27, 2024

LucianSheen commented Dec 27, 2024 •

edited

Loading

CL_OUT_OF_RESOURCES error on Nvidia GTX 1060 copying position buffer from device #136

CL_OUT_OF_RESOURCES error on Nvidia GTX 1060 copying position buffer from device #136

Comments

jhurlbut commented Jan 4, 2018

lungd commented Jan 4, 2018

skhayrulin commented Jan 4, 2018

lungd commented Jan 4, 2018

jhurlbut commented Jan 4, 2018 • edited Loading

lungd commented Jan 5, 2018

LucianSheen commented Dec 27, 2024

skhayrulin commented Dec 27, 2024

LucianSheen commented Dec 27, 2024 • edited Loading

jhurlbut commented Jan 4, 2018 •

edited

Loading

LucianSheen commented Dec 27, 2024 •

edited

Loading