Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CL_OUT_OF_RESOURCES error on Nvidia GTX 1060 copying position buffer from device #136

Open
jhurlbut opened this issue Jan 4, 2018 · 8 comments

Comments

@jhurlbut
Copy link

jhurlbut commented Jan 4, 2018

I am able to run the simulation on the CPU and Intel HD GPU but when running on my Nvidia GPU on the same machine I get a CL_OUT_OF_RESOURCES error when the code tries to copy the position buffer from device. This is on Windows 10.

@lungd
Copy link
Contributor

lungd commented Jan 4, 2018

@jhurlbut I didn't run Sibernetic on a Windows machine yet. With Ubuntu, I think I got the same error after nvidia released and shipped CUDA 9 with the graphics driver package. After downgrading to an older version (375.82) it worked again.

@a-palyanov @skhayrulin Any comments?

@skhayrulin
Copy link
Member

@lungd you can install clinfo utility (sudo apt-get install clinfo), than after installation newest driver (with which you have a problem) you can run it and it will show you all available OpenCL devices if your NVIDIA card there than the problem is on sibernetic side if it's not than something wrong with new NVIDIA driver, also could you please give information about this driver?

@lungd
Copy link
Contributor

lungd commented Jan 4, 2018

@skhayrulin
clinfo_out
sibernetic_out
nvidia-driver version: 387.12
I got the same error with another machine (other GPU, driver version) so I think this issue can be fixed by changing the kernel?
Or maybe the OpenCL version has to be >= 2.0 with CUDA >= 9.0 ?

@jhurlbut Do you get the same error?

@jhurlbut
Copy link
Author

jhurlbut commented Jan 4, 2018

I tried installing Cuda 7.5 next to the Cuda v9 already installed. The same error occured with Cuda v9. The nvidia driver version is 385.54. Yes I get that same error "ERROR: Could not enqueue read data from buffer error code is error code is -5" which after looking up online is the CUDA error CL_OUT_OF_RESOURCES which I understand is usually an issue of trying to read a buffer of a larger size than was written into the buffer. Is there a version of clinfo for Windows? Google only comes up with linux installs. I will try 375.82 driver version. @lungd do you recall which version of CUDA comes with that driver? Thanks!

@lungd
Copy link
Contributor

lungd commented Jan 5, 2018

@jhurlbut with 375.82 I get the following output:

...
Configuration was loaded
 CL_PLATFORM_VERSION [0]: 	OpenCL 2.1 LINUX
 CL_PLATFORM_VERSION [1]: 	OpenCL 1.2 CUDA 8.0.0
 CL_PLATFORM_VERSION [2]: 	OpenCL 2.0 LINUX
CL_CONTEXT_PLATFORM [1]: CL_DEVICE_NAME [0]:	GeForce GTX 1080 Ti

CL_CONTEXT_PLATFORM [1]: CL_DEVICE_TYPE [0]:	GPU
...

So if you can see CUDA 8 it should work.
If you can see your GPU after 'CL_DEVICE_NAME [0]', you don't need to check the devices with clinfo or with an alternative tool for Windows.

BTW, I didn't install the cuda toolkit only the nvidia driver.

@LucianSheen
Copy link

I am experiencing a similar issue while I've been up for running the ow-0.9.5 code on Ubuntu 22.04.
The hardware I am using is Nvidia 4060 laptop gpu with 16gb Vram,
the driver version is 550.120 and here's the clinfo results.

Platform Name NVIDIA CUDA
Platform Vendor NVIDIA Corporation
Platform Version OpenCL 3.0 CUDA 12.4.131

and the following was the error log I've seen:

[[ Step 112934 (total steps: unlimited, t in sim: 2.25868s) dt: 2e-05 (in s), time elapsed: 40.42 (in min) ]]
_runHashParticles: 0.022 ms
_runSort: 9.880 ms
_runSortPostPass: 0.046 ms
_runIndexx: 0.023 ms
_runIndexPostPass: 0.261 ms
_runFindNeighbors: 7.734 ms
_runPCISPH: 0.172 ms 3 iteration(s)
membraneHandling: 0.028 ms
Python >> MuscleSimulation does NOT save results
ERROR: Could not enqueue read data from buffer error code is error code is -5

Could this possibly be caused because of my hardware? My laptop has Ryzen 8645HS and 16gb RAM.
Or should I change the version of Nvidia driver?

@skhayrulin
Copy link
Member

@LucianSheen I don't thilk that changing version of drivers will help did you have any chance to check how much resources did sibernetic use before crash? Maybe some sort of memory leak...

@LucianSheen
Copy link

LucianSheen commented Dec 27, 2024

@skhayrulin I ran the simulation while monitoring both memory and GPU. System memory usage has been around 2.3 gb/ 16 gb while GPU memory was 298 Mb / 8188 Mb (sorry I was mistaken, my gpu has 8gb memory not 16) during the whole simulation and never spiked. If the memory isn't the problem here should I install a CUDA toolkit separately when I already installed NVIDIA driver do you think? NVIDIA-smi showed GPU 00000000:01:00.0: Detected Critical Xid Error when the crash happened.

AND - I don't experience such issues with worm_crawl_half_resolution, but when I run -f worm_crawling.
Regardless of logstep settings it seems to crash precisely at Step 112934.

EDITED: installed CUDA toolkit but got stuck at the very same step of 112934.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants