Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A libcurl function was given a bad argument #35365

Open
vdytyniak-exos opened this issue Apr 28, 2023 · 7 comments
Open

A libcurl function was given a bad argument #35365

vdytyniak-exos opened this issue Apr 28, 2023 · 7 comments

Comments

@vdytyniak-exos
Copy link

Describe the bug, including details regarding any error messages, version, and platform.

We use pyarrow to read data from S3 and sometimes we get the following error:

File "/usr/local/lib/python3.10/dist-packages/{org}/store/storage.py", line 794, in _load_partition
    table = ds.dataset(
  File "pyarrow/_dataset.pyx", line 369, in pyarrow._dataset.Dataset.to_table
  File "pyarrow/_dataset.pyx", line 2818, in pyarrow._dataset.Scanner.to_table
  File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
OSError: AWS Error NETWORK_CONNECTION during GetObject operation: curlCode: 43, A libcurl function was given a bad argument

We were trying to find the reason why it happens, but it is very random. Can you help to understand where actually the problem with libcurl can be?

Component(s)

Python

@westonpace
Copy link
Member

We don't use curl directly, only indirectly through aws-cpp-sdk. This is going to be hard to debug without some way to reliably reproduce.

What version of pyarrow are you using? What OS?

@vdytyniak-exos
Copy link
Author

We don't use curl directly, only indirectly through aws-cpp-sdk. This is going to be hard to debug without some way to reliably reproduce.

What version of pyarrow are you using? What OS?

pyarrow=10.0.1
os: ubuntu:20.04

@westonpace
Copy link
Member

I did some basic research on the error and didn't find much. The only thing I could see that might cause this is if there is an incompatibility between the S3 SDK and the curl versions (e.g. if the S3 SDK was developed / compiled against one version and linked / run with another version).

How are you obtaining pyarrow? Is it from conda, pip, or a build from source? Can you use ldd to check which library versions it is linking against? For example, I use conda so I run this:

(arrow-release-11) pace@pace-desktop:~$ ldd ~/miniconda3/envs/arrow-release-10/lib/python3.11/site-packages/pyarrow/libarrow_python.so.1000.1.0 
...
	libcurl.so.4 => /home/pace/miniconda3/envs/arrow-release-10/lib/python3.11/site-packages/pyarrow/../../../././libcurl.so.4 (0x00007f3ffa8ac000)
...
	libaws-c-s3.so.0unstable => /home/pace/miniconda3/envs/arrow-release-10/lib/python3.11/site-packages/pyarrow/../../.././././libaws-c-s3.so.0unstable (0x00007f3ffa64c000)

@vdytyniak-exos
Copy link
Author

I install from pip. I don't see libaws-c-s3.so:

root@fba404d79f64:/dir# ldd /usr/local/lib/python3.10/dist-packages/pyarrow/libarrow_python.so.1000.1.0
	linux-vdso.so.1 (0x00007ffe52b60000)
	libarrow_dataset.so.1000 => /usr/local/lib/python3.10/dist-packages/pyarrow/libarrow_dataset.so.1000 (0x00007f136944b000)
	libparquet.so.1000 => /usr/local/lib/python3.10/dist-packages/pyarrow/libparquet.so.1000 (0x00007f1368d10000)
	libarrow.so.1000 => /usr/local/lib/python3.10/dist-packages/pyarrow/libarrow.so.1000 (0x00007f136663d000)
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f1366456000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1366307000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f13662ec000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f13660f8000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f13660ee000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f13660cb000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f13660c5000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f13697bd000)

@westonpace
Copy link
Member

I install from pip. I don't see libaws-c-s3.so:

Ah, I think, if you installed from pip, everything is statically linked. Which I suppose rules out a version incompatibility.

In that case I'm afraid I'm at a bit of a loss on where to proceed next. If it could be reproduced regularly we might try and build with a debug version of curl and break at the point where that error is being generated to figure out what exactly is invalid.

@shomilj
Copy link

shomilj commented Nov 16, 2023

We're facing the same issue - we see AWS Error NETWORK_CONNECTION during HeadObject operation: curlCode: 43 show up transiently, and it's pretty hard to reproduce - the only thing that seems to be triggering it more frequently is a higher-latency network connection to S3, so our suspicion is that something at a lower layer is not handling higher latency properly (cc @westonpace if you may have any pointers or additional debugging tips).

@vdytyniak-exos did you ever root cause this?

@mayanksingh2298
Copy link

Any updates on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants