Post by Daniel StenbergFor example, he was using an older libssh2 than you. Are you also on Windows?
I use Fedora Linux and RHEL. It is reproducible issue.
Post by Daniel StenbergIt does give me the feeling it might be a libssh2 bug, but without more
details we can't tell for sure.
Post by VourheyInfinite loop (invoking of clock_gettime, poll, recv functions) loads
CPU.
But why does it loop? When the EAGAIN situation is returned from
libssh2, libcurl checks for what to wait for socket-wise and that
shouldn't happen immediately. Is libssh2 telling us to wait for the
wrong direction in this situation? Can you dig up more specific details
on what exactly is happening?
Looks like it is not a libssh2 bug.
I've compiled 7.19.5 version of curl with the latest version of libssh2
(1.2). The test is simple (to upload a file to a remote host by SFTP):
src/curl -T ./big.iso -u anyuser:anypass sftp://my_host.com/~/
after that curl sends .iso file to the host. And gets 99% of CPU.
libssh2 opens a socket for sending the file:
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 4
file/socket descriptor is 4.
After that:
send(4,
"D\320kE\33\10\253:\241~O\261\2407l\346\313\2275^\313X\305\303RY\264Y\201foH|"...,
16452, MSG_NOSIGNAL) = 16452 //sending data from the file
recv(4, 0x827c7e4, 16384, MSG_NOSIGNAL) = -1 EAGAIN (Resource temporarily
unavailable) //there is no socket data
clock_gettime(CLOCK_MONOTONIC, {14965, 269959641}) = 0
clock_gettime(CLOCK_MONOTONIC, {14965, 270097791}) = 0
clock_gettime(CLOCK_MONOTONIC, {14965, 270215983}) = 0
poll([{fd=4, events=POLLOUT}], 1, 1000) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=4, events=POLLOUT}], 1, 0) = 1 ([{fd=4, revents=POLLOUT}])
recv(4, 0x827c7e4, 16384, MSG_NOSIGNAL) = -1 EAGAIN (Resource temporarily
unavailable) //check again, but still nothing
clock_gettime(CLOCK_MONOTONIC, {14965, 270749652}) = 0
clock_gettime(CLOCK_MONOTONIC, {14965, 270877781}) = 0
clock_gettime(CLOCK_MONOTONIC, {14965, 270972324}) = 0
poll([{fd=4, events=POLLOUT}], 1, 1000) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=4, events=POLLOUT}], 1, 0) = 1 ([{fd=4, revents=POLLOUT}])
recv(4, 0x827c7e4, 16384, MSG_NOSIGNAL) = -1 EAGAIN (Resource temporarily
unavailable) //check again, nothing
clock_gettime(CLOCK_MONOTONIC, {14965, 271424763}) = 0
. . .
and many calls of "clock_gettime" and "poll" functions until:
recv(4, "\207<{\317@\t\256\36|\34\33\24P{\0A\212q
\346\34_\323\270V\322\225\222-\213\243\237\237"..., 16384, MSG_NOSIGNAL) =
68 //the data is received from the socket
Then we read the file and send the next portion of data:
read(3, "\1CD001\1\0LINUX "..., 16384) = 16384 //read
file data
send(4,
"D\320kE\33\10\253:\241~O\261\2407l\346\313\2275^\313X\305\303RY\264Y\201foH|"...,
16452, MSG_NOSIGNAL) = 16452 //send
invoke clock_gettime, poll & recv again and again until the data is received
from the socket.
Looks like these functions: clock_gettime, poll & recv are invoked in
infinite loop, because the socket is non-blocking and ârecvâ syscall does
not wait for data from the socket but exits immediately.
Here is the result of the test "Who does use the most of CPU time":
% time calls errors syscall
------ ----------- ----------- --------- ---------
39.82 70527 clock_gettime
34.33 46339 poll
20.92 23251 23199 recv
1.10 38 send
clock_gettime, poll are invoked very often.
We can see the following backtraces:
clock_gettime is invoked:
#0 0x00a2ecf4 in clock_gettime () from /lib/librt.so.1
#1 0x0074cf69 in curlx_tvnow () from
/home/nortel/curl-7.19.5/lib/.libs/libcurl.so.4
#2 0x00777842 in Curl_socket_ready () from
/home/nortel/curl-7.19.5/lib/.libs/libcurl.so.4
#3 0x0076e8cf in Curl_perform () from
/home/nortel/curl-7.19.5/lib/.libs/libcurl.so.4
#4 0x0076f58d in curl_easy_perform () from
/home/nortel/curl-7.19.5/lib/.libs/libcurl.so.4
#5 0x0804fc74 in main ()
clock_gettime is invoked:
#0 0x00a2ecc6 in clock_gettime () from /lib/librt.so.1
#1 0x002f6f69 in curlx_tvnow () from
/home/nortel/curl-7.19.5/lib/.libs/libcurl.so.4
#2 0x003176bd in Curl_readwrite () from
/home/nortel/curl-7.19.5/lib/.libs/libcurl.so.4
#3 0x003188ea in Curl_perform () from
/home/nortel/curl-7.19.5/lib/.libs/libcurl.so.4
#4 0x0031958d in curl_easy_perform () from
/home/nortel/curl-7.19.5/lib/.libs/libcurl.so.4
#5 0x0804fc74 in main ()
poll is invoked:
#0 0x00e25402 in __kernel_vsyscall ()
#1 0x003fc93b in poll () from /lib/libc.so.6
#2 0x003218ce in Curl_socket_ready () from
/home/nortel/curl-7.19.5/lib/.libs/libcurl.so.4
#3 0x003188cf in Curl_perform () from
/home/nortel/curl-7.19.5/lib/.libs/libcurl.so.4
#4 0x0031958d in curl_easy_perform () from
/home/nortel/curl-7.19.5/lib/.libs/libcurl.so.4
#5 0x0804fc74 in main ()
So, poll and clock_gettime are invoked from the libcurl.
Looks like these backtraces are performed by the following code in infinite
loop (lib/transfer.c file, Transfer() function):
switch (Curl_socket_ready(fd_read, fd_write, timeout_ms)) {
case -1: /* select() error, stop reading */
#ifdef EINTR
/* The EINTR is not serious, and it seems you might get this more
often when using the lib in a multi-threaded environment! */
if(SOCKERRNO == EINTR)
continue;
#endif
return CURLE_RECV_ERROR; /* indicate a network problem */
case 0: /* timeout */
default: /* readable descriptors */
result = Curl_readwrite(conn, &done);
/* "done" signals to us if the transfer(s) are ready */
break;
}
Post by Daniel Stenberglibcurl is supposed to do that blocking magic on its own
using the same mechanism that libssh2 uses internally when blocking is
selected. It would then rather indicate that libssh2 _is_ right and that
libcurl gets confused for some reason.
Could you please provide info, how to enable this mechanism on libcurl
level? "--disable-nonblock" option for the configure does not help and curl
still loads CPU to 100%.
Post by Daniel StenbergPost by VourheyCould you please tell me, can these changes cause any issues in
libcurl work?
Yes. It'll make libcurl work less like it is supposed to, as libcurl
wants to work non-blocking with the transfer-layer. It'll be most
evident if you use the multi interface.
Blocking mode in the libssh2 is performed by non-blocked socket and the
select syscall. So, libcurl just waits for return from a libssh2 call
("libssh2_sftp_write", probably). After that it will invoke the same code.
Could you please tell me, what kind of issues can this cause?
Post by Daniel StenbergI'd like us to fix all bugs, yes. But fixing bugs is a lot of hard work
and we're all busy people. I hope we can help you track this down when
you tell us more details from what you experience.
I understand. You're doing very good job. Thanks a lot.
Please, see details above and feel free to comment.
Thanks,
Pavel