Discussion:
Wait for a response using curl_multi_wait
Richard Copley
2018-03-19 21:04:43 UTC
Permalink
Hi,

I'm posting this question a second time, although the first time it was
apparently
moderated as spam. Meanwhile somebody asking how to add a header to a
request
got through. It's confusing. I put effort into writing a good question. If
there's something
wrong with my question, I'd like to hear about it. Don't worry about
offending me.

Anyway, here's the original message again:

Using the "multi-single.c" example[1] as a testbed, I notice that after
adding the easy handle and calling curl_multi_perform, the curl_multi_wait
call returns immediately and sets numfds to 0, indicating there is no fd to
wait on.

Question: Isn't there a socket fd to wait on for the connect() and/or the
data?

In the "multi-single.c" example, there is a Sleep() in the main loop. If
the sleep interval is too small then we use too much CPU in going round the
loop. If the interval is too big, we risk reacting too slowly when the
response arrives. A 50ms to 100ms sleep interval is probably Just Right,
but if possible, it would be neater to avoid the Sleep() altogether and
wait (one or more times) for something to happen.

Question: Is it possible?

I tried the sample on Windows 10 with libcurl 7.57.0 built by the MSYS2
project for native Windows (i.e., not relying on Cygwin or MSYS for POSIX
emulation) and on Ubuntu 17.10 using libcurl 7.55.1 from the Ubuntu package
sources (package libcurl4-openssl-dev). Same results on both.

Big thanks to Daniel and everyone who has worked on libcurl. It's a
pleasure to use and the documentation is outstandingly good.

[1] https://curl.haxx.se/libcurl/c/multi-single.html
Erik Janssen
2018-03-20 09:10:01 UTC
Permalink
Hi,
Using the "multi-single.c" example[1] as a testbed, I notice that after adding the easy
handle and calling curl_multi_perform, the curl_multi_wait call returns immediately
and sets numfds to 0, indicating there is no fd to wait on.
What is 'immediately' and what happens on the wire? I checked my own code, I don't
wait inside such loop and actually use the waiting of curl_multi_wait for all my internal
timing needs. I never check numfds either and just proceed to curl_multi_perform/curl_multi_info_read.
This example isn't doing that so you don't see what happens. Maybe the data is just there?

Disclaimer: maybe my code is flawed an happens to work because I also pass some extra_fds.

Erik

-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:
Richard Copley
2018-03-20 20:44:22 UTC
Permalink
Post by Richard Copley
Hi,
Post by Richard Copley
Using the "multi-single.c" example[1] as a testbed, I notice that after
adding the easy
Post by Richard Copley
handle and calling curl_multi_perform, the curl_multi_wait call returns
immediately
Post by Richard Copley
and sets numfds to 0, indicating there is no fd to wait on.
What is 'immediately'
It here means "not mediated (by something happening)". A few microseconds
pass,
but no data is sent or received.
Post by Richard Copley
and what happens on the wire?
By collating timestamped debug output against wireshark logs
(disclaimer: I'm not sure of all of this, my timestamps weren't accurate
enough
and I don't know the protocol too well):

0.957 Before first curl_multi_perform
SEND: connect request (SYN, TCP length 0)
0.957 After first curl_multi_perform, still_running=1

0.957 Before curl_multi_wait
0.957 After curl_multi_wait, numfds=0

0.957 Before curl_multi_perform
0.957 After curl_multi_perform, still_running=1

0.957 Before curl_multi_wait
0.957 After curl_multi_wait, numfds=0

0.957 Before waiting 100 ms
RECV: connect accept (SYN/ACK, TCP length 0)
SEND: connect confirm (ACK, TCP length 0)
SEND: request (TCP length 54)
1.064 After waiting 100 milliseconds

1.064 Before curl_multi_perform
1.064 After curl_multi_perform, still_running=1

1.064 Before curl_multi_wait
RECV: response data (3 segments) (ACK, length 1598)
SEND: (ACK, TCP length 0)
RECV: (ACK, TCP length 0)
1.166 After curl_multi_wait, numfds=1

1.166 Before curl_multi_perform
SEND: (ACK, TCP length 0)
SEND: (FIN/ACK, TCP length 0)
1.167 After curl_multi_perform, still_running=1

1.167 Before curl_multi_wait
1.267 After curl_multi_wait, numfds=0

1.267 Before curl_multi_perform
1.267 After curl_multi_perform, still_running=1

1.267 Before curl_multi_wait
RECV: (FIN/ACK, TCP length 0)
1.271 After curl_multi_wait, numfds=1

1.271 Before curl_multi_perform
SEND: (ACK, TCP length 0)
WRITE CALLBACK INVOKED
1.271 After curl_multi_perform, still_running=0

I checked my own code, I don't
Post by Richard Copley
wait inside such loop and actually use the waiting of curl_multi_wait for all my internal
timing needs.
Well, the way it is done in the example is also how curl_easy_perform works
(with a
more careful choice of timeout).
Post by Richard Copley
I never check numfds either and just proceed to
curl_multi_perform/curl_multi_info_read.
This example isn't doing that so you don't see what happens. Maybe the data is just there?
I might have misunderstood you. If curl_multi_wait returns with numfds=0,
then there
is no point trying to read the data.
Post by Richard Copley
Disclaimer: maybe my code is flawed an happens to work because I also pass some extra_fds.
I don't think that would be a flaw. Passing a "dummy" extra_fd (that will
never be ready)
is mentioned in the documentation as a workaround. I can do that, but I'd
like to better
understand what's happening first.
Post by Richard Copley
Erik
Thanks!
Daniel Stenberg
2018-03-23 08:43:11 UTC
Permalink
Post by Richard Copley
I might have misunderstood you. If curl_multi_wait returns with numfds=0,
then there is no point trying to read the data.
When curl_multi_wait() returns, you should call curl_multi_perform() no matter
what numfds says though since there are also timeouts involved that aren't
reflected in numfds.

(Barring the busy-loop problem already being discussed that is.)
--
/ daniel.haxx.se
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
E
Richard Copley
2018-03-23 11:37:28 UTC
Permalink
Post by Richard Copley
I might have misunderstood you. If curl_multi_wait returns with numfds=0,
Post by Richard Copley
then there is no point trying to read the data.
When curl_multi_wait() returns, you should call curl_multi_perform() no
matter what numfds says though since there are also timeouts involved that
aren't reflected in numfds.
(Barring the busy-loop problem already being discussed that is.)
OK, thanks.
Post by Richard Copley
I checked my own code, I don't
wait inside such loop and actually use the waiting of curl_multi_wait for all my internal
timing needs. I never check numfds either and just proceed to
curl_multi_perform/curl_multi_info_read.
Post by Richard Copley
This example isn't doing that so you don't see what happens. Maybe the data is just there?
But I think the example [1] does call curl_multi_perform after every
curl_multi_wait.

In any case, I think what I said about "no point trying to read the
data" doesn't
make much sense, and I take it back.

[1] https://curl.haxx.se/libcurl/c/multi-single.html
Cunningham, Joel
2018-03-20 22:41:59 UTC
Permalink
Hi curl devs,

I just wanted to share some of the problems/solutions I've had integrating with curl_multi_wait, see response inline below.
-----Original Message-----
Richard Copley
Sent: Monday, March 19, 2018 4:05 PM
Subject: Wait for a response using curl_multi_wait
Hi,
I'm posting this question a second time, although the first time it was
apparently
moderated as spam. Meanwhile somebody asking how to add a header to a
request
got through. It's confusing. I put effort into writing a good question. If there's
something
wrong with my question, I'd like to hear about it. Don't worry about
offending me.
Using the "multi-single.c" example[1] as a testbed, I notice that after adding
the easy handle and calling curl_multi_perform, the curl_multi_wait call
returns immediately and sets numfds to 0, indicating there is no fd to wait
on.
Question: Isn't there a socket fd to wait on for the connect() and/or the
data?
In the "multi-single.c" example, there is a Sleep() in the main loop. If the
sleep interval is too small then we use too much CPU in going round the loop.
If the interval is too big, we risk reacting too slowly when the response
arrives. A 50ms to 100ms sleep interval is probably Just Right, but if possible,
it would be neater to avoid the Sleep() altogether and wait (one or more
times) for something to happen.
Question: Is it possible?
curl_multi_wait() has been particularly hard to integrate with in a manner which doesn’t unnecessarily sleep and end up in a tight loop. What I’ve found is the main issue is a lack of information about if any waiting was actually done.

Here are some of the cases I’ve found, which can’t be distinguished between from the function output:

1) Multi handle has no valid FDs and did not wait the requested timeout.
2) Multi handle has valid FDs, but did no work on them, AND did not wait the requested timeout (seen this with CURLOPT_MAX_RECV_SPEED_LARGE).
3) Multi handle has valid FDs and waited the entire timeout, no FDs are ready.

All of these set numfds to 0, so the application code has no choice but to guess at which case happened.

The current solutions I’ve come up with are:

1) Add a dummy FD as an extra FD so that curl_multi_wait() always calls select()/poll() and waits the provided timeout.
2) Record the amount of time spent in curl_multi_wait and if numfds is 0, make sure the entire timeout was waited, if not, do an extra wait before curl_multi_perform
I tried the sample on Windows 10 with libcurl 7.57.0 built by the MSYS2
project for native Windows (i.e., not relying on Cygwin or MSYS for POSIX
emulation) and on Ubuntu 17.10 using libcurl 7.55.1 from the Ubuntu package
sources (package libcurl4-openssl-dev). Same results on both.
Big thanks to Daniel and everyone who has worked on libcurl. It's a pleasure
to use and the documentation is outstandingly good.
[1] https://curl.haxx.se/libcurl/c/multi-single.html
I'm curious if anyone else has other ideas for better dealing with this issue or if there is any interest in updating curl_multi_wait to indicate when it did not wait.

Joel

-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:
Daniel Stenberg
2018-03-20 22:50:23 UTC
Permalink
Post by Cunningham, Joel
curl_multi_wait() has been particularly hard to integrate with in a manner
which doesn¢t unnecessarily sleep and end up in a tight loop. What I¢ve
found is the main issue is a lack of information about if any waiting was
actually done.
Yeah.

When we added curl_multi_wait() to offer a simpler API, we clearly missed out
on exactly that detail and it unfortunately makes the function hard to use in
an effective way. And not that easy and convenient as I wanted it to be and
which was the original plan.

Let's say we would introduce an improved version of this function. How would
you propose its API would work to reduce the current annoyance as much as
possible?

curl_multi_wait_done_again() (yes, the name can also be discussed)
--
/ daniel.haxx.se
Cunningham, Joel
2018-03-20 23:14:19 UTC
Permalink
-----Original Message-----
Daniel Stenberg
Sent: Tuesday, March 20, 2018 5:50 PM
Subject: RE: Wait for a response using curl_multi_wait
Post by Cunningham, Joel
curl_multi_wait() has been particularly hard to integrate with in a
manner which doesn’t unnecessarily sleep and end up in a tight loop.
What I’ve found is the main issue is a lack of information about if
any waiting was actually done.
Yeah.
When we added curl_multi_wait() to offer a simpler API, we clearly missed
out on exactly that detail and it unfortunately makes the function hard to use
in an effective way. And not that easy and convenient as I wanted it to be
and which was the original plan.
Let's say we would introduce an improved version of this function. How
would you propose its API would work to reduce the current annoyance as
much as possible?
My thoughts are either:

1) Make the new API behave just poll/select, i.e. it can be used as a blocking construct even when there are no input FDs.

Given a background in network programming, my initial assumption when first encountering curl_multi_wait() was that it would behave in this way since it took FDs and had "wait" in the name. Only after reading the docs did I realize it was much more complicated than that.

I found a previous discussion about this on the mailing list [1] but it's not clear why the 'not waiting' behavior was preferred

2) Have the API behavior (not waiting in some cases) be the same, but return some indication of whether any waiting was done. Maybe this is through an additional parameter?

Optionally waiting seems to be the motivation behind the current API behavior [1], but I'm not sure I understand the use case of passing a timeout into curl_multi_wait() but then not wanting to actually wait that entire time. I would figure the user would use a 0 timeout in this case.

This second option would probably not help clarify the API for new users and they may still integrate in a non-performant manner, but it would be helpful for those already familiar with this issue.
curl_multi_wait_done_again() (yes, the name can also be discussed)
[1] https://curl.haxx.se/mail/lib-2012-10/0078.html

Joel

-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.haxx.se/mail/etiquette
Daniel Stenberg
2018-03-21 14:35:22 UTC
Permalink
Post by Cunningham, Joel
1) Make the new API behave just poll/select, i.e. it can be used as a
blocking construct even when there are no input FDs.
I decided to make an attempt and it turned out making a new
function called curl_multi_poll() that does exactly this requried a *minimal*
change:

https://github.com/curl/curl/pull/2415

There's no docs and no tests or anything but I'm interested in thoughts and
feedback. It basically acts exactly like curl_multi_wait, *but* if there's
nothing to wait for it will wait for the timeout period (or shorter if the
multi handle as a timeout set that expires before the given timeout).

If we are to add a new function, we might as well make sure that we correct
all the problems we can think of while at it!

An alternative design could be to add a curl_multi_setopt() option that
changes how curl_multi_wait() behaves, but I figure that is slightly more
subtle...
--
/ daniel.haxx.se
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.haxx.se/mai
Richard Copley
2018-03-21 18:24:46 UTC
Permalink
Post by Cunningham, Joel
1) Make the new API behave just poll/select, i.e. it can be used as a
Post by Cunningham, Joel
blocking construct even when there are no input FDs.
I decided to make an attempt and it turned out making a new function
https://github.com/curl/curl/pull/2415
There's no docs and no tests or anything but I'm interested in thoughts
and feedback. It basically acts exactly like curl_multi_wait, *but* if
there's nothing to wait for it will wait for the timeout period (or shorter
if the multi handle as a timeout set that expires before the given timeout).
If we are to add a new function, we might as well make sure that we
correct all the problems we can think of while at it!
An alternative design could be to add a curl_multi_setopt() option that
changes how curl_multi_wait() behaves, but I figure that is slightly more
subtle...
Probably a stupid question, but in the example I was talking about, where
does the socket returned by connect() fit in? After the first SYN packet
has been sent, and before the SYN/ACK arrives, can't curl wait on that
socket?
Daniel Stenberg
2018-03-22 15:16:05 UTC
Permalink
Post by Richard Copley
Probably a stupid question, but in the example I was talking about, where
does the socket returned by connect() fit in? After the first SYN packet has
been sent, and before the SYN/ACK arrives, can't curl wait on that socket?
Yes it can and it does. The typical no-socket gap in time is before that
socket is created, when libcurl resolves a host name in a separate thread.
--
/ daniel.haxx.se
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.haxx.se/mail/etique
Richard Copley
2018-03-22 17:09:26 UTC
Permalink
Post by Richard Copley
Probably a stupid question, but in the example I was talking about, where
Post by Richard Copley
does the socket returned by connect() fit in? After the first SYN packet
has been sent, and before the SYN/ACK arrives, can't curl wait on that
socket?
Yes it can and it does. The typical no-socket gap in time is before that
socket is created, when libcurl resolves a host name in a separate thread.
Oh, I had been assuming the resolve had happened synchronously, but I
didn't check.

I did some packet-sniffing. The results are in an earlier message in this
thread. If you have the time and energy, and if you can make any sense of
what I wrote, would you mind having a look at it? It looked to me as though
what you describe wasn't happening (i.e., curl wasn't waiting on that
socket), but I'm not really confident of that.
Richard Copley
2018-03-25 11:03:26 UTC
Permalink
Post by Richard Copley
Post by Richard Copley
Probably a stupid question, but in the example I was talking about, where
Post by Richard Copley
does the socket returned by connect() fit in? After the first SYN packet
has been sent, and before the SYN/ACK arrives, can't curl wait on that
socket?
Yes it can and it does. The typical no-socket gap in time is before that
socket is created, when libcurl resolves a host name in a separate thread.
Oh, I had been assuming the resolve had happened synchronously, but I
didn't check.
I did some packet-sniffing. The results are in an earlier message in this
thread. If you have the time and energy, and if you can make any sense of
what I wrote, would you mind having a look at it? It looked to me as though
what you describe wasn't happening (i.e., curl wasn't waiting on that
socket), but I'm not really confident of that.
I don't imagine anyone else was worried, but I kept trying to see why my
results seemed to contradict what Daniel said.
One problem was that my timestamps were inaccurate (and not just
imprecise). With better timestamps, the contradiction disappears (when the
SYN is on the wire, curl_multi_wait doesn't indicate a no-socket gap).
I won't bore you with more details.
Thanks!
Daniel Stenberg
2018-03-25 11:16:37 UTC
Permalink
Post by Richard Copley
I don't imagine anyone else was worried, but I kept trying to see why my
results seemed to contradict what Daniel said. One problem was that my
timestamps were inaccurate (and not just imprecise). With better timestamps,
the contradiction disappears (when the SYN is on the wire, curl_multi_wait
doesn't indicate a no-socket gap). I won't bore you with more details.
Haha, well I'm glad you figured it out and that it wasn't a problem in
libcurl. Thanks for the follow-up!
--
/ daniel.haxx.se
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-libra
Richard Alcock
2018-03-21 18:44:30 UTC
Permalink
Post by Daniel Stenberg
Post by Cunningham, Joel
1) Make the new API behave just poll/select, i.e. it can be used as a
blocking construct even when there are no input FDs.
An alternative design could be to add a curl_multi_setopt() option that
changes how curl_multi_wait() behaves, but I figure that is slightly more
subtle...
+1 on making it like poll/select. As recent new user to
curl_multi_wait the example in the doc confused the hell out of me.
Ended up copying it "just because it was there" which is entirely
unsatisfying!

Adding a new option seems it'll just make the doc for curl_multi_wait
even more confusing. So unless there are other strong reasons not to
add a new function, I like curl_multi_poll.

Richard
Post by Daniel Stenberg
Post by Cunningham, Joel
1) Make the new API behave just poll/select, i.e. it can be used as a
blocking construct even when there are no input FDs.
I decided to make an attempt and it turned out making a new function called
https://github.com/curl/curl/pull/2415
There's no docs and no tests or anything but I'm interested in thoughts and
feedback. It basically acts exactly like curl_multi_wait, *but* if there's
nothing to wait for it will wait for the timeout period (or shorter if the
multi handle as a timeout set that expires before the given timeout).
If we are to add a new function, we might as well make sure that we correct
all the problems we can think of while at it!
An alternative design could be to add a curl_multi_setopt() option that
changes how curl_multi_wait() behaves, but I figure that is slightly more
subtle...
--
/ daniel.haxx.se
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.haxx.se/mail/etiquette.html
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl
Richard Copley
2018-03-21 18:59:45 UTC
Permalink
Post by Richard Alcock
Post by Daniel Stenberg
Post by Cunningham, Joel
1) Make the new API behave just poll/select, i.e. it can be used as a
blocking construct even when there are no input FDs.
An alternative design could be to add a curl_multi_setopt() option that
changes how curl_multi_wait() behaves, but I figure that is slightly more
subtle...
+1 on making it like poll/select. As recent new user to
curl_multi_wait the example in the doc confused the hell out of me.
Ended up copying it "just because it was there" which is entirely
unsatisfying!
I hate to be negative, and I don't want to start a pointless argument. I
think I've already said that I admire this software very much.
So, please forgive me for saying it, and accept my apologies if I'm being
stupid, but ...

In the PR, curl_multi_poll isn't polling or selecting. It just moves the
sleep from user code into the library, where the user has no control over
it.
Did I miss something?
Daniel Stenberg
2018-03-22 15:20:33 UTC
Permalink
Post by Richard Copley
I hate to be negative, and I don't want to start a pointless argument.
Negative is good too. If you can tell us why this is a bad idea, I think
that's valuable information! If it truly is a bad idea we shouldn't do it.
Post by Richard Copley
In the PR, curl_multi_poll isn't polling or selecting. It just moves the
sleep from user code into the library, where the user has no control over
it. Did I miss something?
Clearly: The existing curl_multi_wait() function already polls all sockets
libcurl knows for the multi handle plus your set of custom file descriptors.

The new thing here would that the discussed new function would work more
similar to how poll() works and actually sleep for a while if there's nothing
to do on any file descriptor. The "for a while" part being what is usually
tricky for applications to figure out and deal with correctly.
--
/ daniel.haxx.se
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.haxx.se
Cunningham, Joel
2018-03-21 18:53:18 UTC
Permalink
-----Original Message-----
Daniel Stenberg
Sent: Wednesday, March 21, 2018 9:35 AM
Subject: RE: Wait for a response using curl_multi_wait
Post by Cunningham, Joel
1) Make the new API behave just poll/select, i.e. it can be used as a
blocking construct even when there are no input FDs.
I decided to make an attempt and it turned out making a new function called
curl_multi_poll() that does exactly this requried a *minimal*
https://github.com/curl/curl/pull/2415
There's no docs and no tests or anything but I'm interested in thoughts and
feedback. It basically acts exactly like curl_multi_wait, *but* if there's
nothing to wait for it will wait for the timeout period (or shorter if the multi
handle as a timeout set that expires before the given timeout).
Great, I like the name! I'll take a look at the PR and post any feedback
If we are to add a new function, we might as well make sure that we correct
all the problems we can think of while at it!
An alternative design could be to add a curl_multi_setopt() option that
changes how curl_multi_wait() behaves, but I figure that is slightly more
subtle...
Ahh, I didn't think of a setopt. That's a better way to control the behavior without breaking the API, but I prefer the simplicity of curl_multi_poll 😊

Joel

-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https
Loading...