Incorrect documentation of CURLOPT

Discussion:

Incorrect documentation of CURLOPT_HEADERFUNCTION

Guido Berhoerster

2018-02-15 15:54:27 UTC

The header callback will be called once for each header and only
complete header lines are passed on to the callback. Parsing headers
is very easy using this.

This seems to be completely wrong, headers are chopped into chunks
of CURL_MAX_WRITE_SIZE by Curl_client_chop_write(). The only way to
distinguish a complete header seems to be a check whether the last
character is a 0x0a.
I don't think that most client code is aware of this, naive
implementations e.g. check for either a single 0x0a or a 0x0d 0x0a in
order to delimit responses. IMO this makes dealing with headers more
tedious than it already is, so I think it would be better to fix the
code to match the implementation rather than the other way around.

A complete HTTP header that is passed to this function can be up to
CURL_MAX_HTTP_HEADER (100K) bytes.

This also doesn't seem to be true, with libcurl 7.55.1 headers up to
~ 190K are read although I haven't checked where that limit actually
comes from.

--
Guido Berhoerster
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquett

Daniel Stenberg

2018-02-15 22:10:56 UTC

Permalink

The header callback will be called once for each header and only
complete header lines are passed on to the callback. Parsing headers
is very easy using this.

The documentation is right here, the code is wrong. The code is supposed to
always deliver a full header or none at all. This is a bug.

This also doesn't seem to be true, with libcurl 7.55.1 headers up to ~ 190K
are read although I haven't checked where that limit actually comes from.

Please let me know how to reproduce that, I can't. I just wrote up a test case
with a header line that is longer than 102400 bytes and it ends up with:

curl: (27) Avoided giant realloc for header (max is 102400)!

--
/ daniel.haxx.se
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.

Guido Berhoerster

2018-02-15 22:39:05 UTC

Permalink

Post by Daniel Stenberg

The header callback will be called once for each header and only
complete header lines are passed on to the callback. Parsing headers
is very easy using this.

The documentation is right here, the code is wrong. The code is supposed to
always deliver a full header or none at all. This is a bug.

Post by Daniel Stenberg

This also doesn't seem to be true, with libcurl 7.55.1 headers up to ~ 190K
are read although I haven't checked where that limit actually comes from.

Please let me know how to reproduce that, I can't. I just wrote up a test case
curl: (27) Avoided giant realloc for header (max is 102400)!

That happens with the curl tool, but using the attached test case it'll read
headers for up to 191K and pass them to the header function in 16K chunks:

$ ./http-server.sh &
[1] 14366
$ ./header-write-test
GET / HTTP/1.1
Host: 127.0.0.1:8000
Accept: */*

received 17 bytes
received 16384 bytes
received 16384 bytes
received 16384 bytes
received 16384 bytes
received 16384 bytes
received 16384 bytes
received 16384 bytes
received 16384 bytes
received 16384 bytes
received 16384 bytes
received 16384 bytes
received 15362 bytes
received 19 bytes
received 2 bytes

--
Guido Berhoerster

Daniel Stenberg

2018-02-15 22:55:20 UTC

Permalink

Post by Guido Berhoerster
That happens with the curl tool, but using the attached test case it'll read
$ ./http-server.sh &
[1] 14366
$ ./header-write-test

Strange. It doesn't for me:

$ sh http-server.sh &
[1] 19515
$ ./header-write-test
GET / HTTP/1.1
Host: 127.0.0.1:8000
Accept: */*

received 17 bytes
debugit: curl_easy_perform: Out of memory

BTW, I also created an issue out of the first bug:
https://github.com/curl/curl/issues/2314 and I submitted a PR for a new test
that verifies that curl detects "too long" HTTP headers:
https://github.com/curl/curl/pull/2315

--
/ daniel.haxx.se
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:

Daniel Stenberg

2018-02-16 07:58:22 UTC

Permalink

Update: The PR #2315 CI test failures show that this check is at least not
consistently working:

https://travis-ci.org/curl/curl/builds/342103314?utm_source=github_status&utm_medium=notification

Most curious.

--
/ daniel.haxx.se
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: h

Daniel Stenberg

2018-02-16 09:27:36 UTC

Permalink

Post by Daniel Stenberg
Most curious.

I figured out the bug in the max header length detection and PR #2315 now
contains that fix [1].

I submitted PR #2316 to make sure the header callback always gets the full
header as documented. Includes a test case to verify it [2].

[1] = https://github.com/curl/curl/pull/2315
[2] = https://github.com/curl/curl/pull/2316

--
/ daniel.haxx.se
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://cu

Guido Berhoerster

2018-02-16 07:54:56 UTC

Permalink

Post by Daniel Stenberg

Post by Guido Berhoerster
That happens with the curl tool, but using the attached test case it'll read
$ ./http-server.sh &
[1] 14366
$ ./header-write-test

$ sh http-server.sh &
[1] 19515
$ ./header-write-test
GET / HTTP/1.1
Host: 127.0.0.1:8000
Accept: */*
received 17 bytes
debugit: curl_easy_perform: Out of memory

Strange, indeed. I just downloaded and built 7.58.0 from source and
tried again and I can still reproduce it, this is on Ubuntu
17.10/amd64.

Post by Daniel Stenberg
https://github.com/curl/curl/issues/2314 and I submitted a PR for a new test
https://github.com/curl/curl/pull/2315

OK, thanks for the quick response!

--
Guido Berhoerster
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.hax

Guido Berhoerster

2018-02-16 09:50:43 UTC

Permalink

Post by Daniel Stenberg

Post by Daniel Stenberg
Most curious.

I figured out the bug in the max header length detection and PR #2315 now
contains that fix [1].
I submitted PR #2316 to make sure the header callback always gets the full
header as documented. Includes a test case to verify it [2].
[1] = https://github.com/curl/curl/pull/2315
[2] = https://github.com/curl/curl/pull/2316

Thanks again for the quick fix.

One more issue I ran into when dealing with response headers is that
"only complete header lines are passed on to the callback" does not
hold true for folded headers. While RFC7230 does deprecate them, it
also requires clients to accept and normalize them before
interpretation. I don't know wheter you consider that to be in scope
for (lib)curl, if not, it would be good to mention this in the
CURLOPT_HEADERFUNCTION documentation.

--
Guido Berhoerster
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-librar

Daniel Stenberg

2018-02-16 09:55:05 UTC

Permalink

One more issue I ran into when dealing with response headers is that "only
complete header lines are passed on to the callback" does not hold true for
folded headers. While RFC7230 does deprecate them, it also requires clients
to accept and normalize them before interpretation. I don't know wheter you
consider that to be in scope for (lib)curl, if not, it would be good to
mention this in the CURLOPT_HEADERFUNCTION documentation.

So what happens when libcurl gets one?

They're deprecated in 7230 since they are and were virtually unused. I don't
think adding support for them now makes much sense since we've managed pretty
good without them for decades!

--
/ daniel.haxx.se
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.haxx.s

Guido Berhoerster

2018-02-16 10:12:55 UTC

Permalink

Post by Daniel Stenberg

So what happens when libcurl gets one?

Then the callback does not get a complete header and it's up to the
libcurl consumer to merge it with the previous header. Which is
fine, since one usually needs to do other normalization as well,
it's just not obvious from the documentation.

Post by Daniel Stenberg
They're deprecated in 7230 since they are and were virtually unused. I don't
think adding support for them now makes much sense since we've managed pretty
good without them for decades!

--
Guido Berhoerster
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Eti

Daniel Stenberg

2018-02-16 10:28:02 UTC

Permalink

Then the callback does not get a complete header and it's up to the libcurl
consumer to merge it with the previous header. Which is fine, since one
usually needs to do other normalization as well, it's just not obvious from
the documentation.

Thanks. Then how about adding the following section to the
CURLOPT_HEADERFUNCTION man page to clafify?

LIMITATIONS

libcurl does not unfold HTTP "folded headers" (deprecated since RFC 7230). A
folder header is a header that continues on a subsequent line and starts with
a whitespace. Such folds will be passed to the header callback as a separate
one, although strictly it is just a continuation of the previous line.

--
/ daniel.haxx.se
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.haxx.se/m

Guido Berhoerster

2018-02-16 11:06:35 UTC

Permalink

Post by Daniel Stenberg

Then the callback does not get a complete header and it's up to the libcurl
consumer to merge it with the previous header. Which is fine, since one
usually needs to do other normalization as well, it's just not obvious from
the documentation.

Thanks. Then how about adding the following section to the
CURLOPT_HEADERFUNCTION man page to clafify?
LIMITATIONS
libcurl does not unfold HTTP "folded headers" (deprecated since RFC 7230). A
folder header is a header that continues on a subsequent line and starts with
a whitespace. Such folds will be passed to the header callback as a separate
one, although strictly it is just a continuation of the previous line.

Thanks, I think that would be a useful addition.

--
Guido Berhoerster
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://cu