Opened 8 years ago

Closed 7 years ago

#13625 closed bug (fixed)

Webpositive crashes on URL

Reported by: vidrep Owned by: pulkomandy
Priority: normal Milestone: Unscheduled
Component: Applications/WebPositive Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

hrev51307 x86_64

This URL causes Web+ to reliably crash:

https://thedigitaltheater.com/index.php/2015/09/09/dts-the-digital-experience-1993/

Debug report attached

Attachments (1)

WebPositive-1825-debug-25-07-2017-03-05-18.report (38.3 KB ) - added by vidrep 8 years ago.

Download all attachments as: .zip

Change History (9)

comment:1 by vidrep, 8 years ago

patch: 01

comment:2 by accessays, 7 years ago

I've managed to narrow it down to two urls:

  1. https://ir-na.amazon-adsystem.com/e/ir?t=thedigitalt02-20&l=ur2&o=1
  2. https://ir-na.amazon-adsystem.com/e/ir?l=w41&t=thedigitalt02-202&o=1&cb=1501216857775

Both are Amazon's ad system urls, probably used for tracking. Problem comes from the fact that the web server uses LF (\n) instead of CRLF(\r\n) to separate lines in headers, and that causes Web+ to crash (uint overflow and thus segment violation).

Looking at the RFC https://tools.ietf.org/html/rfc7230#section-3.5

...
Although the line terminator for the start-line and header fields is
the sequence CRLF, a recipient MAY recognize a single LF as a line
terminator and ignore any preceding CR.
...

Which hints that such behavior may be considered normal. Firefox and Python Requests (Linux) load the page without errors, tho headers get messed up - instead of Connection header is called nnCoection.

comment:3 by pulkomandy, 7 years ago

This is a MAY, so basically it reads "be careful not to send bare LF in the middle of a header line".

Still, we should try to not crash. I'll change the code to allow just \n then...

Thanks for the investigation. It would be nice if it was possible to report the bug to Amazon.

comment:4 by pulkomandy, 7 years ago

I cannot reproduce the crash with these URL.

Moreover, the GetLine method used for parsing HTTP headers is already handling the lack of \r appropriately: http://cgit.haiku-os.org/haiku/tree/src/kits/network/libnetapi/NetworkRequest.cpp#n91

So the problem must be elsewhere?

It's possible to build the HTTP library in debug mode, and in that case it will trace all operations it performs to standard output. This does not reveal any problem with parsing the headers:

    HTTPS: Resolving https://ir-na.amazon-adsystem.com/e/ir?l=w41&t=thedigitalt02-202&o=1&cb=1501216857775
    HTTPS: Hostname resolved to: 176.32.96.208:443
    HTTPS: Connection to ir-na.amazon-adsystem.com on port 443.
    HTTPS: Connection opened, sending request.
--> HTTPS: Host: ir-na.amazon-adsystem.com
--> HTTPS: Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
--> HTTPS: Accept-Encoding: gzip
--> HTTPS: Connection: close
--> HTTPS: User-Agent: Mozilla/5.0 (Macintosh; Intel Haiku R1 x86) AppleWebKit/602.1.19 (KHTML, like Gecko) WebPositive/1.2 Version/8.0 Safari/602.1.19
    HTTPS: Request sent.
    HTTPS: Status line received: Code 200 ()
<-- HTTPS: Content-Type: image/gif
<-- HTTPS: Connection: close
<-- HTTPS: Content-Length: 42
<-- HTTPS: Cache-Control: no-cache
<-- HTTPS: Pragma: no-cache
    HTTPS: 5 headers and 0 bytes of data remaining

(unless I'm not served the page from the same server and only some servers have the problem?)

Last edited 7 years ago by pulkomandy (previous) (diff)

in reply to:  3 comment:5 by accessays, 7 years ago

Replying to pulkomandy: Here is what I get:

    HTTPS: Resolving https://ir-na.amazon-adsystem.com/e/ir?t=thedigital02-20&l=ur2&o=1
    HTTPS: Hostname resolved to: 72.21.215.147:443
    HTTPS: Connection to ir-na.amazon-adsystem.com on port 443.
    HTTPS: Connection opened, sending request.
--> HTTPS: Host: ir-na.amazon-adsystem.com
--> HTTPS: Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
--> HTTPS: Accept-Encoding: gzip
--> HTTPS: Connection: close
--> HTTPS: User-Agent: Mozilla/5.0 (Macintosh; Intel Haiku R1 x86) AppleWebKit/602.1.19 (KHTML, like Gecko) WebPositive/1.2 Version/8.0 Safari/602.1.19
    HTTPS: Request sent.
    HTTPS: Status line received: Code 200 ()
<-- HTTPS: Content-Type: image/gif
<-- HTTPS: Connection: close
<-- HTTPS: Content-Length: 42
<-- HTTPS: Cache-Control: no-cache
<-- HTTPS: Pragma: no-cached

And then the segment violation happens. What seemingly happens, is that in _GetLine() characterIndex is assumed to never be 0 when size of the input buffer is not zero, but size the headers end with LF LF instead of CRLF CRLF, there are no characters between the first and the second separators (i.e. no CR after first LF) so characterIndex becomes 0 and that later causes it to overflow. Here are the values from fInputBuffer->fImpl at the time of the crash:

fBuffer: HTTP/1.1 200 \nContent-Type: image/gif\nConnection: close\nContent-Length: 42\nCache-Control: no-cache\nPragma: no-cache\n\nGIF89a...
fBufferSize: 318
fBufferStart: 117
fBufferEnd: 159

fBuffer has some zeroes at the end (GIF itself) that are not included.

comment:6 by pulkomandy, 7 years ago

You're right, there was a problem there. Fixed in hrev51314.

It did not crash for me because I run 32-bit haiku, so it would just access array[-1] here, which is most likely still within the malloc()ed area. But on 64bit, bad things would happen.

Can you confirm it does not crash anymore?

Last edited 7 years ago by pulkomandy (previous) (diff)

comment:7 by accessays, 7 years ago

Yep, no crashing on x86_64 with that patch. OP's URL loads without problems.

comment:8 by pulkomandy, 7 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.