#11800 closed bug (invalid)
WebPositive inconveniently strict
Reported by: | donn | Owned by: | pulkomandy |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Applications/WebPositive | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | #12163 | |
Platform: | All |
Description
WebPositive won't accept invalid HTML, where Safari for example manages fine with the same file. Per suggestion on the forum, I'm posting an example that I received in email from Expedia. That's where I encounter this problem, when I invoke WebPositive on an HTML disk file that I have received via email. They clearly aren't valid HTML - this one starts with a 1x1 img and then tries to follow that with doctype - but like I say, Safari renders them perfectly without a whimper.
Because I have this problem only with email, I wondered if there might be some different standard for file: vs http:.
Attachments (1)
Change History (11)
by , 10 years ago
Attachment: | mail009.html added |
---|
comment:1 by , 10 years ago
... and it turns out that the same file is handled fine when it's encountered online, so it appears that the problem does indeed involve different standards for file: vs. http: content. I run into it when I read my email on Haiku, but it would also affect HTML documention, etc.
comment:2 by , 10 years ago
I think the main problem is that the file doctype tells it's XHTML. We use our mime sniffing rules to set the MIME type to that, but it would seem the file is actually HTML (which is parsed in a more relaxed way). When getting files online, the server provides a MIME type so we don't use the sniffing. It should be possible to get the MIME type from the e-mail header and use that, instead of trying to guess. If the mail is set to use the more correct "HTML" MIME type it would work better. You can also force it by editing the file attributes.
In any case, probably Web+ should fallback to HTML when parsing as XHTML fails.
comment:3 by , 10 years ago
I am pleased to confirm that I can get it to work by having my email program set file type to text/html.
comment:4 by , 10 years ago
I'd be tempted to change this to a bug against the mime sniffing process. Haiku might take a tip from the httpd server, which gets better results for everyone by classifying everything as "text/html". At least where the file name is .html. I guess in practice this is a broader category that includes the various specifications.
For example if people search for html files, how likely is it that they will care which specification the files claim to adhere to?
Conversely, for files that really should have this distinct application/xhtml-xml type, probably should have a different icon as well, right?
comment:5 by , 10 years ago
I have introduced the separate MIME type because it is required by some of the tests in the WebKit test suite. Some other pages could be affected, the parsing rules for html and xhtml are different and incompatible. In your case it's HTML failing to render because it's parsed as XHTML, but it could be the reverse (because of extra features in XHTML like the use of XML namespaces). So it is important to have both types for proper operation of the web browser.
Now, the "sniffing" detection for these is difficult because they can look quite close to each other (especially if people use the wrong doctype in their documents as in the sample file here…). So the way to handle this is to try getting the information from elsewhere. When the file comes from an e-mail, the native Mail application could get it from the mail headers, which specify a website. When the page comes from an HTTP server, Web+ will use the HTTP header Content-Type. When downloading a file, it should probably also store the content type into the mime type attribute of the file. This way no sniffing would be needed.
comment:6 by , 10 years ago
Keep both types, for sure. I'm just saying that the sniffing should always return "text/html", just like httpd always says "text/html" when it serves up a .html file, regardless of what's in it.
We can deal with it at the application level by applying the type we get from email or httpd, as you say. We're doing that because email & httpd have got it right. They haven't searched the contents and discovered evidence of xhtml etc. They call it all text/html because that's what we call an .html file. The sniffing detection is not only difficult, it's inherently wrong.
comment:8 by , 9 years ago
Blocking: | 12163 added |
---|
comment:9 by , 6 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
The file claims to be XHTML but is invalid as it; this isn't really our problem.
In fact, current Firefox (on Windows) refuses to open it and shows an XML error, so this seems to be fine behavior to me.
comment:10 by , 5 years ago
Milestone: | R1 |
---|
Remove milestone for tickets with status = closed and resolution != fixed
HTML mail from expedia.com