Opened 13 years ago

Closed 11 years ago

#7670 closed bug (fixed)

Sniffing rule for text/html is wrong

Reported by: pulkomandy Owned by: axeld
Priority: normal Milestone: R1
Component: Preferences/FileTypes Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

The current sniffing rule for html is :

0.40  [0:64]( -i "<HTML" | "<HEAD" | "<TITLE" | "<BODY" | "<TABLE" | "<!--" | "<META" | "<CENTER")

This looks for bits of html in the 64 first bytes of the file.

However, valid html start with a doctype, which takes more than 64 bytes. So the detection will fail on most html files. Checking for title, body, table, meta and center seems barely useful. Checking for the doctype must be done carefully to not mistakenly accept other xml files. Looking for <!DOCTYPE HTML may work.

Change History (1)

comment:1 by pulkomandy, 11 years ago

Resolution: fixed
Status: newclosed

The rule now looks in the first 512 bytes, with better results.

Note: See TracTickets for help on using tickets.