Sniffing rule for text/html is wrong
|Reported by:||pulkomandy||Owned by:||axeld|
|Has a Patch:||no||Platform:||All|
The current sniffing rule for html is :
0.40 [0:64]( -i "<HTML" | "<HEAD" | "<TITLE" | "<BODY" | "<TABLE" | "<!--" | "<META" | "<CENTER")
This looks for bits of html in the 64 first bytes of the file.
However, valid html start with a doctype, which takes more than 64 bytes. So the detection will fail on most html files. Checking for title, body, table, meta and center seems barely useful. Checking for the doctype must be done carefully to not mistakenly accept other xml files. Looking for
<!DOCTYPE HTML may work.