Opened 5 years ago

Last modified 5 years ago

#11591 new bug

C++ source file is identified as HTML

Reported by: waddlesplash Owned by: bonefish
Priority: normal Milestone: R1
Component: Servers/registrar Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

As in title. The source file that has this problem is attached.

This probably happens because it has some HTML tag names in it...

Attachments (1)

HTSearchParser.cpp (5.7 KB ) - added by waddlesplash 5 years ago.
File that causes the issue.

Download all attachments as: .zip

Change History (5)

by waddlesplash, 5 years ago

Attachment: HTSearchParser.cpp added

File that causes the issue.

comment:1 by pulkomandy, 5 years ago

Yes, any file with "<title" in the first 512 chars is HTML with a priority of 0.4. The source code rule identifies files starting with "" or "/*", or having a #include or #ifdef in the first 32 chars, but with a priority of only 0.20.

MIME sniffing can't always make a perfect guess, and these two rules are probably the fuzzier ones. You can try to move the "<title" tag further down in your file so it isn't in the first 512 chars anymore.

I'm not sure if fixing the sniffing rules is possible, do you have an idea what could be done?

comment:2 by diver, 5 years ago

Milestone: R1/beta1R1

Certainly not beta1 blocker.

comment:3 by waddlesplash, 5 years ago

Can we give more weight to anything that has /* and #ifdef in the first 64 chars? Because the file obviously has far more C/C++ keywords than HTML ones.

comment:4 by tangobravo, 5 years ago

Is there no weight at all given to the file extension? To me that feels like a much more likely indicator of the type than having "<title" in the first 512 chars.

Note: See TracTickets for help on using tickets.