Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#9193 closed bug (fixed)

Sniffer rule for video/mp2t too simple

Reported by: dsuden Owned by: jscipione
Priority: normal Milestone: R1
Component: Preferences/FileTypes Version: R1/alpha4.1
Keywords: Cc:
Blocked By: Blocking: #7935
Platform: All

Description (last modified by humdinger)

[See comment 2 to reveal the mystery]

I had a flash video (downloaded with UberTuber) sitting on my desktop, and then deleted it, so it was no longer on the desktop. Later that day, I saved a text file from Pe to the desktop. I didn't look at it at the time, but this morning I noticed my text document had the same icon as the flash video I had trashed. I was able to use filetypes to change the document back to a text file, which allowed it to again be opened normally in Pe, and its text looked fine. But that seemed like a strange enough turn-of-events that I thought I'd better report it. Note, the trashed video file also had a completely different filename than the text file on the desktop.

Attachments (1)

0001-Remove-MPEG2-transport-stream-mp2t-sniffer-rule.patch (905 bytes ) - added by jscipione 12 years ago.
Patch that removes the mp2t sniffer rule that is causing the problem reported by this bug.

Download all attachments as: .zip

Change History (13)

comment:1 by dsuden, 12 years ago

I just saved another text file to the desktop, and it happened again. This time I double-checked the filetype right away and it's not a flash as I thought, but an "MPEG-2 transport stream." So the deleted flash file might be a red herring. Still...how strange to have Pe saving files as video!

comment:2 by humdinger, 12 years ago

Component: User InterfacePreferences/FileTypes
Owner: changed from stippi to axeld

Hold on, let me contact the spirits beyond... <insert dark atmospheric music>

I get a "G".... Does the contents of your text file start with a "G" by any chance? I bet it does. Culprit is the naive sniffer rule for video/mp2t filetypes. It just checks if the first byte is 0x47. Apparently transport streams do always beginn with 0x47, but as that's just a "G", anything starting with that qualifies.
ReadMe.IntroductionToHaiku in the root of the repository suffers that same issue.

You can "solve" this for the time being by removing the rule in the Filetypes preferences.

Can anyone come up with a better sniffer rule? It looks like after that first sync byte, there isn't much to analyze (s. wikipedia), or is there?

comment:3 by humdinger, 12 years ago

Description: modified (diff)
Summary: Filetype changed on its ownSniffer rule for video/mp2t too simple

comment:4 by axeld, 12 years ago

Indeed, I think the format spec is too weak to be sniffed, at least with a high probability of failure. We could remove the sniffer, and only detect the type by extension.

We could also try to make its support level lower, and demand that the bytes following are not ASCII. Maybe this would be good enough already, but it might be the source of problems in the future again, so probably not worth pursuing.

by jscipione, 12 years ago

Patch that removes the mp2t sniffer rule that is causing the problem reported by this bug.

comment:5 by jscipione, 12 years ago

patch: 01

comment:6 by jscipione, 12 years ago

Owner: changed from axeld to jscipione
Status: newassigned

comment:7 by jscipione, 12 years ago

Resolution: fixed
Status: assignedclosed

Fixed in hrev44886 by removing the sniffer rule

comment:8 by diver, 12 years ago

Blocking: 7935 added

comment:9 by Giova84, 12 years ago

hrev48250

I have found again this bug: text files which start with a "g" character are mistakenly marked as video/mp2t. This occurs on BFS, too.

in reply to:  2 comment:10 by phoudoin, 12 years ago

Replying to humdinger:

Can anyone come up with a better sniffer rule? It looks like after that first sync byte, there isn't much to analyze (s. wikipedia), or is there?

Our sniffer rule syntax don't support a repeat mode, otherwise a way better rule for MPEG files would be simply to check the sync 0x47 ('g') byte every 188 bytes, not just first one.

While a repeat pattern could be hard to implement, what could be done quickly is to support combining patterns with & (AND), not just with | (OR) as today. That would make this rule :

"0.9 ('g' & [188] 'g' & [376] 'g' & [564] 'g')"

...detecting better MP2+ content.

Last edited 12 years ago by phoudoin (previous) (diff)

comment:11 by X512, 12 years ago

Mayble having higher priority for plain text detection solve problem?

in reply to:  11 comment:12 by phoudoin, 12 years ago

Replying to X512:

Mayble having higher priority for plain text detection solve problem?

That would be a trick. The issue is that our current sniffer rule syntax is not enough flexible to express a type identification rule for format that can be identified only by repeated pattern, not just some signature at begining.

Note: See TracTickets for help on using tickets.