Opened 7 years ago

Last modified 2 months ago

#10067 new enhancement

Extension should determine file type with same sniffer rule

Reported by: humdinger Owned by: bonefish
Priority: normal Milestone: R1
Component: Servers/registrar Version: R1/Package Management
Keywords: Cc:
Blocked By: Blocking: #10917
Platform: All

Description

This is hrev46173.

If you have several file types with the same sniffer rule, the extension should determine what mime type a file is assigned.

For example, RAR archives and CBR (a comic book archive, consisting of rar-archived images) both have the rule '("Rar!")'. Naturally, since they are both RAR archives. In this case, the file extension should decide if it's a .rar or a .cbr.

Change History (5)

comment:1 by axeld, 7 years ago

Summary: Extention should determine file type with same sniffer ruleExtension should determine file type with same sniffer rule

comment:2 by diver, 6 years ago

Blocking: 10917 added

comment:3 by Giova84, 4 years ago

Hi, I have a suggestion for the implementing of this enhancement.

For those which aren't aware, there is an Haiku app called Filer (Humdinger certainly know this app ;-) ): https://github.com/HaikuArchives/Filer

Well, let's see the readme of this app:

Filer is an automatic file organizer. It takes the files it's opened with or that are dropped on it and moves, renames, copies or does all sorts of other things with them according to rules created by the user. Filer is accompanied by AutoFiler. Instead of working on a set of files provided by the user, it can be started (automatically with Haiku) to monitor certain folders and deal with new files appearing there according to the user-defined rules.


In practice, the "Autofiler" element, is able to monitor folders on the system and automatically performs a bunch of actions, also including shell commands; for some parts it acts like a sort of registrar.

The following is an example on my system:

As Humdinger stated

RAR archives and CBR (a comic book archive, consisting of rar-archived images) both have the rule '("Rar!")'. Naturally, since they are both RAR archives. In this case, the file extension should decide if it's a .rar or a .cbr.

Well, on my system I manage *.cbr files (I can open them using a QT *cbr and *cbz reader), and when I place them in my system (inside of certain folders), AutofileFiler, thanks to Filer, will automatically launch the following commands on such files, which end with the *.cbr and *.cbr extension

settype -t application/x-cbr
settype -t application/x-cbz

http://s33.postimg.org/6fb28674v/Autofiler_CBR_Rule.png

In this way I finally avoid the confusion with *.zip and *.rar archives (because if i set the same sniffing rule for *.cbr and *.cbz, zip and rar files will be mistakenly marked as cbr and cbz). In facts now I lowered the value of the sniffing rule:

Rule for *.cbr
0.10 ("Rar!")
Rule for *.cbz
0.10 ("PK")

But thanks to Filer/AutoFiler, these files are properly recognized also with the lower value "0.10"

So: would be the case to grab/borrow the code from Filer/AutoFiler and implement this feature inside the registrar?

Version 0, edited 4 years ago by Giova84 (next)

comment:4 by humdinger, 4 years ago

You manually force-set the correct MIME type with the "settype" command according to the file extension. I'm afraid, AFAIKS, there is nothing in Filer that can be re-used for the registrar.

comment:5 by CodeforEvolution, 2 months ago

In reality, to make this much more simple, we can go with the BeOS route and have the registrar actually attempt to scan a file's extension first. This would avoid problems with rar and cbr files, in addition to cases where source code files (which are just text files with no standard format) are being mis-mimetyped as just plain text files.

According to https://birdhouse.org/beos/bible/exc_filetype.html:

Assigning a MIME Type Where There Is None

When a new file arrives on your system without a MIME type (as happens when bringing files over to BeOS from other operating systems), the Tracker and the Registrar work together to assign it one.

The Registrar's first recourse is to look for an extension on the end of the filename, like .jpg, .txt, or .html. If it finds one, it checks the FileTypes database to see whether you've connected that extension with any particular filetype. For example, you may have used the Extensions section of FileTypes to declare that files ending in .html were likely to be HTML documents, and that they should inherit the text/html filetype.

If no extension is found, the Tracker will actually read a small portion of the file with a "sniffer." If it encounters plain text, it will assume that this is a text document and give it the appropriate MIME type. A similar process occurs with GIFs, WAVs, and other common filetypes. Because the extensions technique is more likely to be accurate, it's run first. Assuming you've set up a few common extensions in your FileTypes database, BeOS can guess a file's type accurately the vast majority of the time, with the vast majority of files.

Note: See TracTickets for help on using tickets.