Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect identification for JS content starting with comments #86

Closed
tzabaman opened this issue Jan 18, 2023 · 1 comment
Closed

Incorrect identification for JS content starting with comments #86

tzabaman opened this issue Jan 18, 2023 · 1 comment

Comments

@tzabaman
Copy link

I am opening this issue for discussion due to some inconsistent behaviour, coming from ActiveStorage #identify method...

There is an inconsistent behaviour in method Marcel::MimeType.for concerning (at least) javascript files starting with comments and given specific declared types (text/javascript or application/x-javascript). For example:

Marcel::MimeType.for("//Comment\nconsole.log('test');", name: 'sample.js', declared_type: 'application/javascript')
# => application/javascript --> Correct

Marcel::MimeType.for("console.log('test');", name: 'sample.js', declared_type: 'text/javascript')
# => text/javascript --> Correct

Marcel::MimeType.for("//Comment\nconsole.log('test');", name: 'sample.js', declared_type: 'text/javascript')
# => text/plain --> Inconsistent (Incorrect?)

Marcel::MimeType.for("//Comment\nconsole.log('test');", name: 'sample.js', declared_type: 'application/x-javascript')
# => text/plain --> Inconsistent (Incorrect?)

Of course, the explanation is that on (generated) file lib/marcel/tables.rb there are definitions only for type application/javascript so the above behaviour is somehow expected.

The thing is that on file data/tika.xml, we have reference for types text/javascript and application/x-javascript, as they are described as "alias" of application/javascript.

marcel/data/tika.xml

Lines 335 to 340 in 8e28563

<mime-type type="application/javascript">
<alias type="application/x-javascript"/>
<alias type="text/javascript"/>
<sub-class-of type="text/plain"/>
<_comment>JavaScript Source Code</_comment>
<glob pattern="*.js"/>

So my question is, should we change the script/generate_tables.rb script in order to also take into consideration the aliases, when adding values on constants TYPE_EXTS and TYPE_PARENTS?? Something like:

TYPE_EXTS = {
  ...
  'application/javascript' => %w(js),
  'application/x-javascript' => %w(js),
  'text/javascript' => %w(js),
  ...
}

TYPE_PARENTS = {
    ...
    'application/javascript' => %w(text/plain),
    'application/x-javascript' => %w(text/plain),
    'text/javascript' => %w(text/plain),
    ...
}

Thanks!

@jeremy
Copy link
Member

jeremy commented Mar 6, 2024

Alias support added in #101. Returning the canonical application/javascript type is what should be expected for all of the above.

On current main:

>> Marcel::MimeType.for("//Comment\nconsole.log('test');", name: 'sample.js', declared_type: 'application/javascript')
=> "application/javascript"
>> Marcel::MimeType.for("console.log('test');", name: 'sample.js', declared_type: 'text/javascript')
=> "text/javascript"
>> Marcel::MimeType.for("//Comment\nconsole.log('test');", name: 'sample.js', declared_type: 'text/javascript')
=> "application/javascript"
>> Marcel::MimeType.for("//Comment\nconsole.log('test');", name: 'sample.js', declared_type: 'application/x-javascript')
=> "application/javascript"

After #101:

>> Marcel::MimeType.for("//Comment\nconsole.log('test');", name: 'sample.js', declared_type: 'application/javascript')
=> "application/javascript"
>> Marcel::MimeType.for("console.log('test');", name: 'sample.js', declared_type: 'text/javascript')
=> "application/javascript"
>> Marcel::MimeType.for("//Comment\nconsole.log('test');", name: 'sample.js', declared_type: 'text/javascript')
=> "application/javascript"
>> Marcel::MimeType.for("//Comment\nconsole.log('test');", name: 'sample.js', declared_type: 'application/x-javascript')
=> "application/javascript"

Note that Tika has switched text/javascript to become the canonical type with application/javascript as an alias. We'll be able to do that in Marcel with a 1.1 release.

@jeremy jeremy closed this as completed Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants