📜 ⬆️ ⬇️

Sprockets 3 encoding problem when working with HTML files

I recently updated one of the projects to Rails 4.2 and noticed an interesting effect: the encoding of the processed html files in assets changes to ISO-8859-1.

This issue is relevant for Sprockets 3.0.0 and 3.0.1.

The problem was found in EncodingUtils # detect_html:

#... module Sprockets # Internal: HTTP transport encoding and charset detecting related functions. # Mixed into Environment. module EncodingUtils extend self #... # Public: Detect charset from HTML document. Defaults to ISO-8859-1. # # str - String. # # Returns a encoded String. def detect_html(str) str = detect_unicode_bom(str) # Attempt Charlock detection if str.encoding == Encoding::BINARY charlock_detect(str) end # Fallback to ISO-8859-1 if str.encoding == Encoding::BINARY str.force_encoding(Encoding::ISO_8859_1) end str end CHARSET_DETECT[:html] = method(:detect_html) end end 

')
When loading a file, Sprockets tries to determine the Unicode standard, clean the string from the BOM and return it in the correct encoding. In the case of html, if the encoding could not be determined at this stage, then we try to let charlock_holmes do it (if installed), otherwise we will forcefully convert it to ISO-8859-1.

The problem is that BOM is not required for UTF-8 and therefore almost all editors save files to UTF-8 without BOM. And this means that the `detect_unicode_bom` method is by and large useless and the html files in assets will always be converted to ISO-8859-1.

You can solve the problem in the following ways:

1. Override the Mime Type for text / html in the initializer:

 Rails.application.assets.register_mime_type('text/html', extensions: '.html', charset: :default) 

2. Set charlock_holmes.

3. Upgrade to version 3.0.2, where the default behavior is changed from ISO-8859-1 to Encoding.default_external ( pull request )

Source: https://habr.com/ru/post/256523/


All Articles