In early 2012, I worked on a series of articles on client optimization in ASP.NET MVC for MSDeveloper.RU. A total of 2 articles were published: “Compressing JS and CSS files” and “Resource Managers” , but my plans were to write 2 more articles: one about graphics optimization, and the second about minimizing HTML markup and GZIP / Deflate compression ( further just HTTP compression). Unfortunately, these plans could not be realized due to lack of free time (at that moment, I started the Bundle Transformer project) and the subsequent closing of the magazine.
But recently I decided to return to the topic of optimizing HTML markup. After a little research, I realized that under .NET there are almost no full-fledged HTML minimizers. All existing .NET solutions produce only 2 operations: deleting unnecessary whitespace and deleting HTML comments, which is why they very much lose to solutions from other platforms. So I decided to write my own HTML minimizer for .NET, which will be discussed in this article.
Before proceeding with the description of my project, I would like to tell you a little about the nearly 15-year history of HTML minimization and the evolution of software that automates this process.
Contrary to popular belief, techniques to minimize HTML-code appeared much earlier than similar techniques for JavaScript. Already at the end of 1998, Artemy Lebedev, in the 17th paragraph of the “Optimizer Paranoia”, described some techniques for minimizing HTML code.
By the early 2000s, many HTML minimization techniques were already known, which are still relevant today:
</p>
and </li>
)But also widespread and dangerous technology, which can lead to incorrect display of the document and the violation of its semantics:
<!DOCTYPE …>
declaration<strong>
tags were replaced with <b>
and <em>
with <i>
)At the same time, the first HTML minimizers appeared. Only under Windows OS, there were about a dozen free and shareware programs: HTML Shrinker , Absolute HTML Compressor , HTML File Optimizer, HTML Source Cleaner, Anetto HTML Optimize !, HTMLCompact, HTML Code Cleaner , HTMLOpt , Absolute HTML Optimizer, etc.
By the mid-2000s, the XHTML standard was widely adopted, which required writing HTML code in accordance with the XML syntax rules. These rules forbade the removal of unnecessary quotes from attributes and the removal of optional end tags. Thus, the strict rules of XHTML and the ever-increasing throughput have led to the fact that the need to minimize HTML markup has gradually disappeared.
But a few years ago, due to the growth of the mobile web and the emergence of the HTML5 standard, there was again the need to minimize HTML markup. The HTML5 standard provides much more options for reducing the size of an HTML document than HTML 4.01. Many new minimization techniques are well described in the Google HTML / CSS code design guide and Google ’s Optimizing HTML article by Yuri Zaitsev.
All this led to the emergence of new powerful HTML-minimizers focused on HTML5. At the moment, 2 minimizers are the most popular: Sergey Kovalchuk’s HtmlCompressor (written in Java) and Yuri Zaitsev's Experimental HTML Minifier (written in JavaScript).
When creating Web Markup Minifier (abbreviated WebMarkupMin), I set myself the task of creating a modern HTML minimizer for the .NET platform and extensions for its integration with ASP.NET. WebMarkupMin is an Open Source project, the source code of which is published on the CodePlex website, and distributions can be downloaded via NuGet .
In addition to the HTML minimizer, the WebMarkupMin project also implemented XHTML and XML minimizer. Since this article is devoted to HTML minimization, then it will provide only examples of working with HTML minimizer.
The project has the following structure:
The WebMarkupMin.Core module is a library under .NET Framework 4.0 that contains tools for minimizing markup. This library can be used in various types of .NET applications: ASP.NET, Windows Forms, WPF and console applications. Since all markup minimizers support not only document minimization, but also minimization of individual code fragments, you can use WebMarkupMin to minimize individual content blocks (for example, minimize article text when it is saved in the administrative part of your site).
The WebMarkupMin.Core module contains 3 markup minimizers:
Consider the simplest example of using the HtmlMinifier
class:
namespace WebMarkupMin.Example.Console { using System; using System.Collections.Generic; using WebMarkupMin.Core; using WebMarkupMin.Core.Minifiers; using WebMarkupMin.Core.Settings; class Program { static void Main(string[] args) { const string htmlInput = @"<!DOCTYPE html> <html> <head> <meta charset=""utf-8"" /> <title> </title> <link href=""favicon.ico"" rel=""shortcut icon"" type=""image/x-icon"" /> <meta name=""viewport"" content=""width=device-width"" /> <link rel=""stylesheet"" type=""text/css"" href=""/Content/Site.css"" /> </head> <body> <p>- …</p> <script src=""http://ajax.aspnetcdn.com/ajax/jQuery/jquery-1.9.1.min.js""></script> <script>(window.jquery) || document.write('<script src=""/Scripts/jquery-1.9.1.min.js""><\/script>');</script> </body> </html>"; var settings = new HtmlMinificationSettings { WhitespaceMinificationMode = WhitespaceMinificationMode.Aggressive, RemoveHttpProtocolFromAttributes = true, RemoveHttpsProtocolFromAttributes = true }; var htmlMinifier = new HtmlMinifier(settings); MarkupMinificationResult result = htmlMinifier.Minify(htmlInput, generateStatistics: true); if (result.Errors.Count == 0) { MinificationStatistics statistics = result.Statistics; if (statistics != null) { Console.WriteLine(" : {0:N0} ", statistics.OriginalSize); Console.WriteLine(" : {0:N0} ", statistics.MinifiedSize); Console.WriteLine(": {0:N2}%", statistics.SavedInPercent); } Console.WriteLine(" :{0}{0}{1}", Environment.NewLine, result.MinifiedContent); } else { IList<MinificationErrorInfo> errors = result.Errors; Console.WriteLine(" {0:N0} :", errors.Count); Console.WriteLine(); foreach (var error in errors) { Console.WriteLine(" {0}, {1}: {2}", error.LineNumber, error.ColumnNumber, error.Message); Console.WriteLine(); } } } } }
First, we create an instance of the HtmlMinificationSettings
class and override some parameters of HTML minimization. Then we pass it to an instance of the HtmlMinifier
class via the corresponding constructor parameter, after which we call the Minify
method with the following parameters: the first parameter contains the HTML code, and the second is a sign that allows the generation of statistical information (the default value is false, because generating statistics requires time and additional server resources). The Minify
method returns an object of type MarkupMinificationResult
, which has the following properties:
If the error list is empty, then statistics and minimized code are output to the console; otherwise, error information is displayed.
And now let's take a closer look at the properties of the HtmlMinificationSettings
class:
Tab. 1. Properties class HtmlMinificationSettings
Property | Data type | Default value | Description |
---|---|---|---|
WhitespaceMinificationMode | Enumeration | Medium | Minimize whitespace mode. It can take the following values:
|
RemoveHtmlComments | Boolean | true | The flag is responsible for removing all HTML comments, except for conditional comments Internet Explorer and noindex . |
RemoveHtmlCommentsFromScriptsAndStyles | Boolean | true | A flag that removes HTML comments from script and style tags. |
RemoveCdataSectionsFromScriptsAndStyles | Boolean | true | Flag responsible for removing CDATA sections from script and style tags. |
UseShortDoctype | Boolean | true | The flag responsible for replacing the existing doctype with a shorter one - <!DOCTYPE html> . |
UseMetaCharsetTag | Boolean | true | The flag responsible for replacing the <meta http-equiv="content-type" content="text/html; charset=…"> tag with the <meta http-equiv="content-type" content="text/html; charset=…"> tag with <meta charset="…"> . |
EmptyTagRenderMode | Enumeration | NoSlash | The mode of rendering empty tags. It can take the following values:
|
RemoveOptionalEndTags | Boolean | true | Flag that removes optional end tags ( html , head , body , p , li , dt , dd , rt , rp , optgroup , option , colgroup , thead , tfoot , tbody , tr , th and td ). |
RemoveTagsWithoutContent | Boolean | false | The tag is responsible for removing tags with empty content, with the exception of tags textarea , tr , th , td , and tags with attributes class , id , name , role , src and data-* . |
CollapseBooleanAttributes | Boolean | true | The flag responsible for “folding” boolean attributes (for example, checked="checked" reduced to checked ). |
RemoveEmptyAttributes | Boolean | true | Flag that removes empty attributes (applies only to the following attributes: class , id , name , style , title , lang , dir , event attributes, the action attribute of the form tag and the value attribute of the input tag). |
AttributeQuotesRemovalMode | Enumeration | Html5 | The mode for removing quotes in HTML attributes. It can take the following values:
|
RemoveRedundantAttributes | Boolean | true | Flag responsible for removing redundant attributes:
|
RemoveJsTypeAttributes | Boolean | true | The flag that removes type="text/javascript" attributes from script tags. |
RemoveCssTypeAttributes | Boolean | true | The flag that removes type="text/css" attributes from the style and link tags. |
RemoveHttpProtocolFromAttributes | Boolean | false | The flag responsible for removing the HTTP protocol prefix ( http: from attributes that contain a URL (tags marked with the attribute rel="external" ignored). |
RemoveHttpsProtocolFromAttributes | Boolean | false | The flag responsible for removing the HTTPS protocol prefix ( https: from attributes that contain URLs (tags marked with the attribute rel="external" ignored). |
RemoveJsProtocolFromAttributes | Boolean | true | Flag responsible for removing javascript: pseudo-protocol prefix from event attributes. |
MinifyEmbeddedCssCode | Boolean | true | The flag that is responsible for minimizing the CSS code in the style tags. |
MinifyInlineCssCode | Boolean | true | The flag that is responsible for minimizing the CSS code in the style attributes. |
MinifyEmbeddedJsCode | Boolean | true | The flag responsible for minimizing JS code in script tags. |
MinifyInlineJsCode | Boolean | true | A flag that is responsible for minimizing JS code in event attributes and hyperlinks with pseudo-protocol javascript: |
If different parts of your application require the same HTML minimization parameters, then you can specify them only once in the configuration file ( App.config
or Web.config
) in the /configuration/webMarkupMin/core/html
element:
<?xml version="1.0" encoding="utf-8"?> <configuration> <configSections> <sectionGroup name="webMarkupMin"> <section name="core" type="WebMarkupMin.Core.Configuration.CoreConfiguration, WebMarkupMin.Core" /> … </sectionGroup> … </configSections> … <webMarkupMin xmlns="http://tempuri.org/WebMarkupMin.Configuration.xsd"> <core> <html whitespaceMinificationMode="Medium" removeHtmlComments="true" removeHtmlCommentsFromScriptsAndStyles="true" removeCdataSectionsFromScriptsAndStyles="true" useShortDoctype="true" useMetaCharsetTag="true" emptyTagRenderMode="NoSlash" removeOptionalEndTags="true" removeTagsWithoutContent="false" collapseBooleanAttributes="true" removeEmptyAttributes="true" attributeQuotesRemovalMode="Html5" removeRedundantAttributes="true" removeJsTypeAttributes="true" removeCssTypeAttributes="true" removeHttpProtocolFromAttributes="false" removeHttpsProtocolFromAttributes="false" removeJsProtocolFromAttributes="true" minifyEmbeddedCssCode="true" minifyInlineCssCode="true" minifyEmbeddedJsCode="true" minifyInlineJsCode="true" /> … </core> … </webMarkupMin> … </configuration>
To get an instance of the HtmlMinificationSettings
class with values from the configuration file, use the following code:
HtmlMinificationSettings settings = WebMarkupMinContext.Current.Markup.GetHtmlMinificationSettings();
You can also create an instance of the HtmlMinifier
class, which will use the settings specified in the configuration file (HTML minimization parameters, as well as registered by default: CSS minimizer, JS minimizer and logger):
HtmlMinifier htmlMinifier = WebMarkupMinContext.Current.Markup.CreateHtmlMinifierInstance();
This method of creating an instance of the HTML minimizer is used in all modules responsible for integration with ASP.NET.
HtmlMinifier
addition to minimizing markup, HtmlMinifier
and XhtmlMinifier
support: minimizing CSS code in tags and style
attributes, and minimizing JavaScript code in script
tags, event attributes (for example, onclick
) and javascript:
pseudo-protocol hyperlinks.
CSS and JS code are minimized by classes that implement the ICssMinifier
and IJsMinifier
from the WebMarkupMin.Core.Minifiers
namespace.
The kernel contains two classes that implement the ICssMinifier
interface:
And two classes that implement the IJsMinifier
interface:
Instances of CSS and JS minimizers can be passed to the markup minimizer through its constructor:
var kristensenCssMinifier = new KristensenCssMinifier(); var crockfordJsMinifier = new CrockfordJsMinifier(); var htmlMinifier = new HtmlMinifier(cssMinifier: kristensenCssMinifier, jsMinifier: crockfordJsMinifier);
If the markup minimizer is created based on the parameters of the configuration file, then the CSS and JS minimizers can be transferred to the markup minimizer by registering them in the configuration file as the default minimizers:
<?xml version="1.0" encoding="utf-8"?> <configuration> <configSections> <sectionGroup name="webMarkupMin"> <section name="core" type="WebMarkupMin.Core.Configuration.CoreConfiguration, WebMarkupMin.Core" /> … </sectionGroup> … </configSections> … <webMarkupMin xmlns="http://tempuri.org/WebMarkupMin.Configuration.xsd"> <core> … <css defaultMinifier="KristensenCssMinifier"> <minifiers> <add name="NullCssMinifier" displayName="Null CSS Minifier" type="WebMarkupMin.Core.Minifiers.NullCssMinifier, WebMarkupMin.Core" /> <add name="KristensenCssMinifier" displayName="Mads Kristensen's CSS minifier" type="WebMarkupMin.Core.Minifiers.KristensenCssMinifier, WebMarkupMin.Core" /> </minifiers> </css> <js defaultMinifier="CrockfordJsMinifier"> <minifiers> <add name="NullJsMinifier" displayName="Null JS Minifier" type="WebMarkupMin.Core.Minifiers.NullJsMinifier, WebMarkupMin.Core" /> <add name="CrockfordJsMinifier" displayName="Douglas Crockford's JS Minifier" type="WebMarkupMin.Core.Minifiers.CrockfordJsMinifier, WebMarkupMin.Core" /> </minifiers> </js> … </core> … </webMarkupMin> … </configuration>
If CSS-and JS-minimizers are registered in the configuration file, then their instances can be created as follows:
ICssMinifier cssMinifier = WebMarkupMinContext.Current.Code.CreateCssMinifierInstance("KristensenCssMinifier"); IJsMinifier jsMinifier = WebMarkupMinContext.Current.Code.CreateJsMinifierInstance("CrockfordJsMinifier");
If you just want to create instances of CSS and JS minimizers that are registered as default minimizers, this can be done as follows:
ICssMinifier cssMinifier = WebMarkupMinContext.Current.Code.CreateDefaultCssMinifierInstance(); IJsMinifier jsMinifier = WebMarkupMinContext.Current.Code.CreateDefaultJsMinifierInstance();
In addition to manual handling of errors and warnings, WebMarkupMin also provides the ability to connect loggers, with which you can centrally record errors and warnings in your own logs. A logger can be any class that implements the ILogger
interface or inherits the base LoggerBase
class from the WebMarkupMin.Core.Loggers
namespace.
The kernel contains two classes that implement the ILogger
interface:
Errors
and Warnings
properties of the MarkupMinificationResult
class.ThrowExceptionLogger
throws an exception of type MarkupMinificationException
.The logger instance can be passed to the markup minimizer through its constructor:
var htmlMinifier = new HtmlMinifier(logger: new ThrowExceptionLogger());
If the markup minimizer is created based on the parameters of the configuration file, then the logger can be passed to it by registering in the configuration file as the default logger:
<?xml version="1.0" encoding="utf-8"?> <configuration> <configSections> <sectionGroup name="webMarkupMin"> <section name="core" type="WebMarkupMin.Core.Configuration.CoreConfiguration, WebMarkupMin.Core" /> … </sectionGroup> … </configSections> … <webMarkupMin xmlns="http://tempuri.org/WebMarkupMin.Configuration.xsd"> <core> … <logging defaultLogger="ThrowExceptionLogger"> <loggers> <add name="NullLogger" displayName="Null Logger" type="WebMarkupMin.Core.Loggers.NullLogger, WebMarkupMin.Core" /> <add name="ThrowExceptionLogger" displayName="Throw exception logger" type="WebMarkupMin.Core.Loggers.ThrowExceptionLogger, WebMarkupMin.Core" /> </loggers> </logging> </core> … </webMarkupMin> … </configuration>
If the logger is registered in the configuration file, then its instance can be created as follows:
ILogger logger = WebMarkupMinContext.Current.CreateLoggerInstance("ThrowExceptionLogger");
Accordingly, to create a logger registered as the default logger, you can use the following code:
ILogger logger = WebMarkupMinContext.Current.CreateDefaultLoggerInstance();
It is also possible to use a single logger instance for the entire application:
ILogger logger = WebMarkupMinContext.Current.GetLoggerInstance("ThrowExceptionLogger");
and
ILogger logger = WebMarkupMinContext.Current.GetDefaultLoggerInstance();
The embedded CSS and JS code minimizers produce only simple optimizations and cannot provide a high degree of compression. To solve this problem, additional modules were created that contain adapters for the minimizers popular in the .NET community: Microsoft Ajax Minifier and YUI Compressor for .Net .
The WebMarkupMin.MsAjax module contains two minimization adapters: MsAjaxCssMinifier
and MsAjaxJsMinifier
. The WebMarkupMin.Yui module is also similarly organized: YuiCssMinifier
and YuiJsMinifier
.
Minimizer adapters can be passed to the markup minimizer using the same mechanisms as the built-in minimizers.
In addition, the settings of the external minimizers listed above can be changed in the webMarkupMin/msAjax
and webMarkupMin/yui
sections of the configuration file (for those who use Bundle Transformer, this will seem familiar).
The WebMarkupMin.Web module works at the ASP.NET core level and therefore can be used in any of the existing ASP.NET frameworks: Web Forms, MVC and Web Pages.
WebMarkupMin.Web contains classes of HTTP modules that allow you to minimize and compress the code that is generated by ASP.NET:
text/html
content type using HTML Minifier tools.text/html
or application/xhtml+xml
content type using XHTML Minifier.application/xhtml+xml
) using XML Minifier.The above HTTP modules can only process GET requests and responses with a status code equal to 200
. It should also be noted that the HtmlMinificationModule
and XhtmlMinificationModule
cannot be used together.
Consider the example of registering HTTP modules in the Web.config
file:
<?xml version="1.0" encoding="utf-8"?> <configuration> … <system.webServer> <modules> <add name="HtmlMinificationModule" type="WebMarkupMin.Web.HttpModules.HtmlMinificationModule, WebMarkupMin.Web" /> <add name="CompressionModule" type="WebMarkupMin.Web.HttpModules.CompressionModule, WebMarkupMin.Web" /> … </modules> … </system.webServer> … </configuration>
In addition, the behavior of these HTTP modules can be controlled using the webMarkupMin/webExtensions
configuration section:
<?xml version="1.0" encoding="utf-8"?> <configuration> <configSections> <sectionGroup name="webMarkupMin"> <section name="core" type="WebMarkupMin.Core.Configuration.CoreConfiguration, WebMarkupMin.Core" /> <section name="webExtensions" type="WebMarkupMin.Web.Configuration.WebExtensionsConfiguration, WebMarkupMin.Web" /> … </sectionGroup> … </configSections> … <webMarkupMin xmlns="http://tempuri.org/WebMarkupMin.Configuration.xsd"> … <webExtensions enableMinification="true" disableMinificationInDebugMode="true" enableCompression="true" disableCompressionInDebugMode="true" maxResponseSize="100000" /> … </webMarkupMin> … </configuration>
Consider in detail all the properties of the configuration section webExtensions
:
Tab. 2 Properties of the webExtensions
configuration section
Property | Data type | Default value | Description |
---|---|---|---|
enableMinification | Boolean | true | Includes markup minimization. |
disableMinificationInDebugMode | Boolean | true | Disable markup minimization in debug mode. |
enableCompression | Boolean | true | Enables HTTP text content compression. |
disableCompressionInDebugMode | Boolean | true | Disables HTTP compression of text content in debug mode. |
maxResponseSize | Integer | 100 000 | HTTP- ( ), . |
, WebMarkupMin: WebMarkupMin.Mvc WebMarkupMin.WebForms.
, HTTP- ASP.NET-, MVC Web Forms , HTTP- . HTTP- , ASP.NET Web Pages.
WebMarkupMin.Mvc -, ASP.NET MVC 3 4.
WebMarkupMin.Mvc 4 :
text/html
, InvalidContentTypeException
.text/html
application/xhtml+xml
, InvalidContentTypeException
.InvalidContentTypeException
.:
namespace WebMarkupMin.Example.Mvc.Controllers { using System.Web.Mvc; using Infrastructure.ActionResults; using WebMarkupMin.Mvc.ActionFilters; public class HomeController : Controller { [CompressContent] [MinifyHtml] [OutputCache(CacheProfile = "CacheCompressedContent5Minutes")] public ActionResult Index() { … } … } }
HTTP- , OutputCacheAttribute
.
WebMarkupMin.WebForms -, ASP.NET Web Forms 4.0 4.5.
WebMarkupMin.WebForms 3 Web Forms:
HTML- HTTP- Web Forms, MinifiedAndCompressedHtmlPage
:
namespace WebMarkupMin.Example.WebForms { using System; using WebMarkupMin.WebForms.Pages; public partial class Contact : MinifiedAndCompressedHtmlPage { … } }
HTTP- , EnableMinification
EnableCompression
:
private void Page_PreLoad(object sender, EventArgs e) { if (IsPostBack) { EnableMinification = false; EnableCompression = false; } }
HTTP- OutputCache
:
<%@ Page Title="Contact" Language="C#" MasterPageFile="~/Site.Master" AutoEventWireup="true" CodeBehind="Contact.aspx.cs" Inherits="WebMarkupMin.Example.WebForms.Contact" %> <%@ OutputCache CacheProfile="CacheCompressedContent5Minutes" VaryByParam="*" %> …
WebMarkupMin.WebForms -: CompressedMasterPage
, MinifiedAndCompressedHtmlMasterPage
MinifiedAndCompressedXhtmlMasterPage
.
HTML- HTTP- -, MinifiedAndCompressedHtmlMasterPage
:
namespace WebMarkupMin.Example.WebForms { using System; using System.Web.UI; using WebMarkupMin.WebForms.MasterPages; public partial class Site : MinifiedAndCompressedHtmlMasterPage { … } }
, - .
HTML- WebMarkupMin 25-40%. , HTTP- . - , HTML- HTTP- . , , : ( HTTP- ).
HTML- , ASP.NET, . HTML- , CSS- YuiCssMinifier
, JS- — MsAjaxJsMinifier
.
. 3. HTML- HTTP-
Name of the site | * | * | Saving | |
---|---|---|---|---|
Poster | www.afisha.ru | 162,28 | 110,64 | 31,82% |
Mts | www.mts.ru | 80,56 | 48,50 | 39,80% |
OZON | www.ozon.ru | 107,32 | 62,21 | 42,03% |
Workle | www.workle.ru | 115,26 | 72,94 | 36,72% |
. 4. HTML- HTTP- ( GZIP-)
Name of the site | * | * | Saving | |
---|---|---|---|---|
Poster | www.afisha.ru | 30,01 | 25,47 | 15,14% |
Mts | www.mts.ru | 19,08 | 14,15 | 25,86% |
OZON | www.ozon.ru | 16,58 | 14,23 | 14,22% |
Workle | www.workle.ru | 19,06 | 17,31 | 9,15% |
* — , 1 = 1 024
. 3 , HTTP- HTML- 37,59%, .
HTTP- (. 4) . : 4,54 4,93 . , 2 , .
You can independently make similar measurements for your site using the online version of the HTML minimizer on the WebMarkupMin Online site .
Source: https://habr.com/ru/post/178081/
All Articles