📜 ⬆️ ⬇️

PHP DomDocument Note

Having spent a lot of time fighting the correct parsing of the html document (namely DomDocument :: loadHTML) in the cp1251 encoding, I want to add a good post about encoding, parsing and meta-tag

unreliable version - meta-tag comes after title tag

< html >
< head >
< title > </ title >
< meta http-equiv ="Content-type" content ="text/html; charset=window-1251" >
</ head >
< body >
< div > </ div >
</ body >
</ html >


* This source code was highlighted with Source Code Highlighter .
< html >
< head >
< title > </ title >
< meta http-equiv ="Content-type" content ="text/html; charset=window-1251" >
</ head >
< body >
< div > </ div >
</ body >
</ html >


* This source code was highlighted with Source Code Highlighter .
< html >
< head >
< title > </ title >
< meta http-equiv ="Content-type" content ="text/html; charset=window-1251" >
</ head >
< body >
< div > </ div >
</ body >
</ html >


* This source code was highlighted with Source Code Highlighter .



a more reliable option - a meta tag goes before the title tag, DomDocument correctly defines the encoding
')
< html >
< head >
< meta http-equiv ="Content-type" content ="text/html; charset=window-1251" >
< title > </ title >
</ head >
< body >
< div > </ div >
</ body >
</ html >


* This source code was highlighted with Source Code Highlighter .
< html >
< head >
< meta http-equiv ="Content-type" content ="text/html; charset=window-1251" >
< title > </ title >
</ head >
< body >
< div > </ div >
</ body >
</ html >


* This source code was highlighted with Source Code Highlighter .
< html >
< head >
< meta http-equiv ="Content-type" content ="text/html; charset=window-1251" >
< title > </ title >
</ head >
< body >
< div > </ div >
</ body >
</ html >


* This source code was highlighted with Source Code Highlighter .



I hope someone will save time this implicit feature

Source: https://habr.com/ru/post/56829/


All Articles