📜 ⬆️ ⬇️

Encodings and web pages

Returning to the hackneyed problem with the encodings of Russian letters, I would like to have on hand some kind of single reference book or guide in which you can find solutions to various similar situations. At one time he shoveled many articles and publications to find the causes of errors. The goal of this publication is to save the reader’s time and nerves and put together various causes of errors with encodings in Java and JSP development and ways to eliminate them.

Solutions may not be the only ones, I will gladly add the ones suggested by the reader, if they are workers.

So let's go.

1. Problem: when the browser has developed a page for me, the entire Russian text goes short, even the one that is statically scored.
Reason: the browser incorrectly determines the encoding of the text, because there is no explicit indication.
Solution: explicitly specify the encoding:
a) HTML: add the META tag to the page header:
[\< meta http-equiv="Content-Type" content="text/html; charset=cp1251"\>] 

b) XML: specify the encoding in the title:
 [<?xml version="1.0" encoding="cp1251"?>] 

c) JSP - set the content type in the header:
 [<%@ page language="java" contentType="text/html;charset=cp1251"%>] 

d) JSP - set the encoding of the returned page
 [<%@ page pageEncoding="cp1251"%>] 

e) Java - set up the response header:
 [response.setCharacterEncoding("cp1251");] [response.setContentType("text/html;charset=cp1251");] 

')
2. Problem: for some reason, the static Russian text written in the JSP page goes with cracks, although the encoding of the page is set.
Reason: the static text was written in a different encoding from the specified page.
Solution: change the encoding in the editor (for example, for AkelPad click "Save As" and select the desired encoding).

3. Problem: the text received from the request is krakozabrami.
Cause: The request encoding is different from the encoding used to process it.
Solution: set the encoding of the request or recode to the desired one.
a) Java, from the side of the sender the necessary encoding is not specified - we will encode it into the necessary one:
 [String MyParam= new String(request.getParameter("MyParam").getBytes("ISO-8859-1"),"cp1251");] 

Note: ISO-8859-1 encoding is set by default unless otherwise specified.
b) Java, from the side of the sender the required encoding is set - set the encoding of the request:
 [request.setCharacterEncoding("cp1251");] 


4. Problem: the Russian text sent by the GET-parameter when the redirect comes is krakozyabrami.
Reason: the packaging of Russian text in the URI defaults to ISO-8859-1.
Solution: pack the text in the desired encoding manually.
a) JSP, URLEncoder:
 [<%@ page import="java.net.URLEncoder"%> <%response.sendRedirect("targetPage.jsp?MyParam="+URLEncoder.encode(" ","cp1251"));%>] 


5. Problem: text from the database is read by krakozyabrami.
Reason: The encoding of the text read from the database is different from the encoding of the page.
Solution: set the appropriate encoding of the page, or re-encode the values ​​obtained from the database.
a) Java, transcoding the string read in db_string database:
 [String MyValue = new String(db_string.getBytes("utf-8"),"cp1251");] 


6. Problem: the text is written to the database with sparks, although the page is displayed correctly.
Reason: the coding of the written string is different from the encoding of the session of working with the database, or from the encoding of the database (it is worth remembering that they do not always match).
Solution: set the necessary session encoding or re-encode the string.
a) Java, transcoding the db_string being written to the encoding of a session or database:
 [String db_string = new String(MyValue.getBytes("cp1251"),"utf-8");] 

b) Java, MySQL, setting connection parameters in the dburl string passed to the connect function:
 [dburl += "?characterEncoding=cp1251";] 

c) MySQL, setting the connection parameters in the XML context descriptor, add an attribute to the \ <Resource \> tag:
 [connectionProperties="useUnicode=no;characterEncoding=cp1251;"] 

d) MySQL, direct setting of the session encoding by calling SET NAMES (connect is the Connection object):
 [CallableStatement cs = connect.prepareCall("set names 'cp1251'"); cs.execute();] 


7. Problem: if nothing helped ...
Solution: there always remains the most "clumsy" method - direct transcoding.
a) For known source coding:
[String MyValue = new String(source_string.getBytes("utf-8"),"cp1251");]

b) For the query parameter:
[String MyValue = new String(request.getParameter("MyParam").getBytes(request.getCharacterEncoding()),"cp1251");]


Supplement, or what you need to know:

1. Database encodings and connection sessions may vary, depending on the specific DBMS and driver. For example, when connecting to MySQL with the standard com.mysql.jdbc.Driver driver without explicitly specifying the session encoding, it was installed in UTF-8, despite the different encoding of the database schema.
2. The packaging encoding of the query string in the URI is set by default to ISO-8859-1. One may encounter a similar situation, for example, when transmitting explicitly specified text in a redirect from one page to another.
3. Relationships of page encodings, databases, sessions, request and response parameters do not depend on the development language and the functions described for Java have analogues for PHP, Asp and others.

Note: there is no possibility to restore links to sources, all examples are taken from own code, although once I also sought them out on numerous forums.

I hope this short review will help novice web programmers to reduce debugging time and save nerves.

Source: https://habr.com/ru/post/265795/


All Articles