From life with Kubernetes: How the HTTP server did not complain about the Spaniards

A representative of our client, whose application stack lives in the cloud from Microsoft (Azure), addressed a problem: recently, some requests from some European customers began to end with an error 400 ( Bad Request ). All applications are written in .NET, deployed in Kubernetes ...

One of the applications is the API through which all traffic eventually arrives. This traffic listens to a Kestrel HTTP server configured by a .NET client and located in the pod. With debugging, we were lucky in the sense that there was a specific user who had a consistent replay of the problem. However, everything was complicated by the traffic chain:
')

The error in Ingress looked like this:

{ "number_fields":{ "status":400, "request_time":0.001, "bytes_sent":465, "upstream_response_time":0, "upstream_retries":0, "bytes_received":2328 }, "stream":"stdout", "string_fields":{ "ingress":"app", "protocol":"HTTP/1.1", "request_id":"f9ab8540407208a119463975afda90bc", "path":"/api/sign-in", "nginx_upstream_status":"400", "service":"app", "namespace":"production", "location":"/front", "scheme":"https", "method":"POST", "nginx_upstream_response_time":"0.000", "nginx_upstream_bytes_received":"120", "vhost":"api.app.example.com", "host":"api.app.example.com", "user":"", "address":"83.41.81.250", "nginx_upstream_addr":"10.240.0.110:80", "referrer":"https://api.app.example.com/auth/login?long_encrypted_header", "service_port":"http", "user_agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36", "time":"2019-03-06T18:29:16+00:00", "content_kind":"cache-headers-not-present", "request_query":"" }, "timestamp":"2019-03-06 18:29:16", "labels":{ "app":"nginx", "pod-template-generation":"6", "controller-revision-hash":"1682636041" }, "namespace":"kube-nginx-ingress", "nsec":6726612, "source":"kubernetes", "host":"k8s-node-55555-0", "pod_name":"nginx-v2hcb", "container_name":"nginx", "boolean_fields":{} }

In this case, Kestrel gave:

 HTTP/1.1 400 Bad Request Connection: close Date: Wed, 06 Mar 2019 12:34:20 GMT Server: Kestrel Content-Length: 0

Even with the maximum verbosity, the Kestrel error contained very little useful information :

 { "number_fields":{"ThreadId":76}, "stream":"stdout", "string_fields":{ "EventId":"{\"Id\"=>17, \"Name\"=>\"ConnectionBadRequest\"}", "SourceContext":"Microsoft.AspNetCore.Server.Kestrel", "ConnectionId":"0HLL2VJSST5KV", "@mt":"Connection id \"{ConnectionId}\" bad request data: \"{message}\"", "@t":"2019-03-07T13:06:48.1449083Z", "@x":"Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException: Malformed request: invalid headers.\n at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1Connection.TryParseRequest(ReadResult result, Boolean& endConnection)\n at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.<ProcessRequestsAsync>d__185`1.MoveNext()", "message":"Malformed request: invalid headers." }, "timestamp":"2019-03-07 13:06:48", "labels":{ "pod-template-hash":"2368795483", "service":"app" }, "namespace":"production", "nsec":145341848, "source":"kubernetes", "host":"k8s-node-55555-1", "pod_name":"app-67bdcf98d7-mhktx", "container_name":"app", "boolean_fields":{} }

It would seem that only tcpdump will help in solving this problem ... but I repeat about the traffic chain:

Investigation

Obviously, it is better to listen to the traffic on that particular node where Kubernetes has deployed the pod: the volume of the dump will be such that you will be able to quickly find at least something. And indeed, when considering it, such a frame was noticed:

 GET /back/user HTTP/1.1 Host: api.app.example.com X-Request-ID: 27ceb14972da8c21a8f92904b3eff1e5 X-Real-IP: 83.41.81.250 X-Forwarded-For: 83.41.81.250 X-Forwarded-Host: api.app.example.com X-Forwarded-Port: 443 X-Forwarded-Proto: https X-Original-URI: /front/back/user X-Scheme: https X-Original-Forwarded-For: 83.41.81.250 X-Nginx-Geo-Client-Country: Spain X-Nginx-Geo-Client-City: M.laga Accept-Encoding: gzip CF-IPCountry: ES CF-RAY: 4b345cfd1c4ac691-MAD CF-Visitor: {"scheme":"https"} pragma: no-cache cache-control: no-cache accept: application/json, text/plain, */* origin: https://app.example.com user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36 referer: https://app.example.com/auth/login accept-language: en-US,en;q=0.9,en-GB;q=0.8,pl;q=0.7 cookie: many_encrypted_cookies; .AspNetCore.Identity.Application=something_encrypted; CF-Connecting-IP: 83.41.81.250 True-Client-IP: 83.41.81.250 CDN-Loop: cloudflare HTTP/1.1 400 Bad Request Connection: close Date: Wed, 06 Mar 2019 12:34:20 GMT Server: Kestrel Content-Length: 0

Upon careful consideration of the dump, the word M.laga was noticed. It is easy to guess that in Spain there is no city M.laga (but there is Málaga ). Having grasped this idea, we looked at the Ingress configs, where we saw the “harmless” snippet inserted a month ago (at the request of the client):

  ingress.kubernetes.io/configuration-snippet: | proxy_set_header X-Nginx-Geo-Client-Country $geoip_country_name; proxy_set_header X-Nginx-Geo-Client-City $geoip_city;

When you disable the forwarding of these headers, everything became good! (It soon became clear that these headers were no longer required by the application itself.)

Now let's look at the problem more generally . It is easy to reproduce inside the application, if you make a telnet request to localhost:80 :

 GET /back/user HTTP/1.1 Host: api.app.example.com cache-control: no-cache accept: application/json, text/plain, */* origin: https://app.example.com Cookie: test=Desiree

... returns 401 Unauthorized , as expected. And what happens if we do:

 GET /back/user HTTP/1.1 Host: api.app.example.com cache-control: no-cache accept: application/json, text/plain, */* origin: https://app.example.com Cookie: test=Désirée

?

400 Bad request return - in the application log we get the error we already know:

 { "@t":"2019-03-31T12:59:54.3746446Z", "@mt":"Connection id \"{ConnectionId}\" bad request data: \"{message}\"", "@x":"Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException: Malformed request: invalid headers.\n at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1Connection.TryParseRequest(ReadResult result, Boolean& endConnection)\n at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.<ProcessRequestsAsync>d__185`1.MoveNext()", "ConnectionId":"0HLLLR1J974L9", "message":"Malformed request: invalid headers.", "EventId":{ "Id":17, "Name":"ConnectionBadRequest" }, "SourceContext":"Microsoft.AspNetCore.Server.Kestrel", "ThreadId":71 }

Results

Specifically, Kestrel cannot correctly handle HTTP headers with the correct characters in UTF-8, which are contained in the names of a fairly large number of cities.

An additional factor in our case is that the client does not plan to change the implementation of Kestrel in the application. However, the issues in AspNetCore itself ( # 4318 , # 7707 ) suggest that this will not help ...

Summing up: the note is no longer about the specific problems of Kestrel or UTF-8 (in 2019 ?!), but that attentiveness and consistent study of each step during the search for a problem sooner or later will bear fruit. Successes!

From life with Kubernetes: How the HTTP server did not complain about the Spaniards

Investigation

Results

PS

More articles: