I treat Crash'and NSLog'ami. Inexpensive. Years of experience. 100% guaranteed.
Approximately this title could describe what happened three and a half months ago in one of my projects. Or rather, it was not even my project, but I had to deal with the problem of crash.
It all started with the fact that on one of the relatively large projects, an exception began to fall out steadily when a user is authorized. "So what's wrong with that?" Everybody has it. A check for
nil forgot to put or somewhere nakosyachili. “Also, me, a big event - crash on the project“, - most programmers will think
about it . In principle, I absolutely agree. Crash is not such a rare occurrence in iPhone programming, and you come across it ten times a day. But this one was special. He was already beginning to smell magic, when I was told about some of his parameters and features:
- Reproducibility on the simulator : 100%
- Reproducibility on device : 0%
- Crash path (after crash localization): ~ 40 seconds
- Optimization settings when compiling (-O1, -O2 ...) do not affect the reproducibility
- XIBs in the project are not used.
')
Yes, he looked pretty harmless:
// Code UITextView * textView = [ [UITextView alloc] initWithFrame:CGRectMake(0, 150, _width, _height)]; // Exception *** Terminating app due to uncaught exception 'CALayerInvalidGeometry', reason: 'CALayer bounds contains NaN: [0 0; nan 200]'
"Well, right there and the no-brainer is that
width is after calculation — NaN!", I thought. After a quick glance at where and how the width of the view is calculated, and not finding anything particularly dangerous, I, to assert my guess, put the
NSLog before creating And in addition, and a breakpoint on the line with the creation of the element.
// Source: NSLog(@"width = %f", _width); //Output: width = 200
“Um,” I thought to myself, and continued the program after the breakpoint. And crash did not happen ...
Test project that reproduces this Crash
For those people who are interested in looking at this beast themselves - I have a test project.
Reproduced exclusively on iOS Simulator, SDK Version 5.1 (SDK 6+ does not play, but if there are good people who can play, write in a personal post)
If you still think that something trivial is here, then calmly read on. When we get to the interesting places, I’ll stop and make a warning about spoilers.
What have I done?
If you think logically, then everything becomes clear. Once crash disappeared, it was because I changed something (because before that the reproducibility was 100%). The question remained only what
exactly had changed.
- I added some code before creating the view.
- I added NSLog
- I suspended the flow
- I accessed the _width class variable before passing it as a parameter
- ....
If you still have ideas that I could change - you can write. Then check if the guesses were correct. But, most likely - hardly;)
All items are fairly easy to check. I removed all the changes and started experimenting.
I suspended the flow
The idea here is quite simple. Most likely - the error occurs due to multithreading.
This was the easiest thing to check, I put a breakpoint without
NSLog 'a. The code continued to crash, with the same 100%.
I accessed the _width class variable or I added some code before creating the view
The idea of ​​these points is in the "features" of code generation, which, miraculously, could lead to unintended consequences.
No dances with a tambourine and additional, randomly generated, lines of code with access to the variable
_width gave no result. The application has fallen steadily.
I added NSLog
“No, well, you're not serious, right?”
It turned out seriously.
NSLog continued to "cure" crash. Stable. Regardless of the parameters passed to him. Revealed more remarkable features. The application continued to fall, even if you do not use
_width ,
_height as parameters. Without
NSLog, any value passed to
- (id) initWithFrame: caused the application to crash with the same error. Among other things, it turned out that it is not at all necessary to put
NSLog right before creating a view. It can be put at the very beginning of the method where the view was created ... or in the method that caused this method ... or even in the method that caused the method that caused ... well, you understood, in general. The main thing - to create.
Success!
That's it all over. The problem was solved". The blocker is eliminated, and nothing prevents you from quietly testing the application on the simulator. And also, there was a lot of work that needed to be done, and I had to switch to it.
Is the problem solved?
This question gave me no rest for several evenings while I was finishing the main project. Has the problem been resolved, or is the “leech” with
NSLog 'om simply removed the symptoms of some larger problem that lurked and waited for the project to go into release? And then, when everyone forgets about it, and for its fixing, it will be necessary to wait a week for a submission in the AppStore, it will manifest itself and begin to break and crush everything, just not on the simulator anymore?
Magic?
On the one hand, I understood that the problem was in my code. And, most likely, this is not a
UIKit library
error . Familiar programmers, and said - look for the problem in your code. And I agreed with them. But how to find her? The project is rather big, attempts to localize the problem in a few lines fail. I wanted to quit, say "magic" and score. And live with a sense of magic in the project. But then, because if you know how magic works, all interest is lost, everything becomes ordinary, according to the documentation.
Nevertheless, interest in the cause of the problem pereosilil, and I began to dig. By this time, my ideas were spinning around the desire to understand: so what does
NSLog do so interesting, and how does it “cure” the fall of the program? There were a lot of options both for me and in the local chat of developers. In the end, it was agreed that the most likely cause of a
crash is
stack problems. This position did not explain the reason for the stable operation of the devices, but it was necessary to start somewhere. But before I began to look in the direction of the stack, I tried to understand where this
NaN still occurs
? . Just at that time, a
darkproger friend
was actively telling right and left how cool a
dtrace thing
was . And after a few hours, with the help of this tool, I managed to slightly reduce the search for the cause of the fall.
Dtrace magic
At that time, all I had was
stacktrace to the point of the fall. The problem was somewhere inside it, well, or something led to the fact that all of this safely fell.
#4 0x01d8b04a in -[CALayer setBounds:] () <---- NaN lives here #5 0x02d1d714 in WebCore::TileGrid::updateHostLayerSize() () #6 0x02d1af26 in WebCore::TileCache::TileCache(WAKWindow*) () #7 0x02d52507 in -[WAKWindow initWithLayer:] () #8 0x002ee5e9 in -[UIWebTiledView initWithFrame:] () #9 0x001af9b5 in -[UIWebDocumentView initWithWebView:frame:] () #10 0x001af89e in -[UIWebDocumentView initSimpleHTMLDocumentWithStyle:frame:preferences:groupName:] () #11 0x0012cb6a in -[UITextView commonInitWithWebDocumentView:isDecoding:] () #12 0x0012bf0e in -[UITextView initWithFrame:] ()
Using a relatively simple
script for dtrace, we managed to find out where the unfortunate
NaN appears. It turned out that the parameters are “lost” right under the very end:
QuartzCore`-[CALayer setBounds :] <------ (0, 0, NaN, height) WebCore`WebCore::TileGrid::updateHostLayerSize()+0x140 WebCore`WebCore::TileCache::TileCache(WAKWindow*)+0x1c6 WebCore`-[WAKWindow initWithLayer:]+0xd7 UIKit`-[UIWebTiledView initWithFrame:]+0xec " <------ (0, 0, width, height)" UIKit`-[UIWebDocumentView initWithWebView:frame:]+0xb5 UIKit`-[UIWebDocumentView initSimpleHTMLDocumentWithStyle:frame:preferences:groupName:]+0xfe UIKit`-[UITextView commonInitWithWebDocumentView:isDecoding:]+0x198 UIKit`-[UITextView initWithFrame:]+0x70 " <------ (0, 0, width, height)"
What to do? Apparently the problem is not in our code! With a heightened sense of self-importance, I'm already preparing to send a bug report to Apple. And still climb to watch how the
stack is doing there.
Spoiler alert!
If you are still going to solve this problem yourself, then it's time to stop reading this article;)
For those who are going to sort it out themselves, and thinks that the problems lie in working with the stack, or vice versa, they think that the problems are not related to the stack at all, a small spoiler
Do not read me!The problem is not related to the stack *
Stack troubles?
How in Xcode to see what is currently going on in the stack? What parameters are passed to the function? How are they transmitted and by what rules? And through the stack?
This is just a small part of the questions to which at that time I had no answers, but
darkproger helped
again (in general, Vova listened very diligently and patiently, or pretended to listen, all my delusional and not very good ideas of cause and effect;) , poking my nose in a very, very useful
documentation . To be more specific,
to a specific part of it . After a cursory reading of technouts, I was able to display the contents of the stack.
However, these data did not help my understanding of what was happening. The fact is that the stack was identical in both cases. No difference. All values ​​in the memory were exactly the same and the stack registers referenced the same memory addresses. I checked several times, restarted the application, checked memory addresses ±. It all matched.
What's next?
As I said, all the values ​​in the memory addresses match. But something was still different. Something was different. True, the search for this "something" temporarily faded into the background - I found the unfortunate
NaN .

It was placed in one of the
st registers of the FPU processor. Since I did not know what the registers were, I had to get into the
documentation from Intel . According to the documentation, the group of registers
st0-st7 is a cyclic stack for the FPU, through which the processor instructions are responsible for the operations with floating point. Everything has become clear. Someone puts the
NaN value in the FPU stack, and then removes it as a result of the operation. It remains to find only who does it.
About an hour, I killed on a suddenly appeared piece of magic, which was explained by a simple "read the documentation carefully." This magic manifested itself as follows. “Wrong” the
fstp instruction worked (page 190). The team should perform the following operations: Take the current value from the top of the FPU stack of
st0 , write it to the specified address (in my case, in
$ ebp-0x64 ), and push the value out of the FP stack.
That's only after the execution of this instruction, at the address
$ ebp + 0x64 it turned out to be 0, instead of the value 320 I need.
Before executing the
fstp command
After executing the
fstp command
Attentive experts probably already see the problem. Those who want to verify this behavior themselves - you can download a test project and put a breakpoint on the
WebCore :: TileGrid :: updateHostLayerSize () function. Inside it is easy enough to find the desired piece. Who is just interested in the reason for my provtyk - feel free to open the spoiler.
And $ ebp-0x64, they say, is not real!The reason is data types. The fstp command writes a 40-bit floating point chilso into memory, which is commonly referred to in the common people as long double . In this example, in the debugger, I look at the correct address, only I’m showing not 40 bits, but 32 bits. Everything fell into place when I indicated the correct data type. An error that cost me an hour and a half of search might not have occurred if I had read the documentation more carefully. Moral: read the documentation. It is useful and saves time.
Stack overflow?
After I figured out with
fstp , and
fld . I began to look for differences again. And since I already knew that the reason was not related to the stack, I concentrated on the values ​​of the processor registers. And here there were differences. And it even became clear where
NaN comes from. At first I looked at the
$ ftag register , the value of which immediately told me about the presence of something bad in the FPU stack registers (page 182).
This could, of course, be observed just by looking at
$ st0- $ st7 , but
$ ftag lets you know what state the registers are at the moment - which are used, which are busy, and which are empty. In the example you can see that all the registers are occupied (valid), and only one of them has something special.
But finally, all the questions about magic disappeared when I saw the values ​​of the
$ fstat register (x87 FPU Status Register, page 178).

Stack overflow
Stack overflow in FPU processor registers. In fact, the stack overflow itself does not cause any exceptions, and only puts a flag in the control word that an exception has occurred.
If it is not valid, it is not necessary to know whether it was a funeral or a rule. This is the override instruction.
But, if we use the value of the register with the invalid state, we get
NaN , which, once in memory, comes as a parameter when creating
CALayer 'a.
So, I descended to the lowest level of the problem. I knew exactly where the magic
NaNs appear in the program, and about stack overflow in FPU stack registers. That's just the answer, where and when it happens, I did not have. The idea was born again in my head that this fits
UIKit , but you can’t make a claim without proof. On the other hand, I had no idea how the code could trigger FPU Stack Overflow (the existence of which I learned that day).
Culprit
For a relatively short time, the function that was responsible for everything that happened was localized. Here it is, the culprit of all that is happening!
// ACLabel : UILabel - (CGFloat)resizeToContents { CGSize size = [self.text sizeWithFont:self.font forWidth:self.frame.size.width lineBreakMode:self.lineBreakMode]; CGRect oldFrame = self.frame; oldFrame.size.height = size.height; self.frame = oldFrame; return size.height; }
Trying to find something scary? Try. I tried too. Long and tedious. Rewrote a method, rename, make a category. And all in vain. It was this method that left trash in the FPU Stack. Moreover, magic has acquired a new form. The same method left the garbage in the stack through time. That is, the “sometimes” method normally “removed” the value from the top of the stack, and sometimes it completely forgot about it, and ignored it, thereby bringing the crash closer. So what's wrong with this code?
It is a concept that it will be on the air. In the case of the returning of the corrected state, it should be noted.
“How can I feel better about this!” (Sarcasm). Now the vector of guilt has shifted smoothly from
UIKit 'to the compiler again. After all, he must remove the value from the FPU stack! Should, but does not. Rather makes ... but not always. So when?
One step to win
And I will answer you this question. But first, I have a small question: “Are the lines given in the example identical?”
// // â„–1 [obj someMethod]; // â„–2 [obj perfromSelector:@selector(someMethod)];
A simple question like that. He is even asked during interviews. And you can't even say right away that they can work in different ways. After all, in fact, this is the same thing, just a side view. However, the example is greatly simplified due to the absence of the fact that these methods are returned. It is understood that they return
void . And if methods returned
id , or
int or
float . Would anything change?
While you are thinking about the answer to the question, I will give a short extract from the documentation:
The method is equivalent to the receiver. For example, all the same thing:
id myClone = [anObject copy]; id myClone = [anObject performSelector:@selector(copy)]; id myClone = [anObject performSelector:sel_getUid("copy")];
. For methods that return anything other than an object, use NSInvocation.
When the hook is in bold, then, of course, it is noticed. The documentation quite clearly hints that if a method should suddenly return something other than
id , we need to use
NSInvocation . But I will say that even if you return a non-object, then everything works. It worked, anyway. Until that ill-fated day.
Profit
Perhaps many have already guessed what is happening here. Everything is quite simple (especially after reading the article, well, or two and a half days of games with dtrace, debugger, digging into the documentation, and so on).
- The compiler sees the method that the float should return, and regularly generates code that, after calling the method, removes the returned value from the FPU stack.
- In the case of using the performSelector method, the compiler does not know what the method returns, so it hopes that the programmer is conscientious - he believes that the method will return id .
- If the code uses performSelector for the method that should return a float (even if you do not use the returned result), then the compiler does not know that you need to remove the returned value from the FPU stack. So, with each call to the return-float method via the performSelector , one of the registers of the FPU stack is “clogged”.
- After eight such calls (8 st-registers), the processor will set a flag stating that the stack is full, and some normal floating-point operations begin to return NaN as a result. This, in turn, leads to anything. To the extent that a dinosaur can run behind your back.
Moral : Read the documentation carefully. If something works contrary to the documentation - do not really find that it does not entail major problems.
NSLog
“Hey, hey, so what about NSLog? Why does NSLog “cure” this problem? ”
NSLog uses floating point operations for its work. Moreover, by typing, it was established that for his work two registers are needed in the FPU stack. And the FPU stack, in the case of full filling, sets the appropriate overflow flag. And during an operation that removes the value from the stack, it marks one of the registers of the stack as “free to use”.
// Tag word (state of st0 - st7 registers) // 00 - Used // 10 - Invalid // 11 - Unused // before NSLog st0 st1 st2 st3 st4 st5 st6 st7 10 00 00 00 00 00 00 00 // After NSLog st0 st1 st2 st3 st4 st5 st6 st7 11 11 00 00 00 00 00 00
In this way, the state Invalid disappears from the
st0 register, because of this, no one gets
NaN when trying to take a value from it, and does not pass it on as a parameter to the
UITextView .
So,
NSLog does not cure the problem. He gives her to live a little longer, postponing the death of the application until the time codenamed "Suddenly."
Here is such a "magic". Now you can safely write in your resume line
I treat Crash'and NSLog'ami. Inexpensive. Years of experience. 100% guaranteed.
I hope it was not only interesting, but also in some places - informative.
PS Thanks again
darkproger and
vixentael for helping and listening to my stream of thoughts.