The article is designed for beginners in Objective-C and talks about one way to shoot yourself in the foot. We will try to create two different NSString objects with the same text, examine the reaction of different compilers to this, and also find out under what conditions NSLog (@ "% @", @ "123456789") will not produce "123456789" at all.NSString objects and pointers
What do you think will output the following code?
#import "Foundation/Foundation.h" int main(){ @autoreleasepool { NSString *a = @"123456789"; NSString *b = a; NSLog(@"%p %p", a, b); } return 0; }
Naturally, the pointers will be equal (“objects are assigned by reference”), so NSLog () will print two identical memory addresses. No magic
2015-01-30 14: 39: 27.662 1-nsstring [13574]
0x602ea0 0x602ea0')
Hereinafter, the addresses of the objects are given as an example; when you try to play back, the actual values will, of course, be different.Let's try to ensure that we have two
different NSStrings with the
same text. In the case of other standard classes, for example, NSArray, we could write this:
#import "Foundation/Foundation.h" int main(){ @autoreleasepool { NSArray *a = @[@"123456789"]; NSArray *b = @[@"123456789"]; NSLog(@"%p %p", a, b); } return 0; }
Since we initialized NSArray separately, they were placed in different memory areas and two different addresses will be highlighted in the console:
2015-01-30 14: 40: 45.799 2-nsarray [
13634 ]
0xa9e1b8 0xaa34e8However, applying the same approach to NSString will not lead to the desired effect:
#import "Foundation/Foundation.h" int main(){ @autoreleasepool { NSString *a = @"123456789"; NSString *b = @"123456789"; NSLog(@"%p %p", a, b); } return 0; }
2015-01-30 14: 41: 41.898 3-nsstring [13678]
0x602ea0 0x602ea0As you can see, despite the separate initialization, both pointers still refer to the same memory area.
Using stringWithString
Having a little rummaged in NSString, we find out the
stringWithString method, which "returns a string created". So this is what we need! Let's try the following code:
#import "Foundation/Foundation.h" int main(){ @autoreleasepool { NSString *a = @"123456789"; NSString *b = [NSString stringWithString:@"123456789"]; NSString * = [NSString stringWithString:b]; NSLog(@"%p %p %p", a, b, ); } return 0; }
It turns out that the output of this program depends on the version of the compiler used. So
clang under Ubuntu on LLVM 3.4 will actually create
three different objects located in different memory cells. But compiling the specified code in Xcode using
clang for Mac on LLVM 3.5 will generate only
one object and three pointers to it:
2015-01-30 17: 59: 02.206 4-nsstring [670: 21855]
0x100001048 0x100001048 0x100001048Exposure magic session
The aforementioned oddities are explained by compiler attempts to optimize string resources. Encountering string objects with the same content in the source code, it creates them only once for saving storage and comparison costs. This optimization is also performed at the linking stage: even if strings with the same text are in different modules, they will most likely be created only once.
Since the NSString type is immutable (NSMutableString is used for mutable strings), this optimization is safe. As long as we manipulate with strings only methods of the NSString class.
The compiler, however, is not all-powerful. One of the easiest ways to confuse it and actually create two different NSStrings with the same text is this:
#import "Foundation/Foundation.h" int main(){ @autoreleasepool { NSString *a = @"123456789"; NSString *b = [NSString stringWithFormat:@"%@", a]; NSLog(@"%p %p", a, b); } return 0; }
Gcc
Gcc performs a similar optimization of string constants when compiling C code. For example,
#include <stdio.h> void main(){ char *a = "123456789"; char *b = "123456789"; printf("%p %p\n", a, b); }
will output
0x4005f4 0x4005f4 .
However, there is a significant difference with the clang: gcc allocates such string constants in the read-only segment — attempts to change them in runtime (for example, a [0] = '0') will result in a segmentation fault. To place lines on the stack where they can be changed, you need to replace char * a with char a [], however in that case gcc will not apply optimization. The following code will create two different lines:
#include <stdio.h> void main(){ char a[] = "123456789"; char b[] = "123456789"; printf("%p %p\n", a, b); }
0x7fff17ed0020 0x7fff17ed0030Shooting in the leg
So, we know that meeting the same string objects in the source code, the compiler optimizes them and creates the NSString only once. At the same time, he creates it in the heap, where it can be changed with the help of manual manipulations with the pointer. (In plain C, as discussed above, this is impossible.)
Guess what the following code prints?
#import <Foundation/Foundation.h> void bad(){ NSString* a = @"123456789"; char* aa = (__bridge void *)(a); aa[8] = 92; } int main(){ @autoreleasepool { bad(); NSLog(@"%@", @"123456789"); } return 0; }
Depending on the compiler, the result can be different: my Xcode under the Mac prints the krakozyabr set "㈱ 㐳 㘵 㠷 9", and the clang in Ubuntu displays a fragment from the service information "red: pars". In any case, this is not the expected "123456789". Experiments with other values of aa [8], as well as aa [16], I suggest the reader to do it yourself.
Worst of all, the bad () function from the last example may be behind the header, for example, in the plug-in library of another author, who, according to his needs, changed his personal (as it seemed to him) NSString. A smart compiler will still find the matching string constants and close them to one pointer, after which breaking the variable inside bad () will turn the string in the context of main () into hieroglyphs.