📜 ⬆️ ⬇️

Features of creating NSString

NSLog(123456789) != 123456789 The article is designed for beginners in Objective-C and talks about one way to shoot yourself in the foot. We will try to create two different NSString objects with the same text, examine the reaction of different compilers to this, and also find out under what conditions NSLog (@ "% @", @ "123456789") will not produce "123456789" at all.

NSString objects and pointers


What do you think will output the following code?
#import "Foundation/Foundation.h" int main(){ @autoreleasepool { NSString *a = @"123456789"; NSString *b = a; NSLog(@"%p %p", a, b); } return 0; } 

Naturally, the pointers will be equal (“objects are assigned by reference”), so NSLog () will print two identical memory addresses. No magic

2015-01-30 14: 39: 27.662 1-nsstring [13574] 0x602ea0 0x602ea0
')
Hereinafter, the addresses of the objects are given as an example; when you try to play back, the actual values ​​will, of course, be different.

Let's try to ensure that we have two different NSStrings with the same text. In the case of other standard classes, for example, NSArray, we could write this:
 #import "Foundation/Foundation.h" int main(){ @autoreleasepool { NSArray *a = @[@"123456789"]; NSArray *b = @[@"123456789"]; NSLog(@"%p %p", a, b); } return 0; } 

Since we initialized NSArray separately, they were placed in different memory areas and two different addresses will be highlighted in the console:

2015-01-30 14: 40: 45.799 2-nsarray [ 13634 ] 0xa9e1b8 0xaa34e8

However, applying the same approach to NSString will not lead to the desired effect:
 #import "Foundation/Foundation.h" int main(){ @autoreleasepool { NSString *a = @"123456789"; NSString *b = @"123456789"; NSLog(@"%p %p", a, b); } return 0; } 

2015-01-30 14: 41: 41.898 3-nsstring [13678] 0x602ea0 0x602ea0

As you can see, despite the separate initialization, both pointers still refer to the same memory area.

Using stringWithString


Having a little rummaged in NSString, we find out the stringWithString method, which "returns a string created". So this is what we need! Let's try the following code:
 #import "Foundation/Foundation.h" int main(){ @autoreleasepool { NSString *a = @"123456789"; NSString *b = [NSString stringWithString:@"123456789"]; NSString * = [NSString stringWithString:b]; NSLog(@"%p %p %p", a, b, ); } return 0; } 

It turns out that the output of this program depends on the version of the compiler used. So clang under Ubuntu on LLVM 3.4 will actually create three different objects located in different memory cells. But compiling the specified code in Xcode using clang for Mac on LLVM 3.5 will generate only one object and three pointers to it:

2015-01-30 17: 59: 02.206 4-nsstring [670: 21855] 0x100001048 0x100001048 0x100001048

Exposure magic session


The aforementioned oddities are explained by compiler attempts to optimize string resources. Encountering string objects with the same content in the source code, it creates them only once for saving storage and comparison costs. This optimization is also performed at the linking stage: even if strings with the same text are in different modules, they will most likely be created only once.

Since the NSString type is immutable (NSMutableString is used for mutable strings), this optimization is safe. As long as we manipulate with strings only methods of the NSString class.

The compiler, however, is not all-powerful. One of the easiest ways to confuse it and actually create two different NSStrings with the same text is this:
 #import "Foundation/Foundation.h" int main(){ @autoreleasepool { NSString *a = @"123456789"; NSString *b = [NSString stringWithFormat:@"%@", a]; NSLog(@"%p %p", a, b); } return 0; } 

Gcc


Gcc performs a similar optimization of string constants when compiling C code. For example,
 #include <stdio.h> void main(){ char *a = "123456789"; char *b = "123456789"; printf("%p %p\n", a, b); } 

will output 0x4005f4 0x4005f4 .

However, there is a significant difference with the clang: gcc allocates such string constants in the read-only segment — attempts to change them in runtime (for example, a [0] = '0') will result in a segmentation fault. To place lines on the stack where they can be changed, you need to replace char * a with char a [], however in that case gcc will not apply optimization. The following code will create two different lines:
 #include <stdio.h> void main(){ char a[] = "123456789"; char b[] = "123456789"; printf("%p %p\n", a, b); } 

0x7fff17ed0020 0x7fff17ed0030

Shooting in the leg


So, we know that meeting the same string objects in the source code, the compiler optimizes them and creates the NSString only once. At the same time, he creates it in the heap, where it can be changed with the help of manual manipulations with the pointer. (In plain C, as discussed above, this is impossible.)

Guess what the following code prints?
 #import <Foundation/Foundation.h> void bad(){ NSString* a = @"123456789"; char* aa = (__bridge void *)(a); aa[8] = 92; } int main(){ @autoreleasepool { bad(); NSLog(@"%@", @"123456789"); } return 0; } 

Depending on the compiler, the result can be different: my Xcode under the Mac prints the krakozyabr set "㈱ 㐳 㘵 㠷 9", and the clang in Ubuntu displays a fragment from the service information "red: pars". In any case, this is not the expected "123456789". Experiments with other values ​​of aa [8], as well as aa [16], I suggest the reader to do it yourself.

Worst of all, the bad () function from the last example may be behind the header, for example, in the plug-in library of another author, who, according to his needs, changed his personal (as it seemed to him) NSString. A smart compiler will still find the matching string constants and close them to one pointer, after which breaking the variable inside bad () will turn the string in the context of main () into hieroglyphs.

Source: https://habr.com/ru/post/249351/


All Articles