Swift for data scientist: quick dive in 2 hours

Google announced that TensorFlow is moving to Swift . So put all your business aside, throw away Python and learn Swift urgently. And the language, I must say, is rather strange in places.

For starters, watch a small presentation explaining why Swift and how TensorFlow is associated with it:

The developers of TensorFlow, of course, do not forget about Python yet, but Swift will be the basis of the framework. Although it will be possible to write python-like code in it . But it will still be executed by the Python interpreter, and this again means slowly, non-parallelly, inefficiently from memory, without type control and everything else.

Therefore, we learn Swift from scratch. Well, not quite from scratch: it is assumed that you already program well in Python, and therefore many Swift constructions will be described further in comparison with similar Python constructions.
The article in no way claims a detailed description of the language. This is only the first very superficial acquaintance with the basic features of the language for those who know Python.

Common words

Swift is a pretty new language. This is good because it is based on a vast base of earlier languages. But at the same time, it is very bad, because it is not yet devoid of completely “childhood diseases”. Therefore, language evolves very quickly.
Despite the fact that the Internet is full of articles and tutorials on Swift - all of them are already outdated. Numerous recipes with StackOverflow probably will not work for you either, because they belong to previous versions of the language.

The chronology of the latest events: in March 2016, Swift 2.2 was released, and in September - already “strongly different” Swift 3, after a year - “again another” Swift 4. Current version 4.1, although Swift for Tensorflow is already 4.2-dev. Until the end of the year, Swift 5 will be released, in which there will be even more innovations even in the language itself, not to mention libraries.

In general, TLDR: the language is not yet ready for serious development for data science. Therefore, I allowed myself to spend only two hours to get acquainted with the language in its current form, so that in six months it would be easier to dive into Swift 5 with the already new version of TensorFlow.

Variables and constants

All variables are strongly typed, so when declaring it is necessary to specify the data type. Fortunately, the compiler is able to determine the type of the initial value.

let intConst = 5 let strConst = "strings should be in double quotes" var nonInitVar: Int //         var intVar = 10 var floatVar = 10.0 var doubleVar: Double = 10.5 var strVar = "double quotes only"

As you may have guessed, constants are declared using let , and variables through var . As a good form and to optimize the calculations for all values that will not change during the execution of the program, it is recommended to let . Otherwise, everything is simple and clear.

Range

In Python, there is a slice , and in Swift, there is a whole bunch of Range types: open, closed, incomplete from below, etc.
They are set quite literally in literals (but it could have been even shorter):

 1...5 //  1  5  1..<5 //   5 ...5 //    5 2… //  2

You can use the stride function to specify ranges with a step other than one, or with non-integer numbers:

 for i in stride(from: 0.1, to: 0.5, by: 0.1) { print(i) }

 for i in stride(from: 0.1, through: 0.5, by: 0.1) { print(i) }

It is impossible to guess, you just need to know that in the first case (with to ) the range is open on the right (ie, 0.5 does not turn on).

Strings

Each language has its own monstrous stupidity. Swift's developers decided that they would have lines. First, there are two string types, String and Substring. They are very similar, but differ only in that the Substring does not have its own memory region, and it always points to some String for a piece of memory. The idea is clear and correct, but all these nuances could be easily hidden in the implementation of String.

Further worse. How to get a substring from a string? You think there is something like in Python - str[1:10] . Nothing like this! Cannot index strings with integers. So how should it be?

 str[str.index(str.startIndex, offsetBy: 1) ..< str.index(str.startIndex, offsetBy: 10)]

I am not kidding. This is the official way to work with strings. Every stupid idea has a long and meaningless explanation. This case is no exception .

Notice again the string str[str.index(str.startIndex, offsetBy: 1) ..< str.index(str.startIndex, offsetBy: 10)] . Everything is invariant in it, except for two integers. In other words, 84 of 87 characters are superfluous!

To make it human, we write an extension for a standard type String:

 extension String { public subscript(i: Int) -> Character { return self[index(startIndex, offsetBy: i)] } public subscript(r: Range<Int>) -> Substring { var a = Array(r) let start = index(startIndex, offsetBy: a[0]) let end = index(startIndex, offsetBy: a[-1]) return s[start...end] } }

Starting ... does not work! The compiler swears:

 error: 'subscript' is unavailable: cannot subscript String with an integer range, see the documentation comment for discussion

The fact is that in Swift there is an explicit hardcode that prohibits the creation of a subscript method that accepts a range of integers.

Okay, let's go the other way, 10 times longer, apparently this is the Swift-way:

 extension String { public subscript(i: Int) -> Character { return self[index(startIndex, offsetBy: i)] } public subscript(bounds: Range<Int>) -> Substring { let start = index(startIndex, offsetBy: bounds.lowerBound) let end = index(startIndex, offsetBy: bounds.upperBound) return self[start ..< end] } public subscript(bounds: ClosedRange<Int>) -> Substring { let start = index(startIndex, offsetBy: bounds.lowerBound) let end = index(startIndex, offsetBy: bounds.upperBound) return self[start ... end] } public subscript(bounds: PartialRangeFrom<Int>) -> Substring { let start = index(startIndex, offsetBy: bounds.lowerBound) let end = index(endIndex, offsetBy: -1) return self[start ... end] } public subscript(bounds: PartialRangeThrough<Int>) -> Substring { let end = index(startIndex, offsetBy: bounds.upperBound) return self[startIndex ... end] } public subscript(bounds: PartialRangeUpTo<Int>) -> Substring { let end = index(startIndex, offsetBy: bounds.upperBound) return self[startIndex ..< end] } }

And as an exercise, copy all of this text again for the Substring type. Language is not for brevity, it is already clear.

But now you can work normally with strings:

 var str = "Some long string" let char = str[4] var substr = str[3 …< 8] let endSubstr = str[4…] var startSubstr = str[...5] let subSubStr = str[...8][2..][1..<4]

Tuple

The immutable sequence of values, or tuple , is a little different in Swift than in Python. Here, it’s more like a mix of tuple and namedtuple .

 let tuple = (100, "value", true) print(tuple.0) // 100 print(tuple.1) // "value" print(tuple.2) // true var person = (name: "John", age: 24) print(person.name, person.age) //     let tuple2 = (10, name: "john", age: 32, 115) print(tuple2.1) // john print(tuple2.name) // john

But you cannot unpack the tuple in the function arguments. It used to be possible. Then banned . Perhaps in the future they will introduce back.

Collections: Arrays, Sets, and Dictionaries

The array ( Array ) is similar to the python list so that its size can be changed, but all the elements of the array must have the same data type. With Set the same story: as in the Python set, you can change the composition of elements, but not the type. In the dictionary, you will have to define two types: for keys and for elements.

 let immutableArray = [5, 10, 15] var intArr = [10, 20, 30] var nonInitIntArr: [Int] var emptyArr: [Int] = [] var otherEmptyArr = [Int]() var names: [String] = ["John", "Anna"] var noninitSet: Set<String> var emptySet: Set<Int> = [] var otherEmptySet = Set<Int>() var emptyDict: Dictionary<Int, String> = [] var strToArrDict: Dictionary<String, [Int]> var fullDict: Dictionary<String, Int> = ["john": 24, "anna": 22] let allKeys = fullDict.keys let allVals = fullDict.values

By the way, if you try to iterate through the dictionary in the most expected way:

 for k in fullDict.keys { print(k, fullDict[k]) }

then suddenly get a bunch of warnings from the compiler, because the type of values in the fullDict dictionary is not actually Int , but Optional<Int> (that is, it can be nil or int ). Let's talk about Optional separately, and it is more convenient to iterate with tuples:

 for (key, val) in fullDict { print(key, val) }

Cycles

Standard collection bypass:

 for item in collection { // ... }

It is convenient to work with ranges:

 for i in 0...10 { // ... }

If indices are needed selectively, the construction is dramatically lengthened:

 for i in stride(from: 0, to: 10, by: 2) { // ... }

There is also

 while someBool { // ... } repeat { // ... } while otherBool

Functions

Everything as expected:

 func myFunc(arg1: Int, arg2: String) -> Int { // do this // do that return someInt }

The apparent difference from Python can be seen in the fact that the argument can have not only the name and type, but also a label:

 func fn1(a: Int, b: Int){ //     } //        fn1(a: 1, b: 10) func fn2(from a: Int, to b: Int){ // ... } //        fn2(from: 1, to: 10) fn2(a: 1, b: 10) //     func fn3(_ a: Int, to b: Int){ // _ -      } //      fn3(1, to: 10)

There are lambdas, here they are called closure

 { (arg1: Int, arg2: String) -> Bool in // ... return someBool })

Naturally, closure can be passed to functions. And here a new suddenness opens:

 someFunc() { // do this // do that return someInt }

It looks like the definition of a function, but without the word func . But in reality it is a call to the function someFunc , to which the last argument is passed to the closure specified in curly brackets. By the way, if closure is the only function argument, then parentheses can be omitted.

 let descArray = array.sorted { $0 > $1 } let firstValue = array.sorted { $0 > $1 }.first

Classes and Structures

To create complex data types, classes and structures are provided:

 struct MyStructure { public var attr1: Int private var count = 0 init(arg1: Int) { attr1 = arg1 } public func method1(arg1: Int, arg2: String) -> Float { // ... return 0.0 //        Float } } class MyClass { public var attr1: Int private var count = 0 init(arg1: Int) { attr1 = arg1 } public func method1(arg1: Int, arg2: String) -> Float { // … return 0.0 //        Float } }

It looks like there is no difference, but it is still there:

default structures are immutable, so methods that change attribute values should be preceded by the mutating ;
structures are always passed by value, and classes by reference;
classes can be inherited.

As you already know by the lines, classes and structures have a convenient subscript method (analogous to the python __getitem__ and __setitem__ ), which allows indexing the data so that instead of:

 let item = someClass.getItem(itemIndex) let item = someClass.getSubsetOfItems(fromIndex: 0, toIndex: 10)

write more compact:

 let item = someClass[itemIndex] let aFewItems = someClass[0...10]

It is implemented like this:

 class MyClass { private var myData = [Int: Double]() public subscript(i: Int) -> Double { get { return myData[i]! } set { myData[i] = newValue } } }

You probably ask: what is newValue for? And this is another implicit convention - if no arguments are set for set ', then the value is passed through the variable newValue .
So, what does the exclamation mark after myData[i] mean?

Optional

In a strongly typed language, a special way of dealing with missing values is needed. In Swift, there is an Optional for this, which takes the value of the type defined during the declaration of a variable or the value nil .

 var opt: Optional<Int> var short: Int? var anOpt: Optional<Int> = Int(32) var oneMore: Int? = nil

How to work with it?

 if opt == nil { print(" ") } else { print(" =", opt!) //      }

Operator ! designed to force unpacking (eng. “force unwrapping”) values. If opt was nil , you will get a runtime crash.

Another more recommended construction is the if-let block:

 if let val = short { print("val - ''  short,  Int: ", val) } else { print("short  nil") }

There is an operator for expanding an optional with assigning a default value to it ?? which is simply called a nil-coalescing operator.

 print(oneMore ?? 0.0) //   ,    0.0

In addition, Swift allows you to conveniently get the value of the attribute, even if it is packed in the depth of a complex structure:

 if let cityCode = person?.contacts?.phone?.cityCode { //   , //      , //    , //      } else { //   -  print("  ") }

Python

Swift for Tensorflow claims to work with Python so that you can write Python-like code:

 import Python let np = Python.import("numpy") let a = np.arange(15).reshape(3, 5) let b = np.array([6, 7, 8])

But now it does not work and you can only write like this:

 import Python let np = Python.import("numpy") let a = np.arange.call(with: 15).reshape.call(with: 3, 5) let b = np.array.call(with: [6, 7, 8])

When it will be done in a normal way, it is not yet known, because changes in the language are needed, and the proposal for a change has not even been submitted, although it is being actively discussed (for the third time and so far to no avail).

In addition, Swift can now only work with Python 2.7 installed in / usr / local / lib / python27 (again, the hardcode). Not compatible with any virtual environments. Due to the difference between Python 2 and 3 from the point of view of C-data structures and C-calls, this problem will not be solved in the near future either.

Tensorflow

Finally we got to the main thing, for which everything was started.

Let's start with the multiplication of matrices:

 import TensorFlow var tensor = Tensor([[1.0, 2.0], [2.0, 1.0]]) for _ in 0...100000 { tensor = tensor * tensor - tensor }

It looks more beautiful, shorter and clearer than on Python with tf.while_loop along with creating a session and initializing variables. That's just so far slower, and at times. And the GPU, of course, is not supported.

By the way, for matrix multiplication it is necessary to use not * and not @ , but the remarkable sign. Try typing it from the keyboard.

Let's make a neural network already! .. Although we won't do it: there is no documentation, there are no ready-made layers, there are no optimizers - in general, there is nothing yet. Of course, you can manually multiply the tensors, calculate the gradients and change the weights (see the above video and the only example ). But we will not do this, of course.

Conclusion

The language is interesting, only for real use in data science is not yet ready. Let's wait.

Source: https://habr.com/ru/post/354876/

All Articles