📜 ⬆️ ⬇️

Library development: from API to public release

Let's look at the library from the wrong side, which is more familiar to us, that is, the user side, but from the point of view of the developer of the mobile development library. Let's talk what approaches should be followed when developing your library. We begin, of course, with designing such an API that you yourself would like to use, which would be convenient. We will think about what needs to be considered to make not just a working code, but a really good library, and we will get to release a real adult public release. Asya Sviridenko , who will share her considerable experience in developing the mobile SpeechKit library in Yandex, will help us in this.

The material will be useful not only to those who develop the library or framework, but also to those who want to separate part of their application into a separate module, and then reuse it, or, for example, share their code with the rest of the developer community, putting it in public access.

For everyone else, the story will be filled with genuine stories from the life of the mobile SpeechKit team, so it should be fun.
')

Content




Minute SpeechKit


I will not ask if you heard about SpeechKit, because even inside Yandex, not everyone knows what it is.

SpeechKit is the door to all Yandex speech technologies . With this library, you can integrate speech technology into your application: speech recognition and synthesis, voice activation.

You’ve probably heard about Alice’s voice assistant - she’s just running on SpeechKit. SpeechKit itself does not include recognition or synthesis, it happens on the server. But it is through our library that everything can be integrated into the application.

A question usually follows: if everything happens on the server, what does the library do? Why is it needed?

The library does a lot:

  1. Synchronization of all processes. For example, using a voice assistant, the user presses a button, says something, interrupts the assistant, makes requests — it all goes through the library. For the user of our library, this is transparent; they should not worry about all this.
  2. Networking Since everything happens on the server, you need to receive data from there, process it, give it to the user. Now SpeechKit can go to several different servers within one network connection: one recognizes, the other allocates meaning, the third recognizes music, etc. It's all hidden inside the library, users don’t have to worry about it.
  3. Work with audio sources. We deal with the speech of a person, and work with audio also takes place inside SpeechKit. And we can not only write from a standard device, but also receive data from anywhere. It can be a file or stream - we can work with all this.

SpeechKit is used in internal commands. Now it was integrated by 16 teams of Yandex. And we even know about several external teams that have done this too.

Design


Let's think about what we mean by a convenient application. Usually, this is a thoughtful and understandable UX, the solution of our problems, stable work, etc.

When we say that a library is convenient, first of all we mean that it has an API that is understandable to use. How to achieve this?

Basic principles


These are some aspects that I learned from my experience with SpeechKit.


On the one hand, this is good, because you don’t explain to ordinary users: “You see, we have a backend and therefore nothing works, and we’re fine!” You can explain this to developers - you can explain a lot to developers!

On the other hand, you get such users who will definitely take the opportunity to find a hole and break something if you leave it. We all use libraries with you and try to squeeze the maximum out of them. They declare that they do only this, this and this, and we think: "No, now we are here a little bit podshamanit, we will give it, and everything will be as it should."

In addition, the fact that users are developers means that you will always have a lot of advice and recommendations on how to develop and how to make things better.

The second important point is fully correlated with the first.


If your users start doing something with the library that you did not anticipate, this is a direct path to bugs, and to those that are hard to debug. Try to use everything that gives the language and the technology that you use: public / private, final, deprecated, readonly. Reduce scopes, disable inheritance and the use of some methods, mark properties that cannot be changed - provide everything you can to prevent them from doing something your library is simply not designed for.


If this particular class can be created in a single way, deny all others. If this property cannot be null, indicate this explicitly. In iOS, there is a nullable / nonnull, designated initializer, the same is in Java and Android. Use all this in order for the user to open the file, open your class, run over it with his eyes and immediately understand what can and cannot be done.

Case SpeechKit API


Using SpeechKit as an example, I’ll tell you how we refactored version 2 into version 3. We changed the API a lot and tried to use all these principles.

The need arose from the fact that the API was complex and "theoretical . " It had global components that had to be called first - did not call - everything does not work. Very strange settings were made. The API was fairly “theoretical” because SpeechKit was originally part of the Navigator, and then this piece was brought to the library. The API basically worked with the cases used in the Navigator.

Gradually, the number of users grew, and we began to understand what they really needed: what methods, callbacks, parameters. They came to us with requests that the API did not allow to implement. This was repeated time after time, and it became clear that the API does not stand up. So we got involved in refactoring.

The refactoring process was long (six months) and painful (everyone was unhappy) . The main difficulty was not to take a mountain of code and rewrite. It was impossible to just go on refactoring, but it was necessary to support all the active versions that were in use. We couldn’t just say to our users: “Guys, yes, it doesn’t work for you, yes, you need this feature - we’ll do everything in version 3, please wait half a year!”

As a result, refactoring took a long time, and the process was painful, and for users too. Because in the end, we changed the API without backward compatibility. They came to them and said: “Here is a new, beautiful SpeechKit, please take it!” - in response, they heard: “No, we have not planned to switch to your version 3.0 at all”. For example, we had a team that switched to this version for a year. Therefore, we supported the previous version for a whole year.

But the result was worth it. We got simple integration and fewer bugs . This is what I mentioned in the basic principles of API design. If you are sure that your API is used correctly, there are definitely no problems in this part: all classes are called correctly, all parameters are correct. Finding bugs is much easier, fewer cases where something can go wrong.

Below is an example of what the main class that deals with recognition looked like before refactoring.

// SpeechKit v2 @interface YSKRecognizer: NSObject @property (nonatomic, strong, readonly, getter=getModel) NSString* model; @property (nonatomic, assign, getter=isVADEnabled) BOOL VADEnabled; - (instancetype)initWithLanguage:(NSString *)language model:(NSString *)m; - (void)start; - (void)cancel; - (void)cancelSync; @end @interface YSKInitializer: NSObject - (instancetype)init; - (void)dealloc; - (void)start; + (BOOL)isInitializationCompleted; @end extern NSString *const YSKInactiveTimeout; extern NSString *const YSKVADEnabled; @interface YSKSpeechKit: NSObject + (instancetype)sharedInstance; – (void)setParameter:(NSString *)name withValue:(NSString *)value; @end 

This is a common class that inherits from NSObject. Consider separately each of its details. It is clear that we can inherit from it, redefine some methods in it - everything that can be done with NSObject.

Further at creation two lines are transferred to it (language and model). What are these lines? If you pass in the language "Hello, world", then the output will be a translation, or what? Not very clear.

In addition, since this is the heir of NSObject, we can call it init, new, etc. What will happen? Will it work, or will it wait for some parameters?

Of course, I know the answers to these questions, I know this code. But the people who look at it for the first time do not understand at all what this is all about. Even methods with setter and getter do not look at all what it would look like in iOS. Methods start, cancel, cancelSync (and the one that just cancel - is it aSync?) - what happens if you call them together? A lot of questions to this code.

Next comes the object I was talking about (YSKInitializer), which you definitely need to start in order for everything to work - this is generally some kind of magic. It can be seen that this code was written by developers who do not write for iOS, but do C ++.

Further, the settings for this replicator were set through global components that were transferred to another global object, and in fact it was impossible to create two different replicators with different sets of parameters. And this was probably one of the most sought-after cases that did not support the API.

How v3 better v2


What did we get when we refactored and switched to version 3?


Now we have an iOS API like iOS-API, Android API like Android.

An important point that we did not immediately realize was that platform guidelines are much more important than the uniformity of your library's API.

For example, classes for Android are created using builders, because this is a very clear pattern for Android developers. In iOS, this is not so popular, so another approach is used: we create objects with a special class of settings.

I remember how we argued for a long time on this topic. It seemed to us important that the developer took our code on iOS or Android, and the coincidence would be 99%. But it is not. Better the code will be similar to the platform for which it is developed.


We need this object - these are its settings, you create them, you transfer - profit! That is, there are no hidden global settings that need to be transferred somewhere.


We threw out global components that confused everyone, scared and raised many questions even from the developers of this library, not only from the users.

Now the same class in the new version looks like this (this is still Objective-C - it was impossible to switch to Swift then).

 // SpeechKit v3 NS_ASSUME_NONNULL_BEGIN __attribute__((objc_subclassing_restricted)) @interface YSKOnlineRecognizer: NSObject<YSKRecognizing> @property (nonatomic, copy, readonly) YSKOnlineRecognizerSettings *settings; - (instancetype)initWithSettings:(YSKOnlineRecognizerSettings *)s audioSource:(id<YSKAudioSource>)as NS_DESIGNATED_INITIALIZER; + (instancetype)new __attribute__((unavailable("Use designated initializer."))); - (instancetype)init __attribute__((unavailable("Use designated initializer."))); @end NS_ASSUME_NONNULL_END @protocol YSKRecognizing <NSObject> - (void)prepare; - (void)startRecording; - (void)cancel; @end @interface YSKOnlineRecognizerSettings: NSObject<NSCopying> @property (nonatomic, copy, readonly) YSKLanguage *language; @property (nonatomic, copy, readonly) YSKOnlineModel *model; @property (nonatomic, assign) BOOL enableVAD; - (instancetype)initWithLanguage:(YSKLanguage *)l model:(YSKOnlineModel *)m NS_DESIGNATED_INITIALIZER; @end @interface YSKLanguage: YSKSetting + (instancetype)russian; + (instancetype)english; @end 

This is the heir to NSObject, but now we are clearly talking about not being inherited from it. All methods that are characteristic of this object are transferred to a special protocol. It is created using the settings and audioSource. Now all the settings are encapsulated in a single object, which is passed specifically here to set the settings for a specific recliner.

Moreover, we have taken out the work with audio from here, that is, the installer now is not the component that writes audio. This component deals with recognition issues, and any source can be passed here.

Other creation methods via new or via init are prohibited, because this class needs default settings. Please, if you want to use it, create at least some default settings.

The main thing is that those settings that are transmitted here are immutable, that is, you cannot change them in the process of work. Do not try, when something is recognized, to replace the model or language. Accordingly, we do not give users the opportunity to change an object with settings that has already been transferred.

NS_ASSUME_NONNULL_BEGIN / NS_ASSUME_NONNULL_END macros in order to emphasize that these settings cannot be null: audioSource cannot be null - it all must have some specific value in order to work.

As I said, the start and cancel methods (cancelSync gone) moved to a separate protocol. There are places in the library in which you can use any other not our recliner. For example, we use Apple native, which implements this protocol and into which our components can be transferred.

The settings here are NSCopying so that we can copy them, and they could not be changed during work. In init, the required parameters are language, model, and NS_DESIGNATED_INITIALIZER. It does not show a piece of code that is identical to the deprecate methods, but the idea is clear. These are required parameters with which settings are created. They must be, and must be nonzero.

The rest of the set - it is about 20 settings of the rekgnizer set here. Even the settings of a language or model are also separate classes that do not allow us to transmit something abstract, with which we cannot work. That is, we clearly say: “Please do not give us something with which we are not able to work. The compiler won't let you do it. ”

So, we talked about what can be done with the API. The development also has its own nuances.

Development


First of all, the library should do the things for which you wrote it — perform its functionality well. But you can make your code a really good library. I offer several remarks that I collected in the process of developing SpeechKit.

Code not only for yourself


Collecting Debug information is absolutely necessary, because you do not want users to say that their service is not working because of your library.

IOS has a debug information level that shows what information you need to collect. By default, it will collect absolutely everything it can find: all calls, all values. This is great, but it is a very large amount of data. The -gline-tables-only setting allows you to collect information about function calls. This is more than enough to find the problem and fix it.

This is enabled in the Xcode settings (Build Settings), and is called the debug information level. For example, by turning on this setting, we reduced the size of the SpeechKit binary file from 600 MB to 90 MB. This is not very necessary information and we just threw it out.

The second important thing is to hide private characters . You all know that every time you lay out your library in iTunes, you risk getting a new warning that you are using something wrong, you are not adding something. Therefore, if you use libraries that Apple considers private, don't forget to hide them. It means nothing to you, you can also work with them, but as soon as your users try to upload the application with your library to iTunes, they will get an error. Not everyone will ask you to fix it, most simply refuse to use your solution.

Avoid character conflicts : add prefixes to everything that you have, to your classes, to categories. If the library has a UIColor + HEX category, be sure that your users have exactly the same category, and when they integrate your library, they will get character conflicts. And again, not everyone will want to tell you and tell about it.

Another question is when you yourself use third-party libraries in your library. There are a couple of nuances that are worth remembering. First, if you use something that appears in a version older than your library, don't forget to use Weak Linking (Xcode -> Build Phases -> Link Binary With Libraries -> Status is enabled). This allows not to fall, if suddenly this library is not.

Apple documentation describes in detail how it works. But weak linking does not mean that the library will not load if it is not used. That is, if it is important for your users to start the application and may not need that part of your library that uses a third-party library and takes time to start, weak linking will not help you. With it, the library still loads, whether it is used or not.

If you want to load in runtime, it will help to get rid of the linking problem at the start, then you need to use dlopen and dynamic loading. This requires a lot of fuss, and you must first understand whether it makes sense. Facebook has posted quite an interesting code for an example of how they dynamically link.

Last - try not to use global entities inside . There are some global components in each platform. It is advisable not to pull them into your library. This seems obvious because it is a global object, and users of your library can take it and configure it the way they want. You use it in your library, you need to somehow save its state, reconfigure, then restore the state. There are many nuances, and there is where to go wrong. Remember this and try to avoid.

In SpeechKit, for example, up to the third version, we worked with audio inside the library, and we clearly set up and activated the audio session. An audio session in iOS is the kind of thing that every application has - don't say that you don't have it. It is created at the start, is responsible for the interaction of the application and the system media daemon and says what your application wants to do with audio. This is a singleton object in the truest sense of the word. We calmly took it, set it up as we needed, but this led to the fact that users had minor problems like changing the volume of the sound. Another method of audio sessions, which is responsible for setting the settings, is quite long. It takes about 200 ms, and this is a noticeable slowdown on activation or deactivation.

In the third version, I happily rendered an audio session from the library. After that, almost all users of all services that have SpeechKit integrated have been told how terribly unhappy they are. Now we need to know that there is some kind of audio session that needs to be specifically configured for our SpeechKit.

The conclusion from this is this: anyway, try not to use global entities, but be prepared for the fact that your users will not always be happy with your solutions.

Making users comfortable


How else can you help your users?


— , debug-. , , , , .


, — , . , , , 2% iOS 8, iOS 8. , . .

, , , , - . iOS 7. , , iOS 8 iOS 9. iOS 7, , , .

- : « , », — , , .


This is very “not very” for library developers. In the release I want to release everything that's ready. You made the features, fixed the bugs - now we'll put the whole pack and roll it into release. Release is also a process. For your users, this is not the case. When they are in the process of testing their product and preparing it for release, they do not want to get you to build with new features that need to be further tested.

We really had cases when we rolled back some releases, divided them into pieces and rolled them out into pieces. Then those teams for which we implemented the changes could take exactly the version in which there are small changes, but not all at once.

This is really not very convenient for development, but minimal increment in versions will make your users a little bit happier.

Tests do not happen much


, . .

, , , . , , , . , .

- , , -, , . , , — . , — . , . , , , .

, — , : , , , . , .

This will not protect against changes and the need to speed up the library, but it will help to figure out what went wrong. If you have graphics, it will help to monitor in real time what kind of functionality has added time delays or increased power consumption.

There is almost never time for this, because this is not the library's functionality, this is not why you develop it. But this is what helps you maintain it in good condition and in good quality.

Here , . . , , , SpeechKit . , . , , , , .

, , , . , , : , - , . , , . - , , — . , .

, . , . , , , .

Launch


Finally we come up the last important part - this is the launch. The code is written, a good API is designed so that users are comfortable. Now how is it all release a release.

I will start with local releases for users inside Yandex. The scheme here is the same as in the development of a regular application: regular, monthly or weekly releases.

The process consists of the usual stages, but when developing the library, each of these items has its own peculiarities.

Planning


For me, this is the most painful part, because the library has several product teams. In a typical application, there is one product manager who sets the tasks that the team prioritizes and one by one begins to do.

If there are several product commands, then each of them receives requests that must be processed in real time. I will give advice: if there is a person who knows how to deal with a multitude that arrives at one moment, tasks, try to take him to your team. Because there must be someone between all external managers and development - the one who will take over the functionality of prioritizing tasks.

SpeechKit , , , . , , - . , — . , n . , . , , - .

Development


Development, as in any application, usually begins as in a startup: we just work day and night, all these processes are not important to us. Then words about Agile-methodology are recalled and the construction of team work processes begins.

After we worked as a startup, we realized that there is a problem - unpredictability. No one could say exactly what features and when they will be launched. And it was very important!

Then we decided to try Scrum . He really helped, we began to plan a number of tasks, implement them, release them. That is, we sort of coped with the task of making releases predictable. I say “sort of like,” because we should not forget about the problem of several product teams.

Scrum , , , — — , . . — , - . , ? , : «, , , ». , , - , , Scrum . ! , .

Now we have switched to a kind of kanban. There is a board with tasks that are set in order of priority, and we just take the upper tasks. On the one hand, we lost in the predictability of our release. Now we cannot say to the teams using our library that if the task hit the board, then it will definitely be in the next release. But then we can say for sure that the most important tasks will fall into the release. Now it is more important for us.

Support


It is worth remembering that when you release a release it is not just one version that you sent and that someone took to use. Perhaps the changes you made in this release are also needed in other versions used by other teams. This is what I said about the minimum increment of versions. You can not tell your users: "We fixed the error in version 4, and you have version 3 - just go to the fourth." Sometimes it is possible to do this, but it is better not to abuse it. If there are any bugs or minor additions in the release, look at who has what versions and release the fixes for all versions that are currently being used.

From here follows the next point - all your releases should be fast.. Configure Continuous Integration so that you can really press one big red button and send to those versions you need, because there will be a lot of releases .

Prioritize it


A little bit about how we solved the problem with prioritizing tasks. I will highlight two types of tasks.

1. Grocery tasks.

Everything is clear here - first of all you need to look at the importance for the company . If Arkady had somehow come to us and asked for a killer feature to do for Yandex, we would, of course, have left everything and would have done it. Although he never did.

The release time of other commands is an important parameter for the priority of product tasks. If one feature is needed in a month, and another in a week, then it seems obvious what to do. But do not forget to warn the team that is waiting for the first feature that they have begun to do something more high priority.

2. Wishlist users.

, , , — , , , , , !

. , . , , -, .

— . SpeechKit . , , — , - . — , - .

We look further, as far as it will simplify life of the user . If the work leads to the fact that the user instead of 4 lines of code causes 2, it seems that this is not quite the right approach to prioritization. If a huge canvas of code is replaced by one call, or it becomes possible to do something that could not be done before, then we take it on the board.

The last is how long to implement it . When a feature is interesting, it’s cool, but to make it a month, you need to carefully weigh everything.

Documentation. It. Seriously


Especially for the library, because it is used by someone who did not write this code. Therefore, be sure to add documentation in the code . It should be written in files so that people can open, read, view help and see how to use all this.

Add a quick start . We are all looking for libraries like this: we find something, take a piece of code from GitHub, insert it to ourselves, launch it. Works - hurray, does not work - we look further. Having a quick start in the documentation will help to be closer to the users, your library will be easier to integrate and understand what it can.

After that, give examples of using so that you can understand how to do something more tricky and complex with your library, understand how to set up parameters, calls, etc.



, , :


Results



Yandex.SpeachKit on GitHub for iOS , for Android , and Mobile SDK documentation .

AppsConf — — 22 23 2019 , , .

. , , .

Source: https://habr.com/ru/post/429912/


All Articles