Proper API design: what are “one”, “many”, “zero” and “nothing”

Hello, our regular and occasional readers.

Today we want to offer you an interesting article on the design of the API and the associated pitfalls. Do not ask how we came across it, a creative search is a very non-linear matter.

Enjoy reading

')
Overview

When designing an API, there are many factors to consider. Safety, consistency, condition management, style; this list seems endless. However, one factor is often overlooked - it is about scale. If the design of the API from the very beginning take into account the scale of the system, then later (as the system grows) you can save hundreds of hours of working time.

Introduction

Sometimes it is difficult to formulate what is an application programming interface (API). From a technical point of view, any function called by another programmer’s code can be attributed to the API. Discussions about which code “pulls” on the API are beyond the scope of this article, so we will assume that the API is the simplest function.

This article has specially selected simple examples that serve only to illustrate its main theme. Functions in C # are used, but the basic principles outlined here are applicable in virtually any language, framework, or system. The data structures in the article are modeled in a common relational style used in many industrial databases. Again, the examples are written only as illustrations; they should not be considered as recommendations.

Requirements

Suppose you are writing a simple order processing system for a client, and you already have three main classes (or, if you prefer, “data structures”). The Customer class has a “foreign key” (in database terminology) for the Address class, and the Order class has foreign keys for the Address and Customer classes. Your task is to create a library that can be used to process orders (Orders). The first business rule for this case: the client’s HomeAddress state (Customer) must be the same as the BillingAddress state of the Order. Do not ask why, business rules are usually not understandable by the mind :)

public class Address { public int AddressId { get; set; } public string Street { get; set; } public string City { get; set; } public string State { get; set; } public string Zipcode { get; set; } } public class Customer { public Address HomeAddress { get; set; } public int CustomerId { get; set; } public int HomeAddressId { get; set; } public string CustomerName { get; set; } } public class Order { public Customer MainCustomer { get; set; } public Address ShippingAddress { get; set; } public Address BillingAddress { get; set; } public int OrderId { get; set; } public int CustomerId { get; set; } public int ShippingAddressId { get; set; } public int BillingAddressId { get; set; } public decimal OrderAmount { get; set; } public DateTime OrderDate { get; set; } }

Implementation

Checking whether the two fields match is obviously a simple task. You hope to impress the boss, so they concocted a decision in less than 10 minutes. The VerifyStatesMatch function returns a boolean value by which the caller can determine whether a business rule is being executed or not. You run your library through a few simple tests and make sure that an average of 50 ms is spent on executing the code, no jambs are visible in it. The boss is very pleased, gives your library to other developers to use it in their applications.

 public bool VerifyStatesMatch(Order order) { bool retVal = false; try { // ,     25 . Customer customer = SomeDataSource.GetCustomer(order.CustomerId); // ,     25 . Address shippingAddress = SomeDataSource.GetAddress(order.ShippingAddressId); retVal = customer.HomeAddress.State == shippingAddress.State; } catch (Exception ex) { SomeLogger.LogError(ex); } return retVal; }

Problem

The next day, come to work, and you have a sticker on the monitor: “Come to me urgently - Chef”. You can guess that yesterday you were so successful with your library that today the boss decided to entrust you with an even more serious task. However, it soon turns out that your code has serious problems.

You : Good morning, chief, what happened?
Chief : This is your library, there are sheer problems in the code!
You : What? How?
Boss : Bob says your algorithm is too slow, John complains that everything is not working properly, and Steve said this: “the link to the object does not indicate the instance of the object”.
You : I 'll never know, I tested it yesterday, and everything was fine
Chief : I do not want to hear anything. Go and make out!

Not the best start to the day, right? It seems to me that most developers have ever encountered a similar situation. You thought you wrote the library “perfect,” and she brought a whole bunch of problems. But if you correctly understand what “One”, “Many”, “Zero” and “Nothing” are, then you will learn to distinguish where your API does not meet the expectations of your colleagues.

One

en.wikipedia.org/wiki/The_Matrix

The first guide to action is to understand what “One” is and how to work with it. I mean that your API should in any case handle one portion of the expected input without any errors. Such errors are theoretically possible, but you are not required to report them to the caller. “Isn't that obvious?” You might think. Well, let's turn to an example and consider what errors can occur during Order processing.

 Customer customer = SomeDataSource.GetCustomer(order.CustomerId); Address shippingAddress = SomeDataSource.GetAddress(order.ShippingAddressId); //   customer.HomeAddress       null? retVal = customer.HomeAddress.State == shippingAddress.State;

As is clear from the above comment, we assume that the HomeAddress property has correctly loaded from the data source. Although in 99.99% of cases it probably will, a truly reliable API should take into account such a scenario when this does not happen. In addition, depending on the language, the comparison of the two State properties may fail if any of these properties load incorrectly. In this case, it is important that we know nothing about the input that we can receive, or about the data extracted from the code that we do not control.

This is the simplest example, so let's fix our code and move on.

 Customer customer = SomeDataSource.GetCustomer(order.CustomerId); Address shippingAddress = SomeDataSource.GetAddress(order.ShippingAddressId); if(customer.HomeAddress != null) { retVal = customer.HomeAddress.State == shippingAddress.State; }

Lot

msdn.microsoft.com/en-us/library/w5zay9db.aspx

We return to the above scenario. We need to talk to Bob. Bob complained that the code is slow, but the value of 50 ms is quite consistent with the duration of execution expected in a system with this architecture. But it turns out that Bob processes 100 orders of your largest user in one batch, so in the Bob cycle it takes 5 seconds to complete your method.

 //  : foreach(Order order in bobsOrders) { ... bool success = OrderProcess.VerifyStatesMatch(order); .... }

You : Bob, where did you get that my code is too slow? It only takes 50 ms to process an order.
Bob : Our customer Acme Inc. requires that their batch orders be processed at maximum speed. I have to serve 100 orders, so 5 seconds is too long.
You : Oh, I did not know that we have to process orders in batches.
Bob : Well, this is only for Acme, they are our largest customer.
You : I was not told anything about Acme or batch orders
Bob : Shouldn't your code be able to handle multiple orders efficiently at the same time?
You : Ah ... yes, of course.

It is clear what happened, and why the code seems to Bob to be “too slow.” You haven't been told anything about Acme or batch processing. The Bob loop loads the regular Customer class and most likely loads the same Address 100 times. This problem is easy to solve if you take an array of orders, not one, plus add some simple caching. The params keyword in C # exists exactly for such situations.

 public bool VerifyStatesMatch(params Order[] orders) { bool retVal = false; try { var customerMap = new Dictionary<int, Customer>(); var addressMap = new Dictionary<int, Address>(); foreach (Orderorder in orders) { Customer customer = null; if(customerMap.ContainsKey(order.CustomerId)) { customer = customerMap[order.CustomerId]; } else { customer = SomeDataSource.GetCustomer(order.CustomerId); customerMap.Add(order.CustomerId, customer); } Address shippingAddress = null; if(addressMap.ContainsKey(order.ShippingAddressId)) { shippingAddress = addressMap[order.ShippingAddressId]; } else { shippingAddress = SomeDataSource.GetAddress(order.ShippingAddressId); addressMap.Add(order.ShippingAddressId,shippingAddress); } retVal = customer.HomeAddress.State == shippingAddress.State; if(!retVal) { break; } } } catch (Exception ex) { SomeLogger.LogError(ex); } return retVal; }

If the function is modified in this way, Bob’s batch processing will accelerate dramatically. Most data calls will disappear, since you can easily find a record by its ID in the temporary cache (dictionary).

Once you open your API for "Many" - and immediately have to connect any control borders. What to do, for example, if someone sends a million orders to your method? Does such a large number go beyond the limits of this architecture? In this case, the idea of both the system architecture and business processes is useful. If you know that in practice it may be necessary to process a maximum of 10,000 orders, then you can confidently establish control at the level of 50,000. Thus you guarantee that no one can put the system in one gigantic unacceptable challenge.

Of course, the list of possible optimizations is not limited to this, but an example shows how to get rid of unnecessary work, if from the very beginning to rely on “many” copies.

Zero

You : Steve, are you sending a null pointer to my code?
Steve : I think not, why not?
You : The boss says, the system swears "the link does not indicate ...".
Steve : Ah, that's the case, probably, in the inherited system. I do not control the output from this system, we simply upload its output to the new system through the pipeline, as it is.
You : Some nonsense, so why not solve the problem with these zeros?
Steve : Decide; I do in the code check on zero. Are you not?
You : O ... yes, of course.

"Object reference does not indicate an object instance." Is it worth explaining the meaning of this error? Many of us had the opportunity to spend more than one hour of living on it. In most languages, zero, empty set, etc. - absolutely admissible state for any type with indefinite value (non-value type). Thus, any serious API must take the value “Null”, even if it is not allowed to the technical side of the caller to pass it.

Of course, checking all references to zero is difficult and sometimes redundant. However, in no case should you trust the input coming from a source that you do not control. Therefore, we must check for zero the “orders” parameter, as well as Order instances inside it for zero.

By regularly performing a zero check, you can avoid annoying calls from customers seeking technical support and asking what a “copy of the object” is. I always prefer to outbid; Better is my function returning the default value and logging the message (or sending a warning), rather than throwing a rather useless error “does not indicate an object instance”. Of course, this solution depends entirely on the type of system, on whether the code is executed on the client or on the server, etc. The point is that zero can be ignored, but only until it comes back to you.

EXPLANATION: Honestly, I am not saying that the function should “idle” if it finds an unacceptable state. If null parameters are unacceptable for your system, throw an exception (as ArgumentNull in .NET). However, in some situations it is perfectly acceptable to return a meaningful default, and there is no need to throw an exception. For example, current methods usually return the value that was passed to them if they can not do anything with this value. There are too many factors that do not allow general recommendations to be given in the event that you encounter zero.

Nothing

youtu.be/CrG-lsrXKRM

You : John, what are you transmitting to my code? It looks like an incomplete Order.
John : Oh, sorry. I don’t need your method, but another library requires me to pass the Order parameter. I think this library calls your code. I do not work with orders, but I must use another library.
You : This library needs to be fixed: crookedly designed!
John : You see, that library developed organically along with the business objectives - they changed. It was written by Matt, and it will not be this week; In general, I do not know how to change it. Shouldn't your code check if the input is valid?
You : Yes ... indeed.

Of all the four principles, “Nothing” is probably the most difficult to describe. Zero, although it seems to be “nothing” and “emptiness”, has a definition and is quantifiable. Why, there, in most languages, a special keyword is embedded for zero. When working with null, your API must deal with such input, which is essentially garbage. In our example, we are talking about processing an Order that does not have a CustomerId, or that has a OrderDate value five centuries old. A more graphic example is a collection in which there is not a single element. This collection is not null, so it must be categorized as “Many,” but the caller did not fill the collection with any data. It is always necessary to take into account such a scenario in which “nothing” appears. Let's adjust our example so that “nothing” in it is also processed. The caller cannot simply transmit something like Order; her order will have to meet minimum general requirements. Otherwise, this information will be regarded as "nothing".

 ... // ,  . ;-) if (order != null && order.IsValid) ...

Conclusion

I hope I managed to convey to readers the main idea of this article: it does not happen that the code can accept any input information without problems. When implementing any function or API, you have to consider how this API will be used. In our example, the original function has increased from 12 to 50 lines, although we have not made any major changes to it. All the code that we added is needed to ensure scaling, control of boundaries, as well as for the function to handle any input correctly and efficiently.
The volume of stored data in recent years has grown exponentially, so the scale of input data will increase, whereas the quality of this data can only fall. If you write the API correctly right from the start, it can play a crucial role for business growth, adaptation to an increasing customer base, and in the long run - to save on technical support costs (and you will have less headaches).

Source: https://habr.com/ru/post/263895/

All Articles

Proper API design: what are “one”, “many”, “zero” and “nothing”

More articles: