The ideal OS: rethinking the operating systems for the desktop

TL; DR : By the end of this essay, I hope to convince you of the following facts. Firstly, that modern desktop operating systems are no good. They are bloated, slowed down and stuffed with legacy-junk , and somehow work only thanks to Moore's law. Secondly, that innovations in desktop operating systems ceased about 15 years ago , and the main players are not going to invest much in them again. Finally, I hope to convince you that we can and should start from scratch by learning from the lessons of the past.

"Modern" desktop OS inflated

Take the Raspberry Pi. For $ 35, I can buy a great computer with four processor cores, each at a frequency of more than a gigahertz . He also has a 3D accelerator, a gigabyte of RAM, built-in WiFi with Bluetooth and Ethernet. For 35 bucks! And yet for many of the tasks that I want to run on it, the Raspberry Pi is no better than the 66 megahertz computer that I had in college.

In fact, in some cases he copes even worse. It took tremendous effort to launch Doom with 3D acceleration in X Windows in the mid-2000s, a trivial task for the mid-1990s in Microsoft Windows.
')
Below is a screenshot of the Processing environment, first launched on hardware accelerated Raspberry Pi, just a couple of years ago. And this was only possible thanks to a very special X Windows video driver. This driver is still experimental and not officially released, five years after the release of Raspberry Pi.

Despite the problems with X Windows, the Raspberry Pi has a surprisingly powerful GPU that can produce results as in the screenshot below, but only if you remove X from the Windows path (the real screenshot below is made in OS X, but the same code works in Pi 3 on 60 fps).

Or another example. Today, Atom is one of the most popular editors. Developers love it for a bunch of plugins, but let's see how it is written. Atom uses Electron, which is essentially a whole web browser with the NodeJS runtime. These are two Javascript engines built into one application. Electron applications use browser graphics APIs that access native APIs, which then access the GPU (if they are lucky) to actually display the image on the screen. So many layers.

For a long time, Atom could not open a file for more than two megabytes , because scrolling was too slow. The problem was solved by writing a buffer implementation in C ++, essentially removing one extra layer.

Even the simplest applications in our time are very complex. An email client like the one in the screenshot above is conceptually simple. There should be several database queries, a text editor and a module for communicating with IMAP and SMTP servers. But creating a new email client is a difficult task, and it takes a lot of megabytes on the disk, so few people take it. And if you want to modify your email client, or at least the one on the screenshot (Mail.app, Mac default client), then there is no clear way how to extend its functionality. No plugins. No API extensions. This is the result of layered trash and bloat.

No innovation

Innovation in desktop operating systems has essentially stopped. We can say that they ended somewhere in the mid-90s or even in the 80s with the release of the Mac, but all the progress stopped after the smartphone revolution.

Mac OS

Once upon a time, Mac OS X shone with fireworks of new functions, in each new version significant progress and inventions were observed. Quartz 2D! Expose! System synchronization devices! Widgets! But now Apple is putting a minimum of effort into the desktop OS, except that it changes the themes and enhances the binding to mobile devices.

The latest version of Mac OS X (now renamed to macOS in honor of the system that was twenty years ago) is called High Sierra. What are the major innovations we look forward to this fall? New file system and new video encoding format. Is this really all? Oh, and they added the editing function in Photos, which was already in iPhotos, but it was deleted after the upgrade, and they will now block the automatic creation of video in Safari.

Apple is the most expensive company in the world, and is this the best it can come up with? Just desktop UX is not a priority for them.

Microsoft Windows

There was a hectic activity in the Windows camp, as Microsoft tried to reinvent the desktop as an operating system with touchscreen support for tablets and phones. This was a disaster from which they are still recovering. In the process of this transition, they did not add any features that were really useful to desktop users, although they spent an absurd amount of money on creating a custom background image.

Instead of improving the desktop UX, they focused on adding new application models with more and more layers on top of the old code. By the way, Windows can still run applications from the early 90s.

The CMD.exe terminal program, which essentially allows you to run DOS applications, was replaced only in 2016. And the most significant innovation in the latest version of Windows 10? They added a Linux subsystem. Overlaid on top even more layers.

X windows

There were even fewer improvements in X Windows than in the other two desktop operating systems. In fact, this model represents the absence of changes. People complained about it in the early 90s . I'm glad you can change the skin in the GUI, but what about the end-to-end system buffer, which fits more than one item at a time? This has not changed since the 80s!

In the mid-2000s, a window manager layout was added, but because of the legacy problems, it cannot be used for anything except moving the windows back and forth.

Wayland had to fix it, but after ten years of development, he was still not ready. It is really difficult to ensure compatibility with old code. I think that Apple made the right decision when it transferred the old macOS to an emulator called Classic, isolating it from the new code.

Workstations?

In a fundamental sense, it became easier to work with desktop operating systems when they entered the mass market, but then this mass market switched to smartphones and companies lost any interest in improving desktop operating systems.

I can't blame Apple and Microsoft (and now Google). The three billion smartphones that are replaced every two years are a much larger market than the several hundred million desktops and laptops that are replaced every five years.

I think we need to return the feeling of working with the desktop operating system. Such things were called workstations. If the desktop is freed from the bonds of the mass market, you can again return the operating system to work.

What we do not have in 2017

It's 2017 now. Let's see what should exist by now, but for some reason it does not exist.

Why can I transfer tabs in the browser and file manager, but I can’t do this between two different applications? There are no technical limitations here. Application windows are just raster rectangles of bits, ultimately, but the OS developers did not implement the function because it is not considered a priority.

Why can't I have a file in two places at the same time in my file system? Why is it fundamentally hierarchical? Why can't I sort files by tags and metadata? Database file systems have been around for decades. Microsoft tried to implement this feature in WinFS , but due to internal conflicts, removed it from the Vista system even before its release. BeOS did this twenty years ago . Why is this feature not in modern OS?

Any web application can be zoomed. I just press command + - and the text gets bigger. All items in the window are automatically scaled. Why can't my native apps do that? Why can not I make one window with an enlarged text, and another with a small one? Or even scale them automatically as you switch between windows? These are all trivial things for a window manager with a layout, trivial technology for more than ten years.

Limited interaction

My computer has a mouse, keyboard, tilt sensors, light sensors, two cameras, three microphones and a lot of Bluetooth accessories; but only the first two are used as common input devices. Why can I not give a voice command to a computer or gestures in the air, or better yet, that he should follow my work and let me know when I'm tired and have a better rest.

Why is my computer not able to follow my eyes and watch what I read, or scan items that I hold in my hands using any of these cool technologies of augmented reality that will soon appear on smartphones. Some of these functions are in separate applications, but they are not common to all systems and are not programmable.

Why my Macbook Pro cannot communicate with Bluetooth with the necessary HID devices instead of syncing via Apple Watch. Wait, but the Mac can't sync with the Apple Watch. This is another point where it is inferior to my phone.

Why can't my computer use anything other than the display to display information? In the new Razor laptop color lights under each key, but it is used only for the transfusion of colored waves . What about using LEDs for some useful task ! (the idea of Bjorn Stahl, I think).

Application silos

Almost every application on my computer is a bunker. Each application has its own part of the file system, its own configuration system, its own settings, database, file formats and search algorithms. Even your own keyboard shortcuts. This is an incredible amount of duplicated work.

More importantly, the lack of communication between applications makes it very difficult to coordinate their work. The underlying principle of Unix was small tools that work together, but in X Windows it is not implemented at all.

Created for 1984

So why are our computers so clumsy? The bottom line is that they were created for 1984. The desktop GUI was invented when most users created a document from scratch, saved it and printed it. If you are lucky, you could save the document in a shared file system or send it to someone by mail. It's all. GUI was created to work with tasks that were previously performed on paper.

The problem is that we live in 2017. We are no longer working as in 1984. On a typical day, I get the code from several remote sites, create several tests and generate a data structure that outputs the result, it is then sent to the Internet for use by other people. Import, synthesis, export.

I create VR content. I process images. I send messages to dozens of social networks. My perfect playlist is compiled selected from 30,000 songs. I process by orders more data from more sources than it was just 20 years ago, and even more than 40 years ago when these concepts were invented. The desktop metaphor simply does not scale to contemporary tasks. I need a computer that helps to perform modern work.

We need a modern workstation

So, now we are entering the theoretical level. Suppose we really have the resources and the way to ensure (or ignore) backward compatibility. Suppose we can actually create something to design a desktop for modern working methods in a different way. How do we do this?

First you need to get rid of everything that can not cope with their tasks.

Traditional file systems are hierarchical, with a slow search and do not store by default all the necessary metadata.
All interprocess communication . There are too many ways to communicate between programs. Channels, sockets, shared memory, RPC, kernel calls, drag-and-drop, copy-paste.
Command line interfaces do not match current application usage. We just can not do everything in pure text. I would like to redirect my Skype video call to the video analysis service during a call, but I really can't start the video stream through awk or sed.
Window managers on traditional desktops do not follow the context or content and are not controlled by other programs.
Native applications are too heavy, to develop them for a long time and they live in bunkers.

So what do we have left? Little. We have a kernel and device drivers. We can keep a robust file system, but it will not be available to end users or applications. Now let's add some elements back.

Document Database

Let's start with a common database of documents for the system. Wouldn't it be easier to create a new email client if the database is already ready? The UI will consist of just a few lines of code. In reality, many ordinary applications are just text editors combined with data queries. Take iTunes, address book, calendar, notifications, messages, Evernote, to-do list, bookmarks, browser history, password database and photo manager. Each of these programs is equipped with its own unique data storage. So much wasted effort and interference for interaction!

BeOS has proven that a database file system can actually work and provides incredible benefits. We need to get her back.

The file system with the database of documents has many advantages over the traditional file system. Not only “files” exist in more than one place and become easily searchable, but the guaranteed availability of a high-performance database greatly facilitates the creation of applications.

For example, take iTunes. It stores mp3 files on disk, but all the metadata is in a closed database. The presence of two "sources of truth" creates a lot of problems. If you add a new song to the disc, you must manually tell iTunes to re-scan it. If you want to develop a program that works with a database of songs, you will have to reverse engineer the iTunes DB format and pray that Apple will not change it. All these problems disappear in the presence of a single system database.

Message bus

The message bus will become a unified way for interprocess communication. We get rid of sockets, files, channels, ioctrl, shared memory, semaphores and everything else. All communications are only via bus messages. We get a single place to manage security and create a lot of interesting features through competent proxying.

In reality, some of the types of communication still remain as options for applications that need them, such as browser sockets, but all communications with the system and between applications go through a common bus.

Linker

Now we can add a linker - a window manager that truly works with 3D surfaces, transforms coordinates and is monitored via messages over the bus. Most of what a typical manager does, such as placing windows, overlaying notifications, and determining which window is active, other programs can actually do that simply send messages to the linker, and it already does the actual work.

This means that the linker will be closely integrated with the graphics driver, this is important for high performance. Below is a diagram of the Wayland compiler, which someday will work by default on Linux.

Applications display graphics on the screen, requesting the surface from the layout. Having completed the output of the graphics and, if necessary, updates, they simply send messages: please redraw me. In practice, we are likely to have several types of surfaces for 2D and 3D graphics, and maybe for an unprocessed video buffer. The important thing is that ultimately it is the linker that controls everything that appears on the screen, and when. If one application goes crazy, the linker can suppress its output to the screen and ensure that the rest of the system works normally.

Applications become modules

All applications turn into small modules with all communications via the message bus. Fully . No more access to the file system. No access to hardware. Everything is in the form of messages.

If you want to play an mp3 file, then send a play message to the mp3 service. Display graphics on the screen through the linker. This separation ensures system security. In the terminology of Linux, each application will become completely isolated through user permissions and chroot, possibly even to Docker containers or virtual machines. Here you need to work out a lot of details, but everything is solved today.

Modular applications will be much easier to develop. If the database is the only source of truth, then there is no need to do a lot of work on copying data into memory and back. In the audio player example, the search field will not load data and filter to display the list, it simply defines the query. The list is then linked to this query, and the data appears automatically. If another application adds a song to the database that matches the search query, the UI player is automatically updated. This is all done without any additional effort on the part of the developer. “Live” queries with auto-update greatly simplify life and they are more reliable.

Rework applications

On this basis, we can create everything we need. However, this also means that we will have to redo everything from scratch. High-level constructions on top of the database greatly simplify this process. Let's look at a few examples.

Email. If you divide the standard mail client into GUI and network modules that communicate exclusively via messages over the bus, the development of the program will become much easier. The GUI should not know anything about Gmail or Yahoo mail, or how to handle SMTP error messages. It simply searches the database for documents with the specified type of "email". When the GUI wants to send a message, it assigns the outgoing = true property to it. A simple module will make a list of all outgoing emails and send them via STMP.

Splitting the mail client into components makes it easy to replace its individual parts. You can develop a new frontend in half a day, and you don’t have to rewrite the network modules. You can develop a spam filter without a user interface at all; it simply scans incoming messages, processes them and marks suspicious messages with a “spam” tag. He does not know and does not care about how spam is displayed in the GUI. He just does one thing well.

Mail filters can do other interesting things. For example, you mailed your bot the command play beatles . The tiny module scans incoming mail and sends another message to the mp3 module to play music, and then marks the message as deleted.

When everything turns into queries to the database, the whole system becomes more flexible and customizable.

Command line

I know, I used to say that we’ll get rid of the command line. I take my words back. I really sometimes like the command line as an interface, I only care about its purely textual nature. Instead of building chains of CLI applications with text streams, you need something more functional, like serialized object streams (like JSON, but more efficient). Then we will have real power.

Consider the following tasks:

I want to use the laptop as a boosted microphone. I speak into it, and the voice sounds from Bluetooth speakers across the room.
As soon as I post a tweet with the hashtag #mom, a copy of it should be emailed to my mom.
I want to use the iPhone as a microscope mounted on a stand from the Lego. He broadcasts a picture to a laptop, where I have control - buttons for recording, pauses, approximations and relays of live broadcast on YouTube.
I want to make a simple Bayesian filter that reacts to email messages from Energosbyt, adds the “utilities” tag, makes an entry on the website, extracts the amount and date of payment from the letter, and adds an entry to my calendar.

Each of these tasks is conceptually simple, but just think how much code you have to write to accomplish this today. With the command line interface on object flows, each of these examples fits into a script of one or two lines.

We can carry out more complex operations, like “Find all photos taken over the past four years within a radius of 80 km from Yosemite National Park with a rating of 3 stars or higher, change their size by 1000px on the long side, upload to the Flickr album called“ The best of Yosemite "and put a link to the album on Facebook. This can be done with built-in tools, without additional programming, simply by connecting several primitives.

In fact, Apple has created a similar system. It is called Automator. You can create powerful workflows in the GUI. The system has never been advertised, and now they are removing the binding to Applescript, on which everything works. Recently, all employees of the Automator group were transferred to other teams. Eh ...

Semantic keyboard shortcuts throughout the system

Now, after reworking the world, what are we going to do?

Services are available throughout the system. This means that we can launch a single service where the user can assign key combinations (keybindings). This also means that shortcut keys will have a deeper meaning. Instead of pointing to the function of a specific program, they point to a message about the command. In all applications that work with documents, there can be “Create a new document” or “Save” commands. The keyboard shortcut service will be responsible for turning the control-S into a “Save” command. I call this the semantic keyboard shortcuts (semantic keybindings).

Using semantic keyboard shortcuts will make it much easier to support alternative input methods. Let's say you have developed a fancy button on the Arduino, when you click on it, it sounds like a phrase. You do not need to write a special code for it. Just tell Arduino to send a button event, and then attach an audio file to this event in the shortcut editor. Turn the digital pot into a custom scroll wheel. UI now changes as you like.

Some research is still needed in this area, but it seems to me that semantic keyboard shortcuts will simplify the development of screen readers and other programs to facilitate access.

Window

In our new OS, any window fits in as a tab to another window. Or to the sidebar. Or to something else. Regardless of the application. Here is a lot of freedom to experiment.

In the old MacOS 8, there was a variety of tabbed windows, at least in the Finder application, which could be docked to the bottom edge of the screen for quick access. Another cool thing that was thrown out when switching to Mac OS X.

In the screenshot below, the user lifts the window border to see what is below. It is very cool!

This was an example from the scientific article “Ametista: a mini-set for learning new ways to manage windows” , by Nicolas Roussel.

Since the system completely controls the environment of all applications, it can enforce security restrictions and demonstrate this to the user. For example, trusted applications may have green frames. The new application just downloaded from the Internet will have a red frame. An application of unknown origin has a black frame, or it is not displayed at all. Many types of spoofing will become impossible.

Smart copy-paste

When you copied text from one window and switched to another, the computer knows that you copied something. He can use this knowledge to carry out some useful actions, for example, automatically move the first window to the side, leaving it in view, and display the selected text in green. This helps the user to maintain concentration on the current task. When a user inserts text in a new window, you can show how the green fragment jumps from one window to another.

But why limit yourself to this. Make a clipboard that holds more than one item. We have gigabytes of memory. Let's use it. When I copy something, why should I remember that I specifically copied before inserting it in another window? The clipboard is nowhere to be seen. Fix it.

The clipboard should be displayed on the screen as a kind of shelf on which all copied fragments are stored. I can go to three web pages, copy their addresses to the clipboard, and then go back to the document and paste all three at once.

The clipboard viewer allows you to scroll through the entire clipboard history. I can search in it and filter by tags. I can "attach" your favorite copies for later use.

The classic macOS actually had a great built-in tool called [name], but it was abandoned when moving to OS X. Decades ago, we had a future! Bring it back.

Work Kits

And finally, we come to what I consider the most powerful paradigm shift in our new Ideal OS. In the new system, all applications are tiny isolated modules that only know what the system tells them. If we regard the database as the only source of truth, and the database itself is versioned, and our window manager tunes in for every taste ... then truly interesting things become possible.

I usually share personal and work files. These are separate folders, accounts, sometimes different computers. In the ideal OS, my files can be shared by the OS itself. I can have one screen with home mail, and another screen with a working one. This is the same application, just initialized with different query settings.

When I open the file manager on the home screen, it only shows files intended for home projects. If I create a document on the working screen, it is automatically tagged as a strictly working document. Managing all this is trivial; just a few extra fields in the database.

Researchers at the Georgia Institute of Technology in reality described such a system in their scientific work "Giornata: Revising the Desktop Metaphor to Promote Highly-Qualified Work . "

Now we will make one more step. If everything is versioned, even the GUI settings and the location of the windows (since everything is stored in the database), I can save the screen state. It will keep the current state of all parameters, even my keyboard shortcuts. I can continue to work, but there will always be an opportunity to return to this state. Or I can look at the old state - and restore it on a new screen. I basically created a “template” that can be used over and over as soon as I start a new project. This template contains everything you need: email client settings, chat history, to-do lists, code, windows to describe bugs, or even the corresponding Github pages.

Now all the state of the computer is essentially considered as a Github repository, with the ability to fork the state of the whole system. I think it will be just magical. People will begin to exchange useful workspaces online, like Docker images. You can customize your workflows, add useful scripts to the workspace. The possibilities are truly amazing.

None of this is new

So that's it. Dream. All of the above is based on three principles: the all-system versioned real-time database, the all-system real-time message bus and the programmable linker .

I want to emphasize that absolutely nothing of what I said is new. I did not invent anything. All these ideas are years or decades. File databases first appeared in BeOS. A single mechanism for interprocess communication appeared in Plan 9. Environment configuration from an edited document is implemented in Oberon. And of course there are still a lot of scientific articles with research results.

Why don't we have it?

Nothing new here. And we still do not have this? Why is that?

I suspect that the main reason is simply the complexity of developing a successful operating system. It is much more convenient to expand the existing system than to create something new; but expansion also means that you are limited to choices made in the past.

Can we really create a perfect OS? I suspect no. Nobody has done it so far, because, honestly, you won’t make money here. And without money, you simply won’t find the resources to develop.

However, if someone still sets the goal to create such an OS or at least a working prototype, then I would start with a specific limited set of hardware with existing device drivers. Insufficient driver support has always been the Achilles heel of Linux desktop. For example, Raspberry Pi 3 would be an excellent option.

So my question is for you: do you think the idea is worth the effort to implement it, at least to create a working prototype? Would you take part in such a project? What part of the functionality should work for you to agree to take a system for testing? And of course, how do we name it?

If you are interested in discussing the future of desktop UX, subscribe to our new Ideal OS Design group.

Source: https://habr.com/ru/post/337202/

All Articles