On the programmer’s day, we bring to your attention an interview with a developer who was able to otdebazhit rover for 18 hours at a distance of 100 million miles. Moreover, part of the spacecraft code was written in the Ada language.
DDJ: You program computers that work on the surface of another planet. This is unusual!GR: This is unusual for me, I can assure you. This is such a microworld where everyone is focused on ensuring that all the tasks of the next day are carried out in accordance with the plan.
')
You go home at 3 am, still in suspense from watching the information coming back and sleepless nights. Your wife and children are already asleep, you surely will not fall asleep, and at 8 am you should already be at work. Therefore, you turn on CNN and look at your smiling face in the control center and look at the same images that you viewed 12 hours ago. Very strange feeling.
It was a small project of a small campaign, but it greatly influenced many of us. I also note that from a technical point of view, the project was timely.
The Mars Pathfinder mission was the first mission of the NASA Discovery program, a ten-year program for robotic space exploration. The main objectives of the mission included finding evidence of past life on Mars, accumulating information about the Martian climate and using it to explain past and predict future climatic changes on Earth, identify resource potential, which can later be used during Mars missions. This project of the Jet Propulsion Laboratory (
JPL ) was one of the most ambitious and closely watched among all the space projects in the history of JPL.
Pathfinder consisted of a stationary landing station and the Sojourner rover, whose main task was to demonstrate the possibility of making a low-cost landing and studying the surface of Mars. The Pathfinder system was designed from standard software and hardware components. The heart of the system was a single processor, the RS6000, running VxWorks real-time operating system from Wind River Systems.
Pathfinder was launched on December 4, 1996 and flew to Mars for seven months. He landed on Mars on July 4, 1997, and Sogerner almost immediately began conducting experiments and collecting data. During the first month of work on the surface, NASA received about 1.2 Gbps of information, including 9669 images from the Station and 384 images from Sojörner, as well as about 4 million measurements of temperature, pressure and wind.
Suddenly, on September 27, 1997, the connection with Sojörner was lost, since the system started to reboot constantly. Most media outlets rushed to grab the news and mistakenly dubbed the incident "unforeseen software crashes as a result of the rover trying to do too many things at once."
The problem, as JPL engineers quickly discovered, including Glenn Reeves, was in the priority inversion — a phenomenon familiar to real-time operating system engineers for more than 20 years. Contrary to most media reports, the problem was hardly extraterrestrial: a low priority task (for example, collecting meteorological data) sometimes seized a semaphore that was required by a high priority task (for example, bus control), and then it was squeezed out of the medium priority task (for example, by communication). Thus, the high-priority task could not be completed at the specified time. In such cases, the protective algorithm rebooted the system.
Rebooting the system brought the hardware and software back to its original state. She also stopped performing current tasks from Earth. Data that has already been collected is not erased. However, the unfulfilled part of the tasks of this day could not be completed until the next day.
JPL engineers were not only aware of the priority inversion from the very beginning, but also thought out in advance possible scenarios for the emergence of this. Therefore, they were able to quickly recreate the problem on a duplicate system in the laboratory and send a solution - changing the default system priority to the choice of determining the semaphore - the spacecraft at a distance of 100 million miles. (Here it is, remote repair!) Engineers prudently created in advance two images of the control software, which made it possible to remove and update one of them from the Earth. But the team decided to patch one of the two copies remotely, for fear of losing the ability to download the unaffected version.
In the end, everything went well. The spacecraft has completed its mission. Reeves, who is now the Chief Engineer in the data system project, met with DDJ journalist D. Voyer and spoke about the Pathfinder mission and other aspects of writing software for alien research.
DDJ: Presumably, writing the code was no different from any other project in real time.GR: As for the ballistic component, there is little that can be changed. There is some uncertainty there, since you rely on information received 22 years ago. But in general, we tried to simplify this task as best we could. We were spinning a spacecraft, we used inertia ... control was not such a difficult task. The challenge was to foresee what could go wrong, what we would do if something breaks ... after all, very little can be replaced. How should we test everything? How can we recreate the conditions in which the device will fly? Such questions are much more difficult than the development of the spacecraft.
DDJ: This is an eternal problem when developing complex systems, how to conduct a test under conditions different from the laboratory ones.GR: Cosmos is quite lenient, little power, all just action and reaction. Passing through the atmosphere is another matter, but even here we can do enough simulations. We surrendered at the stage of miscalculation of shocks and orientation of the vehicle, as well as simulating accelerometer readings while driving. It was too hard. However, we tested the device for physical impact as much as we could.
DDJ: Did you throw him off a tall building?GR: We conducted strength tests. We did a lot of tests with airbags, until we proved to ourselves that they can withstand rebounds when landing, that they will not deflate, that they can maintain the necessary level of pressure, that the material of the pillows does not break prematurely. In addition, they are retractable. They are pulled in by wires.
We imitated situations when the pillow wrapped around the stone to see how the drives would behave, whether they would get stuck, leaving the airbags outside fluttering in the wind. It took us two or three years to do this, and this was probably the most expensive part of our test program.
The tasks of orientation, opening of the sash and turning of the vehicle are actually solved in the forehead: land the device, read the accelerometer readings, make sure that the necessary sash opens in the correct order.
As a result, the damn thing bounced off 16 times from the surface and landed just on its base and all the airbags were pulled in, just as it should have been. In this mission, everything went according to the ideal scenario.
DDJ: I would like to know a lot about the task of programming in comparison with the whole project.GR: During the mission, most of the navigation was from Earth. The position of the device was controlled by the onboard systems.
JPL used to design its own computers, at one time we made the processors ourselves. But now, no, it's better to let this company be engaged in private companies. Now on-board computers are serial products.
Then it was the second time that JPL decided to buy an on-board computer on request. We have provided that a special operating system will be installed on the computer. Therefore, I have written a lot of specifications for the on-board computer.
I was very familiar with VRTX (Mentor Graphics) and VxWorks (Wind River Systems), and also worked with pSOS (Integrated Systems). Therefore, I have compiled the specification of a completely universal real-time operating system.
IBM, the winner of the tender for the onboard computer, proposed OSOpen, made by the IBM Raleigh branch in North Carolina. It was an early beta version. We got acquainted with it, and they had yet to finalize it. Then, suddenly, somewhere in 1993-1994, IBM Federal Systems was sold to Loral. What previously was one company, became two divisions of two large opposing companies. It became obvious that we would not get a real-time operating system from these guys at Raleigh on time.
Then we decided to go to Wind River and find out what they could port to our RAD6000.
DDJ: Original IBM RISC Radiation Resistant Architecture?GR: Exactly. In fact, we bribed Wind River with the excellent advertising opportunity of their VxWorks system, if they adapted it to our processor. They agreed and did it fairly inexpensively and quickly. After six months, we already had a working version of VxWorks.
They imbued with the idea and immediately identified people to solve this problem. They did everything very quickly, there were no altercations over the contract. When difficulties arose ... I had a team of 7 people involved in developing software for this (separate attention to the rover) ... technical communication was directly between engineers, without a layer of contract managers. It was a good interaction. They did a good job.
DDJ: And when did you need to update the system on the fly?GR: This is a standard procedure. You always build in the ability to change it, regardless of conditions.
DDJ: Just in case.GR: Just in case, but JPL and several significant companies faced such a problem when it was impossible to get ready software by the time of launch. You should always check that you have provided the opportunity to change something.
DDJ: When you send something over a distance of more than 30 million miles, there is always the risk that something not foreseen will happen.GR: Well, as we have said, sparing conditions in space, but as for the surface of Mars and other places, here you are right. What we really would like very much is to build a spacecraft that would be much smarter in overcoming unexpected difficulties without interacting with the Center.
DDJ: A real robot! Were you a fan of Asimov when you were a kid?GR: Still! Both a child and a fan of Asimov.
DDJ: What did your role in the "Head Office in the development of aerobatic software" mean during the Pathfinder mission?GR: I was responsible for the development and operation of the flight software of the spacecraft. Had that, my ass would have taken the brunt.
DDJ: Software Development Management, the last frontier! How do you guys hand over the job?GR: That's my opinion, why everything worked out so well in Pathfinder: we were focused. We had specific goals: to launch the vehicle, deliver it to Mars, navigate through the atmosphere, and deliver the rover to the surface. We managed to focus everyone’s attention on these goals in the team.
In addition, the JPL was then worn with this Absolute Quality Management (TMQ), so they then talked a lot about granting authority, competences and responsibility.
Some of us, despite the skeptical attitude towards TMQ, to put it mildly, nodded and adopted. If we give authority, we will really make decisions. And the whole project worked that way. The management actually trusted the people working on the project.
I had to really take responsibility for those whom you involved in the work.
We were not a big JPL project. At the same time, the Cassini project, in which up to 3,000 people were involved, was in the final phase. About 300 people worked on Pathfinder. Therefore, the team was very well fitted.
All this contributed to success in terms of management. I would very much like to say that this was a lesson for JPL, and she began to apply this experience on subsequent projects, but I have to state the opposite. Many people thought the Pathfinder was successful, but I hear: “We don’t want to do everything the same way next time.”
You see, one of the reasons that this is the case is that earlier in JPL there were these huge projects that needed to be provided with significant infrastructure.
But now we have many small projects that require one common infrastructure, which causes a stronger interdependence between projects.
Therefore, the concept of the work that was in the project of Pathfinder, something like an isolated research department, functioning half-independently, practically without control of the authorities, and showing excellent results, is no longer applicable in JPL.
DDJ: As space travel becomes a chore, geniuses are no longer required.GR: This is still far away!
DDJ: But it's your job, year after year, try to make it a living.GR: That's right. And when we achieve this, then, having made sure that this industry is working, we will take up other advanced tasks.
DDJ: Glenn, what are you doing right now?
GR: I am working on one of many structural, basic tasks: JPL is developing aviation equipment that will be resistant to radiation and will be able to survive in Europe, where there are very difficult radioactive conditions. We are experiencing new technologies and are trying to solve problems that arise at the level of one megarade.
A considerable amount of work is performed outside of JPL under contracts. We are trying to provide spacecraft with the most technically powerful equipment that is available. Our last choice is the PowerPC 750 with a frequency of 200 MHz.
DDJ: You send the Macintosh into space!GR: Comfortable, yes? The speed of the processor used in the project Pathfinder was about 22 MHz. We increased the value by an order of magnitude. This is just phenomenal. JPL has always used technology 10 - 12 years old ...
DDJ: ... while you are waiting for the radiation-resistant version.GR: Yeah. In particular, I am working on a project that is not exactly called “Mission - Data System”. In fact, this is an inter-institutional initiative to implement several things. In JPL recently implemented several multidirectional projects. For example, we are trying to achieve such an architecture that would become the base for spacecraft that differ from the existing much more autonomy.
This applies to software, to the latest developments in the field of programming. I think JPL is gradually moving away from the hardware format and is more focused on software development. The computer systems we are working on are more like small local networks. FireWire and I2C are used to connect fast and not very computer systems.
DDJ: Is I2C still in use ?!GR: It is not fast, but it has low power. This is often the deciding factor. For high volume, high speed, we have 1394 FireWire - 100 Mbps.
DDJ: In NASA, programmers have no shortage of expensive toys?
GR: We try not to allow it, focusing on commercially available serial products.
At the time of Pathfinder we said: “We will use the VME bus and save a lot of money instead of developing every detail ourselves”.
And we still adhere to this principle.
DDJ: We began to discuss your current work in the context of preserving the best practices of the Pathfinder project, given the current horizontal integration of small teams and the use of serial details. How do you cope?GR: JPL overcomes the agonizing process of optimizing the structure and interdependence of teams. It is a difficult task for any large organization to switch from a vertical to a horizontal administrative structure.
DDJ: Is this just an internal JPL process or have NASA swirled you?GR: For now, this is mostly an internal process. But NASA is watching this.
DDJ: Personally, you are now swallowed by management. Do you have to write programs at least sometimes?GR: Recently there is no ... Now my position is called Chief Engineer of the project “Mission - Data System”. It sounds like I make decisions on technical issues, but mostly it concerns management. I resist as much as I can!
DDJ: Sure, all tech-savvy DDJ readers will cheerfully exclaim, “Don't go to the Dark Side!”GR: Someone must bring the plan together. This inferiority drags me to the dark side.
DDJ: Yes, management is the final frontier. Software is now sold in shiny packaging. Determine how to properly distribute work among people, how to alternate tasks and get on time ...GR: This is all secondary. The main thing is the methodology. UML or not UML? Such classic things as languages, the type of computers that need to be purchased ... But some things that previously had to worry about are now a thing of the past. The 200 MHz PowerPC is about 200 times more powerful than the Cassini processors. The problem of efficiency fades into the background.
DDJ: So you just write “good old code” like everyone else?GR: Well, that would be nice, right? True, it would be great if we could get closer to the advanced lines of software development? Alas, JPL is now out of this game, and we have to be content with affordable solutions.
DDJ: What about Linux?GR: We did not evaluate it for suitability for space flight. But many use it here. Each developer has his own opinion about the only true way. .
DDJ: JPL?GR: JPL () . Ada , - - , , , . Pathfinder .
- , - C++ Java… , — LISP'. , , — .
DDJ: , .GR: . Pathfinder 128 RAM. , . . , , , .
DDJ: ?
GR: , . — «Space Technology 4» ST4. , , , . .
. , 2003 . , , .
DDJ: ? ?GR: , , . - , . - , .
: « , ?» , .
NASA , , : ., — 200 . .
NASA , .
DDJ: . ?GR: , SpaceDev Inc., . , , , , . , , , , , …
DDJ: .GR: , . ? , , … .
DDJ: - , ?GR: : « , - , - ». , . , . Pathfinder — , .
1999, Dr. Dobb's Journal
:
: A Conversation with Glenn Reeves— Edison , .
?