
Translation of Stephen Wolfram's post (Stephen Wolfram) "
The Practical Business of Ontology: A Tale from the Front Lines ".
Chemical Philosophy
“We just have to decide: is the chemical closer to the city or to the number?” I spent my day yesterday — like most days of the past 30 years — developing new functions
of the Wolfram language . And yesterday afternoon at one of my meetings there was a dynamic discussion about how to expand the possibilities of language in chemistry.
')
At some level, the problem we were discussing was inherently practical. But as it often happens, what we are doing is ultimately associated with some deep intellectual issues. And in order to actually get the right answer — and successfully develop language functions that stand the test of time — we needed to drop these depths and talk about things that are not usually considered outside of any philosophy workshop.
Of course, part of the problem is that we are dealing with issues that have never really happened before. Traditional computer languages ​​do not try to talk directly about such things as chemicals; they are just doing abstract data. But in the Wolfram language we try to embed knowledge about everything that is possible; which means that we have to deal with real things, such as chemicals.
We built the whole Wolfram system to handle what we call
objects . The object may be a
city (for example,
New York ), or a
film , or a
planet , or a zillion other things. The object has some name ("New York"). And it has certain properties (for example,
population ,
area ,
date of foundation , ...).
We have long had the concept of
chemicals , such as
water ,
ethanol or
tungsten carbide . Each of these chemical objects has properties such as
molecular weight or
structural graph , or
boiling point .
And we have many
hundreds of thousands of chemicals , about which we know a lot of properties. But all this is in some sense specific chemicals: specific compounds that we could put in a test tube and carry out experiments.
But what we were trying to figure out yesterday was how to deal with abstract chemicals — chemicals that we build abstractly, say, by
abstract graphics representing their chemical structures. Should they be represented by objects such as water or New York? Or should they be considered more abstract, for example, lists of numbers or, for that matter, mathematical graphs?
Well, of course, among the abstract chemicals we can build are chemicals that we already represent objects like
sucrose or
aspirin . But there is an important difference. Are we talking about individual molecules of sucrose or aspirin? Or how about a product in bulk?
At some level, this is a confusing difference. Because we can think that knowing the molecular structure, we know everything - it's just a matter of calculation. And some properties — such as
molar mass — are mostly trivial to be calculated by molecular structure. But others - for example, the
melting point - are very far from trivial.
Well, but is this a temporary problem on which you cannot base a long-term language design? Or is it something more fundamental that will never change? Conveniently, I do enough
basic science to know the answer: yes, this is something fundamental. This is due to what I call
computational irreducibility . For example, the exact value of the melting point for an infinite amount of some material may actually be
fundamentally non-computable . (This is due to the
insolubility of the tiling task ; setting the tile is similar to how molecules make up a solid).
Therefore, knowing this part (very advanced) of fundamental science, we know that we can meaningfully distinguish between mass versions of chemicals and individual molecules. It is clear that there is a close connection between, say, water molecules and bulk water. But they still have something fundamentally and irreducibly different, as well as in their properties.
At least the atoms should be fine.
Well, let's talk about individual molecules. Of course, they are made of atoms. And at least when we talk about atoms, we are on a fairly solid basis. It would be logical to say that any particular molecule always has a certain set of atoms in it — although, perhaps, we will want to consider “parameterized molecules” when we talk about polymers, etc.
But at least it seems safe to consider the types of atoms as objects. After all, each type of atom corresponds to a chemical element, and on the periodic table there is only a limited number. Of course, in principle, you can think of additional "chemical elements"; and even think of a
neutron star as a giant atomic nucleus. But again, there is a distinctive feature: there is almost certainly only a limited number of fundamentally
stable types of atoms , and most others have a horribly short lifetime.
However, it is worth noting immediately. “The chemical element is not so much defined as you can imagine. Because it is always a
mixture of different isotopes . And, say, from one tungsten mine to another, this mixture can change, giving another effective atomic mass.
And in fact, this is a good reason to display the types of atoms by objects. Because then you just need to have a single object representing tungsten, which can be used when talking about molecules. And only if someone wants to get the properties of this type of atom, which depend on conditions, for example, from a mine, will it be necessary to deal with such things.
In some cases (for example,
heavy water ) it will be necessary to talk directly about isotopes in what is essentially a chemical context. But in most cases it is enough to indicate the chemical element.
To specify a chemical element, you simply need to specify its atomic number Z. And then the tutorials will tell you that to indicate a particular isotope, you just need to specify how many neutrons it contains. But this ignores the unexpected tantalum case. Because one of the natural forms of
tantalum (180mTa) actually represents the
excited state of the tantalum core , which is very stable. And to correctly determine this, you must indicate its level of excitation, as well as the number of neutrons.
In a sense, quantum mechanics saves us here. Since while there are an infinite number of possible excited states of the nucleus, quantum mechanics says that all of them can be characterized by only two discrete values:
spin and
parity .
Each isotope and each excited state is different and has its own special properties. But the world of possible isotopes is much more ordered than, say, the world of possible animals. Because quantum mechanics says that everything in the world of isotopes can be characterized simply by a limited set of discrete quantum numbers.
We went from molecules to atoms to nuclei, so why not talk about
elementary particles ? Well, that complicates the situation. Yes, there are well-known particles, such as
electrons and
protons — which are fairly easy to talk about — and they are easily represented
by Wolfram objects. But there are many other particles. Some of them - such as kernels - are fairly easy to characterize. You can say things like: “
This is a special excited state of the c-quark-anti-c-quark system ” or something like that. But in particle physics we are talking about quantum field theory, and not just about quantum mechanics. And one cannot simply “count elementary particles”; one also has to deal with the possibility of virtual particles, etc. And finally, the question of which particles can exist is very complex, full of computational irreducibility. (For example, what kind of stable states can be in a gluon field is a much more complicated question, similar to the tiling task I mentioned in connection with melting points.)
Perhaps one day we will get a
complete theory of fundamental physics . And maybe it will be easy. But no matter how exciting it is, it will not help us here. Because computational irreducibility means that there is an impassable distance between what is hidden inside and what phenomena arise from it.
And in creating a language to describe the world, we need to speak in terms of things that can really be observed and calculated. We must pay attention to the fundamentals of physics - and last but not least, to avoid those positions that may eventually lead us into confusion. We also need to pay attention to the actual history of science and the actual things that have been measured. Yes, there are, for example, an infinite number of possible isotopes. But for a variety of purposes it is very useful to simply set up objects for those that are known.
Space of possible chemicals
But is it the same in chemistry? In nuclear physics, we think that we know all of the fairly stable existing isotopes, so any additional and exotic ones will be very short-lived and, therefore, possibly unimportant in practical nuclear processes. But chemistry is another story. There are tens of millions of chemicals that people are studying (and, for example, placed in scientific publications or patents). Indeed, there are no restrictions on the number of molecules that could be considered - and this can be quite useful.
But, well, so how can we refer to all these potential molecules? Perhaps, from the first approximation, we can indicate their chemical structures, indicating graphs in which each node is an atom, and each edge is a bond.
What does “communication” really mean? Although it is incredibly useful in practical chemistry, at some level it is an indefinite concept - a kind of semiclassical approximation of complete quantum mechanics. There are some standard additional aspects:
double bonds , ionization state, etc. But in practice, chemical analysis is very successfully performed simply by characterizing the molecular structures with appropriate labels of atomic and bond graphs.
OK, but should chemicals be represented by objects or abstract graphs? If it is a chemical that has already been heard about, for example,
carbon dioxide , the object seems convenient. But what if this is a new chemical that has never been talked about before? You might think about inventing a new object to represent it.
However, any self-respecting object will have its own name. What would this name be? In the Wolfram language, this can only be a graph that represents a structure. But, probably, it would be desirable something similar to a usual text name - to a line. We always have the
IUPAC method for chemical names with names like
1.1 ′ - {[3- (dimethylamino) propyl] imino} bis-2-propanol . There is also a more convenient version for the computer
SMILES :
CC (CN (CCCN © C) CC © O) O. And whatever the graph, it can always generate one of these lines to represent it.
However, a new problem arises: the string is not unique. Actually, as if someone chose to write a graph, it cannot always be unique. The specific chemical structure corresponds to a specific schedule. But there can be many ways to draw a graph and many different representations of it. And in fact, even a problem (“
isomorphism of a graph ”) with determining whether two images correspond to the same graph is difficult to solve.
What is a chemical at the end?
So let's imagine that we represent the chemical structure as a graph. First it is an abstract thing. There are atoms in the graph as nodes, but we do not know how they will be located in a real molecule (and, for example, how many
angstroms they will be separated). Of course, the answer is not fully defined. Are we talking about the low-energy configuration of the molecule? (What if there are several configurations of the same energy?) Is it supposed that the molecule should be on its own or in water or in something else? How was the molecule supposed to form? (Maybe this is a protein that is folded in a special way when it came down from the ribosome.)
Well, if we had an object representing, say, “natural
hemoglobin, ” perhaps we would be better off. Because in a certain sense this object could encapsulate all these details.
But if we want to talk about chemicals that have never been synthesized, this is a slightly different story. And it seems to me that we would be better off with an abstract representation of any possible chemical substance.
But let's talk about some other cases and analogies. Maybe we should just treat everything as an object. Like any integer can be an object. Yes, their number is infinite. But at least it is clear what names they should be given. With real numbers, things are already in disarray. For example, there is already no such uniqueness as with integers: 0.99999 ... is actually the same as 1.00000 ..., but is written differently.
How about a sequence of integers or, for that matter, mathematical formulas? All possible sequences or all possible formulas may possibly be different objects. But this would not be particularly useful, because much of what one would like to do with sequences or formulas is to go into them and transform their structure. But what is convenient for working with objects is that each of them is a “single entity”, which you do not need to “go inside”.
So what's the story with "abstract chemicals"? It will be confusion. But, of course, you want to “go inside” and transform this structure. That speaks in favor of the representation of a chemical substance by a graph.
But then there is a potentially unpleasant gap. We have a carbon dioxide facility, about which we already know a lot of properties. And then we have this graph, which abstractly represents a carbon dioxide molecule.
We may fear that this will confuse both people and programs. But the first thing to understand is that we can distinguish what the two things represent. The object is a natural version of a chemical whose properties can potentially be measured. A graph is an abstract theoretical chemical whose properties must be calculated.
But obviously there must be a connection. For a particular chemical object, one of the properties will be a graph representing the structure of the molecule. And, having a graph, we need some kind of
ChemicalIdentify function, which, like
GeoIdentify or, possibly,
ImageIdentify , will try to identify by the graph which chemical object (if any) has the molecular structure corresponding to this graph.
Philosophy meets chemistry meets math meets physics ...
While describing some of the problems, I understand how difficult all this may seem. And yes, it is difficult. But yesterday at our meeting, everything went very quickly. Of course, it helps that everyone has faced similar problems before: this is exactly what lies at the core of what we do. But each case is different.
And, somehow, this case has become a little deeper and more philosophical than usual. “Let's talk about the naming of the stars,” someone said. Obviously, there are nearby stars for which we have explicit names. And some other stars may have been identified in large-scale sky surveys and given specific identifiers. But in distant galaxies there are many stars that will never be named. So, how should we represent them?
This led to talk about cities. Yes, there are certain charter cities that are officially named, and we probably have practically all of these Wolfram names that are regularly updated. But what about some village created for one season by some nomadic people? How should we present this? She has a certain place, at least for a while. But is this a certain thing, or maybe it will later be divided into two villages or not at all?
One can argue almost endlessly about the identification — and even the existence — of many of these things. But ultimately this is not the philosophy of these things that interest us: we are trying to create software that people will find useful. And so, in the end, what is useful is important.
This, of course, in most cases, it is impossible to know for sure. But it looks like language design in general: think about everything that people want to do, and then look at how to set up primitives that allow people to do it. Would anyone like to present chemicals as objects? Yes, it would be useful. Would anyone like to present arbitrary chemical structures as graphs? Yes, it would be useful.
But to understand what to do, you need to have a deep understanding of what is actually presented in each case, and how everything is connected. And here philosophy must go to the meeting of chemistry, mathematics, physics, etc.
I am pleased to say that by the end of our hour-long meeting (supplemented by my 40 years of experience and 100 years of experience of all those present at the meeting), it seems to me that we found out the basis of a really good way to handle chemicals and chemical structures. Some more time must pass before it is fully developed and implemented in the Wolfram language. But ideas will help to understand how we calculate and reason about chemistry for many years to come. And for me, figuring out such things is an extremely pleasant pastime. And I'm just glad that in my long efforts to develop the Wolfram language, I do a lot.