NULL Note

Hello!
Long thought, that would write useful about Orakl, tried a bunch of topics. Each time it turned out too long, because it was carried deep into the wilds. Therefore, I decided to start with the simplest topic to assess the interest of the audience and its relation to my style of presentation (IMHO, the writer from me is not very). A few notes:

The note applies to Oracle DBMS, but most of the written concerns other DBMS;
It did not work out nicely: the source tag lang = "sql" highlights too little, how to do something else not up to date. If anyone shows, quickly bring everything in a more readable form.

So.

Main provisions

The special value NULL means the absence of data, a statement of the fact that the value is unknown. By default, columns and variables of any type can accept this value, unless NOT NULL imposed on them. Also, the DBMS automatically adds the NOT NULL to the columns included in the primary key of the table.

The main feature of NULL is that it is not equal to anything, even to another NULL. You cannot compare any value with it using any operators: = , < , > , like ... Even the expression NULL != NULL will not be true, because it is impossible to unambiguously compare one uncertainty with another. By the way, this expression will not be false either, because when calculating conditions, Oracle is not limited to the and states. Due to the presence of an element of uncertainty in the form of NULL, there is another state - .

Thus, Oracle does not operate with two-digit, but three-valued logic. Grandpa Codd put this feature into his relational theory, and Oracle, being a relational DBMS, completely follows its precepts. In order not to meditate on the “strange” results of queries, the developer needs to know the truth table of three-valued logic. You can get acquainted with it, for example, on the English Wikipedia: Three-valued_logic .
')
For convenience, we will do a procedure that prints the state of the boolean parameter:

 create procedure test_bool( p_bool boolean ) is begin case when p_bool = true then dbms_output.put_line('TRUE'); when p_bool = false then dbms_output.put_line('FALSE'); else dbms_output.put_line('UNKNOWN'); end case; end test_bool;

and enable the option to print messages to the console:

 set serveroutput on

The usual comparison operators pass before NULL:

 exec test_bool( null = null ); -- UNKNOWN exec test_bool( null != null ); -- UNKNOWN exec test_bool( null = 'a' ); -- UNKNOWN exec test_bool( null != 'a' ); -- UNKNOWN

NULL Comparison

There are special operators IS NULL and IS NOT NULL that allow you to make comparisons with NULLs. IS NULL returns true if the operand is NULL and false if it is not.

 select case when null is null then 'YES' else 'NO' end from dual; -- YES select case when 'a' is null then 'YES' else 'NO' end from dual; -- NO

Accordingly, IS NOT NULL does the opposite: returns true if the value of the operand is different from NULL and false if it is NULL:

 select case when 'a' is NOT null then 'YES' else 'NO' end from dual; -- YES select case when null is NOT null then 'YES' else 'NO' end from dual; -- NO

In addition, there are a couple of exceptions to the rules regarding comparisons with missing values. First, it is a DECODE function that considers two NULLs equivalent to each other. Second, these are composite indices: if the two keys contain empty fields, but all their non-empty fields are equal, then Oracle considers these two keys to be equivalent.

DECODE goes against the system:

 select decode( null , 1, 'ONE' , null, 'EMPTY' --     , 'DEFAULT' ) from dual;

An example with composite indices is in the paragraph about indices.

Boolean operations and NULL

Usually, the condition treated the same as . For example, if you select rows from a table and the calculation of the condition x = NULL in the WHERE gave an result, then you will not get a single row. However, there is a difference: if the expression () returns true, then () returns . The logical operators AND and OR also have their own characteristics when processing an unknown state. The specifics in the example below.

In most cases, the unknown result is processed as :

 select 1 from dual where dummy = null; --

Denial of the unknown gives the unknown:

 exec test_bool( not(null = null) ); -- UNKNOWN exec test_bool( not(null != null) ); -- UNKNOWN exec test_bool( not(null = 'a') ); -- UNKNOWN exec test_bool( not(null != 'a') ); -- UNKNOWN

OR operator:

 exec test_bool( null or true ); -- TRUE <- !!!!! exec test_bool( null or false ); -- UNKNOWN exec test_bool( null or null ); -- UNKNOWN

AND operator:

 exec test_bool( null and true ); -- UNKNOWN exec test_bool( null and false ); -- FALSE <- !!!!! exec test_bool( null and null ); -- UNKNOWN

IN and NOT IN operators

To begin, we will do some preliminary actions. For tests, create a table T with one numeric column A and four lines: 1, 2, 3 and NULL

 create table t as select column_value a from table(sys.odcinumberlist(1,2,3,null));

Enable trace request (for this you must have the role PLUSTRACE ).
In the listings for tracing, only a part of the filter is left to show where the conditions specified in the request unfold.

 set autotrace on

Preliminary actions are finished, let's work with operators now. Let's try to select all the records that are included in the set (1, 2, NULL) :

 select * from t where a in(1,2,null); --  [1,2] -- Predicate Information: -- filter("A"=1 OR "A"=2 OR "A"=TO_NUMBER(NULL))

As you can see, the string with NULL is not selected. This happened because the calculation of the predicate "A"=TO_NUMBER(NULL) returned the state . In order to include NULLs in the query result, you will have to specify this explicitly:

 select * from t where a in(1,2) or a is null; --  [1,2,NULL] -- Predicate Information: -- filter("A" IS NULL OR "A"=1 OR "A"=2)

Let's try now with NOT IN :

 select * from t where a not in(1,2,null); -- no rows selected -- Predicate Information: -- filter("A"<>1 AND "A"<>2 AND "A"<>TO_NUMBER(NULL))

No records at all! Let's see why the troika did not hit the query results. Calculate manually the filter that the DBMS applied for the case A=3 :

  3 <> 1 AND 3 <> 2 AND 3 <> to_number (NULL)
    \ / \ / \ /
    true AND true AND unknown
        \ / /
         true AND unknown
             \ /
              unknown

Because of the peculiarities of the three-valued logic, NOT IN is not at all friendly with NULLs: as soon as NULL has fallen into the selection conditions, don't wait for the data.

NULL and empty string

Here, Oracle departs from the ANSI SQL standard and declares the equivalence of NULL and the empty string. This is perhaps one of the most controversial features, which from time to time gives rise to multi-page discussions with the transition to the individual, watering each other with feces and other indispensable attributes of tough disputes. Judging by the documentation, Oracle itself would not mind changing this situation (it says that even now the empty string is treated as NULL, this may change in future releases), but today such a huge amount of code is written under this DBMS that and change the behavior of the system is hardly realistic. Moreover, they started talking about this at least from the seventh version of the DBMS (1992-1996), and now it is the twelfth on the way.

NULL and empty string are equivalent:

 exec test_bool( '' is null ); -- TRUE

an indispensable attribute of a tough dispute:

 select count(*) comments_count, avg(c.vote) avg_vote from habr.comments c join habr.posts p on( c.post_id = p.id ) where lower(p.title) like '%%' and lower(p.title) like '%%'; COMMENTS_COUNT AVG_VOTE --------------- --------- 100500 -100

If you follow the testament of the classic and look at the root, then the reason for the equivalence of the empty string and NULL can be found in the storage format of varchar and NULLs inside data blocks. Oracle stores the rows of the table in a header structure followed by columns of data. Each column is represented by two fields: the length of the data in the column (1 or 3 bytes) and, in fact, the data itself. If varchar2 has zero length, then there is nothing to write to the data field, it does not occupy a single byte, and a special value of 0xFF , indicating the absence of data, is written in the field with length. NULL is presented exactly the same: there is no data field, and 0xFF is written in the field with length. Developers Orakla could, of course, to separate these two states, but since ancient times they have been led.

Personally, the equivalence of the empty string and NULL seems quite natural and logical. The name “empty line” implies the absence of meaning, emptiness, a donut hole. NULL, in general, means the same thing. But there is an unpleasant consequence: if one can say with confidence about an empty string that its length is zero, then the length of a NULL is not determined at all. Therefore, the expression length('') will return NULL for you, and not zero, as you obviously expected. Another problem: you can not compare with an empty string. The expression val = '' returns the state , since, in fact, equivalent to val = NULL .

The length of the empty string is not defined:

 select length('') from dual; -- NULL

A comparison with an empty string is impossible:

 exec test_bool( 'a' != '' ); -- UNKNOWN

Critics of the approach proposed by Orakl suggest that the empty string does not necessarily indicate uncertainty. For example, a sales manager fills a customer card. He may indicate his contact phone number (555-123456), may indicate that he is unknown (NULL), or may indicate that there is no contact phone number (blank line). With storage of empty lines to implement the last variant will be problem. From the point of view of semantics, the argument is correct, but I always have a question for him that I never received a full answer: how does the manager enter an empty line in the “phone” field and how will he further distinguish it from NULL? There are options, of course, but still ...

Actually, if we talk about PL / SQL, then somewhere deep inside its engine there is an empty string and NULL are different. One way to see this is because associative collections allow you to save an element with an index of '' (empty string), but do not allow you to save an element with an index of NULL:

 declare procedure empty_or_null( p_val varchar2 ) is type tt is table of varchar2(1) index by varchar2(10); t tt; begin if p_val is not null then dbms_output.put_line(' '); else --      p_val t(p_val) := 'x'; -- ! ,   dbms_output.put_line(' '); end if; exception --     p_val  . , NULL when others then dbms_output.put_line('NULL'); end; begin empty_or_null( 'qwe' ); --   empty_or_null( '' ); --   empty_or_null( NULL ); -- NULL end;

Use these feints ears in practice is not worth it. To avoid problems, it is better to learn the rule from the docks: the empty string and the NULL in the Oracle are indistinguishable.

NULL math

^{This small paragraph was written on a Friday evening for beer, against the backdrop of the Friday REN-TVshnogo film.} ^{Rewrite his laziness, I'm sorry.}

Task. Before the marriage with Kolya, Masha had an unknown number of lovers. Kolya knows that after marriage, Masha had sex with him, Sasha and Vitya. Help Kolya find the exact number of Masha's lovers.

Obviously, we will not be able to help Kolya: an unknown number of Masha's lovers before marriage reduces all calculations to one value - is unknown. Oracle, although it was called an oracle, in this matter goes no further than the participants in the battle of psychics: it gives obvious answers only to obvious questions. Although, I must admit that Oracle is much more honest: in the case of Kolya, he will not be engaged in psychoanalysis and will immediately say: “I don’t know”:

 select decode( null + 10, null, '', '') a from dual; --  select decode( null * 10, null, '', '') a from dual; --  select decode( abs(null), null, '', '') a from dual; --  select decode( sign(null), null, '', '') a from dual; --

With concatenation, things are different: you can add NULL to the string and this will not change it. Such is the policy of double standards.

 select null ||'AA'|| null ||'BB'|| null from dual; -- AABB

NULL and aggregate functions

Almost all aggregate functions, with the exception of COUNT (and even then not always), ignore null values during calculations. If they did not do this, then the first null that has flown in would have led to the result of the function to an unknown value. Take for example the SUM function, which needs to sum the series (1, 3, null, 2) . If it took null values, we would get the following sequence of actions:
1 + 3 = 4; 4 + null = null; null + 2 = null 1 + 3 = 4; 4 + null = null; null + 2 = null .
It is unlikely that you will be satisfied with this calculation when calculating the aggregates, because you probably didn’t want to get this. And what would be hemorrhoids with the construction of data warehouses ... Brrrrr ...

Table with data. Used below many times:

 create table agg( id int, n int ); insert into agg values( 1, 1 ); insert into agg values( 2, 3 ); insert into agg values( 3, null ); insert into agg values( 4, 2 ); commit;

Empty values are ignored by aggregates:

 select sum(n) from agg; -- 6

The COUNT counting function, if used as a COUNT(*) or COUNT() , will take null values into account. However, if it is used in the form COUNT() , then empty values will be ignored.

with constant:

 select count(*) from agg; -- 4 select count(1+1) from agg; -- 4 select count(user) from agg; -- 4

With the expression:

 select count(n) from agg; -- 3 select count(id) from agg; -- 4 select count(abs(n)) from agg; -- 3

Also, be careful with features like AVG . Since it ignores empty values, the result for the N field will be (1+3+2)/3 , and not (1+3+2)/4 . Perhaps this calculation of the average you do not need. To solve such problems, there is a standard solution - use the NVL function:

 select avg(n) from agg; -- (1 + 3 + 2) / 3 = 2 select avg(nvl(n,0)) from agg; -- (1 + 3 + 0 + 2) / 4 = 1.5

Aggregate functions return the state , if they are applied to an empty dataset, or if it consists only of NULLs. The exceptions are the REGR_COUNT and COUNT() functions intended for counting the number of rows. They in the above cases will return zero.

Data set from NULLs only:

 select sum(n) from agg where n is null; --  select avg(n) from agg where n is null; --  select regr_count(n,n) from agg where n is null; -- 0 select count(n) from agg where n is null; -- 0

Empty data set:

 select sum(n) from agg where 1 = 0; --  select avg(n) from agg where 1 = 0; --  select regr_count(n,n) from agg where 1 = 0; -- 0 select count(n) from agg where 1 = 0; -- 0

NULL in OLAP

Very briefly about another feature associated with the aggregates. In multidimensional cubes, NULL in a query result can mean both a lack of data and a sign of grouping by dimension. The most disgusting thing is that you cannot distinguish between his two incarnations. Fortunately, there are special functions GROUPING and GROUPING_ID , which have a sharper eye. GROUPING() will return one if NULL in the dimension column indicates the grouping attribute on this column and zero if it contains a specific value (in particular, NULL). The GROUPING_ID function is a bit vector from GROUPING , in this post it is definitely superfluous.

In general, such a brief and confused information about dualism NULL in multidimensional analysis. Below is an example of using GROUPING , and for details on using the Data Warehousing Guide, chapter 21 .

Convenient feature sqlplus: when outputting data, replaces NULL with the specified string:

 set null [NULL]

Check NULL dualism in a multidimensional cube:

 with t as ( --   1  select 'IVAN' customer, 'KEFIR' product, 1 qty from dual union all --    2  select NULL customer, 'MOLOKO' product, 2 qty from dual union all --   2  select 'IVAN' customer, 'MOLOKO' product, 2 qty from dual ) select customer cust, grouping(customer) grp_c , product prod, grouping(product) grp_p , sum(qty) qty from t group by cube(customer, product) order by grp_c, grp_p; CUST GRP_C PROD GRP_P QTY ---- ----- ------ ----- ---- [NULL] 0 MOLOKO 0 2 --    2   IVAN 0 KEFIR 0 1 --   1   IVAN 0 MOLOKO 0 2 --   2   IVAN 0 [NULL] 1 3 --   3    [NULL] 0 [NULL] 1 2 --    2    [NULL] 1 MOLOKO 0 4 --    4   [NULL] 1 KEFIR 0 1 --    1   [NULL] 1 [NULL] 1 5 --    5

What's left overs

It turned out quite long, so some of the information had to be cut out. Namely:

NULL in indexes (regular, compound, binary, functional). There I have more listings than text;
analysis of functions for working with NULLs: documented and not so much;
work with NULLs in queries (all sorts of cases)
maybe I forgot to mention some other moments, write in the comments, add.

Criticism, additions and hints of the incompetence of the author (with reference to the dock) are welcome. Actually, for this purpose everything was started.

Source: https://habr.com/ru/post/127327/

All Articles