📜 ⬆️ ⬇️

SELECT ... WHERE queries in Cassandra 2.0 in CQL3

Cassandra (hereinafter C * ) limits WHERE queries because of its internal structure. This article will seem complicated, confusing, if you have not read the first article from the cycle, where I told you how C * works. Read it, please, before proceeding to this.

The purpose of this article is to serve as a reference for C * beginners.

Some differences CQL from SQL


In SELECT queries Cassandra Query Language ( CQL ) there are no familiar SQL operations JOIN , GROUP BY . And the WHERE operation is severely curtailed. In SQL, you can filter by any column, whereas in CQL only by partition keys ( partition keys ), cluster keys ( clustering columns ) and secondary indexes .
Note: In C * 2.0, you can create secondary INDEX in any column like SQL indexes. In fact, Cassandra's secondary indexes are an extra table hidden from you, so the performance of WHERE queries on them is worse than queries on key columns.


Disclaimer


 CREATE TABLE ad_click ( reseller_id text, day text, -- day in the format of 'YYYY-MM-DD' time timestamp, ad_id text, amount float, PRIMARY KEY ((reseller_id, day), time, ad_id) --   (reseller_id,day)    (time,ad_id) ) WITH CLUSTERING ORDER BY (time DESC); 

')

Filtering data using WHERE


Roughly speaking the word "filtering" is inappropriate here. Correct to say search. C * almost does not allow filtering.

Each WHERE request tells Cassandra to find the node where the string is stored and pass the request there.

Comparing columns with each other

Since there are no JOIN , it is impossible to compare columns with each other.
 SELECT * from ad_click WHERE maxTimeuuid(day) = maxTimeuuid(3141592653589); -- ; 


AND, OR

Multiple WHERE query conditions cannot be combined with the OR operator, only AND work. And only a few comparison operators work, and even then not always.

Equality =

The use of equality operators (=) is almost unlimited. It is limited to key and index columns. It is also necessary to compare all the previous key columns before comparing the next one.

Right:
 SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND -- OK day = '2013-11-29' AND -- OK time = 3141592653589 AND -- OK ad_id = '890_567_234'; -- OK 

Wrong:
 SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND day = '2013-11-29' AND amount = 0; -- !    . SELECT * FROM ad_click WHERE day = '2013-11-29' AND -- !    reseller_id. time = 3141592653589 AND ad_id = '890_567_234'; SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND day = '2013-11-29' AND ad_id = '890_567_234'; -- !    time. 


Enable IN

The use of the inclusion operator (IN) is limited to the last column in the distribution key and the last column in the cluster key.

Right:
 SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND day IN ('2013-11-28', '2013-11-29') AND -- OK time = 3141592653589 AND ad_id IN ('890_567_234', '890_567_010'); -- OK 

Wrong:
 SELECT * FROM ad_click WHERE reseller_id IN ('supaboobs') AND -- !      . day = '2013-11-28' AND time = 3141592653589 AND ad_id = '890_567_234'; SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND day = '2013-11-28' AND time IN (3141592653589) AND -- !      . ad_id = ('890_567_234'); 


Comparison Operators =>> = <<=


Comparison Syntax

The column name must be to the left of the comparison operator, the value to the right.
Right:
 SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND -- OK day = '2013-11-29'; -- OK 

Wrong:
 SELECT * FROM ad_click WHERE 'supaboobs' = reseller_id AND -- !    . '2013-11-29' = day AND -- !    . 3141592653589 < time; -- !    . 


Comparison Opportunities

They can be used in the last column of your CQL query, and the column should be exclusively cluster.

Right:
 SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND day = '2013-11-29' AND time >= 3141592653589; -- OK SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND day = '2013-11-29' AND time = 3141592653589 AND ad_id > '890_567_234'; -- OK 

Wrong:
 SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND day = '2013-11-29' AND time >= 3141592653589 AND -- !     . ad_id = '890_567_234'; SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND day = '2013-11-29' AND time >= 3141592653589 AND ad_id < '890_567_234'; -- !     . SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND day = '2013-11-29' AND ad_id < '890_567_234'; -- !     time. SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND day < '2013-11-29'; -- !    . 


Bypass Maneuver - ALLOW FILTERING

You can not specify a distribution key, and leave only the cluster key, putting at the end of the query ALLOW FILTERING . All other restrictions remain.
IMPORTANT! Timeout in this case is very likely, because you run over all the nodes, across all rows.


Right:
 SELECT * FROM ad_click WHERE time = 3141592653589 AND -- OK ad_id > '890_567_234' -- OK ALLOW FILTERING; SELECT * FROM ad_click WHERE time >= 3141592653589 AND -- OK time <= 3141592653589 -- OK ALLOW FILTERING; 

Wrong:
 SELECT * FROM ad_click WHERE time >= 3141592653589 AND ad_id > '890_567_234' -- !    . ALLOW FILTERING; SELECT * FROM ad_click WHERE time >= 3141592653589 AND time <= 3241592653589 AND ad_id = '890_567_234' -- !     . ALLOW FILTERING; 


Secondary indexes

Secondary indexes support only the equality operator, that's all. Requests for secondary indexes can be done with or without other keys.

To create a secondary index, execute the following command:
 CREATE INDEX on ad_click (amount); 


Right:
 SELECT * FROM ad_click WHERE amount = 0.0075; -- OK SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND day = '2013-11-29' AND time = 3141592653589 AND ad_id = '890_567_234' AND amount = 0.0075; -- OK 

Wrong:
 SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND day = '2013-11-29' AND time = 3141592653589 AND ad_id = '890_567_234' AND amount > 0.0; -- !       . 


Conclusion


I hope you will not have any more questions "why is my SELECT not working?"

Sources




Previous article of the cycle .

Source: https://habr.com/ru/post/205176/


All Articles