WHERE
queries because of its internal structure. This article will seem complicated, confusing, if you have not read the first article from the cycle, where I told you how C * works. Read it, please, before proceeding to this.SELECT
queries Cassandra Query Language ( CQL ) there are no familiar SQL operations JOIN
, GROUP BY
. And the WHERE
operation is severely curtailed. In SQL, you can filter by any column, whereas in CQL only by partition keys ( partition keys ), cluster keys ( clustering columns ) and secondary indexes .Note: In C * 2.0, you can create secondaryINDEX
in any column like SQL indexes. In fact, Cassandra's secondary indexes are an extra table hidden from you, so the performance ofWHERE
queries on them is worse than queries on key columns.
ad_click
table created in the previous , second article. It also explains the terminology used in this article. I also recommend at least a look. CREATE TABLE ad_click ( reseller_id text, day text, -- day in the format of 'YYYY-MM-DD' time timestamp, ad_id text, amount float, PRIMARY KEY ((reseller_id, day), time, ad_id) -- (reseller_id,day) (time,ad_id) ) WITH CLUSTERING ORDER BY (time DESC);
WHERE
request tells Cassandra to find the node where the string is stored and pass the request there.JOIN
, it is impossible to compare columns with each other. SELECT * from ad_click WHERE maxTimeuuid(day) = maxTimeuuid(3141592653589); -- ;
WHERE
query conditions cannot be combined with the OR
operator, only AND
work. And only a few comparison operators work, and even then not always. SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND -- OK day = '2013-11-29' AND -- OK time = 3141592653589 AND -- OK ad_id = '890_567_234'; -- OK
SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND day = '2013-11-29' AND amount = 0; -- ! . SELECT * FROM ad_click WHERE day = '2013-11-29' AND -- ! reseller_id. time = 3141592653589 AND ad_id = '890_567_234'; SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND day = '2013-11-29' AND ad_id = '890_567_234'; -- ! time.
SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND day IN ('2013-11-28', '2013-11-29') AND -- OK time = 3141592653589 AND ad_id IN ('890_567_234', '890_567_010'); -- OK
SELECT * FROM ad_click WHERE reseller_id IN ('supaboobs') AND -- ! . day = '2013-11-28' AND time = 3141592653589 AND ad_id = '890_567_234'; SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND day = '2013-11-28' AND time IN (3141592653589) AND -- ! . ad_id = ('890_567_234');
SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND -- OK day = '2013-11-29'; -- OK
SELECT * FROM ad_click WHERE 'supaboobs' = reseller_id AND -- ! . '2013-11-29' = day AND -- ! . 3141592653589 < time; -- ! .
SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND day = '2013-11-29' AND time >= 3141592653589; -- OK SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND day = '2013-11-29' AND time = 3141592653589 AND ad_id > '890_567_234'; -- OK
SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND day = '2013-11-29' AND time >= 3141592653589 AND -- ! . ad_id = '890_567_234'; SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND day = '2013-11-29' AND time >= 3141592653589 AND ad_id < '890_567_234'; -- ! . SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND day = '2013-11-29' AND ad_id < '890_567_234'; -- ! time. SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND day < '2013-11-29'; -- ! .
ALLOW FILTERING
. All other restrictions remain.IMPORTANT! Timeout in this case is very likely, because you run over all the nodes, across all rows.
SELECT * FROM ad_click WHERE time = 3141592653589 AND -- OK ad_id > '890_567_234' -- OK ALLOW FILTERING; SELECT * FROM ad_click WHERE time >= 3141592653589 AND -- OK time <= 3141592653589 -- OK ALLOW FILTERING;
SELECT * FROM ad_click WHERE time >= 3141592653589 AND ad_id > '890_567_234' -- ! . ALLOW FILTERING; SELECT * FROM ad_click WHERE time >= 3141592653589 AND time <= 3241592653589 AND ad_id = '890_567_234' -- ! . ALLOW FILTERING;
CREATE INDEX on ad_click (amount);
SELECT * FROM ad_click WHERE amount = 0.0075; -- OK SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND day = '2013-11-29' AND time = 3141592653589 AND ad_id = '890_567_234' AND amount = 0.0075; -- OK
SELECT * FROM ad_click WHERE reseller_id = 'supaboobs' AND day = '2013-11-29' AND time = 3141592653589 AND ad_id = '890_567_234' AND amount > 0.0; -- ! .
Previous article of the cycle .
Source: https://habr.com/ru/post/205176/
All Articles