Hello! Today there was an interesting problem on data sampling, the solution of which I decided to share.
So, given:
Two tables -
shops and
products')
Roughly speaking - a platform where different stores place their goods.
And so, there was a need to make the issuance of goods on the main page, but so that the user did not see a bunch of goods from one store. Shops need to alternate.
shops:- id int not null auto_increment primary key,
- name varchar (255) null
products:- id int not null auto_increment primary key,
- shop_id int not null,
- name varchar (255) null,
A little googling - no sensible solution was found. But there was a thought how to implement the sample with alternating stores.
Initially I will describe the algorithm:
- You must select all products and sort them by store.
- Then number each product, starting with 1 with an interval equal to the number of stores.
- When numbering, as soon as the goods of one store end - the numbering is reset to zero, shifted by one, and starts again
- Select products by sorting them by numbered field.
And all this using MySQL. And preferably one request.
Putting a plan in your head, you can begin to implement it. What do we need?
- @i - the counter that will number our products
- @cnt - Number of stores
- @delta - the delta on which the counter is shifted during the numbering of the goods of the next store
- @cur - id of the current store, for adding a delta and resetting the counter when numbering a new store
Let's declare our variables:
set @cnt = 0; set @i = 0; set @delta = 0; set @cur = 0;
Next, we assign the initial values (the number of stores and the id of the first store).
select @cur:=id from shops order by id limit 1; select @cnt:=count(id) from shops;
Now you can proceed to the sample itself. What do we need?
- It is necessary to number the goods at intervals equal to the number of stores.
- At the end of the goods of one store - reset the counter, add a delta and change the current store.
I got this query:
select id, shop_id, @i:=@i+@cnt as counter, IF(@cur<>shop_id,@delta:=@delta+1,@delta) as delta, IF(@cur<>shop_id,@i:=@delta,@i) as cur, IF(@cur<>shop_id,@cur:=shop_id,@cur) as curshop from t_product order by shop_id
Read more about what is here:
@i:=@i+@cnt
In each row, we increase our counter by a number equal to the number of stores. Ie, if we have 5 stores, then we will have the following numbering: 0, 5, 10, 15, etc.
IF(@cur<>shop_id,@delta:=@delta+1,@delta) as delta
As soon as we have a new store, we increase the shift by one. Those. for the first store, the shift will be 0, for the second, 1, and so on.
IF(@cur<>shop_id,@i:=@delta,@i) as cur,
When changing store, we also need to reset our counters to start numbering products from the beginning, without forgetting to add a shift.
IF(@cur<>shop_id,@cur:=shop_id,@cur) as curshop
And in the end - to update the current store, the products of which we number ...
As a result, we get a sample of the type:
id | shop_id | counter | delta | cur | curshop |
---|
43989 | one | ten | 0 | ten | one |
46989 | one | 20 | 0 | 20 | one |
114172 | one | thirty | 0 | thirty | one |
83989 | one | 40 | 0 | 40 | one |
67172 | one | 50 | 0 | 50 | one |
94672 | 2 | eleven | one | eleven | 2 |
6489 | 2 | 21 | one | 21 | 2 |
41989 | 2 | 31 | one | 31 | 2 |
61672 | 2 | 41 | one | 41 | 2 |
97489 | 3 | 12 | 2 | 12 | 3 |
Here we see that the counter is added correctly, when the store is changed, it is reset, a shift is added, and the numbering starts from the beginning (taking into account the shift).
Actually, the case remains for small. Wrap the resulting selection in a subquery and sort by our counter:
select id as product_id, shop_id, cur from ( select id, shop_id, @i:=@i+@cnt as counter, IF(@cur<>shop_id,@start:=@delta+1,@delta) as delta, IF(@cur<>shop_id,@i:=@delta,@i) as cur, IF(@cur<>shop_id,@cur:=shop_id,@cur) as curmag from products order by shop_id ) as A order by cur ;
Voila! We got a sample of goods with alternating stores:
product_id | shop_id | cur |
---|
4187 | one | ten |
7483 | 2 | eleven |
4045 | 3 | 12 |
9091 | four | 13 |
1457 | five | 14 |
2387 | 6 | 15 |
8109 | 7 | sixteen |
1445 | eight | 17 |
2102 | 9 | 18 |
9245 | ten | nineteen |
6744 | one | 20 |
7854 | 2 | 21 |
2164 | 3 | 22 |
There is one minus - the goods at each store go in order. Those. at the beginning we will see the very first product of the first store, then the first product of the second store, the third, fourth, etc. Next come the second goods stores, third and so on.
In order to get rid of this pattern, we need to mix products in the initial sample, wrapping it in another subquery:
select id as product_id, shop_id, cur from ( select id, shop_id, @i:=@i+@cnt as counter, IF(@cur<>shop_id,@start:=@delta+1,@delta) as delta, IF(@cur<>shop_id,@i:=@delta,@i) as cur, IF(@cur<>shop_id,@cur:=shop_id,@cur) as curmag from products order by shop_id from (select id, shop_id from products order by shop_id, rand()) as A order by shop_id) as B order by cur;
So our products will already be mixed before they are numbered.
Actually, the solution of the problem ended in this. Full request can be viewed under the cut.
Full interleaved query set @cnt = 0; set @i = 0; set @start = 0; set @cur = 0; select @cur:=id from shops order by id limit 1; select @cnt:=count(id) from shops; select id as product_id, shop_id, cur from ( select id, shop_id, @i:=@i+@cnt as counter, IF(@cur<>shop_id,@start:=@delta+1,@delta) as delta, IF(@cur<>shop_id,@i:=@delta,@i) as cur, IF(@cur<>shop_id,@cur:=shop_id,@cur) as curmag from products order by shop_id from (select id, shop_id from products order by shop_id, rand()) as A order by shop_id) as B order by cur;
For the sake of interest - looked at the sampling rate. The results in my opinion turned out to be quite good:
10 stores, 10,000 products - ~ 16ms (0.016s)
100 stores, 1,000,000 products - ~ 2568ms (2.568s)
100 shops, 10,000,000 products - 129951ms (2m 9.951s)
I think that these are not bad results, although, of course, we need to test it in combat mode.
PS For me, there is only one unexplained question. Everything is good, but what to do with pagination? After all, each following page is a new request.
Accordingly, goods mixed in stores will receive a new sequence number and may appear in the sample more than once.
If you have thoughts on this subject - I will be grateful to hear them in the comments.Thank you all for your attention)