When I needed to implement shortcuts for Java “like in the web two-zero”, googling did not help to find a single library containing a similar type of collection.
I decided to do it myself.
So, we need to store objects in the collection of this type (let's call it, say, LabelsMultiMap). Both objects and labels can be of any type. The number of labels on the top is unlimited, as well as the number of objects. More than 1 object can be described with the same set of labels. At one object one label can occur only 1 time.
')
Valid labels example:
Shortcuts | Objects |
---|
green, wooden, alive | tree |
green, wooden, lifeless | bench |
green, alive, croak | frog |
The collection should allow:
- put () - put objects with a list of attached tags into it
- getValues () - return objects contained in the collection
- findValues () - search for objects whose labels contain the requested set of labels
- findValuesOnlyIn () - search for only those objects, all labels of which are included in the requested set of labels
I will explain the last 2 points in the following table for the objects that were considered in the previous example:
Labels passed as an argument | findValues () | findValuesOnlyIn () |
---|
green, wooden | tree, bench | |
green, wooden, alive, lifeless | | tree, bench |
| tree, bench, frog | |
The search should be fast, so for the indexing we will use bit masks. Those. the object itself is stored in the array, and its sequence number in this array is the sequence number of the bit in the bit mask.
Again, back to our example:
Numbering of objects in the array: 0 - tree, 1 - bench, 2 - frog.
Then the bit mask for the green label will be {1, 1, 1}, for wooden - {1, 1, 0}, for alive - {1, 0, 1}, lifeless - {0, 1, 0}, croak - { 0, 0, 1}.
Bit masks allow you to simplify the search for objects by label, simply by performing binary operations: AND, OR, NOT, etc ... As a result, if a single bit remains at a certain position, then by its number you can easily extract the desired object from the array.
We start the implementation:
Here
V is the type of objects,
L is the type of labels,
labelsBitSets is the storage of
bitmasks of labels,
values is the storage of objects:
public class LabelsMultiMap<L, V> { private final Map<L, BitSet> labelsBitSets = new HashMap<>(); private final List<V> values = new ArrayList<>(); }
The method that places a new object with its descriptive labels in our collection:
Here we put our object in
values in the
addValue () method, which will return the index of the new array cell. Next, for each label, we check if there is a bitmask for this label, if not, create an empty mask, add to
labelsBitSets , if it already exists, return the old mask, and set a single bit in it by the position of the object in the array, which is stored in
i :
public void put(Set<L> labels, V value) { int i = addValue(value); for(L label : labels) { BitSet bitSet = getOrCreateLabel(label); bitSet.set(i); } }
The method that returns all objects from the collection:
public List<V> getValues() { return Collections.unmodifiableList(values); }
A method that searches for objects whose labels contain the requested set of labels:
public Collection<V> findValues(Set<L> labels) { Iterator<L> it = labels.iterator();
If the list of labels in the argument is empty, then we return the entire list:
if (!it.hasNext()) { return getValues(); }
If the first available label did not find a single bit mask, then return an empty result:
BitSet firstBitSet = labelsBitSets.get(it.next()); if (firstBitSet == null) { return Collections.emptySet(); }
Initialize the battery, where we will accumulate the bit masks found using the AND operation. Used clone (), because BitSet has no copy constructor:
BitSet accumulator = (BitSet) firstBitSet.clone();
We
accumulate all masks in
accumulator , if we did not find at least one bit mask for one of the following shortcuts, then we return an empty collection:
while (it.hasNext()) { BitSet nextBitSet = labelsBitSets.get(it.next()); if (nextBitSet == null) { return Collections.emptySet(); } accumulator.and(nextBitSet); }
We return the result in the form of a new collection on the accumulated mask in the
accumulator :
return new ValuesByBitSetCollection<>(accumulator, values); }
Well, the last method that searches only those objects, all the labels of which are included in the requested set of labels:
In the
inAccumulator, we accumulate the bit masks that are in our query, and in the
outAccumulator — which are not in our query; if the original query is empty, we return an empty set:
public Collection<V> findValuesOnlyIn(Set<L> labels) { if (labels.isEmpty()) { return Collections.emptySet(); } BitSet inAccumulator = new BitSet(values.size()); BitSet outAccumulator = new BitSet(values.size());
We run over all the saved labels in our collection, putting off their bit masks in
in- / outAccumulator , depending on whether they are in
labels , at the end just subtract one set from another and return the result as a collection by the resulting mask.
for(Map.Entry<L, BitSet> bitSetEntry : labelsBitSets.entrySet()) { BitSet accumulator = labels.contains(bitSetEntry.getKey()) ? inAccumulator : outAccumulator; accumulator.or(bitSetEntry.getValue()); } inAccumulator.andNot(outAccumulator); return new ValuesByBitSetCollection<>(inAccumulator, values); }
Helper Classes and MethodsAuxiliary function for adding an element to an array (add an element to the end of the array, return its position):
private int addValue(V value) { values.add(value); return values.size() - 1; }
Auxiliary function to return the bit mask for this label:
private BitSet createOrGetLabel(L label) { BitSet ret = labelsBitSets.get(label); if (ret == null) { ret = new BitSet(values.size()); labelsBitSets.put(label, ret); } return ret; }
A helper class that allows you to impose a bit mask on an array, in
size, cache the number of unit bits in the mask (
BitSet.cardinality () ):
private static class ValuesByBitSetCollection<V> extends AbstractCollection<V> { private final BitSet bitSet; private final List<V> values; private int size = -1; private ValuesByBitSetCollection(BitSet bitSet, List<V> values) { this.bitSet = bitSet; this.values = values; } @Override public boolean isEmpty() { return bitSet.isEmpty(); } @Override public Iterator<V> iterator() { return new ValuesByBitSetIterator<>(bitSet, values); } @Override public int size() { if (size < 0) { size = bitSet.cardinality(); } return size; } }
A helper class for iterating over this collection, at
index we store the position of the next set bit (or -1 if there is no such bit anymore):
private static class ValuesByBitSetIterator<V> implements Iterator<V> { private final BitSet bitSet; private final List<V> values; private int index; private ValuesByBitSetIterator(BitSet bitSet, List<V> values) { this.bitSet = bitSet; this.values = values; index = bitSet.nextSetBit(0); } @Override public boolean hasNext() { return index >= 0; } @Override public V next() { if (index < 0) { throw new NoSuchElementException(); } V ret = values.get(index); index = bitSet.nextSetBit(index + 1); return ret; } @Override public void remove() { throw new UnsupportedOperationException(); } }