I got a small problem, somewhere for 4 hours of coding, which I found entertaining.
There is a user base of 10 million:
class User{ int id; time_t birthday; // int gender; // int city_id; // time_t time_reg; // };
You need to do a quick search in the database, taking into account its not frequent updates. The search can take place in the fields: age, sex, city. Search fields can be specified in grouping or separately, for example:
The issuance data must be sorted by the date of registration of users, and issued by page for 100 users.
Tip 1: DBMS will not give the desired speed.
Tip 2: Recall the complexity of operations with sets, the complexity of standard algorithms.
Tip 3: Check the search time of the implemented algorithm, a good result is about 0.005 seconds.
From ready-made containers for this task, you can use std :: vector , sorted by the desired search criteria, and std :: lower_bound with the implementation:
template <class ForwardIterator, class T, class Compare> ForwardIterator lower_bound (ForwardIterator first, ForwardIterator last, const T& val, Compare comp);
Or use std :: multiset . I chose std :: multiset for the reason that you can put a comparator into it that will be used for insertion and search.
I wanted to lay out an easy search expansion, so I followed the precepts of Alexandrescu and implemented a comparator as a strategy.
To save memory, multiset stores pointers to User, so the interface search criteria looks like this:
class AbstractCriterion { public: virtual ~AbstractCriterion() = default; virtual bool matched(const User &user) const noexcept = 0; User patternForSearch() const noexcept { User user; initPattern(user); return user; } virtual int signature() const noexcept = 0; virtual void initPattern(User &user) const noexcept = 0; virtual bool operator()(const User *first, const User *second) const noexcept = 0; };
Method | Description |
---|---|
matched | User matches this criterion or not |
patternForSearch | template method for generating a search key |
operator () | comparison of elements |
signature | used as an identifier criteria |
Examples of the implementation of two criteria:
class CityCriterion: public AbstractCriterion { public: CityCriterion() : m_city(0) { } CityCriterion(int city) : m_city(city) { } bool matched(const User &user) const noexcept final { return user.cityId == m_city; } void initPattern(User &user) const noexcept final { user.cityId = m_city; } virtual int signature() const noexcept final { return Signatures::City; } bool operator()(const User *first, const User *second) const noexcept final { return first->cityId < second->cityId; } private: int m_city; }; class GenderCriterion: public AbstractCriterion { public: GenderCriterion() : m_gender(No) { } GenderCriterion(int gender) : m_gender(gender) { } bool matched(const User &user) const noexcept final { return user.gender == m_gender; } void initPattern(User &user) const noexcept final { user.gender = m_gender; } virtual int signature() const noexcept final { return Signatures::Gender; } bool operator()(const User *first, const User *second) const noexcept final { return first->gender < second->gender; } private: int m_gender; };
Not necessarily limited to the available fields of the structure. For example, age can be calculated:
class AgeCriterion: public AbstractCriterion { public: static const unsigned int SECONDS_IN_YEAR = 31536000; AgeCriterion() : m_age(0) { time(&m_curTime); } AgeCriterion(int age) : m_age(age) { time(&m_curTime); } bool matched(const User &user) const noexcept final { const unsigned int age = difftime(m_curTime, user.birthday) / SECONDS_IN_YEAR; return m_age == age; } void initPattern(User &user) const noexcept final { user.birthday = m_curTime - (m_age + 1) * SECONDS_IN_YEAR; } virtual int signature() const noexcept final { return Signatures::Age; } bool operator()(const User *first, const User *second) const noexcept final { const int firstAge = difftime( m_curTime, first->birthday) / SECONDS_IN_YEAR; const int secondAge = difftime( m_curTime, second->birthday) / SECONDS_IN_YEAR; return firstAge < secondAge; } private: size_t m_age; time_t m_curTime; };
To search by two fields, I entered the following template class and function:
template<typename FirstCriterion, typename SecondCriterion> class BiCriterion: public AbstractCriterion { public: BiCriterion(const FirstCriterion &first, const SecondCriterion &second) : m_first(first), m_second(second) { } bool matched(const User &user) const noexcept final { return m_first.matched(user) && m_second.matched(user); } void initPattern(User &user) const noexcept final { m_first.initPattern(user); m_second.initPattern(user); } int signature() const noexcept final { const auto sign1 = m_first.signature(); const auto sign2 = m_second.signature(); return sign1 | sign2; } bool operator()(const User *first, const User *second) const noexcept final { const bool less = m_first(first, second); if (!less && !m_first(second, first)) { return m_second(first, second); } else { return less; } } private: FirstCriterion m_first; SecondCriterion m_second; }; template<typename FirstCriterion, typename SecondCriterion> BiCriterion<FirstCriterion, SecondCriterion> AND(const FirstCriterion &first, const SecondCriterion &second) { return BiCriterion<FirstCriterion, SecondCriterion>(first, second); };
If I want to find men at the age of 30, then
auto criterion = AND(AgeCriterion(30), GenderCriterion(Male));
std :: multiset wrapped in a class UserDataBase:
class UserDataBase { using SetComparator = std::function<bool(User *, User *)>; using Multiset = std::multiset<User *, SetComparator>; std::vector<User *> m_users; std::map<int, Multiset> m_searchMap; public: using SearchResultIteratorPtr = std::unique_ptr<ISearchResultIterator>; UserDataBase() = default; ~UserDataBase(); template<typename Criterion> void registryCriterion(const Criterion &criterion) { m_searchMap[criterion.signature()] = Multiset(criterion); } void append(const User &user) { auto item = new User(user); m_users.push_back(item); for (auto iter = m_searchMap.begin(); iter != m_searchMap.end(); ++iter) { iter->second.insert(item); } } template<typename Criterion> SearchResultIteratorPtr search(const Criterion &criterion) const { typedef SearchResultIterator<Multiset::const_iterator, Criterion> ResultIterator; const auto end = Multiset::const_iterator(); SearchResultIteratorPtr iterator(new ResultIterator(end, end, criterion)); User pattern; pattern = criterion.patternForSearch(); if (m_searchMap.count(criterion.signature())) { const auto &set = m_searchMap.at(criterion.signature()); auto iter = set.find(&pattern); if (iter != set.end()) { iterator.reset(new ResultIterator(iter, set.end(), criterion)); } } return iterator; } };
It seems simple. First, register the search criteria, then add the elements:
UserDataBase base; base.registryCriterion(AgeCriterion()); base.registryCriterion(GenderCriterion()); base.registryCriterion(AND(AgeCriterion(), GenderCriterion())); base.registryCriterion(AND(CityCriterion(), GenderCriterion())); //.. for (int i = 0; i < MAX_USERS; ++i) { User usr; ifs >> usr; base.append(usr); }
There is nothing interesting in the search method itself. Only returns a pointer to an ISearchResultIterator to cut the type information.
An iterator is a wrapper over an std :: multiset iterator that stores search criteria:
template<typename MultisetIterator, typename Criterion> class SearchResultIterator: public ISearchResultIterator { public: SearchResultIterator(MultisetIterator iter, MultisetIterator end, const Criterion &criterion) : m_iter(iter), m_end(end), m_criterion(criterion) { } bool isValid() const noexcept final { return (m_iter != m_end) && (m_criterion.matched(**m_iter)); } bool next() noexcept final { if (isValid()) { ++m_iter; return m_criterion.matched(**m_iter); } else { return false; } } const User &user() const noexcept final { if (isValid()) { return **m_iter; } else { return m_emptyUser; } } private: MultisetIterator m_iter; MultisetIterator m_end; Criterion m_criterion; User m_emptyUser; };
We go on std :: multiset until we reach the end and the elements match the search criteria.
Usage looks like this:
auto iterator = base.search(AND(AgeCriterion(30), GenderCriterion(Male))); if (iterator->isValid()) { printReuslt(iterator); } //... void printReuslt(std::unique_ptr<ISearchResultIterator> &iter) { int cnt = 1; std::vector<User> users; users.reserve(100); do { users.push_back(iter->user()); if (cnt == 100) { break; } ++cnt; } while (iter->next()); std::sort(users.begin(), users.end(), timeSort); std::cout << std::setw(8) << "USR ID" << std::setw(12) << std::left << " REG" << std::setw(5) << std::right << "GNDG" << std::setw(6) << "CITY" << std::setw(12) << "BIRTHDAY" << std::endl; for (const auto &usr: users) { std::cout << std::setw(8) << usr.id << std::setw(12) << usr.timeReg << std::setw(5) << usr.gender << std::setw(6) << usr.cityId << std::setw(12) << usr.birthday << std::endl; } }
In principle, when outputting, it is possible not to sort elements by time, if we use such a property that the same elements in a multiset are arranged in the order in which they were added.
It seems that there is not much code that can be written in 4 hours.
Source: https://habr.com/ru/post/312830/