Our remote office consists of 30 people. The work is associated exclusively with SEO for the United States. And this suggests that day and night — as workers are scattered in different time zones — we search and find hundreds of links, contacts, and characteristics. Information is collected as an avalanche; on the day, the tables increase by 1,200-3,000 cells. Multiply this by half a year of work ... Even now the head is spinning.

I think that any virtual company of this type, which works with a large amount of data, at a certain stage thinks about changing the format of their storage. In this article, I want to touch on the problem described in the subject, and discuss with the commentators the issues put to the end of the material. I apologize in advance for the lack of details; we cannot take out the most delicious things for general discussion.
Microsoft Excel
Let's make flashback a year and a half ago. When the team is headed by one person (in this case, I), who places absolutely all the responsibilities on himself, he is able to manage all his “wealth” through Microsoft Excel without any problems. Whether it is links, email addresses, phone numbers or other information - the good old "Excel" is designed so that problems with its development does not arise. If at first the hands grow from the wrong place, it's okay - over time, all the important functions are mastered as needed.
Initially it was not so difficult to copy cells from one tab to another. This "monkey work" did not strain and even to some extent fascinated. All data fit in a single file, divided into tabs. Information that was accumulated and stored for archival purposes (that is, active access to it was not intended) was transferred to an additional file. The yard was 2010 year.
Google docs
Let's take a look at how the data of one person increased during the year of work of one person.
1 tab (.xls, Excel) => 10 tabs (.xls, Excel) => 5 files (.xls, Excel)
As a result, 3 files were formed, the total volume of which did not exceed 100 thousand of filled cells.
When working alone became unbearable, we decided to put together a virtual office, where, as expected, everyone will perform their own range of work.
2011 came the year ... Immediately, it was decided to transfer all databases to Google Docs to ensure simultaneous editing of documents. The main reasons for the transition to the online platform is not so much. But, you see, they outweigh the rest:
- Possibility of collaboration
- Storage reliability
- Free
Finding another online alternative to Google Docs was possible, but not necessary (sorry for the tongue-tied). Nothing was better, yes and no. We will not discuss here all possible options that we tested, so as not to cast a shadow over alternative free services. There is only one conclusion: there are not many of them, they all functionally lag behind Google Docs. If someone does not agree, please write in the comments. The author may be wrong.
However, a fly in the ointment was found, and not one, but a whole scoop. Very quickly, Google Docs’s shortcomings regarding Excel, and beyond that, began to emerge. Over time, the documents with which we worked actively became more and more, there were about 10 of them. Every day the documents were filled up with 500-1000 lines, and this still needs to be multiplied by 2 to get the number of filled cells. And here the limits of Google Docs made themselves felt. 400 thousand cells, gentlemen, no more.

Unfortunately, this is not all inconvenience. What else?
- There are not many necessary options (removal of duplicates, it is impossible to select a part of the text in cells and similar vital trifles)
- Compared to Excel, the speed of cell copying and the transition from one document to another decreased 2 times.
- Internet addiction (literally) and traffic consumption. No Internet - no data access.
- As we work with URLs, we have to clean up documents from hyperlinks.
In order not to be unfounded, I will hand over the word to my colleagues. That's what they complain about:
Arthur:Annoying 400,000 cells limit
Brakes, today it is generally impossible to use the docks, just do not loadPasha:Yeah, me too. I can't add more than 1000 links to the docIgor:In the docks, lags and frequent eros are always confusing and annoying! + Do not transfer more than 1000 cells at the same time.Olga:Currently, Google Docs is uncomfortable:
1. Transfer from the dock to the dock - you have to delete it, it is transferred only by tabs.
2. When searching, if it does not find it in one document, or for safety reasons, it is necessary to search alternately in 3-4 documents. If the transition to SQL allows you to combine all the data into one database, it will speed up the search and help avoid duplicate !!!
Well, today, after 13.00, the gugldoki simply refused to work. Lada: “do not transfer more than 1000 cells at the same time.”. Yesterday I couldn’t even transfer six cells) For dessert, there are two ordinary problems:
And once ...

And two ...

What have we come to, or a midlife crisis
5 files (.xls, Excel) => 20 Google Docs files => ??? :(
2011, June. Storing data in tables is becoming increasingly difficult. In addition, Google Docs limits are tightened: we have to create new documents, and those who work with mail - jump from one dock to another. Yesterday, two days in a row, individual employees could not reach heaven at all.
Splitting documents into smaller ones is not the best solution: you have to open several documents and search for the necessary information in each of them. And if there are doubles? Each link can not be checked 2 times.
Olya:I am now sending each link through the files “manage mail,” “manage. by mail2 ”and“ second run ”, sometimes also“ telephone calls and letters ”for checking, in general, it takes a couple of minutes, instead of searching in a few seconds for a single merged document.Imagine that ideally one manual search query (URL, email) takes 10 seconds to enter a document (this taking into account the fact that the search box is open regularly). This is without taking into account the fact that he will also scratch his back and yawn, before starting the search.
Even 10 minutes of saving per day is a gain at 4 hours of time per month (10 minutes are multiplied by 22 working days). If you multiply another 10 (the number of employees actively using Google Docs search), without saving these seconds, 40 working hours fly away into the pipe. And our payment is, in fact, hourly.
Where are we going?
All of the above suggests that all information requires a single database with partitioning into categories, not files. What I mean? The category change should be done with two clicks, not body movements like: copy a cell (try to get to the desired area the first time), paste it into another document, delete it from the old one. Yes, until Google Docs learned how to cut information from one tab to another, only within one book. And the stupid activation of hyperlinks makes it necessary to clean documents from active URLs regularly. In general, it starts to take out every little thing. Or another example.
We must abandon the tables in their Exel and GDocs entities.
I have long suggested that the boss start storing data not in Excel or Google Docs, but in SQL. We now have tens of thousands of lines with various data. All this is more or less reduced to a single format: date, URL, comment, etc. Therefore, we can transfer everything that we have accumulated into a single SQL database without bloodshed.
But what's the problem? SQL in the form of a database cannot be opened in Notepad, you will not be able to edit it manually. On the one hand, the database will be more difficult to learn, on the other hand, how to make data storage simple and accessible. We need a “human” shell, UI, and “sharpened” just for our needs. Actually, here they are, our requirements:
- Through a search, we can find everything we need: from having a link in the blacklist (blacklist) to a link in the reports (whitelist). This reduces the likelihood of resending the letter to the client and duplicates, respectively. With the help of search filters, we can filter out links in the database according to a given criterion.
- In the columns, we can indicate anything we want, and the lines will not spread to the full screen. Employees will be able to export / import data in SQL format,
- The rows of the table will be “nominal”: contain a record that Ivanov found the information that entailed (in-) pleasantness. Thus, we will be able to trace the fate of each link.
- Management with the help of the cron scheduler, which can schedule everything with reference to what we want: to recheck PR and Ext. links. In fact, there is no need for Google Docs scripts.
- There are no such limits in SQL as in Google Docs. We have enough base for life.
- Convenient backup in a couple of clicks.
- In order to transfer a string from one instance to another, we will not need to jump between files. We change the category for links (you can use the checkboxes) - and that's it. Example? We did the work on the links - changed the category to "XXX". Telephone operators open their XXX category (a list of their categories is displayed). If the link is no longer interested, change the category to Blacklist and edit the note. And so on.
- Fast import / export. For example, we can sort the data by the necessary parameters in a couple of mouse clicks, export the tables to CSV, just by clicking on the icon.
Of course, this is not all the requirements, but today it is sorely lacking in the same Google Docs.
Presumably, the SQL-base can work under control of CMS Drupal. Why Drupal? For it, a certain number of modules are written, which we will be able to use for our needs (at least a couple of them, the rest will be written later). This CMS has a large community, including Russian-speaking. The scalability of the load and other buns are applied.

But here is the problem. We do not want to reinvent the wheel. Still, the decision will fundamentally affect the performance of the entire numerous office. No matter how it turned out that, transferring all the data to Drupal, we would not have to transfer everything back then, sprinkling ashes on our heads? Then I will be individually individually long and painfully punished for the fact that I proposed such a solution. Both bosses and subordinates.
So, I
need the help of experts . I suggest the following questions for discussion:
- Which DBMS is best suited for storing large amounts of information, in the region of 10 million cells and more. MySQL? Oracle ?!
- Database management through Drupal - how effective is it?
- What solutions, in addition to databases, can be used in a relatively small company?
- Does it make sense to keep a Drupal programmer on an ongoing basis? At first, we can move only part of the data to SQL, and store something in GDocs, for example, reports.
Thank you in advance for any advice.