📜 ⬆️ ⬇️

Excel, SQL and the legendary barometer - we solve a simple problem in different ways

Last week, an old habrotext “ Strategy for a technical interview ” surfaced in some discussion. More precisely, the task number 4 given in it
Given: .xls (Excel) file with one sheet in 4 numeric columns and 1000 lines.
Required: Load it into the SQL database, the table with the corresponding columns is available. Well, first, estimate the time for a decision.

Well, I was wondering how many different and simple ways I can solve this problem, using only what I have on my computer.



Update: In the comments tell the methods of filling the columns without "pulling": one , two
')

0. Before getting started


In terms of the problem there are two very important points:


Both of these points affect the success of the solution of the task, which consists in moving their XLS data to SQL.

I'll start with the second. The data in the table may not allow you to record the data that you have. Well, for example, if some column is a unique id, and in the existing table such id already exists. It's simple. You learn what to do with the data and either clear the table with the first operation, or do REPLACE instead of INSERT.

And now about the file received. Are you going to drive him right into the base like that? Sure? Are you sure that you did not stuff anything extra there? Will you look through all 1000 lines with your eyes?

I made it easy - right in the editor of the XLS file (in my case - LibreOffice Calc) I used regular expressions to remove everything except numeric values.



As a result, only the digits, the comma separator and the minus sign remained.

Then I made a replacement of the “comma” with a “dot” and, when saved in CSV, I received data of the following type:
11,4667.25,6874573,21336
12,466726, -6874574,21337

Now the data is safe and SQL-friendly.

I spend so much time on this not because of natural tediousness, but because if the data is not loaded because of the existing ones or destroyed because of the injection, then the task will not be completed.

In the words of a well-known expert: "It is better to lose a day, but then fly in five minutes!"

So, the preparatory stage is completed - flew. In a sense, proceed to the assignment in various ways.

Update 2: in the comments put on the idea. The data may be safe, but consist of a meaningless set of numbers, "-" and ",". In this case, the import will not work completely. How do we do:
- first do the run on the test table
- immediately to the worker, but with rollback
?

1. Uploading CSV to phpMyAdmin


If there is phpMyAdmin (or equivalent for the SQL used), then:


2. SQL to web form


If there is no web host of the CSV host, but there is just a host of SQL queries, it would seem that this is the option indicated in the source text:
I will add a column in the excel file, where in all the cells I will insert (stretch) “insert into” and additional columns with commas, I will get an sql script. Immediately plus, even in the assessment do not need.

And no. Well, the truth is, you can still stretch the column with INSERT INTO, but you can stretch the commas. On 1000 lines. And so three times. See-see.

There are two options.

First, you can:


); INSERT INTO … ( 


Secondly, you can not save to CSV, and through the buffer insert the contents of the table in Notepad ++ (after completing p. 0). Then we replace the “tabulation” with a “comma”, hyphenate the line to insert, we rule the beginning and end of the file. Post through the web form.

3. SQL Client


I don’t have a MySQL client for a long time (neither a gamer nor a console client). Yes, and access from outside it will hardly give. Therefore, I upload the file received in step 2 to the server and do it in the console.

 mysql ... < ...sql 

4. PHP script


Of course, the ideal option would be to write a 10-line script that will do fgetcsv (), form an INSERT INTO, and pull it all into the database.

Well, the truth is, the one who gives you this test task at any moment will say, “Oh, but I need the lines for which in the third column are odd integers go to another table” or “and in the fifth column you had to write the cube root of product data values ​​from all 4 columns. "

And he will not have an answer to the question “dude, but why didn’t you make these calculations in Excel?”. All he can say is “not me like this - life is like that”.

By the way, in this script, you can insert a web form with the download of a CSV file, make data disinfection and let the author of the task load everything.

Although, of course, this option is not suitable. This script with the mold will then remain on the site, something will be forgotten about it, and there will be a hole.

Therefore, we solve the task in the same way as in step 2, we just save everything in a php file and instead of

 ); INSERT INTO … ( 

do:

 )",$connect); mysql_query("INSERT INTO … ( 

Well, mysql_connect in the beginning

5. I now have Linux!


After gaining Windows Subsystem for Linux, life started playing with new colors.

Therefore:


 cat test.csv | awk '{ gsub("\r", ""); print "INSERT INTO … ( ... ) VALUES (" $0 ");"; }' | ssh ... mysql .. 

Is done.

* * *

But now there will be a piece due to which this text is not only in the MySQL hub, but also in the Website Development hub.

In addition to the obvious solutions to the problem given earlier, there are 3 more:


No need, because of the desire to prove to yourself and those around you that you still have a cake, do this garbage and be distracted from other tasks. There are cases when a person is not “able”, but simply “obliged” to redelegate the task.

It's time to learn to prioritize.

Source: https://habr.com/ru/post/312834/


All Articles