📜 ⬆️ ⬇️

Quran by searching for duplicates in Google Spreadsheet

Good day, dear readers.

Do you remember Dolly the sheep ? When working with google documents, especially spreadsheet (MS Excel) tables, you had to learn how to use JavaScript to solve many atypical tasks, such as: special formatting for a variety of conditions or removing duplicates, but not all can be done using standard functions, but according to the formulas there are limitations that we will reveal in the next issue. Now we present an educational program to find duplicate links and text in Google Spreadsheet.

image
')
So let's start writing scripts for Google Spreadsheet. If you know several programming languages ​​at least superficially, it will of course be easier for you. In JavaScript, I started writing without problems, but I worked mostly with VBA, Visual basic, C / C ++, C #. As they say, a little bit of everything and everywhere. If knowledge of programming languages ​​is lame, then welcome to our educational program.



Introduction

In order for you to understand the example, we will study the script on the example of removing duplicates in the tables of the google spreadsheet document. We will write from scratch, so we proceed.

Set the task
We set the task - to find duplicates in the text list and delete them, creating a backup copy of the list.
Create a new document and call it “Search for duplicates”, the page is called the same. We will look for duplicates on the 1st column. In order to make it more convenient, we prescribe the name of the columns and fix them. In order to fix a row or column, you need to pull the corresponding bold line, at the same time it moves one row down or one column to the right (Figure 1).

Figure 1. Fixing columns and rows.

Since we usually need to use several columns, and because of the habit we usually remove all unnecessary, we will add them again (columns B: D). If you have a new sheet, then accordingly there are columns and you do not need to add them, and deleting everything that comes after will be welcome. Since cleanliness is a pledge of order.

Create another 2 sheets of the “Working page” on which we will carry out data manipulations. “Copy” - a sheet for a copy of the initial data, in case of incorrect script operation or data loss (Figure 2).


Figure 2. Creating 2 sheets.

Start writing code. Click in the top drop-down menu Tools -> Script Editor (Figure 3).

Figure 3. Opening the script editor in Google Spreadsheet.

Before us in a new window will open the script editor with options for creating scripts
(Figure 4)

Figure 4. Script creation options in the Script Editor.

If you select any of the items on the left (creating the appropriate script for Disk, Mail, etc.), you will see an introductory script with detailed explanations of how to work with this or that script, But these are only indicative scripts. But we will select the “Empty Project” and see only the beginning of all the scripts.
function myFunction(){} 
.

The standard name of the project is “Project without a name”, but I changed the name by clicking on these words (the top line in the figure) and called it “Finding duplicates” (Figure 5). This is so that later in the presence of 10 or more scripts, it was possible to distinguish between them and without much effort to find the script we need.

Figure 5. “Empty project” with the first standard lines of code and the modified name

In braces and we will write our script.

We will not go deep into the study of JavaScript, but in order to understand even a beginner, in the course of writing the code, I will comment on what's going on in detail.
We write the algorithm
Before we start writing any automation script (task), we need to draw an algorithm for ourselves. And even if it may seem like a delusion and waste of time to many people, you may find these actions superfluous, but this is a very important stage of work. When working with large projects, this is very important, as you can get confused in 3 pines - the algorithm solves this problem. This applies not only to JavaScript, this applies to absolutely any language, and indeed any action. For understanding, we give an example based on getting a cup of tea (Figure 6).


Figure 6. Action algorithm for obtaining a cup of tea (example).

I hope you made a cup :) and in a good mood you read further.

Let us draw an algorithm for our duplicate search script (Figure 7), it will be more detailed than an example with a cup of tea.

Figure 7. Algorithm for duplicate search task

I will bring in a little explanation, this is the final form of the algorithm, so there are 2 blocks in it that may not be immediately clear to the average user.
Block 2 deals with the removal of content and comments from a page, on a page that has just been created and contains nothing at all. This block is needed when reusing the script, so we immediately put this function in the script. This will be described in more detail below.
Block 5 - array creation and conversion, this will be discussed in detail below. Now I will say that the created array will be the main element of data processing.
Codin
We declare variables for our sheets so that the program knows how to access them.

To declare a variable in the current scope, use the var keyword. Moreover, a feature of the javascript language, as well as many others, such as php, is that you do not need to specifically declare the type of the variable (as in C #: the number is declared integer; the string is string, etc.).
In order to describe the lines of code inside the code itself, we will use a comment in the form of text that begins with two straight slashes “//”. In different programming languages, comment tags look different and represent a sequence of some characters, here are a few of them: (see Table 1.)


Table 1. Comparative table of comment tags in some languages.

When writing programs in the editors of languages, or when applying the correct syntax, for example, in the program notepad ++, the text of the program will be highlighted with the corresponding colors, which are prescribed for this language by default. The color of the text of comments, for example, will be green for most languages ​​by default, but in fact, coding programs usually have the ability to customize the highlight, although not everywhere. In google js, the color of the comment will be brown.

Figure 8. Illustration of code syntax highlighting.

Let's declare three variables for our sheets:
the variable sheet_work_page for the page with the title “Working page”, here we will place the text for the search and get the result after the work.
Variable sheet_find_dubles for the page with the name “Search for duplicates”, on this page will be processed duplicates.
The variable sheet_copy for a page called “Copy” creates a copy of the data before starting processing, since you never know when the original will be needed.

 var sheet_work_page = SpreadsheetApp.getActiveSpreadsheet().getSheetByName(' '); var sheet_find_dubles = SpreadsheetApp.getActiveSpreadsheet().getSheetByName(' '); var sheet_copy = SpreadsheetApp.getActiveSpreadsheet().getSheetByName(''); </sourse>     . var i,Page_Last_Row,k,archurls;  i, k -  ,        ,      .  Page_Last_Row -   ,  ,    .  archurls -      .             “ ”  “”,        .        ,       . : ,    1000 ,     20 ,    ,     1000 ,   20.       :  ,   delete,       .   , ,    delete,  .    :    ,   Shift + Space()     ,  Shift+Ctrl+Arrow_Down ( ) (    )     " ".      . <img src="http://habrastorage.org/storage3/c53/628/51b/c5362851ba5f57dbdaa809c5f0d75dc7.jpg"/>  9   .           .     , ,     ,        ,        -   .         “A2:D   ”    -    “ ”.     A2       A1       .        .   A2:D      (2,1,  ,4)     ,       . <a href=”http://habrahabr.ru/post/157933/”>  </a> <a href=”http://webhostingw.com/google-spreadsheet-formulas/”>  </a>    ,         ,    4 . <source lang=”javascript”> sheet_work_page.getRange("A2:D" + sheet_find_dubles.getLastRow()).clearContent(); sheet_work_page.getRange("A2:D" + sheet_find_dubles.getLastRow()).clearComment(); sheet_copy.getRange("A2:D" + sheet_find_dubles.getLastRow()).clearContent(); sheet_copy.getRange("A2:D" + sheet_find_dubles.getLastRow()).clearComment(); 

We describe in more detail:
sheet_work_page is the symbolic name we assigned to the “Working Page” sheet.
getRange (first row, first column, last row, last column) - specifying the range to perform further actions.
clearContent () - deletes the content in the cells specified in the range.
clearComment () - deletes notes in the cells specified in the range.

Create a datatocopy variable and select the range of cells with A2 data: D the last line in the “Search for duplicates” sheet and copy to the “Working page” and “Copy” pages.
The peculiarity of this operation is that you can paste a copied range only when the corresponding range is selected.
 var datatocopy = sheet_find_dubles.getRange("A2:D" + sheet_find_dubles.getLastRow()).getValues(); sheet_work_page.getRange(2,1,sheet_find_dubles.getLastRow()-1,4).setValues(datatocopy); var datatocopy1 = sheet_find_dubles.getRange("A2:D" + sheet_find_dubles.getLastRow()).getValues(); sheet_copy.getRange(2,1,sheet_find_dubles.getLastRow()-1,4).setValues(datatocopy1); 

variable Page_Last_Row assign the value of the last row of the page "Search for duplicates"
 Page_Last_Row = sheet_find_dubles.getLastRow(); 

In the previously announced archurls array, we assign all the data from the first column of the “Search for duplicates” sheet.
 archurls = sheet_find_dubles.getRange(2, 1, Page_Last_Row-1, 1).getValues(); 

and convert 2-dimensional array into one-dimensional. This is necessary in order to convert all the entries in the column into characters, since we search by character.
 for (i=0; i<archurls.length; i++) //convert 2d array into 1d { archurls[i] = archurls[i][0]; } 

In order to better understand the mechanism of action and make a correct presentation, consider the screenshot “What does the data look like in the archurls array” Figure 8

Figure 8 What the array looks like with values ​​from the inside.

The variable i, which in this case will be used for the ordinal number of the checked row of the “Working page” sheet, is assigned the value of the 2nd row, since the first is the column names, i = 2;

And we proceed to writing the main loop for finding duplicates. We declare a while loop (the cycle whose condition is checked before the start of operations of the whole cycle) and in it we iterate line by line from the “Working page” sheet, comparing with the values ​​of the “Duplicate Search” sheet.
 while (i <= sheet_work_page.getLastRow()) { k = archurls.indexOf(sheet_work_page.getRange(i, 1).getValue()); if (k >= 0) { if ((k+2)==i) { sheet_work_page.getRange(i, 4).setComment('    ' + (k+1) + ' '); i++; continue; } else { sheet_work_page.getRange("A" + i + ":C" + i).clearContent(); i++; delete_count++; continue; } } } 

As a result, on the working page we get a list without 100% duplicates.
Note that if the link A contains Cyrillic alphabet but visually it is similar to the link B written in Latin, then these links will be different.

Sort if there is such a need.
Description: Sheet.sort (column number, true / false) true - from A to Z, false - from Z to A
 sheet_work_page.sort(3, true); 

In order for us not to have to enter the editor every time to launch the script, we will display a separate button on the menu bar.
The function code for the button is simple.
 function onOpen() { SpreadsheetApp.getActiveSpreadsheet().addMenu(" ?", [{name: " !", functionName: "check_duplicates_one_sheet"}]); } 

Add a menu with the name “Digging?”, Specify the name of the submenu “Digging!” And assign the button to perform the function of the one we wrote above: check_duplicates_one_sheet (Figure 9)


Figure 9

You can go to our document described in this article.
Addition:
“And if you just need to remove duplicates from column A, quickly and without nonsense?”, You ask.
Well, let's write a faster script.

 function removeDuplicates() { //  . var sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("     "); //     . var data = sheet.getDataRange().getValues(); //    . var newdata = new Array(); //   for(i in data){ var row = data[i]; var duplicate = false; for(j in newdata){ if(row[0] == newdata[j][0]){ duplicate = true; } } //  if(!duplicate){ newdata.push(row); } } //      sheet.clearContents(); //      sheet.getRange(1, 1, newData.length, newData[0].length).setValues(newData); //   sheet.sort(1, true); } 


As a result, we have a high-speed script to remove duplicates.
Proven on 400,000 entries.

See you again! Your humble servant.

You can also visit our previous articles:
Talmud formulas in Google SpreadSheet
Sending letters to Google Docs (Drive)
wait for our next issues.

Source: https://habr.com/ru/post/198898/


All Articles