⬆️ ⬇️

The world's first search engine: Historical excursion

In the early years of the Internet era, millions of files were stored on thousands of anonymous FTP sites. In this variety it was quite difficult for users to find a program suitable for solving their problem.



Moreover, they did not know in advance whether the desired tool existed. Therefore, we had to manually browse FTP-storage, the structure of which was significantly different. It is this problem that led to the emergence of one of the key aspects of the modern world - the Internet search.



/ photo by mariana abasolo CC

')

History of creation



It is believed that the creator of the first search engine made by Alan Emtage (Alan Emtage). In 1989, he worked at McGill University in Montreal, where he moved from his native Barbados. One of his tasks as an administrator of the University Faculty of Information Technology was to find programs for students and teachers. To facilitate his work and save time, Alan wrote the code that performed the search for him.



“Instead of spending my time fermenting on FTP sites and trying to figure out what was on them, I wrote scripts that did it for me,” says Alan, “and did it quickly.”



Amtage wrote a simple script that automates the task of embedding into listings on FTP servers, which were then copied to local files. These files were used to quickly search for the necessary information using a standard Unix grep command. Thus, Alan created the world's first search engine, called Archie - an abbreviation for the word Archive (Archive).



Archie was able to search 2.1 million files on more than a thousand sites around the world in minutes. The user was required to enter a topic, and the system provided a report on the location of files whose names coincided with keywords.



The solution was so successful that in 1990 Amtage and his partner Peter Deutsch (Peter Deutsch) founded Bunyip, intending to bring a more powerful commercial version of Archie to the market. We can say that it was the first Internet startup in history, since Bunyip sold Internet service.



“It all started with thirty visits a day, then there were thirty requests an hour, then a minute,” says Peter. “Traffic continued to grow, so we began work on scaling mechanisms.”



The team decided to bring the listings to a more efficient presentation. The data was divided into separate databases: in one of them textual file names were stored, in the other - records with links to the hierarchical directories of the hosts. There was a third base connecting the other two to each other. The search was performed element by element by file name.



Over time, other improvements were implemented. For example, the database was again changed - it was replaced by a database based on the theory of compressed trees. The new version formed a text database instead of a list of file names and worked much faster than the previous ones. Also made minor improvements allowed Archie to index the web pages.



Unfortunately, the work on Archie was stopped, and the search engine revolution was postponed. Amtage and his partners disagreed on future investments, and in 1996 he decided to leave. After that, the client Bunyip worked for another year, and then became part of Mediapolis, a New York-based web design company. At the same time, patents for all the developed technologies were never obtained.



“But I gained a wonderful experience: traveled the world, participated in conferences and met people who shaped the modern Internet,” recalls Alan. As a member of the Internet Society, he managed to work with people like Tim Berners-Lee, Vinton Cerf, and John Postel.



Left a mark



Yet Archie managed to influence the development of the WWW. In particular, the emergence of a standard of exceptions for robots. The tool was used to inform robots about which parts of the server cannot be accessed. This was done using the robots.txt file, which could be accessed via HTTP.



It contained one or more lines containing information in the following format :



<>:< ><>< > 


The <field> record could have two values: User-agent or Disallow. User-agent specified the name of the robot for which the policy was described, and Disallow defined the sections to which access was closed.



For example, a file with such information prohibits all robots from accessing any URLs with / cyberworld / map / or / tmp /, or /foo.html:



 # robots.txt for http://www.example.com/ User-agent: * Disallow: /cyberworld/map/ # This is an infinite virtual URL space Disallow: /tmp/ # these will soon disappear Disallow: /foo.html 


This example closes access to / cyberworld / map for all robots except cybermapper:



 # robots.txt for http://www.example.com/ User-agent: * Disallow: /cyberworld/map/ # This is an infinite virtual URL space # Cybermapper knows where to go. User-agent: cybermapper Disallow: 


This file will “deploy” all robots that will try to access information on the site:



 # go away User-agent: * Disallow: / 


The immortal archie



Created almost three decades ago, Archie has not received any updates all this time. And he offered a completely different experience with the Internet. But even today you can find the information you need. One of the places that still host the search engine Archie , is the University of Warsaw. True, most of the files found by the service date back to 2001.





Despite the fact that Archie is a primitive search engine, it still offers several functions to customize the search. In addition to the ability to specify a database (anonymous FTP or Polish web index), the system offers to choose interpretation options for the entered string: as a substring, as a literal search, or as a regular expression. You even have access to options for register selection and three options for changing the options for displaying results: keywords, description, or links.





There are also several optional search options that allow you to more accurately determine the necessary files. It is possible to add the service words OR and AND, limit the file search area to a specific path or domain (.com, .edu, .org, etc.), and also set the maximum number of output results.



Although Archie is a very old search engine, it still provides some pretty powerful functionality when searching for the right file. However, compared to modern search engines, it is extremely primitive. “Search engines” have gone far ahead; it is enough just to start entering the desired query, as the system already offers search options. Not to mention the machine learning algorithms used.



Our posts: Machine learning


Deep learning: opportunities, perspectives and a bit of history





Deep learning and brain work: When will technological singularity





Transformation of a virtual infrastructure, or How the IaaS cloud influences the development of new trends



Today, machine learning is one of the main parts of search engines such as Google or Yandex. An example of using this technology is search ranking: contextual ranking, personalized ranking, etc. Learning to Rank ( LTR ) systems are used very often.



Machine learning also allows you to “understand” user input requests. The site independently corrects the spelling, handles synonyms, resolves issues of ambiguity (what the user wanted to find, information about the Eagles group or about eagles). Search engines independently learn to classify sites by URL - blog, news resource, forum, etc., as well as the users themselves to compose a personalized search.



Great-grandfather search engines



Archie spawned search engines such as Google, because to some extent it can be considered the great-great-grandfather of search engines. That was almost thirty years ago. Today, the search engine industry earns about $ 780 billion annually.



As for Alan Amtage, when he is asked about the missed opportunity to get rich, he responds with a modesty. “Of course, I would like to get rich, ” he says . “However, even with registered patents, I could not become a billionaire.” It is too easy to make inaccuracies in the description. Sometimes not the one who was first wins, but the one who became the best. ”



Google and other companies were not the first, but they outdid their competitors, which allowed them to establish a multi-billion dollar industry.



PS Our digest of practical materials on working with IaaS .

Source: https://habr.com/ru/post/323946/



All Articles