Making Angular.js a website accessible to robots. Part 1

Hello, dear habrovchane.

Many of you, I am sure, have encountered a problem more than once when a website that works entirely on JS, for example, a bunch of Angular.js + UI-router, is available for robots from the outside (Google indexer, Facebook crawler or Twitter). in the very wrong way.

In a few articles I will try to tell you how, with the help of simple tricks, you can still reject “non-SEO-friendly” as a reason not to develop, taking into account trends in the world of web development.
')
In this part, I will try to describe how I fought Facebook and Twitter crawlers, so that my links in posts on these social networks looked attractive and people stopped by.

If this topic is interesting to you, please under cat.

Of course, you saw big and eye-catching messages in your Facebook feed, and you really wanted to. Like these:

The unfriendliness of SEO to JavaScript often pushes developers away from developing a site that uses the full power of JavaScript as it is. Using the HTML5 History API, following links without reloading the page, and so on, and so on, remains only for web applications.

It is a pity, but there is such a tendency that sites tied to JavaScript are not suitable for resources that depend on SEO quite strongly, such as: online stores, news, information resources, and so on.

Why it happens?
Because robots do not execute javascript .
That is, by feeding the link to the page on our website to the Facebook crawler, we will get the same result as for the index.html itself, which lies at the root.

Suppose you, like me, decided to create on Angular.js using ui-router something like a blog. Suppose, in addition, we want only the browser to know that your site is navigated using AJAX and enable the so-called HTML5 Mode for ui-router . What would index.html look like in that case? Yes something like that:

 <!doctype html> <html class="no-js"> <head> <meta charset="utf-8"> <base href="/"> <title></title> <!--   --> </head> <body ng-app="app"> <div ui-view=""></div> <!--      javascript  --> </body> </html>

To work with the HTML5 History API, you need to specify this when configuring the main application module:

 angular.module('app', [ ... 'ui.router', ... ]) .config(function($locationProvider) { ... $locationProvider.html5Mode(true); ... })

Getting started.

For debugging, we can use ready-made tools:

We are trying to feed any Facebook page to a debugger and we understand that after all yes, JS is not executed.
How to be?

The first thing that comes to mind is a redirect. Social network crawlers have their User-Agents, which we will be able to use to redirect to _cooking_ page. Those. we need to render the page not at the client, but on the server, and not all, but only those that are important for us, in the case of a blog - these are posts. In this case, PHP will help us.

To do this, we will write a kind of proxy, call it crawler_proxy.php

 <?php $SITE_ROOT = "http://example.com/"; $jsonData = getData($SITE_ROOT); makePage($jsonData, $SITE_ROOT); function getData($siteRoot) { $id = ctype_digit($_GET['id']) ? $_GET['id'] : 1; $rawData = file_get_contents($siteRoot.'blog/api/get_post/?id='.$id); return json_decode($rawData); } function makePage($data, $siteRoot) { ?> <!DOCTYPE html> <html> <head> <meta property="og:title" content="<?php echo $data->post->title; ?>" /> <meta property="og:description" content="<?php echo strip_tags($data->post->excerpt); ?>" /> <meta property="og:image" content="<?php echo $data->post->attachments[0]->images->full->url; ?>" /> <meta property="og:site_name" content="My Blog"/> <meta property="og:url" content="http://example.com/posts/<?php echo $data->post->id ?>" /> <meta property="og:type" content="article"/> <!-- etc. --> </head> <body> <h1><?php echo $data->post->title; ?></h1> <p><?php echo $data->post->content; ?></p> <img src=""> </body> </html> <?php } ?>

Meta tags are written in the body of the head tag with the Open Graph 'a property.

Open Graph is a protocol, a set of rules by which a social network can build a social graph, determine who the author of the article is, and so on. You can read more on the website of Open Graph itself.

So, in my case, the web server is Apache, so we can simply list the User-Agents and send along with the ID of the post to our proxy, which will return us to the “dry” page.

In the configuration we write the following:

 # ------------------------------------------------------------------------------ # | Redirect crawlers to PHP proxy | # ------------------------------------------------------------------------------ <IfModule mod_rewrite.c> RewriteEngine On # redirect crawlers RewriteCond %{HTTP_USER_AGENT} (Facebot|Google.*snippet|Twitterbot|facebookexternalhit/*) RewriteRule posts/(\d+)$ /crawler_proxy.php?id=$1 [L] </IfModule>

So, after this, try again our link with the post to send the debugger to Facebook.

You can see any warnings and fix them, or, if everything went like clockwork, see how your link will look like on Facebook posts.

In the case of Twitter, it’s just necessary to describe additional meta tags. Their list can be found on the page of the validator: Twitter Card validator .

If the topic turns out to be interesting - in the next part I will talk about indexing a site written in JavaScript.

Source: https://habr.com/ru/post/235395/

All Articles

Making Angular.js a website accessible to robots. Part 1

More articles: