A note to the reader here – I wanted to do more than just repeat Martin’s presentation parrot-fashion, so I have extrapolated by using my own quotes which I am sure Martin would NOT always be happy to condone or say. The SEO Learning bits are ME, not Martin. Martin is welcome to correct me, but may prefer not to comment on my take.
Martin points out that what the user sees is not about the code, but about what the Browser interprets the code. Google wants to see what the user sees. So this means parsing .js first (where possible) and then looking at the DOM after the .js is parsed.
SEO Learning: Stop using the “viewsource”options in your browser to look at the underlying code… you will probably not get an accurate view of what users (or search engines) see. Better is to use the developer tools in Chrome or Firefox.Dixon
Undestanding server-side and client-side rendering
Anyone can use their Pagespeed insights tool. Take their own Japan homepage, which loads in a very impressive 1 second:
Pulling in raw HTML is a very different challenge from pulling in “rendered” HTML, as the rendering is done after the HTML is called by a browser. So this increases the load time. At scale, this has big issues for web crawlers.
SEO Learning: If a search engine wanted to crawl and render a trillion URLs a day at that 1 page per second, let’s do some possibly dubious maths: 1 trillion seconds = 2,777,778 hours. There are only 24 hours in a day, so you would need 115,741 headless chrome instances open 24/7 to render all this. When you note that this was a fast example, you can imagine that you would need more than this… so a search engine cannot scale .js rendering in this way. Martin put up a diagram or two, which I am recreating partly from memory.Dixon
Back to my take on the presentation. Google has a thing called their “Web Rendering Service” (WRS). After the initial crawl, the page goes into the rendering queue, then the rendered DOM goes back into the processing path.
He couldn’t go too much into that but suggested Puppeteer as an equivalent technology that he could talk about. They work like this:
Task 1: Decide on Viewport settings
When rendering your page, they first need to decide what screen size should their system simulate. Essentially, this is the difference between indexing as if you were looking at the site on a mobile versus looking at the site on a desktop.
Task 2: Decide on Timeout or abort options
If Page A takes 2 seconds longer to render than Page B, does their Web Rendering Service make every page wait 2 seconds longer? this is not an easy question to answer and is part of Google’s secret sauce. One thing they do – and this is very important for SEOs – is cache as much as they can (in parallel to the WRS). They use what they call a “Service Wrapper” for this. When you use testing tools you might access the WRS, but you will not see what is cached. When you are testing, you obviously want to test the live version, not an old cache, but Google needs to be more efficient at scale. They don’t want to do “live”… they want to do “fast”.
SEO Learning: Just because Google has recrawled and indexed your page after a rewrite, it might not have rendered all of the .js files correctly. That may happen later after they have updated their “wrapper”.Dixon
There is also a “Pool manager” with a bunch of headless browsers running 24/7. They do not install service workers on demand. John Mueller had already said at this conference that the .js queue time was a median length of 5 seconds but might be minutes for the 90th percentile and above of all sites. This is a massive improvement on the Matt Cutts’ era, but it is not the whole story…
SEO Learning: Google’s .js rendering queue time is NOT the same as the Render time! Infact, the render time is a bit of a misnomer, because of the “service wrapper” which may have done some of the work weeks ago (maybe on an old version of a .js file) and is yet to render the latest version of another .js file.Dixon
Tip 1: Simple Clean Caching
Martin suggested using content hashing on your files so that Google could easily see when a .js file has been updated or changed. For example, you can use an open-source library like Webpack to do this.
Tip 2: Reduce Resource Requests
For example, make the api requests serverside, then make just one request browser-side. Also, you can create one minified request to import all your apps. However – this tip came with a rather important caveat: Avoid one large bundle that needs to be sent on every page load. One single change means that the content hash changes page by page, so you lose the caching benefit. Translated – make sure not to bundle apps that change between pages.
Tip 4: Use the Tools
There was also a tip about using “lodash” and when using gzip, only sending the functions you need for the page to load in the gzip file, otherwise Google has to unpack the whole file to reread code it already had happily cached. Lodash is all a little above my paygrade, so again I leave this as a note for cleverer people to take on board.
Bonus Tip: SEO and Ajax
Although this was not a topic that Martin dwelt on in Zurich, John Mueller (who was also at the conference) has covered the most important issue in a video.
Ajax has been a problem for SEOs for many years. In particular, Ajax has a tendency to use hashtags (#) in URL structures. Anything after a # symbol in a URL is not strictly part of the main URLs so in the past, search engines stopped and only indexed the URL before the # symbol.
Luckily, now that Google renders the page before indexuing the content, this is no longer an issue for SEOs as the video explains.
Some Final Thoughts