Deliver search-friendly JavaScript-powered websites (Google I/O ’18)

Deliver search-friendly JavaScript-powered websites (Google I/O ’18)


[MUSIC PLAYING] TOM GREENAWAY: Good
morning, everyone. My name is Tom Greenaway,
and I’m a partner developer advocate from Google
Sydney with a focus on the indexability of
progressive web applications. JOHN MUELLER: Hi, everyone. I’m John Mueller. I’m a webmaster trends analyst
from Zurich in Switzerland. It’s great to see so many of you
here, even at this early hour. TOM GREENAWAY: Now,
as you can imagine, John and I have a
lot of experience with the work web
developers must do to ensure that
websites are indexable, which is another way of
saying whether a web page can be found and understood
by search engines. But do search engines see all
web pages exactly the same way? Are some pages more
complex than others, and what about modern
JavaScript powered web sites? Today, we’ll be
taking a closer look into what it takes for a modern
JavaScript powered website to be properly indexed by
search crawlers, and especially Google Search. And I’m excited to tell
you that in this talk, we’re announcing a
bunch of cool new stuff, including a new change
to Google Search policy, a new approach for rendering
HTML 2 search crawlers, and even a new Google
Search console tool. It sounds like a
lot of stuff, right? Well, that’s because it
is, so let’s get started. Now, a long time ago,
before I joined Google, I was building e-commerce
sites, and I personally felt there was a lot of mystery
at times behind Google Search, especially on the
topic of indexability. I would wonder, why do some
of my pages appear in Google Search, and some don’t? And what’s the
difference between them? Will JavaScript be
rendered correctly? Will JavaScript rendered content
appear properly and be indexed? And is lazy loading
an image safe to do? These are really
critical questions, and as developers ourselves,
we understand the frustration behind this mystery. So today, John and I are
going to do something we very rarely do at Google. We’re going to pull back
the curtain a little bit and reveal some new
pieces of information about how Google Search
sees the web and indexes it. And with this knowledge
and a few new tools, you’ll have concrete
steps you can take to ensure the
JavaScript powered websites you’re building are
visible to Google Search. Now, I want to remind
you that this talk is about modern JavaScript
powered websites, and typically,
these websites will be powered by a JavaScript
framework, such as Angular, Polymer, React, or Vue.js. And who doesn’t love a great
web development framework that’s easy to use and helps you build
your sites faster and works great for your users? But it’s important to recognize
that some of these frameworks use a single page
app configuration model, meaning they use
a single HTML file that pulls in a bunch of JavaScript. And that can make a
lot of stuff simpler, but if you don’t watch out,
JavaScript powered websites can be a problem
for search engines. So let’s take a
quick look at what the default template for a new
Angular project looks like. As you can see, the
default project template is pretty basic. It shows you how to use Angular
to render a header, an image, and a few links. Nice and simple. How could this
possibly be a problem from an indexability
perspective? Well, let’s take a peek
behind the scenes at the HTML. This is it. Take a good look. When viewed in the browser,
the default sample project had text, imagery, and links,
but you wouldn’t know that from looking at this
initial HTML that’s been delivered from the
server, now, would you? The initial HTML
that’s been sent down is actually completely
devoid of any content. See here in the app root– that’s all there is in
the body of the page, except for some script tags. So some search
engines might assume that there’s actually nothing
here to index, and to be clear, Angular isn’t the only
web framework that serves an empty response on
its initial server side render. Polymer, React, and Vue.js
have similar issues by default. So what does this mean for the
indexability of our websites from the perspective
of Google Search? Well, to answer that
question better, we’ll take a little
step back and talk about the web in general,
why search engines exists, and why search
crawlers are necessary. Perhaps a good question to start
with is, how big is the web? Well, we can tell you
that we’ve actually found over 130 trillion
documents on the web. So in other words,
it’s really big. And as you know, the aim
of all search engines, including Google, is to provide
a list of relevant search results based on a
user’s search query. And to make that
mapping of user queries to search results
fast and accurate, we need an index similar to the
catalog of a gigantic library. And given the size of the web,
that’s a really complex task. And so to build this index
to power our search engine, we need another tool– a search crawler. And traditionally,
a search crawler was basically just a computer
and a piece of software that performed two key steps. One, it aims to find a piece
of content to be crawled, and to do this, the content
must be retrievable via URL. And once we have a URL,
we get its content, and we sift through the
HTML to index the page and find new links
to crawl, as well. And thus, the cycle repeats. So let’s look at that
first step, the crawling, and break it down. Oh, and yes, as an
Australian, I felt it was imperative that I
include some spiders in my talk. So this is the cutest
possible one I could find. John, what do you think? No, you’re not convinced? OK, well, I have a
few more in the deck, so maybe you’ll come around. So to ensure the
crawling is possible, there are some key
things to keep in mind. Firstly, we need
URLs to be reachable, as in, there shouldn’t be any
issue when the crawler wants to request the web
pages and retrieve the resources necessary for
indexing them from your web server. And secondly, if there are
multiple documents that contain the same
content, we need a way to identify
the original source. Otherwise, it could
be interpreted as duplicate content. And finally, we also
want our web pages to have clean, unique URLs. Originally, this was pretty
straightforward on the web, but then the first
single page apps made things a bit
more complicated. So let’s go to each
of these concepts. First, for the
reachability of URLs, there’s a simple,
standard way to help search engines find content that
you’re probably familiar with. You add a plain text
file called robots.text to the top level domain
of your site, which specifies which URLs to
crawl and which to ignore. And I say URLs, because these
rules can prevent JavaScript from being crawled, too, which
could affect your indexability. And this example also gives
us a link to a sitemap. A sitemap helps
crawlers by providing a recommended set of URLs to
crawl initially for a site. And to be clear,
there’s no guarantee these URLs will get crawled. They’re just one of the
signals that search crawlers will consider. OK, but now, let’s talk
about that duplicate content scenario, and how
search crawlers deal with this situation. Sometimes, websites
want multiple pages to have the same content, right? Even if it’s a
different website. For example, bloggers
will publish articles on their website and cross-post
to services like Medium to increase the reach
of their content, and this is called
content syndication. But it’s important
for search crawlers to understand which URL
you prefer to have indexed. So the canonical metadata
syntax shown here in the HTML allows the duplicate documents
to communicate to crawlers where the original,
authoritative source for the content lives. We call that source document
the canonical document. And traditionally, URLs for the
web started out quite simple– just a URL that was fetched
from a server with some HTML. But then, of course,
AJAX came along and just changed everything. Suddenly, websites could
execute JavaScript, which could fetch new
content from the server without reloading
the browser page. But developers
still wanted a way to support back and forth
browser navigation and history, as well. So a trick was invented, which
leveraged something called the fragment identified, and
its purpose is for deep linking into the sub-content of
a page, like a subsection of an encyclopedia article. And because fragment
identifiers were supported by browsers for
history and navigation, this meant developers could
trick the browser into fetching new content dynamically, without
reloading the browser page, and yet also support the
history and the navigation we love about the web. But we realized that using the
fragment identifier for two purposes– subsections on
pages, and also deep linking into content– it wasn’t very elegant. So we moved away from that. And instead,
another approach was proposed– to use the
fragment identifier, followed by an exclamation mark,
which we call the hashbang. And this way, we could
discern the difference between a traditional URL
using the fragment identifier for the sub-content on a
page, versus a fragment identifier being used by
JavaScript to deep link into a page. And this technique was
recommended for a while. However, nowadays, there
is a modern JavaScript API that makes these old
techniques less necessary, and it’s called the History API. And it’s great, because it
enables managing the history state of the URL
without requiring complete reloads of the
browser all through JavaScript. So we get the best
of both worlds– dynamically fetched content
with clean, traditional URLs. And I can tell you that
from Google’s perspective, we no longer index that
single hash work around, and we discourage the use of
the hashbang trick, as well. OK, well, that’s
crawling out of the way. Now, let’s move on
to the indexing step. So web crawlers
ideally want to be able to find all the
content on your website. If the crawlers can’t
see some content, then how are they
going to index it? And the core content of the page
includes all the text, imagery, video, and even hidden elements,
like structured metadata. In other words, it’s
the HTML of the page. But don’t forget about that
content you dynamically fetched, either. This could be worth indexing,
as well, such as Facebook or discuss comments. Crawlers want to see this
embedded content, too. And also, this might
seem really obvious, but I want to emphasize
that at Google, we take HTTP codes pretty
seriously, especially 404 not found codes. If crawlers find a page
that has a 404 status code, then they probably won’t
even bother indexing it. And lastly, of course,
a crawler wants to find all the links
on a page, as well, because these links allow the
crawlers to crawl further. So now, let’s just talk
a bit about those links quickly, because
honestly, they’re some of the most important
parts of the web. How do search crawlers
like Google find links? Well, I can’t speak for
all search crawlers, but I can say that at Google,
we only analyze one thing– anchor tags with HREF
attributes, and that’s it. For example, this span
here that I’ve just added– it won’t get crawled,
because it’s not an anchor. And this additional
span I’ve added, even though it’s an anchor, it
doesn’t have an HREF attribute. But if you are using JavaScript,
such as with the History API that I mentioned earlier,
to navigate the page purely on the client
and fetching new content dynamically, you
can do that, so long as you use the anchor
tags with HREF attributes like in this last example. Because most search
crawlers, including Google, will not simulate navigation
of a page to find links. Only the anchor tags will
be followed for linking. But wait– is that
really everything? In order to have sifted through
the HTML to index the page, we needed to have the
HTML in the first place. And in the early
days of the web, the server likely gave us all
the HTML that was necessary. But nowadays, that’s
not really the case. So let’s insert a step
between crawling and indexing, because we need to recognize
that the search crawlers, themselves, might need to
take on this rendering task, as well. Otherwise, how will
the search crawler understand the modern
JavaScript powered websites we’re building? Because these sites are
rendering their HTML in the browser itself, using
JavaScript and templating frameworks, just like
that Angular sample I showed you earlier. So when I say rendering, I
don’t mean drawing pixels to the screen. I’m talking about the actual
construction of the HTML itself. And ultimately,
this can only ever happen on either the
server or on the client, or a combination of
the two could be used, and we call that
hybrid rendering. Now, if it’s all
pre-rendered on the server, then a search engine could just
index that HTML immediately. But if it’s rendered
on the client, then things get a little
bit trickier, right? And so that’s going
to be the challenge that we’ll be discussing today. But one last turn– you might be wondering, what is
Google Search’s crawler called? Well, we call it Googlebot, and
we’ll be referring to it a lot in this tour. And I think another
detail to note is that I said that a
search crawler is basically just a computer with some
software running on it. Well, obviously, maybe in
the ’90s, that was the case. But nowadays, due to just
the sheer size of the web, Googlebot is comprised of
thousands of machines running all this distributed software
that’s constantly crunching data to understand all of
this continuously expanding information on the web. And to be honest, I think we
sometimes take for granted just how incredible
Google Search really is. For example, I recently learned
that with the Knowledge Graph, which is a database of all the
information we have on the web, it actually maps out how
more than 1 billion things in the real world are connected
and over 70 billion facts between them. It’s kind of amazing. OK. Well, now that we know the
principles of a search crawler, let’s see how these three
different key steps– crawling, rendering, and
indexing– all connect. Because one crucial
thing to understand is the cycle of
how Googlebot works or how it should ideally work. As you can see, we want these
three steps to hand over to one another instantly. And as soon as the
content is fully rendered, we want to index it to
keep the Google Search index as fresh as possible. This sounds simple, right? Well, it would be if
all the content was rendered on the server and
complete when we crawl it. But as you know, if a site
uses client side rendering, then that’s not
going to be the case, just like that Angular
sample I showed you earlier. So what does Googlebot
do in this situation? Well, Googlebot includes
its own renderer, which is able to run
when it encounters pages with JavaScript. But rendering pages at
the scale of the web requires a lot of time and
computational resources. And make no mistake– this is a serious challenge
for such crawlers, Googlebot included. And so we come to
the important truth about Google Search we would
like to share with you today, which is that currently, the
rendering of JavaScript powered websites in Google
Search is actually deferred until Googlebot has the
resources available to process that content. Now, you might be thinking, OK. Well, what does
that really mean? Well, I’ll show you. In reality, Googlebot’s
process looks a bit different. We crawl a page, we fetch the
server side rendered content, and then we run some initial
indexing on that document. But rendering the
JavaScript powered web pages takes processing
power and memory, and while Googlebot is
very, very powerful, it doesn’t have
infinite resources. So if the page has
JavaScript in it, the rendering is
actually deferred until we have the resources
ready to render the client side content. And then we index
the content further. So Googlebot might index a page
before rendering is complete, and the final
render can actually arrive several days later. And when that final
render does arrive, then we perform
another way of indexing on that client side
rendered content. And this effectively
means that if your site is using a heavy amount of client
side JavaScript for rendering, you could be tripped up at
times when your content is being indexed due to the nature
of this two-phase indexing process. And so ultimately,
what I’m really trying to say is because
Googlebot actually runs two waves of indexing
across your content, it’s possible some
details might be missed. For example, if your site is
a Progressive Web Application, and you’ve built it around
the single page app model, then it’s likely
all your unique URLs share some base
template of resources, which are then filled
in with content by AJAX or fetch requests. And if that’s the
case, consider this– did the initially server side
rendered version of the page have the correct canonical
URL included in it? Because if you’re
relying on that to be rendered by the client,
then we’ll actually completely miss it, because that
second wave of indexing doesn’t check for the
canonical tag at all. Additionally, if
the user requested a URL that doesn’t
exist, and you attempt to use JavaScript
to send the user a 404 page, then we’re actually
going to miss that, too. Now, John will talk more about
these issues later in the talk, but the important thing
to take away right now is that these really
aren’t minor issues. These are real issues that
could affect your indexability, metadata, canonical
tags, HTTP codes. As I mentioned at the
beginning of this talk, these are all really key to
how search crawlers understand the content on your web pages. However, just to be clear,
not all web pages on a website necessarily need to be indexed. For example, actually on the
Google I/O schedule website, there is a listing and filter
interface for the sessions, and we want search crawlers
to find the individual session pages. But we discovered the
client side rendered deep links weren’t being indexed
because the canonical tags were rendered in the
client, and the URLs were fragment identifier based. So we implemented a new template
with clean URLs and server side rendered canonical
tags to ensure their session descriptions were properly
indexed, because we care about that content. And to ensure these
documents were crawlable, we added them to the
site map, as well. But what about the
single page app, which allows for filtering sessions? Well, that’s more of a tool
than a piece of content, right? Therefore, it’s not as important
to index the HTML on that page. So ask yourself this– do the pages I care about
from the perspective of content and indexing use
client side rendering, anyway? OK. So now you know– when building
a client side rendered website, you must tread carefully. As the web and the
industry has gotten bigger, so, too, have the teams and
companies become more complex. We now work in a world where
the people building websites aren’t necessarily
the same people promoting or marketing
those websites. And so this challenge is one
that we’re all facing together, as an industry, both from
Google’s perspective and yours, as developers,
because after all, you want your content indexed
by search engines, and so do we. Well, this seems like a good
opportunity to change tracks. So John, do you want to
take over and tell everyone about the Google Search policy
changes and some of the best practices they can
apply so we can meet this challenge together? JOHN MUELLER: Sure. Thanks, Tom. That was a great summary
of how Search works. Though, I still don’t know
about those pictures of spiders. Kind of scary. But Googlebot, in reality,
is actually quite friendly. Anyway, as Tom
mentioned, the indexing of modern JavaScript powered
websites is a challenge. It’s a challenge both for
Google, as a search engine, and for you all, as
developers of the modern web. And while developments on
our side are still ongoing, we’d like to help you to
tackle this challenge in a more systematic way. So for that, we’ll look
at three things here– the policy change that we
mentioned briefly before, some new tools that are
available to help you diagnose these issues a
little bit better, and lastly, a bunch
of best practices to help you to make better
JavaScript powered websites that also work well in Search. So we’ve talked about client
side rendering briefly and server side
rendering already. Client side rendering
is a traditional state where JavaScript is
processed on the client– that would be the
user’s browser– or on a search engine. For server side
rendering, the server– so your server will
process the JavaScript and serve mostly static
HTML to search engines. Often, this also has
speed advantages. So especially on lower end
devices, on mobile devices, JavaScript can take
a bit of time to run. So this is a good practice. For both of these,
we index the state as ultimately seen in the browser. So that’s what we pick up,
and we try to render pages when we need to do that. There’s a third type
of rendering that we’ve talked about in the past. It starts in the same way
in that pre-rendered HTML is sent to the client. So you have the same
speed advantages there. However, on interaction or
after the initial page load, the server adds
JavaScript on top of that. And as with server
side rendering, our job, as a search
engine, is pretty easy here. We just pick up the
pre-rendered HTML content. We call this hybrid rendering. This is actually our
long-term recommendation. We think this is
probably where things will end up in the long run. However, in practice,
implementing this can still be a bit
tricky, and most frameworks don’t make this easy. A quick call out to Angular,
since we featured them in the beginning as an
example of a page that was hard to pick up. They have built a
hybrid rendering mode with Angular Universal
that helps you to do this a little bit easier. Over time, I imagine
more frameworks will have something
similar to make it easier for you to do this in practice. However, at least at the
moment, if your server isn’t written in
JavaScript, you’re going to be dealing with
kind of double maintenance of controlling and
templating logic, as well. So what’s another option? What’s another way
that JavaScript sites could work well with search? We have another option that
we’d like to introduce. We call it dynamic rendering. In a nutshell, dynamic
rendering is the principle of sending normal, client side
rendered content to users, and sending fully server
side rendered content to search engines and to
other crawlers that need it. This is the policy change
that we talked about before. So we call it dynamic
because your site dynamically detects whether or
not the request there is a search engine
crawler, like Googlebot, and only then sends
the server side rendered content
directly to the client. You can include other
web services here, as well, that can’t deal
with rendering– for example, maybe social media services,
or chat services, anything that tries to extract structured
information from your pages. And for all other requesters,
so your normal users, you would serve your normal
hybrid or client side rendered code. This also gives you the
best of both worlds, and makes it easy
for you to migrate to hybrid rendering for your
users over time, as well. One thing to note– this is not
a requirement for JavaScript sites to be indexed. As you’ll see later, Googlebot
can render most pages already. For dynamic rendering,
our recommendation is to add a new tool or step
in your server infrastructure to act as a dynamic renderer. This reads your normal,
client side content and sends a pre-rendered version
to search engine crawlers. So how might you implement that? We have two options here
that help you to kind of get started. The first is Puppeteer, which
is a Node.js library, which wraps a headless version of
Google Chrome underneath. This allows you to
render pages on your own. Another option is
Rendertron, which you can run as a software, as a
service that renders and caches your content on
your side, as well. Both of these are
open source, so you could make your
own version or use something from a third party
that does something similar, as well. For more information
on these, I’d recommend checking out the I/O
session on Headless Chrome. I believe there’s a
recording about that already. Either way, keep
in mind, rendering can be pretty
resource intensive. So we recommend doing this out
of band from your normal web server and implementing
caching as you need it. So let’s take a quick
look at what your server infrastructure might look
like with a dynamic renderer integrated. Requests from Googlebot
come in on the side here. They’re sent to
your normal server, and then perhaps
through a reverse proxy, they’re sent to the
dynamic renderer. There, it requests and renders
the complete final page and sends that back
to the search engines. So without needing to implement
or maintain any new code, this setup could
enable a website that’s designed only for
client side rendering to perform dynamic
rendering of the content to Googlebot and to other
appropriate clients. If you think about it, this
kind of solves the problems that Tom mentioned
before, and now we can be kind of confident
that the important content of our web pages is available
to Googlebot when it performs its initial wave of indexing. So how might you recognize
Googlebot requests? This is actually pretty easy. So the easiest way to do
that is to find Googlebot in the user-agent string. You can do something
similar for other services that you want to serve
pre-rendered content to. And for Googlebot, as
well as some others, you can also do a
reverse DNS look up if you want to be sure
that you’re serving it just to legitimate clients. One thing to kind of
watch out for here is that if you serve adapted
content to smartphone users versus desktop users,
or you redirect users to different URLs, depending
on the device that they use, you must make sure that
dynamic rendering also returns device focused content. In other words, mobile
search engine crawlers, when they go to
your web pages, they should see the mobile
version of the page. And the others should
see the desktop version. If you’re using
responsive design– so if you’re using
the same HTML and just using CSS to conditionally
change the way that content is shown to users,
this is one thing you don’t need to watch out
for, because the HTML is exactly the same. What’s not immediately
clear from the user agents is that
Googlebot is currently using a somewhat older
browser to render pages. It uses Chrome 41, which
was released in 2015. The most visible
implication for developers is that newer
JavaScript versions and coding conventions,
like arrow functions, aren’t supported by Googlebot. And with that, also, any API
that was added after Chrome 41 currently isn’t supported. You can check these on
a site like, canIuse. And while you
could theoretically install an older
version of Chrome, we don’t recommend doing that,
for obvious security reasons. Additionally,
there are some APIs that Googlebot doesn’t support
because they don’t provide additional value for Search. We’ll check these out, too. All right, so you
might be thinking, this sounds like a
lot of work, John. I don’t know. Do I really need to do this? So a lot of times, Googlebot
can render pages properly. Why do I really have
to watch out for this? Well, there are a few reasons
to watch out for this. First is if your site is
large and rapidly changing. For example, if you
have a news website, that has a lot of new content
that keeps coming out regularly and requires quick indexing. As Tom showed, rendering
is deferred from indexing. So if you have a
large, dynamic website, then kind of the new
content might take a while to be indexed, otherwise. Secondly, if you rely on modern
JavaScript functionality. For example, if you
have any libraries that can’t be transpiled
back to ES 5, then dynamic rendering
can help you there. And that said, we
continue to recommend using proper graceful
degradation techniques, so that even older clients
have access to your content. And finally, there’s a third
reason to also look into this. In particular, if you’re
using social media sites– if your site relies on
sharing through social media or through chat applications. If these services require
access to your page’s content, then dynamic rendering
can help you there, too. So when you might not
use dynamic rendering. I think the main aspect here is
balancing the time and effort needed to implement and to
run this with the gains that are received. So remember, implementation
and maintenance of dynamic rendering can
use a significant amount of server resources. And if you see Googlebot
is able to index your pages properly, then if you’re not
making critical, high frequency changes to your
site, maybe you don’t need to actually implement
anything special. Most sites should be
able to let Googlebot render their pages just fine. Like I mentioned, if Googlebot
can render your pages, then probably you don’t need dynamic
rendering for that site. Let’s take a look at a few
tools to help you figure out what the situation is. When diagnosing
rendering, we recommend doing so incrementally. First, checking the
raw HTTP response, and then checking
the rendered version, either on mobile or on mobile
and desktop, if you serve different content, for example. Let’s take a quick
look at these. So looking at the raw HTTP
response, one way to do that is to use Google Search console. To gain access to
Google Search console and to a few other features
that they have there, you first need to verify
ownership of your website. This is really easy to do. There are a few ways to do that. So I’d recommend doing that,
regardless of what you’re working on. Once you have your
site verified, you can use a tool
called Fetch as Google, which will show the HTTP
response that was received by Googlebot, including
the response code on top and the HTML that was provided
before any rendering was done. This is a great
way to double check what is happening
on your server, especially if you’re using
dynamic rendering to serve different content to Googlebot. Once you’ve checked
the raw response, I recommend checking how the
page is actually rendered. So the tool I use for this
is the mobile friendly test. It’s a really fast
way of checking Google’s rendering of a page. As I mentioned, that
name suggests that it’s made for mobile devices. So as you might know,
over time, our indexing will be primarily focused on
the mobile version of a page. We call this
mobile-first indexing. So it’s good to already start
focusing on the mobile version when you’re testing rendering. We recommend testing a few
pages of each kind of page within your website. So for example, if you
have an e-commerce site, check the home page, some
of the category pages, and some of the detail pages. You don’t need to check every
page on your whole website, because a lot of
times, the templates will be pretty similar. If your pages render
well here, then chances are pretty high that
Googlebot can render your pages for Search, as well. One thing that’s
kind of a downside here is that you just
see the screenshot. You don’t see the
rendered HTML here. What’s one way to
check the HTML? Well, new for I/O– I think we launched
this yesterday. We’ve added a way to review
the HTML after rendering. This is also in the
mobile friendly test. It shows you what was
created after rendering with the mobile Googlebot. It includes all of
the markup for links, for images, for
structured data– any invisible elements
that might be on the page after rendering. So what do you do
if the page just doesn’t render properly at all? We also just launched a
way to get full information about loading issues
from a page, as well. On this part within the
mobile-friendly test, you can see all of the resources
that were blocked by Googlebot. So this could be JavaScript
files or API responses. A lot of times, not everything
needs to be crawled, kind of like Tom mentioned. For example, also, if you have
tracking pixels on a page, Googlebot doesn’t really need
to render those tracking pixels. But if you use an API to pull
in content from somewhere else, and that API end point
is blocked by robots.txt, then obviously, we can’t
pull in that content at all. An aggregate list of
all of these issues is also available
in Search console. So when pages fail in
a browser, usually I check the developer console
for more information, to see more details
on exceptions. And new for I/O, one of
the most requested features from people who make JavaScript
powered sites for Search is also showing the
console log when Googlebot tries to render something. This allows you to check for
all kinds of JavaScript issues. For example, if
you’re using ES6, or if you just have other
issues with the JavaScript when it tries to run. This makes my life
so much easier because I don’t
have to help people with all of these detailed
rendering issues that much. Desktop is also a topic
that still comes up. As you’ve seen in maybe
some of the other sessions, desktop isn’t quite dead. So you can run all
of these diagnostics in the rich results
test, as well. This tool shows a desktop
version of these pages. So now that we’ve seen
how to diagnose issues, what kind of issues have we run
across with modern JavaScript powered sites? What patterns do you need to
watch out for and handle well on your side? So remember Tom mentioned
at the beginning of the talk something about
lazy loading images and being unsure if
they’re indexable? Well, it turns out, they’re
only sometimes indexable. So it was good to look at that. Depending on how lazy
loading is implemented, Googlebot may be able to
trigger it, and with that, may be able to pick up
these images for indexing. For example, if the
images are above the fold, and you’re lazy loading
kind of runs those images automatically, then Googlebot
will probably see that. However, if you want to
be sure that Googlebot is able to pick up lazy loaded
images, one way to do that is to use a noscript tag. So you can add a noscript tag
around a normal image element, and we’ll be able to pick that
up for image search directly. Another approach is to use
structured data on a page. When we see structured data
that refers to an image, we can also pick that
up for Image Search. As a side note for
images, we don’t index images that are
referenced only through CSS. We currently only
index images that are kind of embedded with
the structured data markup or with image tags. Apart from lazy
loaded images, there are other types of
content that require some kind of interaction
to be loaded. What about tabs that
load the content after you click
on them, or if you have infinite scroll
patterns on a site? Googlebot generally won’t
interact with a page, so it wouldn’t be
able to see these. There are two ways that you can
get this to Googlebot, though. Either you can
pre-load the content and just use CSS to toggle
visibility on and off. That way, Googlebot
can see that content from the preloaded version. Or alternately, you can
just use separate URLs and navigate the user and
Googlebot to those pages individually. Now, Googlebot is a patient
bot, but there are a lot of pages that we have to crawl. So we have to be efficient and
kind of go through pages fairly quickly. When pages are slow
to load or render, Googlebot might miss some
of the rendered content. And since embedded resources are
aggressively cached for Search, rendering timeouts are
really hard to test for. So to limit these
problems, we recommend making performant and
efficient web pages, which you’re hopefully already
doing for users, anyway, right? Anyway, in particular, limit
the number of embedded resources and avoid artificial delays like
timed interstitials, like here. You can test pages with
the usual set of tools and roughly test rendering with
the mobile-friendly testing tool. And while timeouts
here are a little bit different for
indexing, in general, if the pages work in the
mobile-friendly test, they’ll work for
search indexing, too. Additionally, Googlebot
wants to see the page as a new user would see it. So we crawl and render
pages in a stateless way. Any API that tries
to store something locally would not be supported. So if you use any of
these technologies, make sure to use graceful
degradation techniques to allow anyone to
view your pages, even if these APIs
are not supported. And that was it with regards
to critical best practices. Now, it’s time to take a
quick circle back and see what we’ve seen. So first, we recommend checking
for proper implementation of best practices
that we talked about. In particular, lazy loaded
images are really common. Second, test a
sample of your pages with the mobile-friendly test
and use the other testing tools, as well. Remember, you don’t need
to test all of your pages. Just make sure that you have
all of the templates covered. And then finally,
if pages are large and if sites are large
and quick changing, or you can’t reasonably fix
rendering across a site, then maybe consider using
dynamic rendering techniques to serve Googlebot and other
crawlers a pre-rendered version of your page. And finally, if you do decide
to use dynamic rendering, make sure to double check
the results there, as well. One thing to keep in mind– indexing isn’t the
same as ranking. But generally
speaking, pages do need to be indexed before
their content can appear in Search at all. I don’t know. Tom, do you think that
covers about everything? TOM GREENAWAY: Well, it
was a lot to take in, John. That was some amazing content. But I guess one
question I have, and I think maybe other
people in the audience have this on their
mind, as well, is is it always going
to be this way, John? JOHN MUELLER: That’s
a great question, Tom. I don’t know. I think things will
never stay the same. So as you mentioned
in the beginning, this is a challenge for
us that’s important. Within Google Search, we
want our search results to reflect the web
as it is regardless of the type of
website that’s used. So our long term version is
that you, the developers, shouldn’t need to worry as much
about this for search crawlers. So circling back on
the diagram that Tom showed in the beginning
with deferred rendering, one change we want to make
is to move rendering closer to crawling and indexing. Another change we
want to make is to make Googlebot use a more
modern version of Chrome over time. Both of these will
take a bit of time. I don’t like making
long-term predictions, but I suspect it will be
at least until end of year until this works
a little better. And similarly, we
trust that rendering will be more and more common
across all kinds of web services. So at that point,
dynamic rendering is probably less critical
for modern sites. However, the best practices
that we talked about– they’ll continue to be
important here, as well. How does that sound, Tom? TOM GREENAWAY: That
sounds really great. I think that covers everything,
and I hope everyone in the room has learned some new
approaches and tools that are useful for making your
modern JavaScript powered websites work well
in Google Search. If you have any questions, we’ll
be in the mobile web sandbox area together with the
Search console team. And alternatively, you can
always reach out to us online, as well, be it through
Twitter, our live Office Hours Hangouts, and in the
Webmaster Help forum, as well. So thanks, everyone,
for your time. JOHN MUELLER: Thank you. [APPLAUSE] [MUSIC PLAYING]

21 comments

  1. Will GoogleBot use Chrome 59 in 2018 ?
    https://www.search-foresight.com/googlebot-chrome-59/
    Because, you know, ES20**.

  2. 23:55 provides a solution of implementing server side rendering only for google bot. That might be a good solution, however, I thought that is considered as search engine cloaking (providing different result to users / bots), which will penalize your SEO… isn't it?

  3. I have a question: we work on an brand new site, which build on JS. We close it by robots.txt as we afraid that bot might index a lot of "empty" pages, that are without dinamic redenring… However, i am want to test and to see – how google bot will see those pages? But I can't test it until I unblock the robots.txt file, right? I mean – I even can't use GWT's "Fetch as google bot" while it is closed by robots.txt. So, what might be the solution to check how google bot will render my sites without openeing robots.txt file?

  4. How to make sure that Google will not consider Dynamic Rendering as a Cloaking? Previously there was a recommendation to not checking for a google bot.

  5. Google, Please provide a link to the documentation regarding dynamic rendering and the official policy change.

  6. The dynamic rendering is so ridiculous…. What make you think that I'm going to code like a #$%#! just to make your job simplier when implement that, requires an important infrstructure? Google many times does incredible things, but this…. this goes nowhere. I really don't think people are going to implement this, or if they try, they are going to leave it after try…..

  7. Questions: You mention using the mobile friendly tool and the rich results testing tool as rendering test platforms, essentially. Why do this instead of using Fetch and Render in Search Console? In fact the first time I tried to use the rich results tool it told me that the page was not eligible for "rich results known by this test."

  8. You're missing a 'b' in a part of the info 🙂
    "Watch more >Wemasters< sessions from I/O '18 here"
    Just trying to help, keep being awesome and an inspiration! 🙂

    Awesome video <3

  9. On my website I use the fragment #! and it is perfect for the users, I show the content without refreshing the whole page. But now Google does not recommend this and my site has fallen in terms of indexed pages and therefore its positioning too.

    I do not understand why they do not take the content that comes after #!. Google always recommends focusing on users when the site is done, but this is no longer the case. Since in my case the site works perfect for users, they see the content, but now for Google this is insignificant and if now I have to change something from my code it is for Google to interpret it. Contradictory, no?

    Anyway in search engines like Bing or duckduckgo this does not happen, there if they crawl all the content of my web. They say to make use of the API History, which I was trying to do and I can not make it work for my case.

    So, do we focus on users or search engines?

  10. This technical aspect is really important for the following up of website building.Truly thanks.

  11. This presentation just begs the same question over and over: why not make Googlebot better? Shifting the burden onto all these web developers… or just improve Googlebot to handle modern practices? Oh, your indexing bot doesn't know how to read/index pages that a human can reason about? Sounds like your bot could be improved. It uses Chrome 41—why? etc.

    Don't get me wrong, I think web developers should do all they can to improve SEO (especially with JSON-LD structured data), but some of these limitations of Googlebot are just annoying.

Leave a Reply

Your email address will not be published. Required fields are marked *