In my recent article on assessing your site for SEO (Search Engine Optimization), I mentioned a few issues I found with my own site as I was going along. Nothing critical that would block Google from indexing my pages (the search engine crawler being able to read my content and put it into its catalog of content), but still a couple of improvements for me to make.

Adding noindex to my non-content pages #

The first issue was that my ’list’ pages, which is my general name for pages that exist to help you navigate to content instead of being content themselves, were showing up in searches. On my site, this included the “blog” page (Blogs :: Duncan Mackenzie), the tag pages (such as Web Development :: Duncan Mackenzie), and the tag listing page (Tags :: Duncan Mackenzie). These are important pages, as they allow someone to browse around the site and find related posts, but they tend to be full of keywords from all the post titles and end up competing with the actual content. If you remember from the SEO post, you have two instructions you can give to Google and other search engines, you can ask them to follow or not follow the links on a page (FOLLOW or NOFOLLOW) and you can ask them to index or not index (INDEX or NOINDEX). You can combine these two values as well, but the default is FOLLOW, INDEX so you can skip them if that is the desired behavior.

In this case, what I want is to avoid the pages showing up in search, but I want the search engine to follow the links. Following the links on these listing pages will help it find more of my content pages and understand the relationship between all my posts. Putting either

<meta name="robots" content="noindex" />

Or

<meta name="robots" content="noindex,follow" />

Would work, but follow is the default, so just noindex is best.

To add this line using Hugo, the site generation software I’m using, I updated my theme:

{{ if or (eq .Type "tags") (eq .Type "blog")}}<meta name="robots"
content="noindex" />{{ end }}

In my case, there are only two page-types where I want to make this change, so a simple if statement works. If you had a more complex setup, you could put a collection of page-types into your theme settings.

Removing these same pages from my sitemap #

Within a couple of days of making the change above, Google added a bunch of warning statements about my site to the search console. The issue was that I had submitted pages for indexing (by having them in my sitemap) that I had marked as noindex. This seems like a mistake, but I had mixed feelings. As I mentioned above, I do want these pages to be crawled and their links to be followed, so submitting them to Google seemed fine. On the other hand, there are links to the blog on every page and tag links at the bottom of each post, so those pages are going to be found either way. I decided it would be best to have a clean report from the search console, so I needed to remove these list pages from my sitemap. In Hugo, the sitemap is a built-in template, so you can’t edit it, but you can override by adding your own.

{{ printf "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"yes\" ?>" | safeHTML }}
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
  xmlns:xhtml="http://www.w3.org/1999/xhtml">
  {{ range where .Data.Pages "Type" "not in" "[\"blog\", \"tags\"]" }}
  <url>
    <loc>{{ .Permalink }}</loc>{{ if not .Lastmod.IsZero }}
    <lastmod>{{ safeHTML ( .Lastmod.Format "2006-01-02T15:04:05-07:00" ) }}</lastmod>{{ end }}{{ with .Sitemap.ChangeFreq }}
    <changefreq>{{ . }}</changefreq>{{ end }}{{ if ge .Sitemap.Priority 0.0 }}
    <priority>{{ .Sitemap.Priority }}</priority>{{ end }}{{ if .IsTranslated }}{{ range .Translations }}
    <xhtml:link
                rel="alternate"
                hreflang="{{ .Lang }}"
                href="{{ .Permalink }}"
                />{{ end }}
    <xhtml:link
                rel="alternate"
                hreflang="{{ .Lang }}"
                href="{{ .Permalink }}"
                />{{ end }}
  </url>
  {{ end }}
</urlset>

Similar to the earlier change, adding a where clause to the loop skips these list pages

{{ range where .Data.Pages "Type" "not in" "[\"blog\",\"tags\"]" }}

A few days later (most changes to your site are going to take Google a while to react to), the warnings went away, and I took pride in my clean search console report.

Getting Google to use my page descriptions #

This one was just a complete mistake on my part. I have been setting the description metadata on all my posts, in the YAML block at the top of the markdown file, and I was just assuming it was turning into the meta description in the HTML output. As I was making the earlier noindex change, I replaced a robots value of noodp which is no longer supported but used to indicate that Google should use your page’s description instead of one from a public directory. That led to me wondering how my descriptions were showing up on Google and doing some test searches. In every case, Google had inferred a description from the first paragraph or so of content instead of using the description I supplied in my posts. Turns out that my theme was happily auto-generating this summary snippet, because the template had

<meta name="description" content="{{ if .IsHome }}{{
.Site.Params.homeSubtitle }}{{ else }}{{ .Summary | plainify }}{{ end
}}\" />

The .Summary variable can be supplied in the YAML at the top of your posts, but if it is not provided then Hugo will take the first 70 words. This is an excellent feature, but I had gone ahead and provided a `description` value on each page, so I changed the template to use that first if it is available. I left .Summary in there as a fallback.

<meta name="description" content="{{ if .IsHome }}{{
.Site.Params.homeSubtitle }}{{ else }}{{ with .Description }}{{ . }}{{
else }}{{if .IsPage}}{{ .Summary | plainify }}{{ else }}{{ with
.Site.Params.description }}{{ . }}{{ end }}{{ end }}{{ end }}{{ end }}"
/>

Google can ignore your suggestion for a description, especially if it is short, and generate one from the page content instead but I still prefer to provide my own as a starting point.

Practicing what I preach #

Part of my motivation in fixing these issues was just to ensure I was doing the same best practices that I would advise other folks to do. Google, and other search engines, is remarkably capable of pulling all your content into its index even when you make a few mistakes, but it is still better to do some tests and fix up any issues you see. I’m sure there are a few more issues that I haven’t noticed yet, feel free to let me know if anything jumps out at you!