11. RoR: Simple steps to generate your sitemap and connect it to Google

This chapter assumes you’ve linked to Google Search Console (check chapter 10 if you have not).

Sitemaps are useful in your website to assist with SEO. They will be very important to make your site more visible and searchable.

For this chapter, we will use sitemap generator gem.

First thing you want to do is add it to your Gemfile

gem 'sitemap_generator'

Then you will want to install the gem

bundle install
rake sitemap:install

By installing the gem using above commands, it will create the file: config/sitemap.rb

Navigate to config/sitemap.rb and let’s start configuring.

# Set the host name for URL creation
SitemapGenerator::Sitemap.default_host = "http://www.yazii.co.uk"
SitemapGenerator::Sitemap.sitemaps_host = "http://sitemap.yazii.co.uk/"
SitemapGenerator::Sitemap.public_path = Rails.root / 'tmp'
SitemapGenerator::Sitemap.adapter = SitemapGenerator::AwsSdkAdapter.new(
  ENV["AWS_SITEMAP_BUCKET"],
  aws_access_key_id: ENV["AWS_ACCESS_KEY_ID"],
  aws_secret_access_key: ENV["AWS_SECRET_ACCESS_KEY"],
  aws_region: ENV["AWS_REGION"]
)
SitemapGenerator::Sitemap.create do
  # Static routes:
  add privacy_path, priority: 0.6, lastmod: Policy::Privacy.current.updated_at, changefreq: 'monthly'
  add terms_and_conditions_path, priority: 0.6, lastmod: Policy::Terms.current.updated_at, changefreq: 'monthly'

  # Dynamic routes:
  JobPost.available.find_each do |posting|
    add ['job', posting.id, posting.slug].join('/'),
        lastmod: posting.updated_at,
        expires: posting.created_at + 30.days,
        priority: 0.7
  end
end

Let’s go through what we have here.

SitemapGenerator::Sitemap.default_host = "http://www.yazii.co.uk"
SitemapGenerator::Sitemap.sitemaps_host = "http://sitemap.yazii.co.uk/"
SitemapGenerator::Sitemap.public_path = Rails.root / 'tmp'

Change the default host to your used host, e.g. http://www.example.com

The sitemaps_host will be quite important here. I would suggest something like http://sitemap.example.com where example.com is your domain. This is optional, but you will want to configure a mapping that will provide your sitemap AND is hosted on your domain name – we will go through this in a bit more detail below.

The public_path can be kept as is, as taken from documentation.

Next is setting up AwsSdkAdapter. Why use AWS S3 for storing sitemap? There are several reasons, one is that it is good practice and second is that while using Heroku, there is basically no storage available. i.e. the storage is ephemeral.

 As such, you will want to configure AWS S3 access for this:

SitemapGenerator::Sitemap.adapter = SitemapGenerator::AwsSdkAdapter.new(
  ENV["AWS_SITEMAP_BUCKET"],
  aws_access_key_id: ENV["AWS_ACCESS_KEY_ID"],
  aws_secret_access_key: ENV["AWS_SECRET_ACCESS_KEY"],
  aws_region: ENV["AWS_REGION"]
)

If you’ve followed previous posts, we already have the AWS access and secret keys required to access S3 (set up with user groups and roles). We will also want to create a new bucket specifically for this use-case though.

Why do we create new bucket? It requires public access which we don’t want to have on private buckets currently in place. This bucket should be accessible to Google Search Console to be able to download the sitemap.

Also, we’ll configure the name of the bucket for easy access.

Go ahead and log into your AWS S3 console and click to create a new bucket.

Create a new bucket for your sitemap
Create a new bucket for your sitemap

Make the bucket name the same as your sitemap host from the previous step (without the protocol). In my case it was sitemap.yazii.co.uk – you can see the reason for this in the steps to come.

Under Configure options you can keep everything as default and click next.

S3 Bucket configuration options
S3 Bucket configuration options

Then at Set permissions tab you will want to allow public access

Set bucket permissions to allow public access
Set bucket permissions to allow public access

Uncheck the box to Block all public access and then you can click next and review.

Review bucket configurations and create
Review bucket configurations and create

Again – we keep the bucket permissions to public because it will only hold the sitemap and we want it to be public. Search engines such as Google will need to access this file.

Okay, now we’re getting to the point of why the bucket name is like a link?

We will now configure your DNS settings to route a request from S3. From the sitemap.rb file above you can see that sitemaps_host = "http://sitemap.yazii.co.uk/"

We will configure it such that when you open that URL, it will fetch you the sitemap from S3.

Open up your DNS configuration settings – the example used here is with GoDaddy:

Add DNS CNAME record for sitemap

Add a CNAME record with sitemap for host – this means that when the URL is sitemap.your_domain.ext it will point to whatever is in Points to section.

Set Points to the value of bucket.s3.amazonaws.com

Now the reason why the bucket name is set to sitemap.yazii.co.uk is because it ignores the ‘bucket’ from the example of bucket.s3.amazonaws.com – to be honest I don’t know the details of how it gets routed as such, but I know that it tries to use your domain as the bucket name.

So, let’s summarise and show the sample, we have S3 bucket:

We then have the CNAME record in DNS settings

We configured sitemap.rb to use AWS and to have the file hosted on above host:

SitemapGenerator::Sitemap.sitemaps_host = "http://sitemap.yazii.co.uk/"

SitemapGenerator::Sitemap.adapter = SitemapGenerator::AwsSdkAdapter.new(
  ENV["AWS_SITEMAP_BUCKET"],
  aws_access_key_id: ENV["AWS_ACCESS_KEY_ID"],
  aws_secret_access_key: ENV["AWS_SECRET_ACCESS_KEY"],
  aws_region: ENV["AWS_REGION"]
)

Also just make a entry to robots.txt file to reference this sitemap:

Sitemap: http://sitemap.yazii.co.uk/sitemap.xml.gz

Make sure it’s the same as your sitemaps_host and just add /sitemap.xml.gz to the end of it.

You will see this as an output when you run rake sitemap:refresh (this will create it if it does not exist).

To run this on heroku, just type:

heroku run rake sitemap:refresh

If you don’t want it to notify the search engines, run:

rake sitemap:refresh:no_ping

After running this command, the gem should generate and upload your sitemap to S3 and we can test if it’s available from the URL you’ve defined:

If after navigating to your defined host you’re able to download the file, that’s great!

Now you want to upload this to Google Search Console:

Enter the sitemap URL and click submit.

Can you enter just the S3 URL? No.

You need to enter a URL which is under your domain as its for security and verification purposes. This is why we go through the several hoops to make your sitemap available from your domain and configure S3 bucket name to open as we’d expect with our configuration.

But if you got this far, that’s great! You’re now closer to having a much more SEO friendly site!

Let’s finally go through how to add your pages to the sitemap

SitemapGenerator::Sitemap.create do
    # add privacy_path, priority: 0.6, lastmod: Policy::Privacy.current.updated_at, changefreq: 'monthly'
  # add terms_and_conditions_path, priority: 0.6, lastmod: Policy::Terms.current.updated_at, changefreq: 'monthly'
end

Add your static pages in this manner, where you can use the ‘rails_path’ to specify the routes.

If you’re unsure of the path name, you can type rails routes in your console then find the controller & function name corresponding to it and copy the ‘Prefix’ from this output.

For dynamic routes, i.e. pages which are generated based on the content of your database, you will want something like this:

SitemapGenerator::Sitemap.create do
  JobPost.available.find_each do |posting|
    add ['job', posting.id, posting.slug].join('/'),
        lastmod: posting.updated_at,
        expires: posting.created_at + 30.days,
        priority: 0.7
  end
end

Here, the .available is a scope which filters the active record for only available content.

We create the dynamic URL based on the content of the Active Record. For example it could be add '/job/123/job_slug_description'.

The slug is a short description which should contain several useful keywords for SEO (and is optional in this particular case).

These links are also referenced in your routes.rb file as usual, for example in this case, the routes entry is:

  get '/job/:uuid/:slug', to: 'jobs#show', as: 'job'