-
Notifications
You must be signed in to change notification settings - Fork 277
Generate Sitemaps on read only filesystems like Heroku
To generate sitemaps on read-only filesystems (like Heroku) we generate them into a temporary directory (or any directory with write access) and then upload them to a remote server.
As of 2012-07-12 SitemapGenerator includes some other adapters which you can use if you prefer not to use CarrierWave. The SitemapGenerator::S3Adapter uses fog-aws. You just need to set a few environment variables to configure your S3 key, bucket etc, namely: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, FOG_PROVIDER, FOG_DIRECTORY, FOG_REGION, FOG_PATH_STYLE. Take a look at this issue for more information.
The S3Adapter now supports configurable options so you don't have to use environment variables. The options are:
- :aws_access_key_id,
- :aws_secret_access_key,
- :fog_provider,
- :fog_directory,
- :fog_region,
- :fog_path_style.
Pass them in when you initialize your adapter. You can see the code in this issue
If you omit the access key and secret access key options, it will attempt to use the local IAM profile.
An easy way to configure Fog is to set these environmental variables:
AWS_ACCESS_KEY_ID=XXX
AWS_SECRET_ACCESS_KEY=XXX
FOG_PROVIDER=AWS
FOG_DIRECTORY=your-bucket
FOG_REGION=us-west-2
Alternately, you can pass in some or all of those values when you create your S3Adapter
in the sitemap.rb
configuration file:
SitemapGenerator::Sitemap.adapter = SitemapGenerator::S3Adapter.new(fog_provider: 'AWS',
aws_access_key_id: <your-access-key-id>,
aws_secret_access_key: <your-access-key>,
fog_directory: <your-bucket>,
fog_region: <your-aws-region e.g. us-west-2>)
Once you have Fog working, add the following to the sitemap.rb
configuration file:
# Set the host name for URL creation
SitemapGenerator::Sitemap.default_host = "http://example.com"
# pick a place safe to write the files
SitemapGenerator::Sitemap.public_path = 'tmp/'
# store on S3 using Fog (pass in configuration values as shown above if needed)
SitemapGenerator::Sitemap.adapter = SitemapGenerator::S3Adapter.new
# inform the map cross-linking where to find the other maps
SitemapGenerator::Sitemap.sitemaps_host = "http://#{ENV['FOG_DIRECTORY']}.s3.amazonaws.com/"
# pick a namespace within your bucket to organize your maps
SitemapGenerator::Sitemap.sitemaps_path = 'sitemaps/'
If your bucket is in a region other than the default, your sitemaps_host
must include the region. For example, for a bucket named your-bucket
in the us-west-2
region, the sitemaps_host
would be http://s3-us-west-2.amazonaws.com/your-bucket/
SitemapGenerator can use CarrierWave to support uploading to Amazon S3 store, Rackspace Cloud Files store, and MongoDB's GridFS...basically whatever CarrierWave supports.
# Gemfile
gem 'sitemap_generator', '2.0.1.pre1' # at time of writing
gem 'carrierwave'
gem 'fog-aws' # if you're using S3
Here is an example sitemap file. It generates sitemaps into tmp/sitemaps/. Note that we set the sitemaps_host
to the hostname of the server that will be hosting our sitemaps. The full path to the sitemaps then becomes the remote host + the sitemaps path + the sitemap filename. We set the adapter
to a WaveAdapter
which is a CarrierWave::Uploader::Base
.
SitemapGenerator::Sitemap.default_host = "http://www.example.com"
SitemapGenerator::Sitemap.sitemaps_host = "http://s3.amazonaws.com/sitemap-generator/"
SitemapGenerator::Sitemap.public_path = 'tmp/'
SitemapGenerator::Sitemap.sitemaps_path = 'sitemaps/'
SitemapGenerator::Sitemap.adapter = SitemapGenerator::WaveAdapter.new
SitemapGenerator::Sitemap.create do
add 'hello_world!'
add 'another'
end
In this example we are uploading to S3 using Fog. (I didn't have any success using the s3
storage option.) The fog_directory
is your S3 bucket name.
# config/initializers/carrierwave.rb
CarrierWave.configure do |config|
config.cache_dir = "#{Rails.root}/tmp/"
config.storage = :fog
config.permissions = 0666
config.fog_credentials = {
:provider => 'AWS',
:aws_access_key_id => 'your key',
:aws_secret_access_key => 'your secret',
}
config.fog_directory = 'bucket name'
end
With all that in place, you should be able to run rake sitemap:refresh
and have your sitemaps generated and uploaded!
After running my test with my bucket 'sitemap-generator' my sitemaps were uploaded to https://s3.amazonaws.com/sitemap-generator/sitemaps/sitemap1.xml.gz and https://s3.amazonaws.com/sitemap-generator/sitemaps/sitemap_index.xml.gz successfully.
To make sure that your sitemaps are found by the search engines, include the link to the sitemap_index.xml.gz file in your robots.txt file, by adding the following line:
Sitemap: http://s3.amazonaws.com/sitemap-generator/sitemaps/sitemap_index.xml.gz
And that should be it! This is still in beta and is not well tested at this time.
If you encounter problems, first check the tmp/
directory and make sure the sitemap files were generated correctly (matching the rake output). Then make sure that your S3 bucket is made public and check for any response messages from CarrierWave.
From Issue #69 - If you were already using CarrierWave for uploads, make sure to note this line in the carrierwave.rb
initializer above:
config.storage = :fog
CarrierWave examples commonly set the storage value in the uploader, like this:
class AvatarUploader < CarrierWave::Uploader::Base
storage :fog
end
However, in order for sitemap uploads to work properly, this value must be set in the carrierwave.rb
initializer.