Creating multilingual sites

Creating a site in multiple languages can be tedious, but Nanoc can nonetheless be useful in making the management of multilingual sites a bit easier. The approach that I will be describing in this guide is opinionated. It is not necessarily the best way, but it is an approach that worked quite well for me. This guide is inspired by the techniques I used for the Myst Online website. Feel free to check out the source for the Myst Online website to see the details about how it is done.

A multilingual site is a site where each page is available in multiple languages. Each language forms some sort of sub-site. For example, the English translation could have pages “About” and “Play,” while the Dutch translation could have matching “Over” and “Speel” pages.

For the Myst Online site, I decided to organize the pages in different languages by creating a top-level directory containing the abbreviated language name (en for English, nl for Dutch, fr for French, etc). Inside the language-specific directory, each page has a path that is also translated (so no /nl/play, but rather /nl/speel). Here’s an example:

/en
  /about
  /play
/nl
  /over
  /speel
/fr
  /informations
  /jouez

The way these pages are maintained is standard: each page is an individual item in the content/ directory that is compiled to the output/ directory. You’d create these items like you would in an ordinary, monolingual site. Here’s what the content/ directory looks like:

content/
  en/
    about.html
    play.html
  nl/
    over.html
    speel.html
  fr/
    informations.html
    jouez.html

Implementation

One useful function that will be necessary later on is #language_code_of, which returns the language code (e.g. en or nl) for a given item. This function is implemented like this:

def language_code_of(item)
  # "/en/foo" becomes "en"
  (item.identifier.to_s.match(/^\/([a-z]{2})\//) || [])[1]
 end

Once you have the basic content, you can improve the site to make it easier for switch languages. For this to work, each item needs a “canonical identifier” so that it is possible to find translations of a given page. The same items in different languages will all have the same canonical identifier. For the Myst Online website, I chose the English identifier as the canonical identifier, but the choice you make here is arbitrary. For example, the /en/about page has /about as its canonical identifier, and /nl/speel has /play as its canonical identifier. The canonical identifier is probably best stored in a canonical_identifier attribute.

Now, it is possible to find all translations of a given item by finding all items with the same canonical identifier:

def translations_of(item)
  @items.select do |i|
    i[:canonical_identifier] == item[:canonical_identifier]
  end
end

One more function is necessary: one that converts a language code into the language name (in the language itself, so it should not return “Dutch” for nl but it should return “Nederlands”). Here’s now this function works:

LANGUAGE_CODE_TO_NAME_MAPPING = {
  'en' => 'English',
  'nl' => 'Nederlands'
}

def language_name_for_code(code)
  LANGUAGE_CODE_TO_NAME_MAPPING[code]
end

For completeness, let’s write a function that returns the language name for a given item as well:

def language_name_of(item)
  language_name_for_code(
    language_code_of(item))
end

Now, it is possible to link to all translations from a given item. Here’s how it is done (in ERB):

<ul>
  <% translations_of(@item).each do |t| %>
    <li>
      <a href="<%= t.path %>">
        <%= language_name_of(t) %>
      </a>
    </li>
  <% end %>
</ul>

It is best to prevent linking to the active page, so you should check whether the translation t is the same as @item and handle this situation differently. For example:

<ul>
  <% translations_of(@item).each do |t| %>
    <li>
      <% if @item == t %>
        <span class="active">
          <%= language_name_of(t) %>
        </span>
      <% else %>
        <a href="<%= t.path %>">
          <%= language_name_of(t) %>
        </a>
      <% end %>
    </li>
  <% end %>
</ul>

One extra enhancement would be to indicate the language of the link destinations as well as the language of the link text itself. For this, the hreflang resp. the lang attributes are used. Here’s what the code could look like:

<ul>
  <% translations_of(@item).each do |t| %>
    <li>
      <% if @item == t %>
        <span class="active" lang="<%= language_code_of(t) %>">
          <%= language_name_of(t) %>
        </span>
      <% else %>
        <a href="<%= t.path %>"
           lang="<%= language_code_of(t) %>"
           hreflang="<%= language_code_of(t) %>">
          <%= language_name_of(t) %>
        </a>
      <% end %>
    </li>
  <% end %>
</ul>

The language of the links and the link destinations are now indicated, but the language of the document itself isn’t yet. The html element should get a lang attribute that contains the language code. Here’s what it could look like in the layout:

<html lang="<%= language_code_of(@item) %>">

Redirects

At this point, the site is already a lot friendlier for people from different languages. One thing is still missing , though: a landing page that redirects people to the language of their choice. This means that the landing page will require server-side scripting. For the Myst Online site, I used PHP as this is a widely available scripting language for creating websites, but other languages such as Ruby would have worked as well. A good way of redirecting visitors is to check the contents of the Accept-Language HTTP header, find the preferred language, and then redirect them to the appropriate page.

Here’s the PHP code for parsing the header and returning a list of language codes requested by the user agent, sorted by decreasing preference using qval:

// Parse the Accept-Language header
$langs = array();
if(isset($_SERVER['HTTP_ACCEPT_LANGUAGE']))
{
  // Parse language
  // e.g. en-ca,en;q=0.8,en-us;q=0.6,de-de;q=0.4,de;q=0.2
  preg_match_all(
    '/([a-z]{1,8}(-[a-z]{1,8})?)\s*(;\s*q\s*=\s*(1|0\.[0-9]+))?/i',
    $_SERVER['HTTP_ACCEPT_LANGUAGE'],
    $lang);

  if(count($lang[1]) > 0)
  {
    // Create key-value pair
    $langs = array_combine($lang[1], $lang[4]);

    // Use default q value of 1
    foreach ($langs as $lang => $val)
    {
      if ($val === '')
        $langs[$lang] = 1;
    }

    // Sort based on q value
    arsort($langs, SORT_NUMERIC);
  }
}

Once the list of requested language codes is constructed, we can iterate over this list and try to redirect. For each of the requested languages, check whether the site has a translation in this language, and if it does, redirect.

First, though, we need to build the list of codes of all languages the site is translated in. This involves generating PHP code using Ruby code, which is icky, but it does the trick. Here’s the code:

<%# Find all language codes %>
<%
home         = @items['/en.*']
translations = translations_of(home)
codes        = translations.map { |t| language_code_of(t) }
%>

<%# Build PHP array of language codes %>
$codes = array(<%= codes.join(', ') %>);

The redirection code itself is given below. Note the redirection to the English version of the site s a fallback if no other languages could be satisfied.

// Show correct site
foreach($langs as $request_lang => $qval)
{
  foreach($codes as $code)
  {
    if(strpos($request_lang, $code) === 0)
      redirect($code);
  }
}
redirect('en');

The PHP redirect() function still needs to be implemented. This function creates a HTTP redirect to the home page in the given language. The HTTP header is different based on whether HTTP 1.0 or 1.1 is used. Here is its implementation:

function redirect($lang)
{
  global $base_url;

  // Set HTTP status code
  if ($_SERVER['SERVER_PROTOCOL'] == 'HTTP/1.1')
    header('HTTP/1.1 303 See Other');
  else
    header('HTTP/1.0 302 Moved Temporarily');

  // Set location
  header('Location: ' . $base_url . '/' . $lang . '/');

  // Stop!
  exit();
}

The global $base_url variable contains the base URL for the website. For the Myst Online website, this is https://mystonline.com. It is used to build the full redirection URL. You can either hardcode this in PHP, like this:

$base_url = 'https://mystonline.com';

… or you can set the base_url configuration attribute in nanoc.yaml (or config.yaml for older sites) and generate the PHP code for setting it (a bit icky, but DRYer):

$base_url = '<%= @site.config[:base_url] %>';

That’s the end of this guide. Now, you have a website in multiple languages where every language is given equal attention. For example, a German-speaking person can arrive on the site and be redirected to the German version of the site, and even the URLs will be in German.