How Jekyll works

Update (2026-03-13): This post is almost 14 years old! The latest version of Jekyll when this text was written was 0.11.2, which didn’t include Data Files, a feature for converting CSV/JSON/YAML files into items to include in the website; even so, I decided to keep it up due to the investigation being interesting in my opinion. I rewrote a few paragraphs for clarity.

The Quotes Generator

I use Tumblr to interact with fandoms I’m interested in. One of the kinds of posts I enjoy are quotes from authors, which I use as a signal to find new books to read instead of book reviews. Over time, these posts I reblogged grew into a decent-sized collection of quotes which I enjoy rereading from time to time, sort of like a “quotation book”.

This post shows the construction of a Generator plugin that updates a website with product information from a JSON file. That inspired me to write something similar for myself: the plugin would pull data from Tumblr, save it onto a JSON file, then create a new page showcasing the quotes.

There’s an example of a generator plugin in the GitHub wiki: This example lived in the wiki on the repository before the documentation was moved to the jekyllrb.com website. (2026-03)

module Jekyll
  class CategoryPage < Page
    def initialize(site, base, dir, category)
      @site = site
      @base = base
      @dir = dir
      @name = 'index.html'

      self.process(@name)
      self.read_yaml(File.join(base, '_layouts'), 'category_index.html')
      self.data['category'] = category

      prefix = site.config['category_title_prefix'] || 'Category: '
      self.data['title'] = "#{prefix}#{category}"
    end
  end

  class CategoryPageGenerator < Generator
    safe true

    def generate(site)
      if site.layouts.key? 'category_index'
        dir = site.config['category_dir'] || 'categories'
        site.categories.keys.each do |category|
          site.pages &lt;&lt; CategoryPage.new(site, site.source,
                File.join(dir, category), category)
        end
      end
    end
  end
end

After a bit of tinkering, I wrote a script that saved a JSON file locally with data pulled from Tumblr’s API, which was fed into a Generator plugin (available here) responsible for creating a page to showcase those quotes. This process made me curious about how all those Markdown files are handled internally and I decided to investigate it further. My first mental model involved different data structures holding all the generators, converters, &c, that were invoked during the “compilation” of the site, but I still wanted to know how to generate an output HTML without an existing template, effectively creating it during “compile time”.

Exploration

I started with the Jekyll executable. It’s pretty simple: the command-line parameters, the defaults and the _config.yml (through Jekyll::configuration method) are used to create an options hash and then a new site is instantiated:

# Create the Site
site = Jekyll::Site.new(options)

After that, it starts watching the necessary directories if the --auto option was used:

if options['auto']
  require 'directory_watcher'
  puts "Auto-regenerating enabled: #{source} -> #{destination}"
    # ...
else
  puts "Building site: #{source} -> #{destination}"
    # ...
end

The site is built through a call to #process, the main method in the Jekyll::Site class. Finally, it runs the local server if --server was specified.

Now to check how #process actually creates the website; onto lib/jekyll/site.rb:

def process
  self.reset
  self.read
  self.generate
  self.render
  self.cleanup
  self.write
end

Let’s see what each of these methods do:

  • reset: initialize Hashes for the layouts, categories, and tags and Arrays for the posts, pages, and static_files.
  • read: get site data from the filesystem and store it in internal data structures (both the ones created in the previous step and in the Jekyll::Site instance).
  • generate: call each generators’ #generate method.
  • render: call the #render method for each post and page.
  • cleanup: All pages, posts, and static_files are stored in a Set and everything else (unused files, empty directories) is deleted.
  • write: call the #write method of each post, page, and static_file, copying them to the destination folder (_site by default).

To recap: to create a generator, I need to write a subclass of Jekyll::Generator with a #generate method; that would be called before rendering of the pages and posts. However, I wanted to create the page for the quotes in-memory, which wasn’t playing so well with Jekyll’s rendering. After reviewing how #process works, it is clear that whatever page I create needs to be included as if it was encountered during #read.

Thus, after creating a string representing a page filled with quotes obtained from quotes.json, I had to create a Page object and add it to the internal Jekyll’s data structures. Consequently, #render will take that object and convert it into a complete HTML document that will be written to the filesystem via #write. Here’s (part of) the code from the plugin that implements this sequence:

class QuotesPage < Page
    attr_accessor :content, :data

    def initialize(site, base, dir, name)
      @site = site
      @base = base
      @dir  = dir
      @name = name

      self.process(name)
      self.content = ''
      self.data = {}
    end
  end

# In Site#create_quotes_page
quotes_page = QuotesPage.new(self, self.source, 'quotes_page', 'index.html')

# [...]

# `string` contains HTML generated from the quotes.json file
quotes_page.content = string
quotes_page.data["layout"] = self.config['tumblr_quotes_layout']

# [...]

self.pages << quotes_page

Check the code for the generator at this gist to understand what I’m saying.

I added a page to Jekyll’s repository wiki based on this post.

References