How Jekyll works
Update (2026-03-13): This post is almost 14 years old! The latest version of Jekyll when this text was written was 0.11.2, which didn’t include Data Files, a feature for converting CSV/JSON/YAML files into items to include in the website; even so, I decided to keep it up due to the investigation being interesting in my opinion. I rewrote a few paragraphs for clarity.
The Quotes Generator
I use Tumblr to interact with fandoms I’m interested in. One of the kinds of posts I enjoy are quotes from authors, which I use as a signal to find new books to read instead of book reviews. Over time, these posts I reblogged grew into a decent-sized collection of quotes which I enjoy rereading from time to time, sort of like a “quotation book”.
This post shows the construction of a Generator plugin that updates a website with product information from a JSON file. That inspired me to write something similar for myself: the plugin would pull data from Tumblr, save it onto a JSON file, then create a new page showcasing the quotes.
There’s an example of a generator plugin in the GitHub wiki: This example lived in the wiki on the repository before the documentation was moved to the jekyllrb.com website. (2026-03)
module Jekyll
class CategoryPage < Page
def initialize(site, base, dir, category)
@site = site
@base = base
@dir = dir
@name = 'index.html'
self.process(@name)
self.read_yaml(File.join(base, '_layouts'), 'category_index.html')
self.data['category'] = category
prefix = site.config['category_title_prefix'] || 'Category: '
self.data['title'] = "#{prefix}#{category}"
end
end
class CategoryPageGenerator < Generator
safe true
def generate(site)
if site.layouts.key? 'category_index'
dir = site.config['category_dir'] || 'categories'
site.categories.keys.each do |category|
site.pages << CategoryPage.new(site, site.source,
File.join(dir, category), category)
end
end
end
end
end
After a bit of tinkering, I wrote a script that saved a JSON file locally with data pulled from Tumblr’s API, which was fed into a Generator plugin (available here) responsible for creating a page to showcase those quotes. This process made me curious about how all those Markdown files are handled internally and I decided to investigate it further. My first mental model involved different data structures holding all the generators, converters, &c, that were invoked during the “compilation” of the site, but I still wanted to know how to generate an output HTML without an existing template, effectively creating it during “compile time”.
Exploration
I started with the Jekyll executable.
It’s pretty simple: the command-line parameters, the defaults and the _config.yml (through Jekyll::configuration method) are used to create an options hash and then a new site is instantiated:
# Create the Site
site = Jekyll::Site.new(options)
After that, it starts watching the necessary directories if the --auto option was used:
if options['auto']
require 'directory_watcher'
puts "Auto-regenerating enabled: #{source} -> #{destination}"
# ...
else
puts "Building site: #{source} -> #{destination}"
# ...
end
The site is built through a call to #process, the main method in the Jekyll::Site class.
Finally, it runs the local server if --server was specified.
Now to check how #process actually creates the website; onto lib/jekyll/site.rb:
def process
self.reset
self.read
self.generate
self.render
self.cleanup
self.write
end
Let’s see what each of these methods do:
reset: initializeHashes for the layouts, categories, and tags andArrays for the posts, pages, and static_files.read: get site data from the filesystem and store it in internal data structures (both the ones created in the previous step and in theJekyll::Siteinstance).generate: call each generators’#generatemethod.render: call the#rendermethod for each post and page.cleanup: All pages, posts, and static_files are stored in aSetand everything else (unused files, empty directories) is deleted.write: call the#writemethod of each post, page, and static_file, copying them to the destination folder (_siteby default).
To recap: to create a generator, I need to write a subclass of Jekyll::Generator with a #generate method; that would be called before rendering of the pages and posts.
However, I wanted to create the page for the quotes in-memory, which wasn’t playing so well with Jekyll’s rendering.
After reviewing how #process works, it is clear that whatever page I create needs to be included as if it was encountered during #read.
Thus, after creating a string representing a page filled with quotes obtained from quotes.json, I had to create a Page object and add it to the internal Jekyll’s data structures.
Consequently, #render will take that object and convert it into a complete HTML document that will be written to the filesystem via #write.
Here’s (part of) the code from the plugin that implements this sequence:
class QuotesPage < Page
attr_accessor :content, :data
def initialize(site, base, dir, name)
@site = site
@base = base
@dir = dir
@name = name
self.process(name)
self.content = ''
self.data = {}
end
end
# In Site#create_quotes_page
quotes_page = QuotesPage.new(self, self.source, 'quotes_page', 'index.html')
# [...]
# `string` contains HTML generated from the quotes.json file
quotes_page.content = string
quotes_page.data["layout"] = self.config['tumblr_quotes_layout']
# [...]
self.pages << quotes_page
Check the code for the generator at this gist to understand what I’m saying.
I added a page to Jekyll’s repository wiki based on this post.
References
- Jekyll on GitHub
- Generating Jekyll Pages From Data on the Wayback Machine
- The original code for the Quotes Generator lives in this gist, but it’s incomplete.
- Data files in modern Jekyll
- Plugins in modern Jekyll