Skip to content

Instantly share code, notes, and snippets.

@dandrake
Last active January 28, 2026 22:14
Show Gist options
  • Select an option

  • Save dandrake/1b6deb7779b5629f75b35bfac3a2f9aa to your computer and use it in GitHub Desktop.

Select an option

Save dandrake/1b6deb7779b5629f75b35bfac3a2f9aa to your computer and use it in GitHub Desktop.
Basic pandoc usage of Lua filters to remove speaker notes, html comments from output

Motivation: removing speaker notes from slides

I make reveal.js slides with markdown and convert with pandoc. Sometimes I want to post those slides so my students can access them, but I want to strip out the speaker notes because they usually aren't very useful to them.

Solution: use a pandoc filter

I don't want to edit my source files; I just want the output correct. So, I could do something to the output html slides.

But: I also want to allow for the possibility that I'll want to output to other formats. So the best solution is to tell pandoc to do that filtering while it converts.

Problem: poor documentation

Pandoc's main page on filters is nice enough, and convinced me that I should try to write a Lua filter. I know the kind of thing I want to find (whatever "speaker notes" becomes when pandoc does its parsing) and what I want to do (omit it entirely from the output), but the documentation never clearly says how one does that.

Here's what I figured out. (Perhaps I'll turn this into some kind of PR for the pandoc documentation...)

How to write the filter

1. Find out how pandoc represents speaker notes

The filters page above almost tells you how to see the AST. To do that, convert your document to the native format:

pandoc --to native your_input_file

That turns this markdown input...

::: notes
my notes for this slide
:::

...into...

Div
    ( "" , [ "notes" ] , [] )
    [ Para
        [ Str "my"
        , Space
        , Str "notes"
        , Space
        , Str "for"
        , Space
        , Str "this"
        , Space
        , Str "slide"
        ]
    ]

Okay, so we want to identify that kind of Div. How? In Pandoc's JSON AST, the parentheses syntax represents an Attr (attributes) tuple with three components: (identifier, classes, key-value pairs).

(See these entries in the Lua documentation for Div nodes and Attr nodes.)

Here, we have:

  • "" = identifier (empty string)
  • ["notes"] = classes (list with one class: "notes")
  • [] = key-value attributes (empty list)

We want to get at that "notes" class.

2. Remove the element

The documentation also never says how you do this, though again it comes close.

Here's how you can do that in the Lua filter: you return an empty list.

    function Div(el)
      local identifier = el.attr.identifier  -- string
      local classes = el.attr.classes        -- list
      local attributes = el.attr.attributes  -- list of key-value pairs
      -- Check if "notes" is in the classes
      if el.attr.classes:includes("notes") then
        return {}  -- Remove the Div by returning empty list
      end
    end

Of course, you can shorten that up:

function Div(el)
  if el.attr.classes:includes("notes") then
    return {}
  end
end

Save that into a file and tell pandoc to use it with --lua-filter=your_file.lua.

Bonus: strip HTML comments too

HTML comments seem to be the closest thing Markdown has to comments; they show up in the pandoc AST as:

RawBlock (Format "html") "<!-- comment here -->"

or

RawInline (Format "html") "<!-- another comment -->"

Looking at the reference for RawBlock and for RawInline, we need to look for a format of html and to be sure, check that the text starts with the right bits.

local function starts_with(start, str)
  return str:sub(1, #start) == start
end

function RawBlock(el)
   if el.format == "html" and starts_with('<!--', el.text) then
      return {}
   end
end

function RawInline(el)
   if el.format == "html" and starts_with('<!--', el.text) then
      return {}
   end
end

The starts_with function is shamelessly copied from https://pandoc.org/lua-filters.html#building-images-with-tikz.

While we're here: difference between pandoc's markdown and GitHub-flavored markdown?

While working on this, I kept using --from=gfm with pandoc, and getting weird results that I didn't expect. Why is that?

Pandoc enables different extensions and features for GFM and its own native Markdown.

You can just diff the list of extensions used:

diff <(pandoc --list-extensions=markdown) <(pandoc --list-extensions=gfm):

For me, it seems the key thing is that GFM doesn't include fenced_divs.

There's some documentation of the GFM-related bits here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment