I make reveal.js slides with markdown and convert with pandoc. Sometimes I want to post those slides so my students can access them, but I want to strip out the speaker notes because they usually aren't very useful to them.
I don't want to edit my source files; I just want the output correct. So, I could do something to the output html slides.
But: I also want to allow for the possibility that I'll want to output to other formats. So the best solution is to tell pandoc to do that filtering while it converts.
Pandoc's main page on filters is nice enough, and convinced me that I should try to write a Lua filter. I know the kind of thing I want to find (whatever "speaker notes" becomes when pandoc does its parsing) and what I want to do (omit it entirely from the output), but the documentation never clearly says how one does that.
Here's what I figured out. (Perhaps I'll turn this into some kind of PR for the pandoc documentation...)
The filters page above almost tells you how to see the AST. To do that, convert your document to the native format:
pandoc --to native your_input_file
That turns this markdown input...
::: notes
my notes for this slide
:::
...into...
Div
( "" , [ "notes" ] , [] )
[ Para
[ Str "my"
, Space
, Str "notes"
, Space
, Str "for"
, Space
, Str "this"
, Space
, Str "slide"
]
]
Okay, so we want to identify that kind of Div. How? In Pandoc's JSON AST, the parentheses syntax represents an Attr (attributes) tuple with three components: (identifier, classes, key-value pairs).
(See these entries in the Lua documentation for Div nodes and Attr nodes.)
Here, we have:
""= identifier (empty string)["notes"]= classes (list with one class: "notes")[]= key-value attributes (empty list)
We want to get at that "notes" class.
The documentation also never says how you do this, though again it comes close.
Here's how you can do that in the Lua filter: you return an empty list.
function Div(el)
local identifier = el.attr.identifier -- string
local classes = el.attr.classes -- list
local attributes = el.attr.attributes -- list of key-value pairs
-- Check if "notes" is in the classes
if el.attr.classes:includes("notes") then
return {} -- Remove the Div by returning empty list
end
endOf course, you can shorten that up:
function Div(el)
if el.attr.classes:includes("notes") then
return {}
end
endSave that into a file and tell pandoc to use it with --lua-filter=your_file.lua.
HTML comments seem to be the closest thing Markdown has to comments; they show up in the pandoc AST as:
RawBlock (Format "html") "<!-- comment here -->"
or
RawInline (Format "html") "<!-- another comment -->"
Looking at the reference for RawBlock and for RawInline, we need to look for a format of html and to be sure, check that the text starts with the right bits.
local function starts_with(start, str)
return str:sub(1, #start) == start
end
function RawBlock(el)
if el.format == "html" and starts_with('<!--', el.text) then
return {}
end
end
function RawInline(el)
if el.format == "html" and starts_with('<!--', el.text) then
return {}
end
endThe starts_with function is shamelessly copied from https://pandoc.org/lua-filters.html#building-images-with-tikz.
While working on this, I kept using --from=gfm with pandoc, and getting weird results that I didn't expect. Why is that?
Pandoc enables different extensions and features for GFM and its own native Markdown.
You can just diff the list of extensions used:
diff <(pandoc --list-extensions=markdown) <(pandoc --list-extensions=gfm):
For me, it seems the key thing is that GFM doesn't include fenced_divs.
There's some documentation of the GFM-related bits here.