Skip to content

Instantly share code, notes, and snippets.

@geotheory
Created December 14, 2025 17:38
Show Gist options
  • Select an option

  • Save geotheory/6ccf3cafd6bae468f2cb1669949eb165 to your computer and use it in GitHub Desktop.

Select an option

Save geotheory/6ccf3cafd6bae468f2cb1669949eb165 to your computer and use it in GitHub Desktop.
# R rvest workflow to extract nodes between given headings
require(rvest)
require(tidyverse)
u = '<url>'
h0 = xml2::read_html(u)
# e.g. h2 elements
from_heading = 'Section 2'
to_heading = 'Section 4'
xpath <- sprintf(
"//h2[normalize-space(.)='%s']
/following-sibling::*
[not(preceding-sibling::h2[normalize-space(.)='%s'])]",
from_heading, to_heading
)
nodes <- html_nodes(h0, xpath = xpath)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment