This is a quick, updated version of html5-parser’s benchmark script that includes tests for Selectolax and Markupever.
Note that this also includes tests of walking through the whole tree (walk-selectolax and walk-markupever), since the initial parsing result seemed so unbelievably fast in Selectolax. I was worried it was doing some lazy parsing, but it turns out to also be that fast walking the whole tree. Pretty impressive!
From a run in Python 3.14.3 on a MacBook M1 Pro, MacOS Sequoia 15.7.3. This is not a very clean test (lots running in the background), but is still useful for a general gist of behavior.
Testing with HTML file of 15,131,111 bytes
Parsing 100 times with html5-parser
html5-parser took an average of: 0.429 seconds to parse it @ 35280.7 KB/s
Parsing 10 times with html5-parser-to-soup
html5-parser-to-soup took an average of: 3.075 seconds to parse it @ 4920.4 KB/s
Parsing 10 times with html5lib
html5lib took an average of: 7.358 seconds to parse it @ 2056.4 KB/s
Parsing 10 times with BeautifulSoup-with-html5lib
BeautifulSoup-with-html5lib took an average of: 8.536 seconds to parse it @ 1772.6 KB/s
Parsing 10 times with BeautifulSoup-with-lxml
BeautifulSoup-with-lxml took an average of: 4.904 seconds to parse it @ 3085.6 KB/s
Parsing 100 times with selectolax
selectolax took an average of: 0.080 seconds to parse it @ 188549.9 KB/s
Parsing 100 times with markupever
markupever took an average of: 0.234 seconds to parse it @ 64731.7 KB/s
Parsing 10 times with html5_parser
html5_parser took an average of: 0.554 seconds to parse it @ 27324.0 KB/s
Parsing 10 times with selectolax
selectolax took an average of: 0.193 seconds to parse it @ 78574.7 KB/s
Parsing 10 times with markupever
markupever took an average of: 1.084 seconds to parse it @ 13959.7 KB/s
Results are below. They show how much faster html5-parser is than each
specified parser. Note that there are two additional considerations:
what the final tree is and whether the parsing supports the HTML 5
parsing algorithm. The most apples-to-apples comparison is when the
final tree is lxml and HTML 5 parsing is supported by the parser being
compared to. Note that in this case, we have the largest speedup. In
all other cases, speedup is less because of the overhead of building
the final tree in python instead of C or because the compared parser
does not use the HTML 5 parsing algorithm or both.
Parser |Tree |Supports HTML 5 |Speedup (factor) |
===============================================================================
html5lib |lxml |yes |17 |
soup+html5lib |BeautifulSoup |yes |3 |
soup+lxml.html |BeautifulSoup |no |2 |
selectolax |lexbor |yes |0.19 |
markupever |html5ever |yes |0.55 |
walk-selectolax |lexbor |yes |0.35 |
walk-markupever |html5ever |yes |1.96 |