When the PDF text is available, microsoft's Markitdown is fastest.
Install:
pip install markitdown[pdf]
Run:
for i in $(find ./folder/* -type f -name '*.pdf');
do markitdown ${i} -o output/${i%.pdf}.md;
doneWhen more complex PDFs required, can use markit.
Install requires v1.8.0 in mac due to recent bug making it 20x slower
pip install markit==1.8.0
Run (change commands based on requirements):
marker --disable_image_extraction --output_dir ./output/ --pdftext_workers 2 --disable_ocr ./