Alan Ralph

Wearer Of Many Hats


๐Ÿ› ๏ธ Please note that this site is a work-in-progress as I play around & experiment โ€” things may change appearance between visits. ๐Ÿ› ๏ธ

What's so hard about PDF text extraction? โ€‹

As it turns out, it can be very hard indeed, depending on how the PDF was created.

I have a love-hate relationship with PDF files. While I love them as an archival format, not all PDFs are made equal. When I was doing print pre-press work, for instance, text could get mangled depending on whether we had the matching fonts available.


If you'd like to comment, send me an email.