What’s so hard about PDF text extraction? ​

As it turns out, it can be very hard indeed, depending on how the PDF was created.

I have a love-hate relationship with PDF files. While I love them as an archival format, not all PDFs are made equal. When I was doing print pre-press work, for instance, text could get mangled depending on whether we had the matching fonts available.

