What's so hard about PDF text extraction? โ
As it turns out, it can be very hard indeed, depending on how the PDF was created.
I have a love-hate relationship with PDF files. While I love them as an archival format, not all PDFs are made equal. When I was doing print pre-press work, for instance, text could get mangled depending on whether we had the matching fonts available.
If you'd like to comment, send me an email.