Jim King's Inside PDF blog is a great way to keep up with what's happening with PDF (especially standardization). Last week he detailed why extracting text from a PDF is so difficult:

PDF specifies text content of pages as glyphs not characters. That is, one of the appearances for an "a" is chosen by the creator of the PDF file by choosing a font from which the "a" glyph can be taken. PDF page contents do not specify characters such as just the Latin letter "a".  

The rub comes when we want to work with characters not glyphs. Unicode is widely used because it is a character encoding technology not a glyph encoding one. In fact, for many purposes, such as searching for text strings, we do not want to search by appearance but we want to search by the Latin letters (or commonly by the Unicode encoding of characters).

Of course, if you need to extract text from a PDF, there are tools that can do that for you.