Text from PDF is not copied: reasons, ways to change the format and expert advice

It happened to you that you need to paste the text present in some PDF document into another program for editing, but the text is not copied in the PDF file? What would be advised to do in such a situation? Not many users are aware that it is possible to use not one, but several simple ways to "resolve" the situation. But first, let's dwell on some standard cases and their causes, and then try to find the most suitable solution for each of them. It is immediately worth noting that changing the original document format is not always advisable.

Why is text from PDF not copied?

So, most experts consider the installation of all kinds of prohibitions on such actions in the files themselves to be the first and main reason for the inability to copy the contents of PDF documents.

Prohibition of copying in file structure

It can be opening passwords, copying restrictions, or even protecting a document when trying to print the contents. Another equally common situation related to the fact that the text from the PDF is not copied may be due to damage to the file itself or a violation of its original structure. Less common are cases where a user uses an application that is not quite suitable for extracting text content from a PDF document. So, for example, many experts agree that Adobe Reader has much more options in comparison with Acrobat. Therefore, if the text from the PDF is not copied to Acrobat, first of all try to perform a similar operation in the Reader. It is possible this will give the desired result. But in most cases, this, alas, does not help, because the content is simply protected from copying, and the password is hidden deep inside the file itself. We will look at how to circumvent such restrictions a bit later, but for now let us dwell on one more situation, which also confuses many users.

Why is text from PDF copied with hieroglyphs?

Now suppose that the copy protection is not installed in the original document and everything seems to be normal. But for some reason, when transferring content to another editor, text from PDF is copied with hieroglyphs. This is due only to the fact that the original has a different encoding from the standard. Most often, experts in this situation offer the easiest way, in which you do not even need to change the initial format of the document. Based on the fact that the text from the PDF is copied with the wrong encoding, it needs to be changed.

Re-save PDF File

To do this, the easiest way is to use the file menu of any PDF editor, select โ€œSave As ...โ€ (Save As ...), and then click the Settings button in the save window and select a different encoding. Usually itโ€™s enough to change the original standard to UTF-8. When you reopen the document, the text can be copied and pasted into any other text editor unchanged. You can also transcode the file on some Internet resource like Decoder.

How to bypass copy prohibition in the file itself?

Now let's see what can be done to bypass all kinds of prohibitions and locks.

PDF Password Remover

If the text from the PDF is not copied under any pretext, you can use the pirated method by removing the restrictions or deleting the set passwords in the PDF Password Remover program. If this does not work, you can go to some specialized site like PDFPirate or FreeMyPDF and try to remove the protection there. However, everyone should understand that in the case of some official documents, such a technique is illegal.

Opening a PDF file in Word

Another simple technique recommended to eliminate many problems with the original PDF documents that need to be edited is not to copy the original content in the "viewer" or PDF editor, but to open the file directly in the program with which you plan to produce editing.

Opening PDF in Word

In the case of text documents, the easiest way is to use the universal Word and open the document in this application by selecting the appropriate file type. If the document opens without problems, it can be edited and saved in the desired format.

How to convert pdf text to word?

But let's assume that the original document does not open in text editors (you never know what can be) and that text from PDF is not copied in "native" editors.

Copy PDF file to clipboard

In this case, to convert the file to a Word document, try not to copy the text, but copy the entire file to the clipboard in the PDF editor, and then paste the contents into Word. The method, of course, is far from the most convenient, since the insert will have a graphic format, and it will be impossible to edit the material.

In this situation, the optimal solution would be to change the format of the original document to any other. Quite a lot of converter programs are currently available on the Internet, for example, PDF to Word Converter, etc. In the selected application, it is usually enough to simply specify the starting file and the final format after conversion. Using such applets, by the way, you can convert PDF not only to Word. There are programs for converting to Excel.

Problems with the text itself in PDF documents

Sometimes it also happens that in the original file text content could be initially created by scanning a printed document. It is clear that with this approach, the text was saved in graphic format. At the same time, prohibitions on copying or printing could be imposed on him. What to do in such a situation?

Using an optical recognition system

In this case, OCR optical recognition systems come to the rescue. Almost all experts agree that choosing the ABBY Finereader package will be the best option. Of course, the program is not free, but on the Runet you can find already activated (hacked) versions or modifications with an activation key.

ABBYY Finereader Program

In the application itself, in the start window, select the conversion of the PDF file / image to a Word document. The system independently recognizes the text from the image and sends it to Word, after which it will be possible to edit and save a new document.

Convert to other formats

Finally, if the task is to convert text to other non-standard formats, it is usually recommended to use the same converters for these purposes, choosing either narrowly targeted programs (for example, PDF to JPEG for converting to graphic files), or universal applications that support not one but several formats, among which there will be one that is needed. Sometimes you can use online services, but this is inconvenient due to time consuming considerations and restrictions on the size of the added files (or their number).

Conclusion

Summing up, we can highlight several key points. Firstly, it is not always necessary to change the original format, since copying can be done either in a more advanced editor, as in the case of โ€œAcrobatโ€ and โ€œReader,โ€ or you can open the file directly in the program for working with textual content into which you need to paste source material, as is the case with Word. Secondly, to reset passwords and prohibitions, it is best to use special applications (even if it looks illegal). Thirdly, in the process of converting formats, most converters generally ignore prohibitions, so their use looks very promising. Fourth, do not neglect text recognition systems, which sometimes look even better than all the previous ones. Fifthly, there is an opinion that sometimes conversion can be performed using virtual printers, but this option is suitable only for those cases when the original text fragment needs to be converted to graphics.

Source: https://habr.com/ru/post/C41187/


All Articles