Hello to the community, im new in the programming. So, thanks in advance, i run the program in pycharm, the Convert starts and seems to work without problems (Parsing Page... -> Creating Page... etc.) then, when i go to the directory that my file was saved, to check, if the conversion worked, i see what is shown in the attach picture (the docx file is shown like pictures, like pieces, not like text) and i was wonder , if you any idea why this happening and if you have any idea how to fix it.
Hi, welcome. From the screenshot, I guess the "text" you saw is not real text. Can you copy and paste the text? It'd be great if you can upload the pdf (one page you failed is enough) for my test.
Sorry one limitation of pdf2docx
is that it can process text-based pdf only. You pdf page consists of multi-pieces of images, which would not be ocr-ed, but copied to docx directly. The screenshot below shows the images in pdf.
ok, i will have that in mind, Thanks for your quick reply. i dont know if you have already done user interface, but i have done a basic friendly user interface for your program and here is the code from this.
from pdf2docx import Converter
from tkinter import *
from tkinter.filedialog import *
from tkinter import filedialog
root = Tk()
root.title('PDF_2_Docx Converter')
root.geometry('500x500')
root.config(bg='grey')
def pdf_file_location():
Tk().withdraw()
filename = askopenfilename()
file_path_pdf_entry.insert(0, filename)
def docx_folder_location():
Tk().withdraw()
folder_selected = filedialog.askdirectory() + "/" + 'New_DOCX.docx'
file_path_docx_entry.insert(0, folder_selected)
def convert_button_function():
cv = Converter(file_path_pdf_entry.get())
cv.convert(file_path_docx_entry.get(), start=0, end=None)
cv.close()
"""Labels"""
label1 = Label(text='PDF to Docx', font='Impact 40', bg='white', fg='#1E90FF')
label1.grid(column=2, row=1, sticky='n', pady=50, padx=120)
"""Entries"""
# PDF file entry
file_path_pdf_entry = Entry(border=5)
file_path_pdf_entry.grid(ipadx=90, ipady=4, padx=20, sticky='nw', column=2, pady=1, row=2)
# Docx file entry
file_path_docx_entry = Entry(border=5)
file_path_docx_entry.grid(column=2, ipady=4, ipadx=90, padx=20, sticky='nw', pady=70, row=3)
"""Buttons"""
# Convert Button
converter_button = Button(text='Convert', bg='#1E90FF', fg='white', font='impact 20', border=5,
command=convert_button_function)
converter_button.grid(padx=175, sticky='s', ipady=5, ipadx=10, column=2, row=4)
select_pdf_file = Button(text='Select PDF file', fg='black', bg='white', border=3,
command=pdf_file_location)
select_pdf_file.grid(column=2, sticky='ne', row=2, pady=6, padx=60)
select_new_file_folder = Button(text='Select new file folder', fg='black', bg='white', border=3,
command=docx_folder_location)
select_new_file_folder.grid(column=2, sticky='ne', row=3, pady=74, padx=26)
root.mainloop()
Much appreciated. It's a good idea -> I'll put GUI into the backlog.
Would you like to make a bit more improvement, e.g. convert multi-pdf files under a user defined folder in a batch mode. After that, please submit a PR, so I can merge you work to this library to benefit more people.
batch mode you mean, to save as batch file and run it? I can do it windows exe. what do you prefer?
With your user interface, one can convert one file per time. But one might need to convert lots of pdf files, in such case, it's more convenient to put all pdf files in a folder, select that folder and convert them all per one go.
Most helpful comment
ok, i will have that in mind, Thanks for your quick reply. i dont know if you have already done user interface, but i have done a basic friendly user interface for your program and here is the code from this.