how can I extract data from a pdf written in hindi language and is using winansiencoding


i have got pdf contains data in hindi , has different blocks store same type of information. have extract data pdf , store in csv/excel format can used further processing.

i have tried using ocr , different tools , libraries of python (like tesseract, pdfminer) not able receive satisfactory results.(somewhere or other there problem in hindi 'matra').

please me this. have been stucked 3-4 days

hi,

 

are trying extract data in pdf form style or converting normal pdf excel?

 

thanks,

abhishek



More discussions in Creating PDFs


adobe

Comments

Popular posts from this blog

after effects warning: unable to create drawing surface

Maximum number of authorizations reached!