how can I extract data from a pdf written in hindi language and is using winansiencoding


i have got pdf contains data in hindi , has different blocks store same type of information. have extract data pdf , store in csv/excel format can used further processing.

i have tried using ocr , different tools , libraries of python (like tesseract, pdfminer) not able receive satisfactory results.(somewhere or other there problem in hindi 'matra').

please me this. have been stucked 3-4 days

hi,

 

are trying extract data in pdf form style or converting normal pdf excel?

 

thanks,

abhishek



More discussions in Creating PDFs


adobe

Comments

Popular posts from this blog

Adobe Reader DC install error 150410

Error: 100600 Update failed, updates have been disabled by your system policy

ReferenceError: Window does not have a constructor : 325