OCR Challenge
Posted: 08 Jun 2021, 14:28
I am having a lot of trouble obtaining the data from pdf documents that have certain data I need.
Problems:
1. OCR does not always work, either cant read the data or gets the wrong number
2. Each pdf that i need to obtain data from has a different layout, or puts the dimension in a different area
2.A. sometimes one of the dimensions is missing, need an error to pop up
I need to obtain the dimensions of the panel, the bolt spacing, module thickness and the weight For example in this pdf the values would be: Long edge = 2024, short edge = 1024, long boltA = 1424, long BoltB = "", short bolt = 984, thickness = 40, weight = 20.3 kg
for the next pdf Long edge = 2094, short edge = 1038, long boltA = 1400, long BoltB = 1276, short bolt = 989
Here is a github link to the pdfs. Test your skills and help me with my problem, Thank you for your time
https://github.com/piterat/solarPanelPdfs
Problems:
1. OCR does not always work, either cant read the data or gets the wrong number
2. Each pdf that i need to obtain data from has a different layout, or puts the dimension in a different area
2.A. sometimes one of the dimensions is missing, need an error to pop up
I need to obtain the dimensions of the panel, the bolt spacing, module thickness and the weight For example in this pdf the values would be: Long edge = 2024, short edge = 1024, long boltA = 1424, long BoltB = "", short bolt = 984, thickness = 40, weight = 20.3 kg
for the next pdf Long edge = 2094, short edge = 1038, long boltA = 1400, long BoltB = 1276, short bolt = 989
Here is a github link to the pdfs. Test your skills and help me with my problem, Thank you for your time
https://github.com/piterat/solarPanelPdfs