pypdf throws exception: pypdf.utils.pdfreaderror: eof marker not found i don't need fix pypdf, need eof error cause "except" block execute , skip on file, doesn't work. still causes program stop running. background: batch ocr program pdfs python, pypdf, adobe pdf ocr error: unsupported filter /lzwdecode ... saga continues. i got 10,000 pdfs in folder. ocrd, not. can't tell 'em apart. step 1 figure out ones not ocrd , ocr (see other threads details). so i'm using pypdf. exceptions related unrecognized characters , unsupported filters when try read text. guestimated if throws exception, it's got text in , doens't go in list. problem solved, right? so: pypdf import pdffilewriter, pdffilereader import sys, os, pypdf, re path = 'c:\users\homer\documents\my pdfs' filelist = os.listdir(path) has_text_list = [] does_not_have_text_list = [] pdf_name in filelist: pdf_file
Comments
Post a Comment