Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

combine_tessdata Fails to Create Temporary File micr.traineddata.__tmp__ During Training #4395

Open
jo-walker opened this issue Mar 6, 2025 · 3 comments
Labels

Comments

@jo-walker
Copy link

Current Behavior

During MICR font training for check validation, the combine_tessdata.exe step fails with the following error:
Failed to create a temporary file micr.traineddata.tmp

Here’s the relevant log snippet from my script:
2025-03-05 19:52:23,933 - ERROR - combine_tessdata failed: Failed to create a temporary file micr.traineddata.tmp
2025-03-05 19:52:23,935 - ERROR - MICR font training failed: Command '['C:\Program Files\Tesseract-OCR\combine_tessdata.exe', '-o', 'micr.traineddata', 'training.unicharset', 'training.inttemp', 'training.pffmtable', 'training.shapetable', 'training.normproto']' returned non-zero exit status 1.
2025-03-05 19:52:23,937 - ERROR - Processing failed: Training failed: Command '['C:\Program Files\Tesseract-OCR\combine_tessdata.exe', '-o', 'micr.traineddata', 'training.unicharset', 'training.inttemp', 'training.pffmtable', 'training.shapetable', 'training.normproto']' returned non-zero exit status 1.

The training process halts at this step, and the micr.traineddata file is not generated.

The script sets the TMP and TEMP environment variables to the training directory before running combine_tessdata.exe, but the error persists.
Manual execution of the combine_tessdata command in the training directory also fails with the same error:
C:\Users\jotam\projects\ocr-trans\check_validation\training_model\training>"C:\Program Files\Tesseract-OCR\combine_tessdata.exe" -o micr.traineddata training.unicharset training.inttemp training.pffmtable training.shapetable training.normproto
Failed to create a temporary file micr.traineddata.tmp

the full log from my script on GitHub:
https://github.com/jo-walker/ocr-reader/blob/main/check_validation/check_processing.log#L1314

Expected Behavior

The training should complete successfully, generating the micr.traineddata file in the specified training directory.

Suggested Fix

No response

tesseract -v

tesseract v5.5.0.20241111
leptonica-1.85.0
libgif 5.2.2 : libjpeg 8d (libjpeg-turbo 3.0.4) : libpng 1.6.44 : libtiff 4.7.0 : zlib 1.3.1 : libwebp 1.4.0 : libopenjp2 2.5.2
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Found libarchive 3.7.7 zlib/1.3.1 liblzma/5.6.3 bz2lib/1.0.8 liblz4/1.10.0 libzstd/1.5.6
Found libcurl/8.11.0 Schannel zlib/1.3.1 brotli/1.1.0 zstd/1.5.6 libidn2/2.3.7 libpsl/0.21.5 libssh2/1.11.0

Operating System

Windows 11

Other Operating System

No response

uname -a

No response

Compiler

No response

CPU

No response

Virtualization / Containers

No response

Other Information

No response

@amitdo
Copy link
Collaborator

amitdo commented Mar 17, 2025

The issue is probably caused by the way Leptonica handles temp files on Windows.

@zdenop
Copy link
Contributor

zdenop commented Mar 17, 2025

@amitdo : Are you sure Leptonica is used for for overwriting tessdata components in langdata models (e.g not image io operation is involved)?

@zdenop
Copy link
Contributor

zdenop commented Mar 17, 2025

@jo-walker : It appears that the temporary file is not being deleted automatically. This issue requires further investigation. As a workaround, you will need to overwrite one component at a time and then delete the temporary file. For example, you should run a sequence of commands like this::

"C:\Program Files\Tesseract-OCR\combine_tessdata.exe" -o micr.traineddata training.unicharset
del micr.traineddata.__tmp__
"C:\Program Files\Tesseract-OCR\combine_tessdata.exe" -o micr.traineddata training.inttemp
del micr.traineddata.__tmp__
"C:\Program Files\Tesseract-OCR\combine_tessdata.exe" -o micr.traineddata training.pffmtable
del micr.traineddata.__tmp__
"C:\Program Files\Tesseract-OCR\combine_tessdata.exe" -o micr.traineddata training.shapetable
del micr.traineddata.__tmp__
"C:\Program Files\Tesseract-OCR\combine_tessdata.exe" -o micr.traineddata training.normproto
del micr.traineddata.__tmp__

@amitdo amitdo removed the leptonica label Mar 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants