Analyzing the influence of hyperparameters on the efficiency of OCR model for pre-reform handwritten texts
- Авторлар: Sherstnev P.A.1, Kozhin K.D.1, Pyataeva A.V.1
-
Мекемелер:
- Artificial Intelligence Center of Siberian Federal University
- Шығарылым: № 3 (2025)
- Беттер: 70–79
- Бөлім: COMPUTER GRAFICS AND VISUALIZATION
- URL: https://pediatria.orscience.ru/0132-3474/article/view/688124
- DOI: https://doi.org/10.31857/S0132347425030071
- EDN: https://elibrary.ru/GRLAPG
- ID: 688124
Дәйексөз келтіру
Аннотация
The article considers the influence of hyperparameters on the efficiency of models of optical handwriting recognition of pre-reform period on the example of handwritten reports of governors of the Yenisei province of the XIX century. A comparative analysis of model configurations with different architectural components, including normalization modules, feature extraction blocks and predictors, is carried out. Particular attention is paid to the role of input image resolution and the size of hidden layers in achieving an optimal balance between prediction accuracy and computational cost. The results obtained allow us to identify key parameters for the development of optical character recognition systems adapted to historical texts with non-standard orthography and complex structure. Prospects for further research include evaluating synthetic methods for extending training data and analyzing alternative architectures such as transformers.
Толық мәтін

Авторлар туралы
P. Sherstnev
Artificial Intelligence Center of Siberian Federal University
Хат алмасуға жауапты Автор.
Email: sherstpasha99@gmail.com
ORCID iD: 0000-0003-2816-9433
Ресей, Akademika Kirenskogo 26, k. 1, Krasnoyarsk, 660074
K. Kozhin
Artificial Intelligence Center of Siberian Federal University
Email: kozhin-sfu@yandex.ru
ORCID iD: 0009-0003-4966-2427
Ресей, Akademika Kirenskogo 26, k. 1, Krasnoyarsk, 660074
A. Pyataeva
Artificial Intelligence Center of Siberian Federal University
Email: anna4u@list.ru
ORCID iD: 0000-0002-0140-263X
Ресей, Akademika Kirenskogo 26, k. 1, Krasnoyarsk, 660074
Әдебиет тізімі
- Karatzas D., Gomez-Bigorda L., Nicolaou A., Ghosh S., Bagdanov A., Iwamura M. ICDAR 2015 robust reading competition, 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, 2015, IEEE, 2015, pp. 1156–1160. https://doi.org/10.1109/ICDAR.2015.7333942
- Lattner C. LLVM: An infrastructure for multi-stage optimization, Master’s Thesis, Urbana, IL: University of Illinois, 2002.
- de Campos T.E., Babu B.R., and Varma M. Character recognition in natural images, Proceedings of the Fourth International Conference on Computer Vision Theory and Applications – Vol. 2: VISAPP (VISIGRAPP 2009), SciTePress, 2009, pp. 273–280. https://doi.org/10.5220/0001770102730280
- Chammas E., Mokbel Ch., Likforman-Sulem L. Handwriting recognition of historical documents with few labeled data, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), Vienna, 2018, IEEE, 2018, pp. 43–48. https://doi.org/10.1109/das.2018.15
- Mohammed H., Jampour M. From detection to modelling: An end-to-end paleographic system for analysing historical handwriting styles, Document Analysis Systems. DAS 2024, Sfikas G. and Retsinas G., Eds., Lecture Notes in Computer Science, Cham: Springer, 2024, pp. 363–376. https://doi.org/10.1007/978-3-031-70442-0_22
- Galushko I.N. Correcting OCR recognition of the historical sources texts using fuzzy sets (on the example of an early 20th century newspaper), Istoricheskaya Informatika, 2023, no. 1, pp. 102–113. https://doi.org/10.7256/2585-7797.2023.1.40387
- Rogov A.A., Skabin A.V., Shterkel’ I.A. On deciphering handwritten historical documents, CEUR Workshop Proceedings, 2012.
- Yumasheva Yu.Yu. Automated handwriting recognition using artificial intelligence algorithms: Russian and foreign experience, Digital Orientalia, 2023, vol. 3, nos. 1–2, pp. 24–32. https://doi.org/10.31696/s278240120026084-5
- Li M., Lv T., Chen J., Cui L., Lu Yi., Florencio D., Zhang Ch., Li Zh., Wei F. TrOCR: Transformer-based optical character recognition with pre-trained models, arXiv Preprint, 2021. https://doi.org/10.48550/arXiv.2109.10282
- Coquenet D., Chatelain C., Paquet T. End-toend handwritten paragraph text recognition using a vertical attention network, IEEE Trans. Pattern Anal. Mach. Intell., 2023, vol. 45, no. 1, pp. 508–524. https://doi.org/10.1109/TPAMI.2022.3144899
- Baek Yo., Lee B., Han D., Yun S., Lee H. Character region awareness for text detection, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, 2019, IEEE, 2019, pp. 9357–9366. https://doi.org/10.1109/CVPR.2019.00959
- Zhou X., Yao C., Wen H., Wang Yu., Zhou Sh., He W., Liang J. EAST: An efficient and accurate scene text detector, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, IEEE, 2017, pp. 2642–2651. https://doi.org/10.1109/CVPR.2017.283
- Liao M., Wan Zh., Yao C., Chen K., Bai X. Real-time scene text detection with differentiable binarization, Proceedings of the AAAI Conference on Artificial Intelligence, 2020, vol. 34, no. 7, pp. 11474–11481. https://doi.org/10.1609/aaai.v34i07.6812
- Wang W., Xie E., Li X., Hou W., Lu T., Y G., Shao Sh. Shape robust text detection with progressive scale expansion network, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, 2019, IEEE, 2019, pp. 9336–9345. https://doi.org/10.1109/cvpr.2019.00956
- Baek J., Kim G., Lee J., Park S., Han D., Yun S., Oh S.J., Lee H. What is wrong with scene text recognition model comparisons? Dataset and model analysis, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 2019, IEEE, 2019, pp. 4714–4722. https://doi.org/10.1109/ICCV.2019.00481
- Smith R. An overview of the Tesseract OCR engine, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Curitiba, Brazil, 2007, IEEE, 2007, vol. 2, pp. 629–633. https://doi.org/10.1109/icdar.2007.4376991
- Brandt Skelbye M., Dannйlls D. OCR processing of Swedish historical newspapers using deep hybrid CNN–LSTM networks, Proceedings of the Conference Recent Advances in Natural Language Processing–Deep Learning for Natural Language Processing Methods and Applications, Shoumen, Bulgaria: INCOMA, 2021, pp. 190–198. https://doi.org/10.26615/978-954-452-072-4_023
- Wick C., Reul C., Puppe F. Improving OCR accuracy on early printed books using deep convolutional networks, arXiv Preprint, 2018. https://doi.org/10.48550/arXiv.1802.10033
- Lyu L., Koutraki M., Krickl M., Fetahu B. Neural OCR post-hoc correction of historical corpora, Trans. Assoc. Comput. Linguist., 2021, vol. 9, pp. 479–493. https://doi.org/10.1162/tacl_a_00379
- Shi B., Wang X., Lyu P., Yao C., Bai X. ASTER: An attentional scene text recognizer with flexible rectification, IEEE Trans. Pattern Anal. Mach. Intell., 2018, vol. 41, no. 9, pp. 2035–2048. https://doi.org/10.1109/TPAMI.2018.2848939
- Sun Z., Pan W., Luo X. Attention-based handwritten text recognition using CNN-BiLSTM architecture, Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), 2019.
- Luong T., Pham H., Manning Ch.D. Effective approaches to attention-based neural machine translation, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, 2015, Mаrquez L., Callison-Burch Ch., Su J., Eds., Association for Computational Linguistics, 2015, pp. 1412–1421. https://doi.org/10.18653/v1/d15-1166
- FromThePage: Collaborative transcription and OCR platform. https://www.fromthepage.com (cited January 15, 2025)
- Reports of the governors of the Yenisei province. https://fromthepage.sfu-kras.ru/lib/otchyoty-gubernatorov-eniseyskoy-gubernii (cited January 15, 2025)
- Kozhin K. Image labeling software for optical character recognition (Anno OCR), RF Certificate of State Registration of Software 2024684369, 2024.
- Mann H.B., Whitney D.R. On a test of whether one of two random variables is stochastically larger than the other, Ann. Math. Stat., 1947, vol. 18, no. 1, pp. 50–60. https://doi.org/10.1214/aoms/1177730491
- Zhu X. Sample size calculation for Mann–Whitney U test with five methods, International Journal of Clinical Trials, 2021, vol. 8, no. 3, pp. 184–190. https://doi.org/10.18203/2349-3259.ijct20212840
- Mokeyev A., Artemova E., Malkin P. StackMix and Blot augmentations for handwritten recognition using CTCLoss, arXiv Preprint, 2021. https://doi.org/10.48550/arXiv.2108.11667
- Fogel S., Averbuch-Elor H., Cohen S., Mazor S., Litman R. ScrabbleGAN: Semi-supervised varying length handwritten text generation, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, 2020, IEEE, 2020, pp. 4324–4333. https://doi.org/10.1109/CVPR42600.2020.00438
Қосымша файлдар
