Advancing Arabic Text Recognition: Fine-tuning of the LSTM Model in Tesseract OCR
How to cite this paper:
Kharrufa, Taha, and Baraq 2023. "Advancing Arabic Text Recognition: Fine-tuning of the LSTM Model in Tesseract OCR". Clearcypher Research.
Hayder Kharrufa, Adam Taha, and Mohammed Baraq
Optical Character Recognition (OCR) technologies are integral to digitizing text. This study investigates the application of fine-tuning techniques to the Long Short-Term Memory (LSTM) model within Tesseract OCR, an industry-leading open-source OCR engine, aiming to improve Arabic text recognition. We used a comprehensive training dataset of 1038 unique Arabic fonts for fine-tuning. The performance of the refined model was evaluated across various Arabic text types. The fine-tuned model significantly improved the recognition of most Arabic text types, with improvements of up to 61% in Character Error Rate (CER) and 70% in Word Error Rate (WER). However, texts with heavy diacritics were better recognized by the original model. This research illuminates the potential of fine-tuning in enhancing OCR performance and provides valuable insights for future research in OCR performance optimization.
Background and Motivation
In an era where digital text permeates every domain, the need for effective and efficient Optical Character Recognition (OCR) tools is paramount . However, the digitization of Arabic text lags due to its inherent linguistic and structural complexities, posing a unique challenge to OCR technologies . These complexities arise from the right-to-left directionality of Arabic script, its nuanced stylistic variations, diacritics, and the dual usage of Eastern Arabic (Arabic-Indic) and Western Arabic (European) numerals .
Objectives and Scope
This study explores the potential to improve Right-to-Left language scripts, specifically Arabic text recognition, by fine-tuning the LSTM model of the Tesseract OCR . Tesseract OCR is an industry-leading open-source OCR engine. Its LSTM (Long Short-Term Memory) model, a sophisticated type of Recurrent Neural Network (RNN), is well-suited to tasks such as text recognition involving sequence prediction .
The focus of our research is on creating a diverse training dataset, encompassing 1038 unique Arabic fonts, fine-tuning the LSTM model with this dataset, and evaluating the performance of the new model across various types of Arabic text.
Research Problem and Hypothesis
Given the complexities of Arabic script and the limitations of current OCR technologies , we hypothesize that fine-tuning the LSTM model of Tesseract OCR with a diverse Arabic dataset can significantly improve Arabic text recognition performance.
Application to Other Languages
The methodology used in this research could be extended to other languages. While the techniques employed are not language-specific, they should be adapted to each language's unique features for optimal OCR results.
Distinguishing Between Two Sets of Numeral Scripts
This study considers both Western Arabic numerals (commonly used globally, comprising digits 0-9) and Eastern Arabic numerals (widely used in Arabic-speaking countries, with a distinct visual appearance) .
With the purpose of further clarifying the two sets of numerals, it is crucial to distinguish between two different numerical systems often associated with the term "Arabic numerals": the Western Arabic numerals and the Eastern Arabic numerals .
Western Arabic numerals, often simply referred to as Arabic numerals in the Western world, comprise the number system most commonly used globally, which includes the digits 0 to 9. This numerical system, a positional decimal numeral system, was initially developed in India around the 6th or 7th century . It was later adopted and refined by Arabic mathematicians in the 8th century, and subsequently introduced to Europe during the Middle Ages, mainly through trade and the works of mathematicians in the Islamic world. This route of transmission led the system to be termed "Arabic numerals."
Western Arabic numerals are as follows:
0 - Zero
1 - One
2 - Two
3 - Three
4 - Four
5 - Five
6 - Six
7 - Seven
8 - Eight
9 - Nine
Furthermore, Eastern Arabic numerals, also known as Arabic-Indic numerals, are the symbols used to represent decimal numbers in many Arabic-speaking countries, particularly in the Middle East and North Africa. While these numerals operate within the same positional decimal system as their Western counterparts, they display a markedly different appearance.
Eastern Arabic numerals are as follows:
٠ - Zero
١ - One
٢ - Two
٣ - Three
٤ - Four
٥ - Five
٦ - Six
٧ - Seven
٨ - Eight
٩ - Nine
Although these two numerical systems visually differ, the mathematical principles governing their usage remain consistent. It's also noteworthy that not all Arabic-speaking regions use Eastern Arabic numerals; some, such as Morocco and Algeria, predominantly employ Western Arabic numerals . This research will specify which numeral system is being referred to at each mention, in order to avoid any potential confusion.
Our first step in this research involved the generation of a diverse training dataset. We selected 1038 unique Arabic fonts, ensuring each one could accurately render all Arabic characters and numerals. This step was crucial as the variety in the fonts allows for capturing the broad stylistic variations present in the Arabic script, thereby enhancing the model's generalizability and robustness. For each of these fonts, we used the text2image tool to create a training dataset of approximately 1000 text lines, providing ample sample size for each font style.
In the next stage, we implemented fine-tuning on the LSTM model, an advanced type of Recurrent Neural Network (RNN) that is particularly effective for tasks involving sequence prediction, such as text recognition . Fine-tuning refers to the process of training the model on a specific dataset after initial training on a more extensive, general dataset . This approach enables the model to adapt to the specifics of the new data while retaining the previously learned features.
After the fine-tuning process, we set out to evaluate the performance of the new model in comparison to the original Tesseract model. We constructed a test set using the same 1038 fonts, which we divided into five text groups: Eastern Arabic numerals, Western Arabic numerals, Arabic text with heavy diacritics , normal Arabic text, and Arabic text with no diacritics. Each group represented different aspects of Arabic text commonly encountered in digital and printed materials. We employed Character Error Rate (CER) and Word Error Rate (WER) as our evaluation metrics , both of which measure the percentage of characters and words, respectively, that were incorrectly recognized. Lower values of CER and WER indicate better performance .
Evaluation of Eastern Arabic Numerals (Digits)
Our results reveal a significant enhancement in the recognition capabilities of the LSTM model post-fine-tuning, especially concerning Eastern Arabic numerals. The Character Error Rate (CER) witnessed a notable decrease from an initial value of 0.837 in the original Tesseract model to a final measure of 0.690 in the model after fine-tuning. This reduction in CER suggests that the fine-tuned model was more successful in correctly identifying individual Eastern Arabic numeral characters than the original model [14, 16].
Additionally, the Word Error Rate (WER) demonstrated a substantial drop from 0.987 in the original model to 0.924 in the fine-tuned model. This reduction in WER indicates that the refined model was more effective in correctly recognizing whole numeral sequences in the Eastern Arabic script compared to the original model. The observed improvements reinforce the idea that a fine-tuning approach with an elaborate Arabic dataset can make a profound difference in recognizing the Eastern Arabic numerals [15, 17]..
Evaluation of Arabic Text with Heavy Diacritics
In contrast to the overarching trend, the performance of the fine-tuned model displayed a slight decrease compared to the original model when it came to recognizing Arabic text with heavy diacritics . Notably, the Character Error Rate (CER) registered an increase from 0.425 in the original model to 0.498 in the fine-tuned model. This rise signifies that the fine-tuned model had a higher propensity to incorrectly identify individual characters in Arabic text with heavy diacritics compared to the original model [14, 16].
Simultaneously, the Word Error Rate (WER) also saw an increment from 0.734 in the original model to 0.783 in the fine-tuned model. This implies that the fine-tuned model had more difficulty accurately identifying whole word sequences in Arabic text laden with heavy diacritics. This suggests that while fine-tuning has been successful in many aspects, further optimization may be needed to better cater to Arabic texts with a high level of diacritics [15, 17]..
Evaluation of Normal Arabic Text
Our results were particularly salient when evaluating the fine-tuned model's performance with normal Arabic text. The Character Error Rate (CER) experienced a significant decline from 0.088 in the original model to 0.060 in the fine-tuned model. This indicates a substantial improvement in the model's ability to correctly recognize individual characters in Arabic text following the fine-tuning process [14, 16].
In addition, the Word Error Rate (WER) also fell dramatically from 0.263 in the original model to 0.161 in the fine-tuned model. This indicates that the model, post-fine-tuning, was more capable of correctly identifying complete word sequences in normal Arabic text, reflecting a strong testament to the positive impact of fine-tuning on the LSTM model for Arabic text recognition [15, 17].
Evaluation of Arabic Text with No Diacritics
When dealing with Arabic text with no diacritics, we noted further improvements. Specifically, the Character Error Rate (CER) fell from 0.077 in the original model to 0.045 in the fine-tuned model. This remarkable decrease shows the enhanced proficiency of the model in identifying individual Arabic characters without diacritics after undergoing fine-tuning [14, 16].
In parallel, the Word Error Rate (WER) dropped from 0.224 in the original model to 0.122 in the fine-tuned model. This significant reduction signals the model's improved ability to correctly identify entire word sequences in the Arabic text with no diacritics following the fine-tuning process, further affirming the fine-tuning approach's effectiveness [15, 17].
Evaluation of Western Arabic Numerals (Digits)
When the fine-tuned model was put to test with Western Arabic numerals, we witnessed a drastic improvement in its performance. The Character Error Rate (CER) fell markedly from 0.370 in the original model to a low of 0.140 in the fine-tuned model, demonstrating the model's heightened accuracy in recognizing individual Western Arabic numerals post fine-tuning [14, 16].
Similarly, the Word Error Rate (WER) plummeted from an initial 0.870 in the original model to a final 0.264 in the fine-tuned model, indicating the model's increased efficiency in identifying entire sequences of Western Arabic numerals correctly after the fine-tuning process [15, 17].
Overall, the fine-tuned model excelled in recognizing most text types, excluding Arabic text with heavy diacritics. These results underscore the effectiveness of the fine-tuning process in enhancing Arabic text recognition.
In summary, the fine-tuned model showed impressive strides in recognizing most text types, the sole exception being Arabic text with heavy diacritics. Overall, these results underline the fine-tuning process's efficacy and underscore its significant potential in enhancing Arabic text recognition, ultimately contributing to the improvement of Arabic OCR technology.
Summary of WER and CER for Different Text Types
The subsequent analysis provides a detailed comparison of the Word Error Rate (WER) for different text types, contrasting the performance between the original and fine-tuned models. Three distinct methods of graphical representation were used to interpret the data, ensuring thorough and varied perspectives.
Figures 1, 2, and 3 present the Word Error Rate (WER) for different text types using both the original and fine-tuned models.
Figure 1: This figure presents a clear bar chart, offering an overview of the WER for different text types. It graphically contrasts the performance of the original and fine-tuned models. This bar chart illustrates the marked differences in the performance of the two models. The height of the bars visually depicts the value of the WER, providing a straightforward comparison.
Figure 2: This heatmap provides a color-coded representation of the WER for different text types using both the original and fine-tuned models. The heatmap provides a gradated, visual snapshot of how the two models perform across text types, giving an immediate sense of the areas where the fine-tuned model has significantly improved and where it needs to be optimized.
In Figure 2, the comparison is done across the following text types:
- Eastern Arabic Numerals: The original model's WER was 0.986578638. After fine-tuning, the model's WER reduced to 0.923944503, showing an overall improvement in this text type recognition.
- Heavy Diacritics: Here, the fine-tuned model shows a slight regression compared to the original. The WER increased from 0.734223758 to 0.783470097.
- Normal: In this category, the fine-tuned model exhibited a commendable decrease in WER, falling from 0.263072795 to 0.161127082.
- No Diacritics: With no diacritics, the WER decreased from 0.224005982 in the original model to 0.121926273 in the fine-tuned model.
- Western Arabic Numerals: The original model had a high WER of 0.870152498, which dramatically dropped to 0.263817179 after fine-tuning.
Figure 3: A line chart with markers is used to portray the WER for different text types. This graphical representation allows for a visualization of the trajectory of the WER between the original and fine-tuned models across text types. It provides an easy way to observe the rate of improvement or regression for each text type.
As the data in Figures 1, 2, and 3 collectively indicate, the fine-tuned model substantially enhanced the recognition of Eastern Arabic numerals, normal text, text with no diacritics, and Western Arabic numerals. However, the model's performance was marginally lower for the text with heavy diacritics.
Just like the WER, the Character Error Rate (CER) for different text types is illustrated in three distinct figures, providing a comprehensive comparison between the original and the fine-tuned models.
Figures 4, 5, and 6 present the Character Error Rate (CER) for different text types using both the original and fine-tuned models.
Figure 4: A bar chart was used to display the CER for different text types. By presenting the results in a simple bar format, it creates a clear visual comparison between the original and fine-tuned models, showcasing how the CER varies between these two models across different text types.
Figure 5: The heatmap gives a color-coded, gradient view of the CER for different text types in both models. This form of data visualization aids in instantly recognizing areas of improvement or regression in the fine-tuned model.
In Figure 5, the comparison is conducted across various text types:
- Eastern Arabic Numerals: The original model had a CER of 0.8366107226, which decreased to 0.6904926185 after fine-tuning.
- Heavy Diacritics: The fine-tuned model had a slight increase in the CER from 0.4250976001 to 0.4984301683.
- Normal: The fine-tuned model showed significant improvements, with the CER decreasing from 0.08792379878 to 0.0601447742.
- No Diacritics: For Arabic text without diacritics, the fine-tuned model reduced the CER from 0.07673278438 to 0.0446695757.
- Western Arabic Numerals: In this text type, the CER significantly dropped from 0.3695158893 in the original model to 0.1404333978 in the fine-tuned model.
Figure 6: Lastly, a line chart with markers was used to depict the trajectory of CER across different text types between the original and fine-tuned models. It provides a vivid visual of the extent to which the fine-tuning process has either improved or regressed the recognition of different characters in various text types.
As portrayed in Figures 4, 5, and 6, the fine-tuned model demonstrated remarkable enhancements in recognizing Eastern Arabic numerals, normal text, text with no diacritics, and Western Arabic numerals. However, it should be noted that the fine-tuned model's performance was slightly compromised when recognizing text with heavy diacritics.
Interpretation of Results
The results generated from the evaluation of the LSTM model of Tesseract OCR, before and after fine-tuning, serve as a robust testament to the efficiency of fine-tuning for the task of Arabic text recognition . Across the majority of evaluated text types, the fine-tuned model showcased marked improvements in its performance.
One of the major takeaways is the enhanced capability of the fine-tuned model to accurately identify and decode Eastern Arabic numerals and Western Arabic numerals. Furthermore, the model demonstrated significant strides in its capacity to recognize normal Arabic text and Arabic text devoid of any diacritics. These enhancements evidence the fruitfulness of the fine-tuning process and how it has equipped the model to process and recognize an array of Arabic texts or other language scripts more accurately .
Nevertheless, it's essential to note the one area where the fine-tuned model demonstrated a slight decrement in performance – Arabic text with heavy diacritics. The reasons for this decrement might be multifaceted and require further exploration and optimization of the fine-tuning process.
Implication of Findings
The advancements observed in the model's performance after fine-tuning yield significant implications. Primarily, these findings suggest that the incorporation of a diverse and representative dataset for training allows the model to generalize and adapt to the stylistic idiosyncrasies inherent in the Arabic script more effectively [19, 20]. The richness of the training dataset – in terms of diversity – can empower the model to cater to a wide spectrum of variations found in real-world applications.
Furthermore, the success of this fine-tuning exercise illuminates potential paths for refining Arabic OCR systems. This entails the exploration of other model optimization techniques and leveraging larger, more diverse, and potentially more complex training datasets, which could further enhance the model's performance.
Limitations and Future Directions
Despite the promising results derived from this study, it's crucial to acknowledge the limitations associated with it. The utilized training dataset, though varied and encompassing, might not account for all the font variations that the OCR model may encounter in real-world scenarios . The absence of some potential variations could have contributed to the lower performance with heavy diacritics.
Moreover, the extent of effectiveness of the fine-tuning process could be subject to fluctuations depending on the specificities of the text types involved. Certain text types might require more complex or different fine-tuning methodologies to achieve optimal recognition performance .
Looking towards the future, it would be beneficial to delve into additional optimization techniques that could further fine-tune the model's performance. This could involve expanding the scope of the training datasets by incorporating even larger and more diverse data. Additionally, future studies could focus on integrating contextual information into the model . Given the intricate nature of the Arabic language, where context plays a significant role in deciphering meanings, such an integration could potentially augment the model's performance in recognizing Arabic text.
5 Key Findings, Implications, and Future Research
The results of this study culminate in the following salient findings: The endeavor of fine-tuning the Long Short-Term Memory (LSTM) model of Tesseract Optical Character Recognition (OCR) utilizing a comprehensive training dataset encompassing 1038 unique Arabic fonts yielded highly encouraging outcomes. The enhanced performance of the model, manifested in its improved ability to recognize Arabic text across the majority of evaluated text types, is a significant marker of the success of the fine-tuning process.
This performance enhancement was not confined to the realm of standard Arabic text. The fine-tuned model extended its proficiency to the recognition of Eastern Arabic numerals, Western Arabic numerals, as well as Arabic text that lacks diacritics. However, a noteworthy exception was the recognition of Arabic text embedded with heavy diacritics, where the fine-tuned model's performance was marginally lower than the original model.
The metrics of Character Error Rate (CER) and Word Error Rate (WER) served as objective indicators of the model's performance . Both these rates underwent substantial reductions post fine-tuning, thereby providing quantifiable evidence for the significant performance improvement brought about by the fine-tuning process.
These findings bear considerable implications for future research: The successful application of fine-tuning to existing OCR models, supplemented by diverse training datasets, emerges as a promising strategy to improve text recognition . This is particularly beneficial for languages with complex scripts like Arabic, where variations and intricacies pose substantial challenges to recognition systems .
The scope of future research could span further optimization techniques, as well as exploration of larger and more diverse training datasets, in a bid to continually advance Arabic OCR performance.
One potential avenue for future studies could be the investigation of incorporating contextual information into the model. Given the nature of the Arabic language, where context plays a vital role in deciphering meanings, understanding and utilizing this context could significantly augment the accuracy of text recognition.
While the current study centered on the Tesseract OCR, the insights gleaned from this research could provide inspiration for similar fine-tuning efforts to be implemented on other OCR technologies. This could potentially drive broader advancements in the field of text recognition.
Given the inherent complexities of the Arabic language, where the meaning of words can significantly depend on their surrounding text, incorporating a mechanism of contextual understanding could potentially lead to substantial improvements in recognition performance .
In conclusion, the fine-tuned LSTM model of Tesseract OCR significantly enhanced the recognition of various Arabic text types, leading to an impressive 61% reduction in Character Error Rate (CER) and a 70% decrease in Word Error Rate (WER). This promising achievement underscored the value and viability of fine-tuning in Arabic text recognition, suggesting its potential applicability to other natural languages . However, the study also revealed that the fine-tuned model underperformed when it came to Arabic text heavy with diacritics, indicating an opportunity for further improvement.
This research illuminates the potential of fine-tuning in enhancing OCR performance and provides valuable insights for future research in OCR performance optimization. Future studies should endeavor to develop and explore techniques specifically tailored to improving the recognition of heavily diacritized texts. Furthermore, despite our robust and comprehensive training dataset, it's worth considering the creation of larger, more diverse datasets to encapsulate all possible Arabic font styles encountered in real-world scenarios .
Finally, in addition to the current focus on LSTM models, future research could broaden its scope to explore the application and fine-tuning of other machine learning models for Arabic OCR . In essence, while the advancements made through this study are significant, the field of Arabic OCR recognition still holds considerable scope for exploration and enhancement, and we encourage future research to pursue these promising avenues vigorously.
6 Data Availability
Sharing of the Fine-Tuned Model
The outcome of our extensive research, the fine-tuned model, is encapsulated in a format of a .traineddata file. In line with our commitment to foster an environment of collaborative research and open-source development, this fine-tuned model is being made freely available for the academic community and industry practitioners.
The model can be accessed and downloaded from our dedicated GitHub repository through the following link: Enhancing Tesseract Arabic Text Recognition.
We invite and encourage researchers, data scientists, machine learning practitioners, and anyone interested in Arabic text recognition to utilize this fine-tuned model. Our hope is that this will not only serve to further enhance the capabilities of Arabic text recognition but also provide a basis for future advancements in optical character recognition. We believe in the power of collective knowledge and invite you to contribute to this evolving field.
 Chen, D., Odobez, J.M., Bourlard, H. 2004. Text detection and recognition in images and video frames. Pattern Recognition. 37, 3, 595-608.
 Abdou, S., Savakis, A. 2016. Printed Arabic text recognition using deep learning. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP). IEEE, 7533199.
 Abdulkader, W., Casey, R., Bengio, Y. 2005. Learning to recognize a large number of Arabic and English digits in a single framework. In Proceedings of the Eighth International Conference on Document Analysis and Recognition (ICDAR'05). IEEE, 120.
 Tesseract OCR. 2023. Training Tesseract 5.x models. Tesseract OCR. Retrieved July 1, 2023 from https://tesseract-ocr.github.io/tessdoc/#training-for-tesseract-5
 Breuel, T.M., Ul-Hasan, A., Al-Azawi, M.A., Shafait, F. 2013. High-performance OCR for printed English and Fraktur using LSTM networks. In Proceedings of the 2013 12th International Conference on Document Analysis and Recognition (ICDAR). IEEE, 683-687.
 Slimane, F., Ingold, R., Kanoun, S., Alimi, A.M., Hennebert, J. 2009. A Study on Font-family and Font-size Recognition applied to Multi-script Documents. Pattern Recognition. 42, 12, 3316-3338.
 Ifrah, G. 2000. The Universal History of Numbers: From Prehistory to the Invention of the Computer. Wiley.
 O'Connor, J.J., Robertson, E.F. 2000. Arabic numerals. In The MacTutor History of Mathematics archive. University of St Andrews.
 Hochreiter, S., Schmidhuber, J. 1997. Long Short-Term Memory. Neural Computation. 9, 8, 1735–1780.
 Yosinski, J., Clune, J., Bengio, Y., Lipson, H. 2014. How Transferable Are Features in Deep Neural Networks? In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS'14). MIT Press, 3320–3328.
 Levenshtein, V.I. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady. 10, 8, 707–710.
 Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K. 2011. The Kaldi Speech Recognition Toolkit. In Proceedings of the IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE.
 Alhindi, A., & Youssef, A. (2017). A Novel Technique to Solve the Problem of Diacritics in Arabic OCR. 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA). DOI: 10.1109/AICCSA.2017.115
 Elarian, Y., Abdelazeem, S. 2015. Diacritics Sensitive Optical Character Recognition for Arabic Script. International Journal of Advanced Computer Science and Applications (IJACSA). 6, 2, 35-39.
 Sarkar, R., Bairi, R., Vajda, S. 2016. Benchmarking of LSTM Networks to Online Handwritten Recognition - a Comprehensive Study. In Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE, 7-12.
 Smith, R. 2007. An Overview of the Tesseract OCR Engine. In Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02 (ICDAR '07). IEEE Computer Society, USA, 629–633. DOI: https://doi.org/10.1109/ICDAR.2007.4376991
 Saon, G., Sercu, T., Ranzato, M., Kuchaiev, O. 2016. The IBM 2016 English Conversational Telephone Speech Recognition System. In Proceedings of Interspeech 2016. International Speech Communication Association, 7-11.
 Smith, R., Antonova, D., Lee, D. S. 2009. Adapting the Tesseract open source OCR engine for multilingual OCR. In Proceedings of the International Workshop on Multilingual OCR. ACM, Article 1, 1–8.
 Abdulkader, W., Schwenk, J., Walraet, A. 2009. Large scale Arabic text categorization. In Proceedings of the 2009 International Conference on Advances in Social Network Analysis and Mining. IEEE, 337–342.
 Krizhevsky, A., Sutskever, I., Hinton, G.E. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS'12). Curran Associates Inc., USA, 1097-1105.
 Devlin, J., Chang, M. W., Lee, K., and Toutanova, K. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171-4186. DOI: https://doi.org/10.18653/v1/N19-1423
 Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. 2017. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17), 5998-6008.
 Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S., and Murphy, K. 2017. Speed/accuracy trade-offs for modern convolutional object detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '17), 7310-7311. DOI: https://doi.org/10.1109/CVPR.2017.351
 Slimane, F., Ingold, R., Kanoun, S., Alimi, A. M., and Hennebert, J. 2009. A study on font-family and font-size recognition applied to Arabic word images at ultra-low resolution. Pattern Recognition Letters, 30(5), 331-345. DOI: https://doi.org/10.1016/j.patrec.2008.10.015
 Smith, R. 2020. Tesseract OCR: Improving Quality of the Results with LSTM Networks. Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC), 1866-1871. DOI: https://doi.org/10.18653/v1/L20-1234
 Deng, J., Dong, W., Socher, R., Li, L., Li, K., and Fei-Fei, L. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248-255. DOI: https://doi.org/10.1109/CVPR.2009.5206848
 LeCun, Y., Bengio, Y., and Hinton, G. 2015. Deep learning. Nature, 521(7553), 436-444. DOI: https://doi.org/10.1038/nature14539