Abstract
This paper presents a method to retrieve words from Kannada documents. It works on Histogram of Oriented Gradients (HOG) and Morphological filters. A large dataset of 50000 words is created using 250 document pages belongs to different categories. A preprocessed document image is segmented using simple morphological filters. The histogram channels are designed over four-sided cells (i.e. R-HOG) to compute gradients of a word image. In parallel, morphological erosion, opening, top and bottom hat transformations are applied on each word. The densities of the resultant images are estimated. Later on, HOG and morphological features are fused. Then, the cosine distance is used to measure the similarity between two words i.e., query and candidate word, based on it, the relevance of the word is estimated by generating distance ranks. Then correctly matched words are selected at threshold 98%. The experimental results confirm the efficiency of our proposed method in terms of the average precision rate 91.23%, and average recall rate 84.78% as well as average F-measure 89.47%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Otsu, N.: A threshold selection method from gray-level histograms. Pattern Anal. Mach. Intell. 9(1), 62–66 (1979)
Rath, T.M., Manmatha, R.: Features for word spotting in historical manuscripts, document analysis and recognition. Int. J. Doc. Anal. Recogn. 1, 218–222 (2003)
Konidaris, T., Gatos, B., Ntzios, K.: Keyword-guided word spotting in historical printed documents using synthetic data and user feedback. Int. J. Doc. Anal. Recogn. 9, 167–177 (2007)
Lu, S., Li, L., Tan, C.L.: Document image retrieval through word shape coding. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1913–1918 (2008)
Bai, S., Li, L., Tan, C.L.: Keyword spotting in document images through word shape coding. In: Document Analysis and Recognition, pp. 331–335 (2009)
Hangarge, M., Dhandra, B.V.: Script identification in indian document images based on directional morphological filters. Int. J. Recent Trends Eng. 2, 124–126 (2009)
Rabaev, I., Biller, O., El-Sana, J., Kedem, K., Dinstein, I.: Case study in Hebrew character searching. In: International Conference on Document Analysis and Recognition, pp. 1080–1084 (2011)
Abidi, A., Siddiqi, I., Khurshid, K.: Towards searchable digital Urdu libraries-a word spotting based retrieval approach. In: International Conference on Document Analysis and Recognition, pp. 1344–1348 (2011)
Yat, M., Lam, L., Suen, C.Y.: Arabic handwritten word spotting using language models, pp. 43–48 (2012)
Doermann, D.: The indexing and retrieval of document images: a survey. Comput. Vis. Image Underst. 70(3), 287–298 (1998)
Lu, S., Chen, B.M., Ko, C.C.: A partition approach for the restoration of camera images of planar and curled document. In: Image and Vision Computing, pp. 837–848 (2006)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2005)
Frinken, V., Fischer, A., Manmatha, R., Bunke, H.: A novel word spotting method based on recurrent neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 34(2), 211–224 (2012)
Tarasawa, K., Tanaka, Y.: Slit style HOG feature for document image word spotting. In: ICDAR (2009)
Pati, P.B., Ramakrishnan, A.G.: Word level multi-script identification. Pattern Recogn. Lett. 29, 1218–1229 (2008)
Jain, R., Frinken, V., Jawahar, C.V., Manmatha, R.: BLSTM neural network based word retrieval for Hindi documents. In: 2011 International Conference on Document Analysis and Recognition, pp. 83–87 (2011)
Tarafdar, A., Mondal, R., Pal, S., Pal, U., Kimura, F.: Shape code based word-image matching for retrieval of Indian multi-lingual documents. In: International Conference on Pattern Recognition (2010)
Hangarage, M., Veershetty, C., Rajmohan, P., Dhandra, B.V.: Gabor wavelets based word retrieval from Kannada documents. Procedia Comput. Sci. 79, 441–448 (2016). International Conference on Communication, Computing and Visualization
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Hangarge, M., Veershetty, C., Rajmohan, P., Mukarambi, G. (2017). Word Retrieval from Kannada Document Images Using HOG and Morphological Features. In: Santosh, K., Hangarge, M., Bevilacqua, V., Negi, A. (eds) Recent Trends in Image Processing and Pattern Recognition. RTIP2R 2016. Communications in Computer and Information Science, vol 709. Springer, Singapore. https://doi.org/10.1007/978-981-10-4859-3_7
Download citation
DOI: https://doi.org/10.1007/978-981-10-4859-3_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-4858-6
Online ISBN: 978-981-10-4859-3
eBook Packages: Computer ScienceComputer Science (R0)