Document Layout Analysis is our second exercise. Using the three images above our program needs to do the following:
- Individual characters are boxed
- Individual words are boxed
- Lines are boxed
- Paragraphs are boxed
- The paragraphs with margins
I used a bottom-up approach for this exercise. It means that I started detecting and boxing the letters to words to line to paragraph and lastly to the paragraph with margin. I created a function for each of the objectives. I used a trial and error approach for determining the appropriate kernel size for the specific function. I have a very simple step for every objectives:
- Load the images.
- Assigning of output images
- Convert images to grayscale
- Cleaning the images using Otsu's Thresholding method. (with the inversed binarized image)
- Assigning kernel size (1 or 2 kernels depending in the objective)
- Morphological Operations (Dilation, Erosion, Closing and Opening)
- Find the Contours
- Box the contours (I added some offset in the word and letter objectives because the morphological operation I used is affecting the position to be bounded by the rectangle)
- Writing the image to a file.
Doing the exercise, I have doubts about my algorithm because it seems very simple and not that dynamic so I just finished boxing letters, words and line for the first image (with a very dirty code) but when I heard my classmates that they also did the same I began to code the other objective I lacked.
Implementing my algorithm, big fonts and colored images are the big limitation of the program. For example, heading letters will be detected as words for the word function because of its size. (Remember that I am using a hardcoded kernel size for each of my functions.) I can only guarantee a high accuracy detection for the example images.
Here are the resulting images for example 1:
Implementing my algorithm, big fonts and colored images are the big limitation of the program. For example, heading letters will be detected as words for the word function because of its size. (Remember that I am using a hardcoded kernel size for each of my functions.) I can only guarantee a high accuracy detection for the example images.
Here are the resulting images for example 1:
![]() |
Letters |
![]() |
Words |
![]() |
Lines |
![]() |
Paragraphs |
![]() |
Paragraphs with Margin |
Comments
Post a Comment