原图：

目标：

我想通过在文本上放置边界框将文本分成单独的段落（如上所示）。

我尝试通过使用 opencv 的传统计算机视觉方法来做到这一点。

我绘制了字符级边界框
接下来，我对图像进行灰度化、二值化。
应用膨胀
最后将 bbox 放在扩张后的图像上。

这就是我得到的：

> #Morphological Transformation

kernel = np.ones((3,4),np.int8)

dilation = cv2.dilate(im_bw, kernel)

cv2.imwrite('dilated.png', dilation)

绘制矩形框

ret,thresh = cv2.threshold(im_bw, 127,255,0)
image, contours,hierarchy = cv2.findContours(thresh,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE )

for c in contours:
    rect = cv2.boundingRect(c)
    if rect[2] < 50 or rect[3] < 50 : continue

    print (cv2.contourArea(c))
    x,y,w,h = rect
    cv2.rectangle(im_new,(x,y),(x+w,y+h),(0,255,0),2)

cv2.imwrite('sample_res_inner.jpg',im_new)

由于图像是扫描图像加上它们之间的行间距很小，我无法根据段落对它们进行分割。

我怎样才能得到我想要的结果？

如何从扫描的文档图像中检测文本块

绘制矩形框