【进阶OpenCV】（8）--摄像头操作---＞识别文档内容

文章目录

摄像头操作
- 1. 打开摄像头
- 2. 识别画面预处理
- 3. 轮廓检测
- 4. 轮廓近似
- 5. 透视变换
- - 5.1 定义order_point(pts)方法：
  - 5.2 定义four_point_transform(image,pts)方法：
  - 5.3 代码应用
- 6. 关闭图像窗口
- 7. 完整代码展示
总结

摄像头操作

本篇我们来介绍，如何打开摄像头来识别文档。

思路：

打开摄像头。
描绘出摄像头识别画面中的所有轮廓。
那么，轮廓有了，如何找到独属于文档的轮廓呢？我们知道，一般的文档都是长方形的，利用这一点，我们可以通过轮廓近似的方法，看看哪些轮廓是可以通过四点定位的，从而取出文档轮廓。
识别之后，通过透视变换方法，将文档规整的单独展示出来。

1. 打开摄像头

通过**cv2.VideoCapture()**方法，当括号内为0时，打开电脑摄像头；为1时，打开外接摄像头。

cap = cv2.VideoCapture(0) # 打开摄像头
if not cap.isOpened(): # 打开失败‘
    print("Cannot open camera")
    exit()

2. 识别画面预处理

使用**cap.read()**方法读取摄像头画面，将画面转化为灰度图，进行高斯滤波去除噪声：

def cv_show(name,img):
    cv2.imshow(name,img)
    cv2.waitKey(60)
    
while True:
    flag = 0 # 用于标识，当前是否检测到文件
    ret,image = cap.read() # 如果正确读取，ret为True
    orig = image.copy()
    if not ret:
        print("不能读取摄像头")
        break
    cv_show("image",image)

    gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY) # 将图像转化为灰度图
    # 预处理
    gray = cv2.GaussianBlur(gray,(5,5),0) # 高斯滤波
    edged = cv2.Canny(gray,75,200)
    cv_show('1',edged)

注意！！！：以下每点操作都在主循环while True中。

3. 轮廓检测

通过cv2.findContours()方法查询轮廓，并将其在原图上描绘出来：

# 轮廓检测
cnts = cv2.findContours(edged.copy(),cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)[1]
cnts = sorted(cnts,key=cv2.contourArea,reverse=True)[:3]
image_contours = cv2.drawContours(image,cnts,-1,(0,255,0),2)
cv_show("image_contours",image_contours)

4. 轮廓近似

遍历每一个检测到的轮廓，使用**cv2.approxPolyDP()**方法对其进行轮廓近似，因为文档是长方形的，接着将近似轮廓只需要四个点就组成的轮廓取出，意味着成功识别到了文档：

# 遍历轮廓
for c in cnts:
    # 计算轮廓近似
    peri = cv2.arcLength(c,True) # 计算轮廓的周长

    approx = cv2.approxPolyDP(c,0.05 * peri,True) # 轮廓近似
    # C表示输入的点集
    # epsilon表示从原始轮廓到近似轮廓的最大距离，它是一个准确参数
    # True表示封闭的
    area = cv2.contourArea(approx)

    # 4个点的时候就拿出来
    if area > 20000 and len(approx) == 4:
        screenCnt = approx
        flag = 1
        print(peri,area)
        print('检测到文档')
        break

5. 透视变换

率先将透视变换的方法通过函数形式编写出来以便于调用（定位到四个角点，然后将其透视变换到一个矩阵上）：

5.1 定义order_point(pts)方法：

用于将给定的四个点（通常是从图像中检测到的轮廓点或角点）按照特定的顺序排列：左上角（tl）、右上角（tr）、右下角（br）和左下角（bl）。

过程：

首先，计算每个点坐标的和 s，通过 np.argmin(s) 和 **np.argmax(s)**找到 y 值最小（最上）和最大（最下）的两个点，分别作为矩形的顶部和底部点。
然后，计算每对相邻点之间 x 坐标的差 diff，通过 np.argmin(diff) 和 np.argmax(diff) 找到 x 值变化最小（最左，即左侧边界上的点，假设点按顺时针或逆时针顺序给出）和最大（最右，即右侧边界上的点）的两个点，分别作为矩形的左侧和右侧点。
输出：rect是一个形状为 (4, 2) 的 NumPy 数组，包含了按左上角、右上角、右下角、左下角顺序排列的四个点。

def order_point(pts):
    rect = np.zeros((4,2),dtype="float32")
    s = pts.sum(axis=1)
    rect[0] = pts[np.argmin(s)]
    rect[2] = pts[np.argmax(s)]
    diff = np.diff(pts,axis=1)
    rect[1] = pts[np.argmin(diff)]
    rect[3] = pts[np.argmax(diff)]
    return rect

5.2 定义four_point_transform(image,pts)方法：

这个函数使用 order_point函数得到的点来对输入图像进行透视变换，使得这四个点映射到一个矩形上。

输入：image是要进行透视变换的输入图像，pts是图像中检测到的四个点的坐标。
过程：

首先，调用 order_point(pts) 来获取按特定顺序排列的四个点（tl, tr, br, bl）。
然后，计算这四个点形成的矩形的宽度和高度，以确保变换后的图像能够包含整个矩形区域。
接着，定义一个目标矩形 dst，其四个角点映射到变换后的图像的 (0,0)、(maxwidth-1,0)、(maxwidth-1,maxheight-1) 和 (0,maxheight-1) 位置。
使用 OpenCV 的 **cv2.getPerspectiveTransform(rect, dst)**函数计算透视变换矩阵 M。
最后，使用 **cv2.warpPerspective(image, M, (maxwidth, maxheight))**对输入图像进行透视变换，得到变换后的图像。

输出：warped是经过透视变换后的图像。

def four_point_transform(image,pts):
    # 获取输入坐标点
    rect = order_point(pts)
    (tl,tr,br,bl) = rect
    # 计算输入的w和h值
    widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
    widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
    maxwidth = max(int(widthA),int(widthB))
    heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
    heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
    maxheight = max(int(heightA), int(heightB))
    # 变换后对应坐标位置
    dst = np.array([[0,0],[maxwidth - 1,0],
                    [maxwidth - 1,maxheight - 1],[0,maxheight - 1]],dtype="float32")

    M = cv2.getPerspectiveTransform(rect,dst)
    warped = cv2.warpPerspective(image,M,(maxwidth,maxheight))
    # 返回变化后结果
    return warped

5.3 代码应用

将透视变换后的图像进行二值化处理，是内容呈现更清晰：

if flag == 1:
    # 展示结果
    image_contours = cv2.drawContours(image,[screenCnt],0,(0,255,0),2)
    cv_show("image",image_contours)
    # 透视变换
    warped = four_point_transform(orig,screenCnt.reshape(4,2))
    cv_show("warped",warped)
    # 二值处理
    warped = cv2.cvtColor(warped,cv2.COLOR_RGB2GRAY)
    ref = cv2.threshold(warped,220,255,cv2.THRESH_BINARY)[1]
    # ref = cv2.threshold(warped, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
    cv_show("ref",ref)

6. 关闭图像窗口

对于视频画面的捕获，每一帧保存，需要通过**cap.release()**方法释放（写在主循环外哦！）：

cap.release() # 释放捕获器-
cv2.destroyAllWindows() # 关闭图像窗口

7. 完整代码展示

import cv2
import numpy as np

def cv_show(name,img):
    cv2.imshow(name,img)
    cv2.waitKey(60)

def order_point(pts):
    rect = np.zeros((4,2),dtype="float32")
    s = pts.sum(axis=1)
    rect[0] = pts[np.argmin(s)]
    rect[2] = pts[np.argmax(s)]
    diff = np.diff(pts,axis=1)
    rect[1] = pts[np.argmin(diff)]
    rect[3] = pts[np.argmax(diff)]
    return rect
def four_point_transform(image,pts):
    # 获取输入坐标点
    rect = order_point(pts)
    (tl,tr,br,bl) = rect
    # 计算输入的w和h值
    widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
    widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
    maxwidth = max(int(widthA),int(widthB))
    heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
    heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
    maxheight = max(int(heightA), int(heightB))
    # 变换后对应坐标位置
    dst = np.array([[0,0],[maxwidth - 1,0],
                    [maxwidth - 1,maxheight - 1],[0,maxheight - 1]],dtype="float32")

    M = cv2.getPerspectiveTransform(rect,dst)
    warped = cv2.warpPerspective(image,M,(maxwidth,maxheight))
    # 返回变化后结果
    return warped

if __name__ == '__main__':
    cap = cv2.VideoCapture(0) # 打开摄像头
    if not cap.isOpened(): # 打开失败‘
        print("Cannot open camera")
        exit()

    while True:
        flag = 0 # 用于标识，当前是否检测到文件
        ret,image = cap.read() # 如果正确读取，ret为True
        orig = image.copy()
        if not ret:
            print("不能读取摄像头")
            break
        cv_show("image",image)

        gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY) # 将图像转化为灰度图
        # 预处理
        gray = cv2.GaussianBlur(gray,(5,5),0) # 高斯滤波
        edged = cv2.Canny(gray,75,200)
        cv_show('1',edged)

        # 轮廓检测
        cnts = cv2.findContours(edged.copy(),cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)[1]

        cnts = sorted(cnts,key=cv2.contourArea,reverse=True)[:3]
        image_contours = cv2.drawContours(image,cnts,-1,(0,255,0),2)
        cv_show("image_contours",image_contours)

        # 遍历轮廓
        for c in cnts:
            # 计算轮廓近似
            peri = cv2.arcLength(c,True) # 计算轮廓的周长

            approx = cv2.approxPolyDP(c,0.05 * peri,True) # 轮廓近似
            # C表示输入的点集
            # epsilon表示从原始轮廓到近似轮廓的最大距离，它是一个准确参数
            # True表示封闭的
            area = cv2.contourArea(approx)

            # 4个点的时候就拿出来
            if area > 20000 and len(approx) == 4:
                screenCnt = approx
                flag = 1
                print(peri,area)
                print('检测到文档')
                break
        if flag == 1:
            # 展示结果
            image_contours = cv2.drawContours(image,[screenCnt],0,(0,255,0),2)
            cv_show("image",image_contours)
            # 透视变换
            warped = four_point_transform(orig,screenCnt.reshape(4,2))
            cv_show("warped",warped)
            # 二值处理
            warped = cv2.cvtColor(warped,cv2.COLOR_RGB2GRAY)
            ref = cv2.threshold(warped,220,255,cv2.THRESH_BINARY)[1]
            # ref = cv2.threshold(warped, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
            cv_show("ref",ref)

cap.release() # 释放捕获器-
cv2.destroyAllWindows() # 关闭图像窗口