OCR文字識别在UI自動化上的運用

2023-04-24 00:05:19

用了Airtest的圖像識别後發現在一些文字的識别上有些不準确，猜測可能是特征值比較低吧，容易比對錯。

在論壇上也看到過有人用OCR的方式，記不得是哪個文章了，用的是騰訊雲的接口吧。

按這個思路嘗試了一下，騰訊雲的接口有次數限制，我找了訊飛的接口，完全免費，也能用

原理很簡單，給這個接口上傳一張圖檔，背景處理生成識别出來的文字以及位置坐标。

有幾個雲平台提供了OCR的接口，騰訊雲超過一定次數就收費，我找到了科大訊飛的接口是完全免費的。

給這個接口上傳一張圖檔，背景處理生成識别出來的文字以及位置坐标。

是以，隻要把裝置的螢幕截圖儲存，讀進來，轉成base64編碼，傳給訊飛雲接口

等着結果傳回json串，解析裡面包含你要找的文字，拿到位置坐标，算出中心點

點選

搞定

貼代碼，訊飛雲上的demo代碼照搬有問題，改了一下

import urllib
def OCR_getPos(target):

    filePath = snapshot()
    f = open(filePath, 'rb')
    file_content = f.read()
    base64_image = base64.b64encode(file_content)
    body = urllib.parse.urlencode({'image': base64_image}).encode(encoding='utf-8')

    url = 'http://webapi.xfyun.cn/v1/service/v1/ocr/general'
    api_key = '1e90ca2d09d7213bf6770f34e6d2e70b'#用你自己的api_key替換
    param = {"language": "cn|en", "location": "true"}

    x_appid = "c23538b5" #用你自己的appid替換，我這個是亂敲的哈
    x_param = base64.b64encode(json.dumps(param).replace(' ', '').encode(encoding="utf-8"))
    x_param_b64_str = x_param.decode('utf-8')
    x_time = str(int(int(round(time.time() * 1000)) / 1000))
    string = api_key+x_time+x_param_b64_str
    string = string.encode('utf-8')
    # string = api_key + str(x_time) + x_param
    # m = hashlib.new('md5')
    # m.update(string.encode(encoding='UTF-8'))
    # x_checksum = m.hexdigest()
    # hash = hashlib.new('md5')
    # hash.update(.encode(encoding='utf-8'))
    # x_checksum = hash.hexdigest()
    x_checksum = hashlib.md5(string).hexdigest()
    x_header = {'X-Appid': x_appid,
                'X-CurTime': x_time,
                'X-Param': x_param_b64_str,
                'X-CheckSum': x_checksum}
    req = urllib.request.Request(url, body, x_header)
    result = urllib.request.urlopen(req)
    result = result.read().decode()
    jsonObject = json.loads(result)
    location=None
    try:
        data = jsonObject.get('data').get('block')
        for block in data:
            if block.get('type') == 'text':
                data = block
    except:
        print('no words')
        return
    lines = data.get('line')
    for line in lines:
        words = line.get('word')
        for word in words:
            content = word.get('content')
            if content is not None and target in content:
                location = word.get('location')
                print(location)
    if location :
        x1 = int(location.get('top_left').get('x'))
        y1 = int(location.get('top_left').get('y'))
        x2 = int(location.get('right_bottom').get('x'))
        y2 = int(location.get('right_bottom').get('y'))
        width = x2 -x1
        height = y2 - y1
        center_x = x1 + width/2
        center_y = y1 + height/2
        pos = [center_x, center_y]
        touch(pos)
    print(result+'\n')
    print(data)

if __name__ == '__main__':
    OCR_getPos('姓名')

OCR文字識别在UI自動化上的運用

繼續閱讀

tess4j的簡單使用doOCR異常彙總：

java實作阿裡雲圖檔文字識别

圖檔文字識别工具調研2.tesseract安裝與使用3.python

LEADTOOLS WinRT OCR識别代碼示例

場景文本檢測（一）-可微分二值化在基于語義分割方法的場景文本檢測中的應用動機和Contributions現存方法Methodology總結Ref

文本檢測算法----DB、DBNet1. 摘要2. 算法3. 實驗4. 結論

【百度OCR 封裝篇】OCR封裝隻IOCR自定義模版或分類器封裝兩種調用方式

使用MODI（Microsoft Office Document Imaging）識别中文，但無法區分段落

識别PDF文字的軟體，得力OCR文字識别

主流深度學習OCR文字識别方法對比：Tesseract（LSTM）、CTPN+CRNN、Densenetopencv mser算法框出圖檔文字區域

【基于WPF+OneNote+Oracle的中文圖檔識别系統階段總結】之篇三：批量處理後的txt檔案入庫處理篇三：批量處理後的txt檔案入庫處理

百度雲 OCR 識别圖檔驗證碼

爬蟲驗證碼識别_工具篇：安裝pytesseract&Tesseract-OCR

應用移動端銀行卡識别技術，實作APP端快捷綁定銀行卡号

手寫字元識别

Jaspersfot Studio Create Check OCR Font