python 图片转word. pytesseract & Tesseract OCR

作者admin

10月 24, 2022

OCR 为光学文字识别的缩写（Optical Character Recognition，OCR），白话一点就是将图片翻译为文字。

Tesseract 是一个 OCR 模组，目前由 Google 赞助。除了极高的精准度外，Tesseract 也有很高的灵活性，能够通过训练识别出任何字体（只要这些字体的风格不变就可以），也能识别出任何 Unicode 字符，pytesseract 模块就像是Tesseract的 python 包。

1.安装tesseract-ocr引擎

https://github.com/tesseract-ocr/tesseract/wiki

2.安装模块

pip install pytesseract

#图片处理需要
pip3 install pillow

import pytesseract
from PIL import Image
#打开图片
image = Image.open('图片路径')
#将图片中的文字转换字符串
code = pytesseract.image_to_string(image, lang='chi_sim')
#输出字符串
print(code)