数据挖掘 - python中的正则表达式 - - 吾爱随笔录

python中的正则表达式 -

数据挖掘 Python 正则表达式

2022-02-25 08:53:41

我想提取以下文本的值

Paﬁent Name : Thomas Joseph MRNO : DQ026151?
Doctor : Haneef M An : 513! Gandar : Male
Admission Data : 19-Feb-2V'3‘¥T12:2'$ PM Bill No : IDOGIII.-H-17
Discharge Date : 22-Feb-20$? 1D:5‘F AM Bill Dale : E2-Feb-2017

仅提取字段名称的值，例如，

Thomas Joseph 来自字段名称 Pateint 名称，类似地用于其他字段名称并将输出保存到 excel

上面的 Python 代码

我的尝试——

text = pt.image_to_string(img1)
print(text)
s = re.findall(r'\s:\s(\w+)', text)
print (s)

2个回答

它可能并不完美，但几乎可以完成这项工作。

import re
re.findall(r'(?<=: )\w{2}-\w{3}-\d{4}|(?<=: )\d{2}-\w{3}-\w{2}|(?<=: )\s?\w+\s?\w+\s?\w+',data)

#['Thomas Joseph MRNO','DQ026151','Haneef M An','513','Male','19-Feb-2V','IDOGIII','22-Feb-20','E2-Feb-2017']

正如@spacedman 正确提到的，这将在 StackOverflow 上更快地回答。但是您可以使用它来创建这样的字典。可能有更好的方法，但这是一个快速的解决方法。

# -*- coding: utf-8 -*-
import re
st = '''Pafient Name : Thomas Joseph MRNO : DQ026151?
Doctor : Haneef M An : 513! Gandar : Male
Admission Data : 19-Feb-2V'3‘¥T12:2'$ PM Bill No : IDOGIII.-H-17
Discharge Date : 22-Feb-20$? 1D:5‘F AM Bill Dale : E2-Feb-2017'''
st = st.decode('utf-8').replace('\n','')+'<eof>'
words = ['Pafient Name','MRNO','Doctor','An','Gandar','Admission Data','PM Bill No','Discharge Date','Bill Dale','<eof>']
print {words[i]:st[st.index(words[i])+len(words[i]):st.index(words[i+1])].replace(':','').strip() for i in range(len(words)-1)}

其它你可能感兴趣的问题

上一篇acf 函数在拟合时间序列时显示错误下一篇进行新闻推荐的各种方法