python 正则表达式使用总结，Python正则表达式使用详解与总结

温馨提示：这篇文章已超过449天没有更新，请注意相关的内容是否还可用！

摘要：，，本文总结了Python中正则表达式的使用。正则表达式是一种强大的文本处理工具，可用于匹配、查找和替换文本模式。本文介绍了Python中正则表达式的基本概念和语法，包括模式匹配、转义字符、特殊字符等。本文还提供了使用正则表达式的常见方法和函数，如re模块中的match、search、findall等函数的使用方法。通过本文的总结，读者可以更快地掌握Python正则表达式的使用，提高文本处理的效率。

最详细的官方讲解

https://docs.python.org/zh-cn/3.11/howto/regex.html#simple-patterns

python 正则表达式使用总结，Python正则表达式使用详解与总结第1张

速查表

https://zhuanlan.zhihu.com/p/658261452

https://blog.csdn.net/Java_ZZZZZ/article/details/130862224

https://docs.python.org/zh-cn/3.8/library/re.html

re库使用

re.findall()寻找所有符合特定形式的子串

import re
## 存在匹配的
txt = "ai aiThe rain in Spain"
x = re.findall("ai", txt)
print(x)
# 没有匹配的
txt = "adafda dafasdf"
x = re.findall("ai", txt)
print(x)

python 正则表达式使用总结，Python正则表达式使用详解与总结第2张

s='中国人adfadsfasfasdfsdaf中国万岁\n'
print(re.findall(r"\w",s))

python 正则表达式使用总结，Python正则表达式使用详解与总结第3张

s='中国人adfads----fasfasdfsdaf中国万岁\n'
print(re.findall(r"\w",s))
#“\w”代表的字符主要包括26个大写字母A到Z,即[A-Z]、26个小写字母a到z,即[a-z]、10个阿拉伯数字0到9,即[0-9]和下划线“_”。
# \w不包括破折号，所以我做了破折号的测试。

python 正则表达式使用总结，Python正则表达式使用详解与总结第4张

s='中国人adfads----fasfasdfsdaf中国万岁\n'
print(re.findall(r"\w",s,re.A)) # 不匹配汉字

python 正则表达式使用总结，Python正则表达式使用详解与总结第5张

\d+

import re 
result = re.findall('\d+','123acb567def98')
# 这里'\d+'相当于一种子串的形式，这个含义是在加在\d的含义上，
# 即这个子串的含义是\d\d\d\d\d\d...，但是我不知道到底有多少个，选择用\d+来表示(加号含义表示至少一个)
# 然后re.findall()相当于找出所有符合条件的子串(即形如\d\d\d\d\d...的子串)，注意不是其中某一个，而是所有
# 最终以列表的形式返回所有返回条件的子串，注意re.findall()是不返回字符串位置的，返回位置需要用的compile函数
print(result)

python 正则表达式使用总结，Python正则表达式使用详解与总结第6张

\d

import re 
result = re.findall('\d','123acb567def98')
# 这里'\d'相当于一种子串的形式，这里的子串就是'\d'
# 然后re.findall()相当于找出所有符合条件的子串，注意不是找出一个
print(result)

python 正则表达式使用总结，Python正则表达式使用详解与总结第7张 ### ^

import re
 
s = "https://blog.csdn.net/weixin_44799217http"
ret = re.findall(r"^http", s) # ^表示字符串以什么开始
print(ret)
#

python 正则表达式使用总结，Python正则表达式使用详解与总结第8张

finditer()返回字符串位置

\d+

import re 
p = re.compile("\d+")
s = '123acb567def98'
for m in p.finditer(s):
    print(m.span(), m.group())

python 正则表达式使用总结，Python正则表达式使用详解与总结第9张

\d

import re 
p = re.compile("\d")
# 可以看到这里得到的结果是类似的
s = '123acb567def98'
for m in p.finditer(s):
    print(m.span(), m.group())

python 正则表达式使用总结，Python正则表达式使用详解与总结第10张

import re 
p = re.compile("[a-z]+")
# 可以看到这里得到的结果是类似的
s = '123acb567def98'
for m in p.finditer(s):
    print(m.span(), m.group())

python 正则表达式使用总结，Python正则表达式使用详解与总结第11张

import re
p = re.compile("[a-z]")
for m in p.finditer('a1b2c3d4'):
    print(m.start(), m.group())

python 正则表达式使用总结，Python正则表达式使用详解与总结第12张注意\d+和[0-9]是有一定区别的，可以自行百度

re.match()函数：直接判断某个字符串是否符合某个形式

re.match() 函数是从头开始匹配一个符合规则的字符串，从起始位置开始匹配，匹配成功返回一个对象，未匹配成功返回None。

import re
result = re.match(r'1[35678]\d{9}','13111111111')
print(result.group())   #匹配成功
result = re.match(r'1[35678]\d{9}','12111111111')
print(result)     #none，第二位为2
result = re.match(r'1[35678]\d{9}','121111111112')
print(result)     #none，有12位

python 正则表达式使用总结，Python正则表达式使用详解与总结第13张

注意是起始位置

import re 
result=re.match("hello","hello world")
print(result)

python 正则表达式使用总结，Python正则表达式使用详解与总结第14张

import re 
result=re.match("hello123","hello world")
print(result)

python 正则表达式使用总结，Python正则表达式使用详解与总结第15张注意是开头匹配，否则的话是会返回None的

import re 
result=re.match("hello","qhello world")
print(result)

python 正则表达式使用总结，Python正则表达式使用详解与总结第16张

print(re.match('super','insuperable'))

python 正则表达式使用总结，Python正则表达式使用详解与总结第17张

import re
print(re.match('www', 'www.runoob.com').span())  # 在起始位置匹配
print(re.match('com', 'www.runoob.com'))         # 不在起始位置匹配

python 正则表达式使用总结，Python正则表达式使用详解与总结第18张

re.fullmatch(): 完全匹配

import re 
  
string = 'geeks'
pattern = 'g...s'
  
print(re.fullmatch(pattern, string))

python 正则表达式使用总结，Python正则表达式使用详解与总结第19张

import re 
  
string = 'geeks'
pattern = 'g..s'
  
print(re.fullmatch(pattern, string))

python 正则表达式使用总结，Python正则表达式使用详解与总结第20张

re.match()和re.fullmatch()的区别

import re 
  
string = 'geeks'
pattern = 'g...'
  
print(re.fullmatch(pattern, string))

import re 
# 注意这个差距
string = 'geeks'
pattern = 'g...'
  
print(re.match(pattern, string))

python 正则表达式使用总结，Python正则表达式使用详解与总结第21张

re.search()

re.search()会在字符串内查找模式匹配,只要找到第一个匹配然后返回，如果字符串没有匹配，则返回None

import re
 
ret = re.search(r"\d+", "阅读次数为 9999")
print(ret.group())

python 正则表达式使用总结，Python正则表达式使用详解与总结第22张

print(re.search('super','superstition').span())
print(re.search('super','superstition').group())

python 正则表达式使用总结，Python正则表达式使用详解与总结第23张

print(re.search('super','insuperable').span())

python 正则表达式使用总结，Python正则表达式使用详解与总结第24张

import re
txt = "The rain in Spain"
x = re.search("\s", txt)
print("The first white-space character is located in position:", x.start())

python 正则表达式使用总结，Python正则表达式使用详解与总结第25张

re.split(将一个字符串按照正则表达式匹配后进行分割）

import re
txt = "The rain in Spain"
x = re.split("\s", txt)
print(x)
import re
txt = "The rain in Spain"
x = re.split("\s", txt, 1)
print(x)

python 正则表达式使用总结，Python正则表达式使用详解与总结第26张

st = "abc1def23mn4xyz"
result = re.split(r"\d+",st)
print(result) ## zhge

st = "abc1def23mn4xyz"
result = re.split(r"[a-z]+",st)
print(result) ## zhge

string = "Hello,World,Python"
result = string.split(",")  # 使用逗号作为分隔符进行切分
print(result)  # 输出结果为 ['Hello', 'World', 'Python']
result = re.split(r",",string)
print(result) ## zhge

python 正则表达式使用总结，Python正则表达式使用详解与总结第27张

re.sub(): 替换匹配的子串

import re
txt = "The rain in Spain"
x = re.sub("\s", "9", txt)
print(x)
import re
txt = "The rain in Spain"
x = re.sub("\s", "9", txt, 2)
print(x)

python 正则表达式使用总结，Python正则表达式使用详解与总结第28张

import re 
st = "abc1def23mn4xyz"
result = re.sub(r"\d+","_",st)
print(result)

python 正则表达式使用总结，Python正则表达式使用详解与总结第29张

语法辨析

\s

python 正则表达式使用总结，Python正则表达式使用详解与总结第30张

import re 
p = re.compile("\s+")
# 可以看到这里得到的结果是类似的
s = "瓦房店分12打发打发的==大  发的\n是方\t法"
for m in p.finditer(s):
    print(m.span(), m.group())

python 正则表达式使用总结，Python正则表达式使用详解与总结第31张

*和+的区别

python 正则表达式使用总结，Python正则表达式使用详解与总结第32张

?操作符

python 正则表达式使用总结，Python正则表达式使用详解与总结第34张

{m,n}

python 正则表达式使用总结，Python正则表达式使用详解与总结第36张

案例1(匹配中文）

import re 
p = re.compile("[\u4e00-\u9fa5]+")
# 可以看到这里得到的结果是类似的
s = "瓦房店分12打发打发的==大法！！！发的是方法"
for m in p.finditer(s):
    print(m.span(), m.group())

python 正则表达式使用总结，Python正则表达式使用详解与总结第38张

案例2

案例3

案例4

案例5

案例6

从一个列表中根据字符串选出符合条件的字符串

import re
mylist = ["dog", "cat", "wildcat", "thundercat", "cow", "hooo"]
r = re.compile(".*cat") # 
newlist = list(filter(r.match, mylist)) # Read Note below
print(newlist)
## 运用r.match
# filter函数这个得好好学习。

python 正则表达式使用总结，Python正则表达式使用详解与总结第39张或者统一用pandas库就可以了，很方便的

pandas库正则表达式

pandas.str.match(元素匹配）

exampe1

import numpy as np
import pandas as pd
a = np.array(['A0','A1','A2','A3','A4','B0','B1','C0'])
pd.Series(a).str.match(r'A[0-2]')

python 正则表达式使用总结，Python正则表达式使用详解与总结第40张

example2

 s = pd.Series(['zzzz', 'zzzd', 'zzdd', 'zddd', 'dddn', 'ddnz', 'dnzn', 'nznz',
'znzn', 'nznd', 'zndd', 'nddd', 'ddnn', 'dnnn', 'nnnz', 'nnzn', 'nznn', 'znnn',
'nnnn', 'nnnd', 'nndd', 'dddz', 'ddzn', 'dznn',  'znnz', 'nnzz', 'nzzz', 'zzzn',
                'zznn', 'dddd', 'dnnd'])
#print(s.str.endswith("dd"))
#print("*"*50)
#print(s[s.str.endswith("dd")])
#print("*"*50)
print("*"*50)
print(s.str.match(".*dd$"))
print(s[s.str.match(".*dd$")])

python 正则表达式使用总结，Python正则表达式使用详解与总结第41张

pandas.str.extract

注意正则表达式里的括号里的内容就是最终返回匹配的内容

example1

import pandas as pd 
ele= ["Toy Story (1995)",
     "GoldenEye (1995)",
    "Four Rooms (1995)",
    "Get Shorty (1995)",
      "Copycat (1995)"]
df = pd.DataFrame({"movie_title":ele})
print(df)
df['just_movie_titles'] = df['movie_title'].str.extract('(.+?) \(')
df

python 正则表达式使用总结，Python正则表达式使用详解与总结第42张

example 2

import pandas as pd
df = pd.DataFrame({"col1":["1/1/100 'BA1", "1/1/102Packe", "1/1/102 'to_"]})
df["col2"]=df['col1'].str.extract('(\d+/\d+/\d+)', expand=True)
df

结果如下

python 正则表达式使用总结，Python正则表达式使用详解与总结第43张

example3

# importing pandas as pd 
import pandas as pd 
  
# importing re for regular expressions 
import re 
  
# Creating the Series 
sr = pd.Series(['New_York', 'Lisbon', 'Tokyo', 'Paris', 'Munich']) 
  
# Creating the index 
idx = ['City 1', 'City 2', 'City 3', 'City 4', 'City 5'] 
  
# set the index 
sr.index = idx 
  
# Print the series 
print(sr)
# extract groups having a vowel followed by 
# any character 
result = sr.str.extract(pat = '([aeiou].)') 
  
# print the result 
print(result)

python 正则表达式使用总结，Python正则表达式使用详解与总结第44张

example4

import pandas as pd 
s = pd.Series(['a1', 'b2', 'c3'])
s.str.extract(r'([ab])(\d)')

python 正则表达式使用总结，Python正则表达式使用详解与总结第45张

设置expand = True

s.str.extract(r'[ab](\d)', expand=True)

python 正则表达式使用总结，Python正则表达式使用详解与总结第46张设置新的列名

s.str.extract(r'(?P[ab])(?P\d)')

python 正则表达式使用总结，Python正则表达式使用详解与总结第47张

s.str.extract(r'(\d)')

python 正则表达式使用总结，Python正则表达式使用详解与总结第48张

s.str.extract(r'([ab])')

python 正则表达式使用总结，Python正则表达式使用详解与总结第49张

pandas.str.split

example1

import pandas as pd
temp = pd.DataFrame({'ticker' : ['spx 5/25/2001 p500', 'spx 5/25/2001 p600', 'spx 5/25/2001 p700']})
temp2 = temp.ticker.str.split(' ')
print(temp2)
temp2.str[-1]

python 正则表达式使用总结，Python正则表达式使用详解与总结第50张

抽取某一列的部分元素设置为新列

import pandas as pd
df = pd.DataFrame({ 'gene':["1 // foo // blabla",
                                   "2 // bar // lalala",
                                   "3 // qux // trilil",
                                   "4 // woz // hohoho"], 'cell1':[5,9,1,7], 'cell2':[12,90,13,87]})
print(df)
df['gene'] = df['gene'].str.split('//').str[1]
df

结果如下

python 正则表达式使用总结，Python正则表达式使用详解与总结第51张可以使用pandas.str.extract达到pandas.str.split的同样结果

import pandas as pd
df = pd.DataFrame({ 'gene':["1 // foo // blabla",
                                   "2 // bar // lalala",
                                   "3 // qux // trilil",
                                   "4 // woz // hohoho"], 'cell1':[5,9,1,7], 'cell2':[12,90,13,87]})
print(df)
df["gene"] = df["gene"].str.extract(r"\/\/([a-z ]+)\/\/")
print(df)
df["gene"] = df["gene"].str.strip()
df

结果如下

python 正则表达式使用总结，Python正则表达式使用详解与总结第52张

example2

import pandas as pd
df = pd.DataFrame({'Scenario':['HI','HI','HI','HI','HI','HI'],
                   'Savings':['Total_FFC_base0','Total_FFC_savings1','Total_FFC_saving2',
                              'Total_FFC_savings3','Total_site_base0','Total_site_savings1'],
                    'PC1':[0.12,0.15,0.12,0.17,0.12,0.15],
                    'PC2':[0.13,0.12,0.14,0.15,0.15,0.15]})
print(df)
df[['Savings', 'EL']] = df['Savings'].str.extract('_(?P.*)_.*(?P\d+)')
df

python 正则表达式使用总结，Python正则表达式使用详解与总结第53张

import pandas as pd
df = pd.DataFrame({'Scenario':['HI','HI','HI','HI','HI','HI'],
                   'Savings':['Total_FFC_base0','Total_FFC_savings1','Total_FFC_saving2',
                              'Total_FFC_savings3','Total_site_base0','Total_site_savings1'],
                    'PC1':[0.12,0.15,0.12,0.17,0.12,0.15],
                    'PC2':[0.13,0.12,0.14,0.15,0.15,0.15]})
print(df)
df['Savings'].str.extract('(.*)_(.*)_(.*)')

python 正则表达式使用总结，Python正则表达式使用详解与总结第54张

df['Savings'].str.extract('(.*)_(.*)_(.*)\d')

python 正则表达式使用总结，Python正则表达式使用详解与总结第55张

df['Savings'].str.extract('(.*)')

python 正则表达式使用总结，Python正则表达式使用详解与总结第56张

df['Savings'].str.extract(r'(\d+)')
# 匹配的内容都是括号括起来的，括号外面的相当于是标志物，不参与最终的表达结果。

python 正则表达式使用总结，Python正则表达式使用详解与总结第57张

实例操作1

import numpy as np 
import pandas as pd 
ele = np.array(['CD1C_P14_S91', 'CD1C_P14_S96', 'CD1C_P3_S12', 'CD141_P7_S22',
       'CD141_P7_S24', 'CD1C_P4_S36', 'CD141_P7_S7', 'CD141_P8_S27',
       'CD141_P8_S31', 'CD141_P9_S72', 'pDC_P10_S73', 'pDC_P10_S74',
       'pDC_P10_S83', 'pDC_P13_S56', 'pDC_P13_S59', 'pDC_P13_S70',
       'pDC_P14_S76', 'pDC_P14_S78', 'pDC_P14_S87', 'pDC_P14_S89',
       'pDC_P14_S90', 'pDC_P14_S91', 'pDC_P14_S92', 'pDC_P3_S14',
       'pDC_P3_S16', 'pDC_P3_S17', 'pDC_P3_S18', 'pDC_P3_S1',
       'pDC_P3_S21', 'pDC_P3_S2', 'pDC_P3_S4', 'pDC_P3_S5', 'pDC_P4_S28',
       'pDC_P4_S29', 'pDC_P4_S30', 'pDC_P4_S36', 'pDC_P4_S37',
       'pDC_P4_S40', 'pDC_P4_S42', 'pDC_P4_S43', 'pDC_P4_S45',
       'pDC_P4_S46', 'pDC_P4_S48', 'pDC_P7_S15', 'pDC_P7_S16',
       'pDC_P7_S17', 'pDC_P7_S1', 'pDC_P7_S21', 'pDC_P7_S22', 'pDC_P7_S3',
       'pDC_P7_S7', 'pDC_P8_S26', 'pDC_P8_S28', 'pDC_P8_S32',
       'pDC_P8_S34', 'pDC_P8_S39', 'pDC_P8_S40', 'pDC_P8_S42',
       'pDC_P8_S44', 'pDC_P8_S46', 'pDC_P8_S47', 'pDC_P9_S52',
       'pDC_P9_S54', 'pDC_P9_S61', 'pDC_P9_S63', 'pDC_P9_S65',
       'pDC_P9_S71', 'DoubleNeg_P10_S73', 'DoubleNeg_P10_S76',
       'DoubleNeg_P10_S79', 'DoubleNeg_P10_S80', 'DoubleNeg_P10_S81',
       'DoubleNeg_P10_S84', 'DoubleNeg_P10_S86', 'DoubleNeg_P13_S49',
       'DoubleNeg_P13_S53', 'DoubleNeg_P13_S64', 'DoubleNeg_P13_S67',
       'DoubleNeg_P14_S74', 'DoubleNeg_P14_S78', 'DoubleNeg_P14_S81',
       'DoubleNeg_P14_S82', 'DoubleNeg_P14_S83', 'DoubleNeg_P14_S87',
       'DoubleNeg_P14_S90', 'DoubleNeg_P14_S92', 'DoubleNeg_P14_S95',
       'DoubleNeg_P3_S1', 'DoubleNeg_P3_S20', 'DoubleNeg_P3_S24',
       'DoubleNeg_P3_S3', 'DoubleNeg_P3_S5', 'DoubleNeg_P3_S7',
       'DoubleNeg_P4_S29', 'DoubleNeg_P4_S30', 'DoubleNeg_P4_S35',
       'DoubleNeg_P4_S39', 'DoubleNeg_P4_S42', 'DoubleNeg_P4_S45',
       'DoubleNeg_P4_S46', 'DoubleNeg_P7_S11', 'DoubleNeg_P7_S13',
       'DoubleNeg_P7_S14', 'DoubleNeg_P7_S16', 'DoubleNeg_P7_S24',
       'DoubleNeg_P7_S2', 'DoubleNeg_P7_S3', 'DoubleNeg_P7_S5',
       'DoubleNeg_P7_S7', 'DoubleNeg_P7_S8', 'DoubleNeg_P8_S25',
       'DoubleNeg_P8_S30', 'DoubleNeg_P8_S38', 'DoubleNeg_P8_S41',
       'DoubleNeg_P8_S42', 'DoubleNeg_P8_S43', 'DoubleNeg_P8_S44',
       'DoubleNeg_P9_S64', 'DoubleNeg_P9_S66', 'CD1C_P13_S57',
       'CD1C_P13_S63', 'CD1C_P14_S85'])
               
df = pd.DataFrame({"cell":ele})
df

python 正则表达式使用总结，Python正则表达式使用详解与总结第58张

测试1(仅仅抽取大写字母）

df["cell"].str.extract(r"([A-Z]+)")

python 正则表达式使用总结，Python正则表达式使用详解与总结第59张

测试2(抽取大写字母和小写字母)

df["cell"].str.extract(r"([A-Za-z]+)")

python 正则表达式使用总结，Python正则表达式使用详解与总结第60张

测试3(联合使用）

df["cell"].str.extract(r"([A-Za-z]+\d+[A-Za-z]+)")
# CD141不符合,注意这个NaN值

python 正则表达式使用总结，Python正则表达式使用详解与总结第61张

测试4（使用split）

print(df["cell"].str.split("_").str[0])
print(df["cell"].str.split("_").str[0].value_counts())

python 正则表达式使用总结，Python正则表达式使用详解与总结第62张

测试5（使用正则表达式）

# [a-zA-Z0-9] 判断字母和数字
print(df["cell"].str.extract(r"([a-zA-Z0-9]+)"))
print(df["cell"].str.extract(r"([a-zA-Z0-9]+)").value_counts())

python 正则表达式使用总结，Python正则表达式使用详解与总结第63张

pandas.str.fullmatch()

我刚测试出这个pandas.str.match()和pandas.str.fullmatch()是存在区别的

比如

import numpy as np
import pandas as pd
## 这个只是部分match就行
a = np.array(['A0','A11','A2-','A3','A4','B0','B1','C0'])
pd.Series(a).str.match(r'A[0-2]')

结果如下

python 正则表达式使用总结，Python正则表达式使用详解与总结第64张

可以看到pandas.str.match()是部分匹配，不管后面的是否匹配，只要前面满足条件就行

但是同样的数据，对于pandas.str.fullmatch()的结果就不一样了

# 这个得完全match，我就说有一定的问题吧
pd.Series(a).str.fullmatch(r'A[0-2]')

python 正则表达式使用总结，Python正则表达式使用详解与总结第65张需要额外注意

« 2025年7月 »
一	二	三	四	五	六	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

python 正则表达式使用总结，Python正则表达式使用详解与总结

速查表

re库使用

re.findall()寻找所有符合特定形式的子串

\d+

\d

finditer()返回字符串位置

\d+

\d

re.match()函数：直接判断某个字符串是否符合某个形式

re.fullmatch(): 完全匹配

re.match()和re.fullmatch()的区别

re.search()

re.split(将一个字符串按照正则表达式匹配后进行分割）

re.sub(): 替换匹配的子串

语法辨析

\s

*和+的区别

?操作符

{m,n}

案例1(匹配中文）

案例2

案例3

案例4

案例5

案例6

从一个列表中根据字符串选出符合条件的字符串

pandas库正则表达式

pandas.str.match(元素匹配）

exampe1

example2

pandas.str.extract

example1

example 2

example3

example4

pandas.str.split

example1

example2

实例操作1

测试1(仅仅抽取大写字母）

测试2(抽取大写字母和小写字母)

测试3(联合使用）

测试4（使用split）

测试5（使用正则表达式）

pandas.str.fullmatch()

相关阅读

发表评论取消回复

还没有评论，来说两句吧...

目录[+]