数据挖掘 - 删除给定文本中某个字符之后的字符串 - 吾爱随笔录

删除给定文本中某个字符之后的字符串

数据挖掘 r 数据清理

2021-09-27 23:48:02

我有一个如下所示的数据集。我想删除字符©之后的所有字符。我怎么能在 R 中做到这一点？

data_clean_phrase <- c("Copyright © The Society of Geomagnetism and Earth", 
"© 2013 Chinese National Committee ")

data_clean_df <- as.data.frame(data_clean_phrase)

2个回答

例如：

 rs<-c("copyright @ The Society of mo","I want you to meet me @ the coffeshop")
 s<-gsub("@.*","",rs)
 s
 [1] "copyright "             "I want you to meet me "

或者，如果您想保留 @ 字符：

 s<-gsub("(@).*","\\1",rs)
 s
 [1] "copyright @"             "I want you to meet me @"

编辑：如果您想要从最后一个 @ 中删除所有内容，您只需按照前面的示例使用适当的正则表达式。例子：

rs<-c("copyright @ The Society of mo located @ my house","I want you to meet me @ the coffeshop")
s<-gsub("(.*)@.*","\\1",rs)
s
[1] "copyright @ The Society of mo located " "I want you to meet me "

鉴于我们正在寻找的匹配， sub 和 gsub 都会给你相同的答案。

为了完整起见：您可以使用 stringr 包来提取您想要的内容。

library(stringr)
data_clean_phrase <- c("Copyright © The Society of Geomagnetism and Earth", 
                       "© 2013 Chinese National Committee ")

str_extract(data_clean_phrase, "^(.*?©)") # including the @
str_extract(data_clean_phrase, "^.*(?=(©))") # excluding the @

注意：我选择了str_extract，你也可以选择str_remove。

其它你可能感兴趣的问题

上一篇VC维度的确切定义是什么？下一篇专业人工翻译的 bleu 分数是多少？