我有一个如下所示的数据集。我想删除字符©之后的所有字符。我怎么能在 R 中做到这一点?
data_clean_phrase <- c("Copyright © The Society of Geomagnetism and Earth",
"© 2013 Chinese National Committee ")
data_clean_df <- as.data.frame(data_clean_phrase)
我有一个如下所示的数据集。我想删除字符©之后的所有字符。我怎么能在 R 中做到这一点?
data_clean_phrase <- c("Copyright © The Society of Geomagnetism and Earth",
"© 2013 Chinese National Committee ")
data_clean_df <- as.data.frame(data_clean_phrase)
例如:
rs<-c("copyright @ The Society of mo","I want you to meet me @ the coffeshop")
s<-gsub("@.*","",rs)
s
[1] "copyright " "I want you to meet me "
或者,如果您想保留 @ 字符:
s<-gsub("(@).*","\\1",rs)
s
[1] "copyright @" "I want you to meet me @"
编辑:如果您想要从最后一个 @ 中删除所有内容,您只需按照前面的示例使用适当的正则表达式。例子:
rs<-c("copyright @ The Society of mo located @ my house","I want you to meet me @ the coffeshop")
s<-gsub("(.*)@.*","\\1",rs)
s
[1] "copyright @ The Society of mo located " "I want you to meet me "
鉴于我们正在寻找的匹配, sub 和 gsub 都会给你相同的答案。
为了完整起见:您可以使用 stringr 包来提取您想要的内容。
library(stringr)
data_clean_phrase <- c("Copyright © The Society of Geomagnetism and Earth",
"© 2013 Chinese National Committee ")
str_extract(data_clean_phrase, "^(.*?©)") # including the @
str_extract(data_clean_phrase, "^.*(?=(©))") # excluding the @
注意:我选择了str_extract
,你也可以选择str_remove
。