RegExp.exec() 偶尔返回 NULL

IT技术 javascript regex
2021-03-16 16:21:01

我真的为此发疯了,我已经花了不成比例的时间来试图弄清楚这里发生了什么。所以请帮我一把 =)

我需要在 JavaScript 中对字符串进行一些 RegExp 匹配。不幸的是,它的行为非常奇怪。这段代码:

var rx = /(cat|dog)/gi;
var w = new Array("I have a cat and a dog too.", "There once was a dog and a cat.", "I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat.");

for (var i in w) {
    var m = null;
    m = rx.exec(w[i]);
    if(m){
        document.writeln("<pre>" + i + "\nINPUT: " + w[i] + "\nMATCHES: " + m.slice(1) + "</pre>");
    }else{
        document.writeln("<pre>" + i + "\n'" + w[i] + "' FAILED.</pre>");
    }
}

为前两个元素返回“cat”和“dog”,这是应该的,但随后一些 -exec()调用开始返回null我不明白为什么。

我在这里发布了一个 Fiddle ,您可以其中运行和编辑代码。

到目前为止,我已经在 Chrome 和 Firefox 中尝试过这个。

4个回答

哦,它来了。因为您正在定义您的 regex 全局,它首先匹配cat,然后在循环的第二遍匹配dog所以,基本上你只需要重置你的正则表达式(它是内部指针)。参见 这:

var w = new Array("I have a cat and a dog too.", "I have a cat and a dog too.", "I have a cat and a dog too.", "I have a cat and a dog too.");

for (var i in w) {
    var rx = /(cat|dog)/gi;
    var m = null;
    m = rx.exec(w[i]);
    if(m){
        document.writeln("<p>" + i + "<br/>INPUT: " + w[i] + "<br/>MATCHES: " + w[i].length + "</p>");
    }else{
        document.writeln("<p><b>" + i + "<br/>'" + w[i] + "' FAILED.</b><br/>" + w[i].length + "</p>");
    }
    document.writeln(m);
}
哇——“正则表达式的内部指针”?你能推荐一个关于那个的资源吗?谢谢!
2021-04-24 16:21:01
哇...我在过去的 14 年里一直在密集地编写 JavaScript,并且RexExp在过去的 8 年里越来越密集——这让我非常震惊。如果我更擅长 Perl,我会对此有更好的理解吗?
2021-05-11 16:21:01

regex 对象有一个属性lastIndex,当您运行exec. 因此,当您在例如“我也有一只猫和一只狗。”上执行正则表达式时,lastIndex设置为 12。下次您exec在同一个正则表达式对象上运行时,它从索引 12 开始查找。因此您必须重置该lastIndex属性每次运行之间。

感谢您的解释!通过设置myRe.lastIndex = 0;以供后续使用,它有很大帮助
2021-04-25 16:21:01
同意这应该是正确答案。它重用了相同的正则表达式对象并解释了内部机制。OP应该考虑改变。
2021-05-06 16:21:01
我认为这应该是正确的答案,因为它遵循重用相同正则表达式对象的最佳实践
2021-05-13 16:21:01

两件事情:

  1. 使用(全局)标志提到的重置需要g为了解决这个问题,我建议简单地分配0对象lastIndex成员RegExp这比销毁并重新创建具有更好的性能。
  2. 使用in关键字来遍历Array对象时要小心,因为可能会导致某些库出现意外结果。有时你应该检查类似的东西isNaN(i),或者如果你知道它没有漏洞,使用经典的 for 循环。

代码可以是:

var rx = /(cat|dog)/gi;
w = ["I have a cat and a dog too.", "There once was a dog and a cat.", "I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat."];

for (var i in w)
 if(!isNaN(i))        // Optional, check it is an element if Array could have some odd members.
  {
   var m = null;
   m = rx.exec(w[i]); // Run
   rx.lastIndex = 0;  // Reset
   if(m)
    {
     document.writeln("<pre>" + i + "\nINPUT: " + w[i] + "\nMATCHES: " + m.slice(1) + "</pre>");
    } else {
     document.writeln("<pre>" + i + "\n'" + w[i] + "' FAILED.</pre>");
    }
  }
不过,最好不要g在您不想要的时候使用该标志。创建一个专门更新的正则表达式lastIndex只是为了在每次执行后重置它是没有意义的
2021-05-08 16:21:01
这应该是正确答案。设置rx.lastIndex = 0比在循环内重新创建 RegEx 对象要好得多。
2021-05-17 16:21:01
您可能确实希望在一个项目中进行全局搜索,然后在下一个项目中重置和重用正则表达式。我认为 OP 代码只是一个例子来说明什么不明白。
2021-05-20 16:21:01

我只使用 /g 时遇到了类似的问题,这里建议的解决方案在 FireFox 3.6.8 中对我不起作用。我得到了我的脚本

var myRegex = new RegExp("my string", "g");

我添加这个以防其他人遇到与上述解决方案相同的问题。

这个“错误”在 ES5 中得到了修复。最初,文字正则表达式只被实例化一次。因此没有必要将它们存储在变量中。以前的简洁while(/a/g.exec(text)) {...} 现在必须写为regex = /a/g; while(regex.exec(text)) {...}. 可能此更改破坏了网络上的许多代码,但不太容易出错。另一方面,当您想lastIndex在每次执行后重置时,正确的解决方案始终是移除g标志。
2021-05-09 16:21:01