IT技术 - 注释正则表达式 - 吾爱随笔录

注释正则表达式

IT技术 javascript regex comments

2021-03-15 12:41:03

我正在尝试在 JavaScript 中评论正则表达式。

似乎有很多关于如何使用正则表达式从代码中删除注释的资源，但实际上并没有如何在 JavaScript 中注释正则表达式，因此它们更容易理解。

5个回答

不幸的是，JavaScript 没有像其他语言那样的正则表达式文字的详细模式。不过，您可能会觉得这很有趣。

代替任何外部库，最好的办法是使用普通字符串并注释：

var r = new RegExp(
    '('      + //start capture
    '[0-9]+' + // match digit
    ')'        //end capture
); 
r.test('9'); //true

@plemarquand 逃避成为一种痛苦。在字符串中嵌入语言时，需要考虑其特殊字符。例如，/s\/\d+/变成's\\/\\d+'。因此，在动态构建时需要小心。基本上，您在使用语言汤的服务器端遇到的麻烦相同。

2021-04-28 12:41:03

有趣但明显的限制是它不允许您构建正则表达式文字。

2021-05-09 12:41:03

使用“new RegExp()”语法无法获得的正则表达式文字能获得什么？

2021-05-12 12:41:03

虽然 Javascript 本身不支持多行和带注释的正则表达式，但构造完成相同事情的东西很容易 - 使用一个函数接收（多行，带注释的）字符串并从该字符串返回正则表达式, 没有注释和换行符。

以下代码段模仿其他风格x（“扩展”）标志的行为，它忽略模式中的所有空白字符以及注释，用表示#：

function makeExtendedRegExp(inputPatternStr, flags) {
  // Remove everything between the first unescaped `#` and the end of a line
  // and then remove all unescaped whitespace
  const cleanedPatternStr = inputPatternStr
    .replace(/(^|[^\\])#.*/g, '$1')
    .replace(/(^|[^\\])\s+/g, '$1');
  return new RegExp(cleanedPatternStr, flags);
}


// The following switches the first word with the second word:
const input = 'foo bar baz';
const pattern = makeExtendedRegExp(String.raw`
  ^       # match the beginning of the line
  (\w+)   # 1st capture group: match one or more word characters
  \s      # match a whitespace character
  (\w+)   # 2nd capture group: match one or more word characters
`);
console.log(input.replace(pattern, '$2 $1'));

通常，要在 Javascript 字符串中表示反斜杠，必须对每个文字反斜杠进行双重转义，例如str = 'abc\\def'. 但是正则表达式经常使用很多反斜杠，而双重转义会使模式的可读性大大降低，所以在编写带有很多反斜杠的 Javascript 字符串时，最好使用String.raw模板字面量，它允许单个类型的反斜杠实际表示一个字面反斜杠，无需额外转义。

就像使用标准x修饰符一样，要匹配#字符串中的实际值，只需先将其转义即可，例如

foo\#bar     # comments go here

显示代码片段

// this function is exactly the same as the one in the first snippet

function makeExtendedRegExp(inputPatternStr, flags) {
  // Remove everything between the first unescaped `#` and the end of a line
  // and then remove all unescaped whitespace
  const cleanedPatternStr = inputPatternStr
    .replace(/(^|[^\\])#.*/g, '$1')
    .replace(/(^|[^\\])\s+/g, '$1');
  return new RegExp(cleanedPatternStr, flags);
}


// The following switches the first word with the second word:
const input = 'foo#bar baz';
const pattern = makeExtendedRegExp(String.raw`
  ^       # match the beginning of the line
  (\w+)   # 1st capture group: match one or more word characters
  \#      # match a hash character
  (\w+)   # 2nd capture group: match one or more word characters
`);
console.log(input.replace(pattern, '$2 $1'));

请注意，要匹配文字空格字符（而不仅仅是任何空白字符），x在任何环境（包括上述环境）中使用标志时，您必须使用\第一个对空格进行转义，例如：

^(\S+)\ (\S+)   # capture the first two words

如果您想频繁匹配空格字符，这可能会变得有点乏味并使模式更难阅读，类似于双重转义反斜杠不是很可取。允许未转义空格字符的一种可能（非标准）修改是仅去除行首和行尾的空格以及#注释前的空格：

function makeExtendedRegExp(inputPatternStr, flags) {
  // Remove the first unescaped `#`, any preceeding unescaped spaces, and everything that follows
  // and then remove leading and trailing whitespace on each line, including linebreaks
  const cleanedPatternStr = inputPatternStr
    .replace(/(^|[^\\]) *#.*/g, '$1')
    .replace(/^\s+|\s+$|\n/gm, '');
  console.log(cleanedPatternStr);
  return new RegExp(cleanedPatternStr, flags);
}


// The following switches the first word with the second word:
const input = 'foo bar baz';
const pattern = makeExtendedRegExp(String.raw`
  ^             # match the beginning of the line
  (\w+) (\w+)   # capture the first two words
`);
console.log(input.replace(pattern, '$2 $1'));

在其他几种语言（特别是 Perl）中，有一个特殊的x标志。设置后，正则表达式会忽略其中的任何空格和注释。遗憾的是，javascript regexp 不支持该x标志。

缺乏语法，利用可读性的唯一方法是约定。我的方法是在棘手的正则表达式之前添加一条注释，将其包含在内，就好像您拥有 x 标志一样。例子：

/*
  \+?     #optional + sign
  (\d*)   #the integeric part
  (       #begin decimal portion
     \.
     \d+  #decimal part
  )
 */
var re = /\+?(\d*)(\.\d+)/;

对于更复杂的示例，您可以在此处和此处查看我使用该技术所做的工作。

+1 #read above, I'm not repeating this crap（是的，有人关注您的链接）。

2021-05-11 12:41:03

在2021年，我们可以利用这样做模板文字具有String.raw（）适用于它。

VerboseRegExp `
    (
        foo*                  // zero or more foos
        (?: bar | baz )       // bar or baz
        quux?                 // maybe a quux
    )
    \s \t \r \n \[ \] \/ \`   // invisible whitespace is ignored ...
    [ ]                       // ... unless you put it in a character class
`
`gimy`                        // flags go here

// returns the RegExp /(foo*(?:bar|baz)quux?)\s\t\r\n\[\]\/\`[ ]/gimy

的实施VerboseRegExp：

const VerboseRegExp = (function init_once () {
    const cleanupregexp = /(?<!\\)[\[\]]|\s+|\/\/[^\r\n]*(?:\r?\n|$)/g
    return function first_parameter (pattern) {
        return function second_parameter (flags) {
            flags = flags.raw[0].trim()
            let in_characterclass = false
            const compressed = pattern.raw[0].replace(
                cleanupregexp,
                function on_each_match (match) {
                    switch (match) {
                        case '[': in_characterclass = true; return match
                        case ']': in_characterclass = false; return match
                        default: return in_characterclass ? match : ''
                    }
                }
            )
            return flags ? new RegExp(compressed, flags) : new RegExp(compressed)
        }
    }
})()

请参阅JavaScript 中的详细正则表达式了解其.raw[0]作用。

我建议您在带有正则表达式的行上方放置一个正则注释，以便对其进行解释。

你将拥有更多的自由。

@dystroy Hah :) 我永远不会为电子邮件验证编写正则表达式，我在说明正则表达式变得非常不可读非常快。这是来自regular-expressions.info/email.html

2021-04-27 12:41:03

在这种情况下，行上方的常规注释将如何帮助：(?:[a-z0-9!#$%&'*+/=?^_{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_

{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])$

这是电子邮件的正则表达式

2021-04-29 12:41:03

@BenjaminGruenbaum 你知道你的电子邮件正则表达式可能有缺陷，对吧？

2021-05-09 12:41:03

您在上面放置了多行注释。并且您逐块解释它（与 Explosion Pills 建议的方式相同，但在上面。如果您需要修改正则表达式会方便得多）。

2021-05-10 12:41:03

+1 表示邮件+正则表达式。但是回到这个话题，我一直在使用 above-regex-comments 很长一段时间，我发现它非常有效。这留下了完整描述意图和实施方式的空间。

2021-05-11 12:41:03

其它你可能感兴趣的问题

上一篇在 HTML 中使用内联事件处理程序是不好的做法吗？下一篇javascript中的Math.random是如何实现随机性的？