javascript正则表达式选择引用的字符串但不是转义引号(javascript regex to select quoted string but not escape quotes)

原始字符串:

some text "some \"string\"right here "

想得到:

"some \"string\"right here"

我使用以下正则表达式:

/\"(.*?)\"/g

Original string:

some text "some \"string\"right here "

Want to get:

"some \"string\"right here"

I am using the following regex:

/\"(.*?)\"/g

最满意答案

使用解析器正确解析字符串

使用JavaScript正则表达式,无法以正确的双引号开始匹配。 您将匹配转义的一个,否则您将无法在引号之前的文字\之后匹配正确的双引号。 因此,最安全的方法是使用解析器。 这是一个样本:

var s = "some text \\\"extras\" some \\\"string \\\" right\" here \"";
console.log("Incorrect (with regex): ", s.match(/"([^"\\]*(?:\\.[^"\\]*)*)"/g));
var res = [];
var tmp = "";
var in_quotes = false;
var in_entity = false;
for (var i=0; i<s.length; i++) {
  if (s[i] === '\\' && in_entity  === false) { 
     in_entity = true;
     if (in_quotes === true) {
       tmp += s[i];
     }
  } else if (in_entity === true) { // add a match
      in_entity = false;
      if (in_quotes === true) {
         tmp += s[i];
      }
  } else if (s[i] === '"' && in_quotes === false) { // start a new match
      in_quotes = true;
      tmp += s[i];
  } else if (s[i] === '"'  && in_quotes === true) { // append char to match and add to results
      tmp += s[i];
      res.push(tmp);
      tmp = "";
      in_quotes = false;
  } else if (in_quotes === true) { // append a char to the match
     tmp += s[i];
  } 
}
console.log("Correct results: ", res); 
  
 

不那么安全的正则表达式方法

不可能将所需的字符串与延迟点匹配模式匹配,因为它将在第一个字符串之前停止。 如果您知道您的字符串在引用的子字符串之前永远不会有转义引号,并且如果您确定没有文字\在双引号之前 (这些条件非常严格,可以安全地使用正则表达式),你可以使用

/"([^"\\]*(?:\\.[^"\\]*)*)"/g

请参阅正则表达式演示

" - 匹配报价 ([^"\\]*(?:\\.[^"\\]*)*) - 0或更多序列 [^"\\]* - 0+非\和非" s (?:\\.[^"\\]*)* - 零个或多个序列 \\. - 任何转义符号 [^"\\]* - 0+非\和非" s " - 尾随报价

JS演示:

var re = /"([^"\\]*(?:\\.[^"\\]*)*)"/g; 
var str = `some text "some \\"string\\"right here " some text "another \\"string\\"right here "`;
var res = [];
while ((m = re.exec(str)) !== null) {
   res.push(m[1]);
}
document.body.innerHTML = "<pre>" + JSON.stringify(res, 0, 4) + "</pre>"; // Just for demo
console.log(res); // or another result demo 
  
 

Parsing the string correctly with a parser

With a JavaScript regex, it is impossible to start matching at the correct double quote. You will either match an escaped one, or you will fail to match the correct double quote after a literal \ before a quote. Thus, the safest way is to use a parser. Here is a sample one:

var s = "some text \\\"extras\" some \\\"string \\\" right\" here \"";
console.log("Incorrect (with regex): ", s.match(/"([^"\\]*(?:\\.[^"\\]*)*)"/g));
var res = [];
var tmp = "";
var in_quotes = false;
var in_entity = false;
for (var i=0; i<s.length; i++) {
  if (s[i] === '\\' && in_entity  === false) { 
     in_entity = true;
     if (in_quotes === true) {
       tmp += s[i];
     }
  } else if (in_entity === true) { // add a match
      in_entity = false;
      if (in_quotes === true) {
         tmp += s[i];
      }
  } else if (s[i] === '"' && in_quotes === false) { // start a new match
      in_quotes = true;
      tmp += s[i];
  } else if (s[i] === '"'  && in_quotes === true) { // append char to match and add to results
      tmp += s[i];
      res.push(tmp);
      tmp = "";
      in_quotes = false;
  } else if (in_quotes === true) { // append a char to the match
     tmp += s[i];
  } 
}
console.log("Correct results: ", res); 
  
 

Not-so-safe regex approach

It is not possible to match the string you need with lazy dot matching pattern since it will stop before the first ". If you know your string will never have an escaped quote before a quoted substring, and if you are sure there are no literal \ before double quotes (and these conditions are very strict to use the regex safely), you can use

/"([^"\\]*(?:\\.[^"\\]*)*)"/g

See the regex demo

" - match a quote ([^"\\]*(?:\\.[^"\\]*)*) - 0 or more sequences of [^"\\]* - 0+ non-\ and non"s (?:\\.[^"\\]*)* - zero or more sequences of \\. - any escaped symbol [^"\\]* - 0+ non-\ and non"s " - trailing quote

JS demo:

var re = /"([^"\\]*(?:\\.[^"\\]*)*)"/g; 
var str = `some text "some \\"string\\"right here " some text "another \\"string\\"right here "`;
var res = [];
while ((m = re.exec(str)) !== null) {
   res.push(m[1]);
}
document.body.innerHTML = "<pre>" + JSON.stringify(res, 0, 4) + "</pre>"; // Just for demo
console.log(res); // or another result demo 
  
 

更多推荐