JavaScript RegEx Syntax -


i'm writing c# code parse javascript tokens, , knowledge of javascript not 100%.

one thing threw me javascript regular expressions not enclosed in quotes. how parser detect when start , end? looks start / can contain character after that.

note not asking syntax needed match characters, results google searches about. want know rules determining how know regular expression starts , ends.

i consider following regexp reasonable approximation.

/(\\/|[^/])+/([a-za-z])* 

the rules formally defined:

 regularexpressionliteral ::  see 7.8.5      / regularexpressionbody / regularexpressionflags   regularexpressionbody ::  see 7.8.5      regularexpressionfirstchar regularexpressionchars   regularexpressionchars ::  see 7.8.5      [empty]      regularexpressionchars regularexpressionchar   regularexpressionfirstchar ::  see 7.8.5      regularexpressionnonterminator not 1 of * or \ or / or [      regularexpressionbackslashsequence      regularexpressionclass   regularexpressionchar ::  see 7.8.5      regularexpressionnonterminator not \ or / or [      regularexpressionbackslashsequence      regularexpressionclass   regularexpressionbackslashsequence ::  see 7.8.5      \ regularexpressionnonterminator   regularexpressionnonterminator ::  see 7.8.5      sourcecharacter not lineterminator   regularexpressionclass ::  see 7.8.5      [ regularexpressionclasschars ]   regularexpressionclasschars ::   see 7.8.5      [empty]      regularexpressionclasschars regularexpressionclasschar   regularexpressionclasschar ::   see 7.8.5      regularexpressionnonterminator not ] or \      regularexpressionbackslashsequence  regularexpressionflags ::  see 7.8.5      [empty]      regularexpressionflags identifierpart 

full specification

here quick , dirty code might started.

class charstream {     private readonly stack<int> _states;     private readonly string _input;     private readonly int _length;     private int _index;      public char current     {         { return _input[_index]; }     }      public charstream(string input)     {         _states = new stack<int>();         _input = input;         _length = input.length;         _index = -1;     }      public bool next()     {         if (_index < 0)             _index++;         if (_index == _length)             return false;         _index++;         return true;     }      public bool expectnext(char c)     {         if (_index == _length)             return false;         if (_input[_index + 1] != c)             return false;         _index++;         return true;     }      public bool back()     {         if (_index == 0)             return false;         _index--;         return true;     }      public void pushstate()     {         _states.push(_index);     }      public t popstate<t>()     {         _index = _states.pop();         return default(t);     } }  static string parseregularexpressionliteral(charstream cs) {     string body, flags;     cs.pushstate();     if (!cs.expectnext('/'))         return cs.popstate<string>();     if ((body = parseregularexpressionbody(cs)) == null)         return cs.popstate<string>();     if (!cs.expectnext('/'))         return cs.popstate<string>();     if ((flags = parseregularexpressionflags(cs)) == null)         return cs.popstate<string>();     return "/" + body + "/" + flags; }  static string parseregularexpressionbody(charstream cs) {     string firstchar, chars;     cs.pushstate();     if ((firstchar = parseregularexpressionfirstchar(cs)) == null)         return cs.popstate<string>();     if ((chars = parseregularexpressionchars(cs)) == null)         return cs.popstate<string>();     return firstchar + chars; }  static string parseregularexpressionchars(charstream cs) {     var sb = new stringbuilder();     string @char;     while ((@char = parseregularexpressionchar(cs)) != null)         sb.append(@char);     return sb.tostring(); }  static string parseregularexpressionfirstchar(charstream cs) {     return null; }  static string parseregularexpressionchar(charstream cs) {     return null; }  static string parseregularexpressionbackslashsequence(charstream cs) {     return null; }  static string parseregularexpressionnonterminator(charstream cs) {     return null; }  static string parseregularexpressionclass(charstream cs) {     return null; }  static string parseregularexpressionclasschars(charstream cs) {     return null; }  static string parseregularexpressionclasschar(charstream cs) {     return null; }  static string parseregularexpressionflags(charstream cs) {     return null; } 

as how find end of literal? trick recursively follow productions have listed. consider production regularexpressionbody. reading production tells me requires regularexpressionfirstchar followed regularexpressionchars. notice how regularexpressionchars has either [empty] or regularexpressionchars regularexpressionchar. defined itself. once production terminates [empty] know valid character should closing /. if not found not valid literal.


Comments

Popular posts from this blog

c# - how to write client side events functions for the combobox items -

exception - Python, pyPdf OCR error: pyPdf.utils.PdfReadError: EOF marker not found -