Below, there are some useful regular expressions for matching C-like primitive data values.
Before presenting the regular expressions I’ll introduce you first some named patterns:
/* universal character name */ UCN (\\u[0-9a-fA-F]{4}|\\U[0-9a-fA-F]{8}) /* exponent part of a floating point number */ EXP ([Ee][-+]?[0-9]+) /* length part of an integer number */ ILEN ([Uu](L|l|LL|ll)?|(L|l|LL|ll)[Uu]?)
Having the above named patterns as basic building blocks we can describe more complex information.
For integer numbers:
/* integer in octal form */ 0[0-7]*{ILEN}? /* integer in decimal form */ [1-9][0-9]*{ILEN}? /* integer in hexadecimal form */ 0[Xx][0-9a-fA-F]+{ILEN}?
For floating point numbers:
/* floating point number in decimal form */ ([0-9]*\.[0-9]+|[0-9]+\.){EXP}?[flFL]? [0-9]+{EXP}[flFL]? /* floating point number in hexadecimal form */ 0[Xx]([0-9a-fA-F]*\.[0-9a-fA-F]+|[0-9a-fA-F]+\.?)[Pp][-+]?[0-9]+[flFL]?
For character and string literals:
/* character literal */ \'([^'\\]|\\['"?\\abfnrtv]|\\[0-7]{1,3}|\\[Xx][0-9a-fA-F]+|{UCN})+\' /* string literal */ L?\"([^"\\]|\\['"?\\abfnrtv]|\\[0-7]{1,3}|\\[Xx][0-9a-fA-F]+|{UCN})*\"
The following regular expression is for matching identifiers and does not describe a primitive data value. However, because it is a very important regular expression in most compilers and is related with primitive data types and values we present it:
/* identifier */ ([_a-zA-Z]|{UCN})([_a-zA-Z0-9]|{UCN})*