Skip to content

kmarius/jsregexp

Repository files navigation

jsregexp

Provides ECMAScript regular expressions for Lua 5.1, 5.2, 5.3, 5.4 and LuaJit. Uses libregexp from Fabrice Bellard's QuickJS.

Installation

To install jsregexp globally with luarocks, run

sudo luarocks install jsregexp

To install jsregexp for a different lua version (in this case Lua5.1 or LuaJit), run

sudo luarocks --lua-version 5.1 install jsregexp

To install jsregexp locally for your user, run

luarocks --local --lua-version 5.1 install jsregexp

This will place the compiled module in $HOME/.luarocks/lib/lua/5.1 so $HOME/.luarocks/lib/lua/5.1/?.so needs to be added to package.cpath.

Simply running make in this project's root will compile the module jsregexp.so (tested on linux only).

Usage

This module provides two functions

jsregexp.compile(regex, flags?)
jsregexp.compile_safe(regex, flags?)

that take an ECMAScript regular expression as a string and an optional string of flags, most notably

  • "d" provide tables with begin/end indices of match groups in match objects
  • "i": case insensitive search
  • "g": match globally
  • "n": enables named groups (not present in JavaScript, needs to be enabled manually if needed)
  • "u": utf-16 support if detected in the pattern string (implicity set)

The complete list of flags can be found in the JavaScript reference.

On success, compile and compile_safe return a RegExp object. On failure, compile throws an error while compile_save returns nil and an error message.

RegExp object

Each RegExp object re has the following fields

re.last_index   -- the position at wchich the next match will be searched in re:exec or re:test (see notes below)
re.source       -- the regexp string
re.flags        -- a string representing the active flags
re.dot_all      -- is the dod_all flag set?
re.global       -- is the global flag set?
re.has_indices  -- is the indices flag set?
re.ignore_case  -- is the ignore_case flag set?
re.multiline    -- is the multiline flag set?
re.sticky       -- is the sticky flag set?
re.unicode      -- is the unicode flag set?

Calling tostring on a RegExp object returns representation in the form of "/<source>/<flags>".

The RegExp object re has the following methods corresponding to JavaScript regular expressions:

re:exec(str)                      -- returns the next match of re in str (see notes below)
re:test(str)                      -- returns true if the regex matches str (see notes below)
re:match(str)                     -- returns a list of all matches or nil if no match
re:match_all(str)                 -- returns a closure that repeatedly calls re:exec, to be used in for-loops
re:match_all_list(str)            -- returns a list of all matches
re:search(str)                    -- returns the 1-based index of the first match of re in str, or -1 if no match
re:split(str, limit?)             -- splits str at re, at most limit times
re:replace(str, replacement)      -- relplace the first match of re in str by replacement (all, if global)
re:replace_all(str, replacement)  -- relplace each match of re in str by replacement

For the documentation of the behaviour of each of these functions, see the JavaScript reference.

Note: Each regexp object has a field last_index which denotes the position at which the next call to exec and test searches for the next match. Afterwards last_index is changed accordingly. If you need to use these methods, you should reset last_index to 1.

Note: Because the regexp engine used works with UTF16 instead of UTF8, the input string is converted to UTF16 if necessary. Calling exec or test on non-Ascii strings repeatedly could potentially introduce a large overhead. This conversion only needs to be done once for the match* methods, you probably want to use those instead.

Match object

A match object m returned by exec and the match* functions has the following fields:

m[0]             -- the full match
m[i]             -- match group i
m.input          -- the input string
m.capture_count  -- number of capture groups
m.index          -- start of the capture (1-based)
m.groups         -- table of the named groups and their content
m.indices        -- table of begin/end indices of all match groups (if "d" flag is set)
m.indices.groups -- table of named groups and their begin/end indices (if "d" flag is set)

Calling tostring on a match object returns the full match m[0].

Example

local jsregexp = require("jsregexp")

local re, err = jsregexp.compile_safe("(\\w)\\w*", "g")
if not re then
	print(err)
	return
end

local str = "Hello World"

for match in re:match_all(str) do
	print(match)
	for j, group in ipairs(match) do
		print(j, group)
	end
end