Regular Expression Blueprints Documentation

Introduction

What are regular expressions?

Regular expressions are a high-level programming language used to search and match patterns. It is a powerful and yet infuriatingly difficult tool. This documentation will gently introduce you to how regular expressions work and take you through a few examples.

Cower before Regular Expressions!

When running a regular expression you provide the regex program with a block of text to search on and a pattern with which to search. The regex matcher returns a multi-dimensional array of match groups. Consider the following:

The Fool wander'd forth, his mind ablaze.
"Stop!" shouted a man "For I am the Knight of Pentacles, and this is MY land!"
"Your land?" he questioned "How can this be your land when the Queen hangs no Star in her keep?"
The Knight of Pentacles grew angry, and lowered his pike to do battle.
Search String
(Queen|Fool|Knight)(?:.+) (?:his|her) ((?:\w+)(?:['])?(?:\w+))
Search Pattern
Group 0
  [Full Match] Fool wander'd forth, his mind
  [1] Fool
  [2] mind
Group 1
  [Full Match] Queen hangs no Star in her keep
  [1] Queen
  [2] keep
Group 2
  [Full Match] Knight of Pentacles grew angry, and lowered his pike
  [1] Knight
  [2] pike
Search Results

Here you can see a typical search string, a search pattern and a search result.

The pattern is the most difficult part to understand. It looks like nonsense. In fact it is, regular expressions are nonsense. Nobody should have to learn them. That's why I made this plugin.

Here's what the above setup looks like in blueprint form:

Incomparable beauty

The pattern-builder nodes are generally self explanatory, but in order to know how to build the above results you will still need to understand regex and how it works. You still need to be able to visualise the pattern behind the nodes. So let's break it all down.

Queen|Fool|King

What this essentially says is "I want Queen OR Fool OR King." | is our OR delimiter, and so this node produces Queen|Fool|King.

( )

This node captures a single instance of the pattern provided to it. We already know that we have Queen|Fool|King, and this turns it into (Queen|Fool|King). The parentheses denote a capture group, which means the contents will end up in our results array.

Contrast this node to CaptureAll, which produces ( )+. Unlike CaptureOne will continue to capture as many instances of the pattern as possible as a new rersult group until it runs out of instances to match with.

(?:.+)

Let's speed things up a little and grab the next two nodes. MatchAnything produces .+. On its own this is a fairly useless term and must be wrapped in a group. For our purposes however we don't actually want to keep the results of this part of the pattern, so we wrap it in a Discard node, which gives us (?:.+).

In this instance the period . means match any one character, and the plus + modifies it to match one or more. The discard group firstly defines it as a group by using parentheses ( ) and then indicates that the group is discarded ?:.

Group modifiers seem to appear almost randomly in regex, but it is possible to learn them by wrote. Pretend that it's some kind of arcane magic adventure with word treasure at the end and it'll be a lot more fun.

(?:his|her)

Our next group is another Or node. We also discard this result. Regex will assemble the pattern in such a way that it will match this group before the previous MatchAnything group, so essentially the previous group will match results until this group can match instead.

((?:\\w+)(?:['])?(?:\\w+))

Oh boy, here I go regexing again. SimpleMatchAnyWholeWord is an example of where regex both shines and becomes an unfathomable nightmare. It's enough to say that (?:\w+)(?:['])?(?:\w+) is set up to match any whole word with an optional single element of punctuation such as don't, omg or whywouldyouwritethis.

Then, because we don't want the individual results such as don ' or t, we discard them. But we still want the whole word so we wrap it in a CaptureOne group again reversing the discard behaviour, but getting the accumulated results as one single group instead. This has the effect of matching the first word after the last result, e.g. his pike, her keep, his mind.

(Queen|Fool|Knight)(?:.+) (?:his|her) ((?:\w+)(?:['])?(?:\w+))

Finally, we have to append it all together into its final form. My pattern wouldn't match unless I put a couple of spaces between the patterns. This is because we were matching on words with \w. In short: we know words are delimited by spaces and the whitespace still needs to be accounted for to allow the pattern to match, so we can just hardcode it in instead of messing around with more nodes and making our pattern more complicated than it needs to be.

This brings us full circle back to our results array:

Group 0
  [Full Match] Fool wander'd forth, his mind
  [1] Fool
  [2] mind
Group 1
  [Full Match] Queen hangs no Star in her keep
  [1] Queen
  [2] keep
Group 2
  [Full Match] Knight of Pentacles grew angry, and lowered his pike
  [1] Knight
  [2] pike

This result can be printed to the screen or the output log by using the PrintRegularExpressionResult node, but its actual structure is:

TArray <FRegexResult>
    { TMap <int32, FString> }
Result structure

To access the results, you first need to loop over the TArray part of the results, and on each array member access the TMap row you're interested in by index.

Your results will pretty much always look the same as long as your pattern structure doesn't change. I absolutely recommend parameterizing patterns. I do not recommend ever using a dynamic pattern structure. This way rows 1 and 2 will always contain the results we want.

Quite often FullMatch is actually the result you want. It is always row 0. However in this case we're only interested in parts of our result and FullMatch has returned things we don't want.

With a bit of the magic of programming we can use our results thusly:

The Knight has a pike, 
but the Queen has a keep, 
but the Fool has a mind.
Not quite a haiku.

This is a fairly silly example of how regex works, but by now you should be able to see the unlimited power of regular expressions. It's highly useful for extracting text for various programmatic functions or cleaning up dynamic text input. It can be used for everything from string sanitation to pulling prices or data out of paragraphs of text.

My regular expression doesn't work!

A natural step in the development of any regular expression pattern is complete and utter bewilderment. You are not alone.

I am intending to add a real-time, in-graph preview of what a regex pattern is doing, but until I do I highly recommend the tool found at this website: https://regex101.com/

For sanity's sake I leave it open when I'm working on regex and I test everything there first. It also has an insightful summary on what each part of your pattern is doing, which can really help you start to understand how regex works.

Don't forget to check the plugin content folder, as there is a demo map and actor there that you can l00k at to find all of the examples used in this documentation.

If you're still stuck jump on our Discord server (https://discord.gg/PxnTuxs) and we'll help you out. Tips and gratitude-donations made out to [email protected] are much appreciated.

Other useful nodes in this plugin

Replace Diacritics

ReplaceDiacritics

This is not a pattern node and should be used on its own. ReplaceDiacritics strips a string of diacritical points such as accents and other marks favoured by death-metal bands. In every possible case it will replace the accented or otherwise umlauted version of a letter with its nearest low-ascii match. This can be useful if you'd like to use a user's character name as the index in a TMap or something similar. Do it, there's no reason they should be special.

Initial Capitals

InitialCapitals

InitialCapitals will take a string and convert the first letter to uppercase while leaving any other mixed case letters alone. You can also do this per-word. Useful stuff. This is also not a pattern node and is standalone.

PrintStringPlus

PrintStringPlus was born out of frustration at my own short-sightedness. I literally needed to make the on-screen debug text bigger. This node exposes a number of already-present arguments that PrintString doesn't, as well as adding features such as telling you how many times a particular key has been called.

If left as None Key will be ignored and the node will output to a new line each time it is called like an ordinary PrintString node. However, if a Key is provided, new instances of output will replace earlier on-screen instances using that key. Count Occurrences will cause the on-screen output to also note the number of times each key has been used. The Text parameter can change but as long as the Key remains the same then that data will be grouped and counted.