Tokenizing a String : Using Regular Expressions

Posted on

In my previous blog post I gave some PL/SQL that performed the tokenising of a string. Check out this blog post here.

Thanks also to the people who sent me links examples of how to tokenise a string using the MODEL clause. Yes there are lots of examples of this out there on the interest.

While performing the various searches on the internet I did come across some examples of using Regular Expressions to extract the tokens. The following example is thanks to a blog post by Tanel Poder

I’ve made some minor changes to it to remove any of the special characters we want to remove.

column token format a40

define separator=” “

define mystring=”$My OTN LA Tour (2014?) will consist of Panama, CostRica and Mexico.”

define myremove=”\?|\#|\$|\.|\,|\;|\:|\&|\(|\)|\-“;

 

SELECT regexp_replace(REGEXP_REPLACE(

                 REGEXP_SUBSTR( ‘&mystring’||’&separator’, ‘(.*?)&separator’, 1, LEVEL )

                           , ‘&separator$’, ”), ‘&myremove’, ”) TOKEN

FROM

    DUAL

CONNECT BY

    REGEXP_INSTR( ‘&mystring’||’&separator’, ‘(.*?)&separator’, 1, LEVEL ) > 0

ORDER BY

LEVEL ASC

/

When we run this code we get the following output.

Token fun 3

So we have a number of options open to use to tokenise strings using SQL and PL/SQL, using a number of approaches including substring-ing, using pipelined functions, using the Model clause and also using Regular Expressions.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s