How to match across multiple lines using regex?

I’m trying to a regular expression search and replace for things like this: doc: text text next line next line</a:documentation> What I can’t figure out is how to do this multi-line regular expression match. If use “dot matches all” then my match is not limited to just what’s shown but everything between the first doc: and the last </a:documentation>. If do e.g. “doc:(.)</a:documentation>” then it matches cases where it all happens to be on one line. Is this possible? What bit of regex fu am I missing? Thanks, E. -- Eliot Kimber Senior Solutions Architect "Bringing Strategy, Content, and Technology Together" Main: 512.554.9368 www.reallysi.com www.rsuitecms.com

At 2014-01-11 13:17 -0600, Eliot Kimber wrote:
Content-Language: en-US Content-Type: text/plain; charset="utf-7"
I+IBk-m trying to a regular expression search and replace for things like this:
doc: text text next line next line+ADw-/a:documentation+AD4-
What I can+IBk-t figure out is how to do this multi-line regular expression match. If use +IBw-dot matches all+IB0- then my match is not limited to just what+IBk-s shown but everything between the first doc: and the last +ADw-/a:documentation+AD4-.
If do e.g. +IBw-doc:(.)+ADw-/a:documentation+AD4gHQ- then it matches cases where it all happens to be on one line.
Is this possible? What bit of regex fu am I missing?
I think it is the use of "\n". Try this to include all of the lines: doc:((.|\n)*?):documentation (note that I'm guessing what your boundaries are because of a mailer problem) I hope this helps. . . . . . . . . Ken -- Public XSLT, XSL-FO, UBL & code list classes: Melbourne, AU May 2014 | Contact us for world-wide XML consulting and instructor-led training | Free 5-hour lecture: http://www.CraneSoftwrights.com/links/udemy.htm | Crane Softwrights Ltd. http://www.CraneSoftwrights.com/x/ | G. Ken Holman mailto:gkholman@CraneSoftwrights.com | Google+ profile: http://plus.google.com/+GKenHolman-Crane/about | Legal business disclaimers: http://www.CraneSoftwrights.com/legal |

That works. But here’s another question: why is the outer group with the ? operator required? I tried just (.|\n)*, which should match anything between my start and end strings, but it does not, where ((.|\n)*?) does. Thanks, E. -- Eliot Kimber Senior Solutions Architect "Bringing Strategy, Content, and Technology Together" Main: 512.554.9368 www.reallysi.com www.rsuitecms.com On 1/11/14, 3:08 PM, "G. Ken Holman" <gkholman@CraneSoftwrights.com> wrote:
At 2014-01-11 13:17 -0600, Eliot Kimber wrote:
Content-Language: en-US Content-Type: text/plain; charset="utf-7"
Im trying to a regular expression search and replace for things like this:
doc: text text next line next line/a:documentation
What I cant figure out is how to do this multi-line regular expression match. If use dot matches all then my match is not limited to just whats shown but everything between the first doc: and the last /a:documentation.
If do e.g. doc:(.)/a:documentation then it matches cases where it all happens to be on one line.
Is this possible? What bit of regex fu am I missing?
I think it is the use of "\n".
Try this to include all of the lines: doc:((.|\n)*?):documentation
(note that I'm guessing what your boundaries are because of a mailer problem)
I hope this helps.
. . . . . . . . Ken
-- Public XSLT, XSL-FO, UBL & code list classes: Melbourne, AU May 2014 | Contact us for world-wide XML consulting and instructor-led training | Free 5-hour lecture: http://www.CraneSoftwrights.com/links/udemy.htm | Crane Softwrights Ltd. http://www.CraneSoftwrights.com/x/ | G. Ken Holman mailto:gkholman@CraneSoftwrights.com | Google profile: http://plus.google.com/麜稥馩Crane/about | Legal business disclaimers: http://www.CraneSoftwrights.com/legal |
_______________________________________________ oXygen-user mailing list oXygen-user@oxygenxml.com http://www.oxygenxml.com/mailman/listinfo/oxygen-user

Hi Eliot, The ? operator makes it a lazy match (first best match) instead of a greedy match (longest possible match). Without the ? operator the expression is greedy and will match a lot more content than expected. You can use either (.|\n)*? or .*? with "dot matches all", but in either case the ? operator is the one that limits the search to the first best match. Regards, Adrian Adrian Buza oXygen XML Editor and Author Support Tel: +1-650-352-1250 ext.202 Fax: +40-251-461482 support@oxygenxml.com http://www.oxygenxml.com On 11.01.2014 23:24, Eliot Kimber wrote:
That works. But here's another question: why is the outer group with the ? operator required? I tried just (.|\n)*, which should match anything between my start and end strings, but it does not, where ((.|\n)*?) does.
Thanks,
E.
_______________________________________________ oXygen-user mailing list oXygen-user@oxygenxml.com http://www.oxygenxml.com/mailman/listinfo/oxygen-user

Thanks, that clarifies things. I did look in the online help but didn’t find any specific guidance on regular expressions beyond “Oxygen uses Perl regular expressions”. I think at least a brief topic on regular expression syntax would be helpful, with a pointer to more information elsewhere. Cheers, E. -- Eliot Kimber Senior Solutions Architect "Bringing Strategy, Content, and Technology Together" Main: 512.554.9368 www.reallysi.com www.rsuitecms.com On 1/15/14, 8:24 AM, "Oxygen XML Editor Support" <support@oxygenxml.com> wrote:
Hi Eliot,
The ? operator makes it a lazy match (first best match) instead of a greedy match (longest possible match). Without the ? operator the expression is greedy and will match a lot more content than expected.
You can use either (.|\n)*? or .*? with "dot matches all", but in either case the ? operator is the one that limits the search to the first best match.
Regards, Adrian
Adrian Buza oXygen XML Editor and Author Support
Tel: +1-650-352-1250 ext.202 Fax: +40-251-461482 support@oxygenxml.comhttp://www.oxygenxml.com
On 11.01.2014 23:24, Eliot Kimber wrote:
That works. But here’s another question: why is the outer group with the ? operator required? I tried just (.|\n)*, which should match anything between my start and end strings, but it does not, where ((.|\n)*?) does.
Thanks,
E.
_______________________________________________ oXygen-user mailing list oXygen-user@oxygenxml.comhttp://www.oxygenxml.com/mailman/listinfo/oxygen- user

Hi, Starting with v15.2 we'll include in the Oxygen user manual a link to the Perl 5 regular expression documentation: http://perldoc.perl.org/perlre.html#Regular-Expressions However, there are probably some more useful Java regex tutorials out there: http://www.vogella.com/tutorials/JavaRegularExpressions/article.html Regards, Adrian Adrian Buza oXygen XML Editor and Author Support Tel: +1-650-352-1250 ext.202 Fax: +40-251-461482 support@oxygenxml.com http://www.oxygenxml.com On 15.01.2014 17:59, Eliot Kimber wrote:
Thanks, that clarifies things.
I did look in the online help but didn’t find any specific guidance on regular expressions beyond “Oxygen uses Perl regular expressions”. I think at least a brief topic on regular expression syntax would be helpful, with a pointer to more information elsewhere.
Cheers,
E.

Hi, I think Ken misread the problem but solved it anyway. Shouldn't the "dot matches all" take care of the line breaks, while the ? (correctly placed) in the expression prevents it from being too greedy? So, something like "doc:.+?</a:documentation>" ? Cheers, Wendell Wendell Piez | http://www.wendellpiez.com XML | XSLT | electronic publishing Eat Your Vegetables _____oo_________o_o___ooooo____ooooooo_^ On Sat, Jan 11, 2014 at 4:08 PM, G. Ken Holman <gkholman@cranesoftwrights.com> wrote:
At 2014-01-11 13:17 -0600, Eliot Kimber wrote:
Content-Language: en-US Content-Type: text/plain; charset="utf-7"
I+IBk-m trying to a regular expression search and replace for things like this:
doc: text text next line next line+ADw-/a:documentation+AD4-
What I can+IBk-t figure out is how to do this multi-line regular expression match. If use +IBw-dot matches all+IB0- then my match is not limited to just what+IBk-s shown but everything between the first doc: and the last +ADw-/a:documentation+AD4-.
If do e.g. +IBw-doc:(.)+ADw-/a:documentation+AD4gHQ- then it matches cases where it all happens to be on one line.
Is this possible? What bit of regex fu am I missing?
I think it is the use of "\n".
Try this to include all of the lines: doc:((.|\n)*?):documentation
(note that I'm guessing what your boundaries are because of a mailer problem)
I hope this helps.
. . . . . . . . Ken
-- Public XSLT, XSL-FO, UBL & code list classes: Melbourne, AU May 2014 | Contact us for world-wide XML consulting and instructor-led training | Free 5-hour lecture: http://www.CraneSoftwrights.com/links/udemy.htm | Crane Softwrights Ltd. http://www.CraneSoftwrights.com/x/ | G. Ken Holman mailto:gkholman@CraneSoftwrights.com | Google+ profile: http://plus.google.com/+GKenHolman-Crane/about | Legal business disclaimers: http://www.CraneSoftwrights.com/legal |
_______________________________________________ oXygen-user mailing list oXygen-user@oxygenxml.com http://www.oxygenxml.com/mailman/listinfo/oxygen-user
participants (4)
-
Eliot Kimber
-
G. Ken Holman
-
Oxygen XML Editor Support
-
Wendell Piez