[Solved] RegExp how to get multiline text before two words?

Need help, or want to share a macro? Post here!
Forum rules
Be nice to others! Respect the FreeCAD code of conduct!
Post Reply
User avatar
Evgeniy
Posts: 477
Joined: Thu Jul 15, 2021 6:10 pm

[Solved] RegExp how to get multiline text before two words?

Post by Evgeniy »

Code:

Code: Select all

test="""
<test> text
text
text
</test>


<test> text
text
text
</test>"""
print(re.search(r'<test>(.|\n)*<\/test>',test).group())
return

Code: Select all

<test> text
text
text
</test>


<test> text
text
text
</test>
instead:

Code: Select all

<test> text
text
text
</test>
Last edited by Evgeniy on Mon Sep 27, 2021 12:16 pm, edited 3 times in total.
openBrain
Veteran
Posts: 9034
Joined: Fri Nov 09, 2018 5:38 pm
Contact:

Re: RegExp how to get multiline text before two words?

Post by openBrain »

Evgeniy wrote: Fri Sep 24, 2021 12:55 pm Code:

Code: Select all

test="""
<test> text
text
text
</test>


<test> text
text
text
</test>"""
print(re.search(r'<test>(.|\n)*<\/test>',test).group())
What you're looking for is called "non greedy". It should be (not tested)

Code: Select all

print(re.search(r'<test>(.|\n)*?<\/test>',test).group())
User avatar
Evgeniy
Posts: 477
Joined: Thu Jul 15, 2021 6:10 pm

[Solved] RegExp how to get multiline text before two words?

Post by Evgeniy »

"non greedy" worked variant:

Code: Select all

print(re.search(r'<test>((.|\n)*?)<\/test>',test).group())

Code: Select all

<test> text
text
text
</test> 
openBrain
Veteran
Posts: 9034
Joined: Fri Nov 09, 2018 5:38 pm
Contact:

Re: [Solved] RegExp how to get multiline text before two words?

Post by openBrain »

Evgeniy wrote: Fri Sep 24, 2021 4:46 pm "non greedy" worked variant:
Optimally you should tell the inner parentheses are non capturing with :

Code: Select all

print(re.search(r'<test>((?:.|\n)*?)<\/test>',test).group())
But this is still simpler to use character class:

Code: Select all

print(re.search(r'<test>([\s\S]*?)<\/test>',test).group())
Or a flag :

Code: Select all

print(re.search(r'<test>(.*?)<\/test>',test,re.DOTALL).group())
Last edited by openBrain on Mon Sep 27, 2021 7:23 am, edited 1 time in total.
User avatar
Evgeniy
Posts: 477
Joined: Thu Jul 15, 2021 6:10 pm

Re: [Solved] RegExp how to get multiline text before two words?

Post by Evgeniy »

openBrain wrote: Fri Sep 24, 2021 5:29 pm Optimally you should tell the inner parentheses are non capturing with :

Code: Select all

print(re.search(r'<test>((? :.|\n)*?)<\/test>',test).group())
This is not worked example:
raise source.error("unknown extension ?" + char,
re.error: unknown extension ? at position 8


Ok. but how to make expression for get result without "<test>""</test>"?
i can use:

Code: Select all

print(re.search(r'<test>((.|\n)*?)<\/test>',test).group().replace("<test>","").replace("</test>","")
But maybe there is a better way?
openBrain
Veteran
Posts: 9034
Joined: Fri Nov 09, 2018 5:38 pm
Contact:

Re: [Solved] RegExp how to get multiline text before two words?

Post by openBrain »

Evgeniy wrote: Mon Sep 27, 2021 7:21 am This is not worked example:
Sorry there was an extra space in the regex because I was typing with a mobile. I fixed it in the original post.
But maybe there is a better way?
Just use correctly the groups. ;)

Code: Select all

print(re.search(r'<test>([\s\S]*?)<\/test>',test).group(1))
User avatar
Evgeniy
Posts: 477
Joined: Thu Jul 15, 2021 6:10 pm

Re: RegExp how to get multiline text before two words?

Post by Evgeniy »

Thanks. Now the question is definitely solved.
Post Reply