Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX ODT substitution when many HTML tags in string #32515

Open
wants to merge 5 commits into
base: 18.0
Choose a base branch
from

Conversation

thomas-Ngr
Copy link
Contributor

FIX ODT substitution : if a string contains many times the same HTML tag, only the first opening tag and the last closing tag are used.

To reproduce :

  • create a translation string containing two times the same HTML tag, : I <b>contain</b> two <b>bold</b> tags
    two_strong_tags_string

  • create an ODT template with this translation string and upload it for any object
    two_strong_tags_tempalte

  • generate the ODT

  • see that the string is incorrectly rendered
    two_strong_tags_result

This PR fixes this behavior and adds a test to lock the correct behavior.

As a side enhancement, the PR reverses the orders of parameters in test assertEquals() method since they were not in the right order.

@thomas-Ngr thomas-Ngr closed this Jan 3, 2025
@thomas-Ngr thomas-Ngr reopened this Jan 6, 2025
@thomas-Ngr thomas-Ngr force-pushed the 18_fix_many_tags_in_string branch from 72693f2 to 78bf506 Compare January 6, 2025 08:44
const FIND_TAGS_REGEX = '/<([A-Za-z0-9]+)(?:\s([A-Za-z]+(?:\-[A-Za-z]+)?(?:=(?:".*?")|(?:[0-9]+))))*(?:(?:\s\/>)|(?:>(.*)<\/\1>))/s';
const FIND_ENCODED_TAGS_REGEX = '/&lt;([A-Za-z]+)(?:\s([A-Za-z]+(?:\-[A-Za-z]+)?(?:=(?:".*?")|(?:[0-9]+))))*(?:(?:\s\/&gt;)|(?:&gt;(.*)&lt;\/\1&gt;))/';
const FIND_TAGS_REGEX = '/<([A-Za-z0-9]+)(?:\s([A-Za-z]+(?:\-[A-Za-z]+)?(?:=(?:".*?")|(?:[0-9]+))))*(?:(?:\s\/>)|(?:>(((?!<\1>).)*)<\/\1>))/s';
const FIND_ENCODED_TAGS_REGEX = '/&lt;([A-Za-z]+)(?:\s([A-Za-z]+(?:\-[A-Za-z]+)?(?:=(?:".*?")|(?:[0-9]+))))*(?:(?:\s\/&gt;)|(?:&gt;(((?!&lt;\1&gt;).)*)&lt;\/\1&gt;))/';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note, FIND_ENCODED_TAGS_REGEX seems to be never used into all dolibarr source code, maybe a code cleanup @eldy to simplify the code ?

@@ -46,8 +46,8 @@ class Odf
public $userdefined=array();

const PIXEL_TO_CM = 0.026458333;
const FIND_TAGS_REGEX = '/<([A-Za-z0-9]+)(?:\s([A-Za-z]+(?:\-[A-Za-z]+)?(?:=(?:".*?")|(?:[0-9]+))))*(?:(?:\s\/>)|(?:>(.*)<\/\1>))/s';
const FIND_ENCODED_TAGS_REGEX = '/&lt;([A-Za-z]+)(?:\s([A-Za-z]+(?:\-[A-Za-z]+)?(?:=(?:".*?")|(?:[0-9]+))))*(?:(?:\s\/&gt;)|(?:&gt;(.*)&lt;\/\1&gt;))/';
const FIND_TAGS_REGEX = '/<([A-Za-z0-9]+)(?:\s([A-Za-z]+(?:\-[A-Za-z]+)?(?:=(?:".*?")|(?:[0-9]+))))*(?:(?:\s\/>)|(?:>(((?!<\1(\s.*)?>).)*)<\/\1>))/s';
Copy link
Contributor Author

@thomas-Ngr thomas-Ngr Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(((?!<\1(\s.*)?>).)*)
This groups matches any character (last .) as much as we want (last*) EXCEPT (?!) the first capturing group between opening and closing chevron (<\1>). It allows any other chars between the chevrons, if a space separates them from the tag((\s.*)?) : this is a lazy way to allow html attributes.

NB - In case of intricated tags <b class="truc">quatre<i>troc</i></b>, we want the match to take the outer tags, because _getDataFromHtml() is called recursively on the tag inner content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants