Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@automattic/interpolate-components check #13634

Open
matzeeable opened this issue Jan 23, 2025 · 4 comments
Open

@automattic/interpolate-components check #13634

matzeeable opened this issue Jan 23, 2025 · 4 comments
Labels
enhancement Adding or requesting a new feature.

Comments

@matzeeable
Copy link

Describe the problem

We are using POST /api/translations/(string: project)/(string: component)/(string: language)/autotranslate/ with auto_source=mt and engines=deepl. This works great, but when it comes to texts with variables this could run into check errors.

In our case, we use @automattic/interpolate-components which allows interpolating React components in the frontend.

Example: Source string:

This template is {{strong}}%smachine translated{{/strong}} into your language {{languages/}} and has not yet been checked by a human translator.

which gets translated to Slovak via DeepL in:

Táto šablóna je {{strong}}%strojovo preložená{{/strong}} do vášho jazyka {{jazyky/}} a zatiaľ nebola skontrolovaná ľudským prekladateľom.

In this case, {{languages/}} should not be translated.

Image

We already "work around" this by running into an error with a custom implemented check:

Image

Describe the solution you would like

When using e.g. OpenAI gpt-4o-mini I can successfully fix the translation: https://chatgpt.com/share/67920307-bbe0-8002-a653-4a6db846c721

Image

Prompt:

You are tasked with proofreading a translation from English to Slovak. The original English text is:

\`\`\`
This template is {{strong}}%smachine translated{{/strong} into your language {{languages/}} and has not yet been checked by a human translator.
\`\`\`

The Slovak translation provided is:

\`\`\`
Táto šablóna je {{strong}}%strojovo preložená{{/strong}} do vášho jazyka {{jazyky/}} a zatiaľ nebola skontrolovaná ľudským prekladateľom.
\`\`\`

While the translation itself is accurate and should remain unchanged, the interpolation variables (enclosed in double curly brackets `{{` and `}}`) are incorrectly translated. Your task is to correct only the interpolation variables in the Slovak translation without altering any other part of the text.

Return only the fixed translation text in plain text format and nothing else.

The feature request

It would be great to define beside "Checks" so-called "Automatic Fixer". This allows to configure on rule-level a LLM and prompt, and when the check error occurs, automatically try to fix the issue. In our case, I can imagine the following flow:

  1. Navigate to "Checks"
  2. Click on "Add automatic fixer"
  3. Select @automattic/interpolate-components as check which should be fixed
  4. Select OpenAI's gpt-4o-mini
  5. Fill the textarea "Prompt" with the following prompt:
You are tasked with proofreading a translation from %sourceLanguage% to %targetLanguage%. The original %sourceLanguage% text is:

%sourceText%

The %targetLanguage% translation provided is:

%targetText%

While the translation itself is accurate and should remain unchanged, the interpolation variables (enclosed in double curly brackets `{{` and `}}`) are incorrectly translated. Your task is to correct only the interpolation variables in the %targetLanguage% translation without altering any other part of the text.

Return only the fixed translation text in plain text format and nothing else.

Describe alternatives you have considered

No response

Screenshots

No response

Additional context

No response

@nijel
Copy link
Member

nijel commented Jan 23, 2025

The check for "@automattic/interpolate-components" is not part of Weblate, have you ever considered contributing it? In case it would implement placeholders highlighting, it would also make it correctly work with DeepL and no fixes would be needed.

@matzeeable
Copy link
Author

Here is the content of the custom checks.py:

from weblate.checks.base import TargetCheck
import re
import logging

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class AutomatticInterpolateComponentsCheck(TargetCheck):
    check_id = 'automatic_interpolate_components'
    name = '@automattic/interpolate-components mismatch'
    description = 'Interpolation placeholders are inconsistent between source and translation.'

    def check_single(self, source, target, unit):
        source_tags = self.get_tags(source)
        target_tags = self.get_tags(target)

        # Log the tags
        #logger.info("Source Tags: %s", source_tags)
        #logger.info("Target Tags: %s", target_tags)

        return source_tags != target_tags

    @staticmethod
    def get_tags(s):
        # Use a simple regex to extract all tags
        # This is a simple version and might need more robust handling for edge cases
        matches = re.findall(r'{{(/?[a-zA-Z0-9_]+/?)}}', s)
        sorted_matches = sorted(matches)
        return ', '.join(sorted_matches)

@nijel
Copy link
Member

nijel commented Jan 23, 2025

Thanks, let's focus here on improving this check work and integrating it with machine translations. I've created #13639 to track AI usage for fixing failing checks.

@nijel nijel changed the title Allow to fix automatic translations (DeepL) when it runs into a check error @automattic/interpolate-components check Jan 23, 2025
@nijel nijel added the enhancement Adding or requesting a new feature. label Jan 23, 2025
@matzeeable
Copy link
Author

Test cases of interpolate-components: https://github.com/Automattic/interpolate-components/blob/master/test/test.jsx
Regular expression: https://github.com/Automattic/interpolate-components/blob/fd0bebbdf81450e14df64b888f724335cd51ce2f/src/tokenize.es6#L30C43-L30C68

Regular expression playground: https://regex101.com/r/l5hbjW/1

Is this helpful for you? Unfortunately, I do not have real experience with Python and just found format.py where all those checks are placed, but I do not understand how to combine this with machine translation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding or requesting a new feature.
Projects
None yet
Development

No branches or pull requests

2 participants