Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration for release 4.8.0 fails for unicode characters #1047

Open
mortbauer opened this issue Feb 21, 2024 · 11 comments
Open

Migration for release 4.8.0 fails for unicode characters #1047

mortbauer opened this issue Feb 21, 2024 · 11 comments

Comments

@mortbauer
Copy link
Contributor

mortbauer commented Feb 21, 2024

The release 4.8.0 includes pull request Introduce Message Formatting with Actiontext Trix Editor #1013

the migration 20230215085312_migrate_message_body_to_action_text.rb will fail if the message includes unicode characters like emojis.

for example the following text:

"Lieber Oli, \\n\\nvielen Dank! \\n\\nAndi benötigt keinen Account. Wir bestellen gemeinsam, da wir zusammenleben. \\nSollen wir ein Beitrittsformular für ihn ausfüllen? \\n\\nWenn ja, könntest Du mir noch ein leeres zukommen lassen? \\n\\nDanke schön! 😊 \\n\\nLiebe Grüße\\nJosef"
@mortbauer mortbauer changed the title Migration for release 4. Migration for release 4.8.0 may fail Feb 21, 2024
@mortbauer mortbauer changed the title Migration for release 4.8.0 may fail Migration for release 4.8.0 fails for unicode characters Feb 21, 2024
@lentschi
Copy link
Contributor

lentschi commented Feb 25, 2024

@mortbauer
I haven't been able to reproduce this error so far.
Where exactly does it occur - directly on migration or in some view after the migration has been executed?
What's the error's stack trace?

How exactly do you set the string before running the migration? (I tried setting it manually in mariadb, through the rails console and even by rolling back, sending a message containing emojis and re-migrating, but the error never occurred. I also tried various levels of escape characters, but to no avail.)

@mortbauer
Copy link
Contributor Author

Happens directly in the migrations; well just happened for 3 foodcoops out of 83 so it really because of special characters like in the example posted above.
To set this easiest is to directly alter it in the database, you could also send a message with that text ;)

@lentschi
Copy link
Contributor

To set this easiest is to directly alter it in the database, you could also send a message with that text

Like I said, I did both (set it directly in the db and send a message with that text), but running the migration afterwards did not throw an error for me. I tested on f4a4528 with the local docker dev setup.

@mortbauer
Copy link
Contributor Author

No idea.
To be honest I do not wanna waste more time on this, I had this error in production and I fixed it so I wanted to share. Will not go deeper here and try to recreate a problem I don't have anymore (since I never will have those migrations again)

@yksflip
Copy link
Member

yksflip commented Mar 6, 2024

Hm I cannot reproduce the error either. I can also understand that you @mortbauer don't want to spend more time on this.
I'm not sure how to proceed, there might be a edge case (maybe database encoding?) that leads to this error.
I'm glad you shared your fix, as far as I understand it renders chars that are not in ascii as its ordinal number, right? I wouldn't like that to happen in this migration very much, as it actually can handle unicode (most of the time... :D)

Would it be okay to close this issue then until someone else encounters this error again or you find time & motivation to provide more information how we might reproduce the error?

@mortbauer
Copy link
Contributor Author

Hm, understood, I simply close this issuel, we still have it documented then in case sombody else stumbles over it.

@mortbauer
Copy link
Contributor Author

Ok so this was actually not only affecting migrations but also creating new messages. And they reason is that the characterset was not utf8mb but utf8mb3. I changed this now all over the place, set it for the rails database connection encoding but I also had to manually run ALTER TABLE action_text_rich_texts CONVERT TO CHARACTER SET 'utf8mb4'; for every table in every db, see https://stackoverflow.com/a/1294156 for more background.

@mortbauer mortbauer reopened this Mar 8, 2024
@mortbauer
Copy link
Contributor Author

I think we should mention this at least in known issues or something, because it is nasty!

@lentschi
Copy link
Contributor

lentschi commented Mar 8, 2024

@mortbauer Nice find 👍

I investigated a little too - here's what I found out:

  1. If I bring up a local docker env in the version before this migration has even been executed ...
  2. then run alter table messages convert to character set utf8mb3 collate utf8mb3_unicode_ci; in the mariadb container ...
  3. and finally try to send the text impossible without utf8mb4 🙂 through MessagesController#create

-> I immediately get the following error:
Mysql2::Error: Incorrect string value: '\xF0\x9F\x99\x82\x0D\x0A...' for column development.messages.body at row 1

So the data couldn't even be stored in the db (If I then upgrade to the latest master, all migrations run without errors even if the database's default collation is utf8mb3.)

So the only way this can occur is, if messages.body contains unicode characters encoded as HTML entities (which is probably something the client decides - older clients surely have done that; newer ones still might do so.) The current version of simple_format then converts them to real emojis as it assumes all systems can handle those by now leading to the error in utf8mb3.
The foodsoft prior to this change wasn't even able to display the representation of these HTML entities though - they were always just displayed as plain text.

I think we should mention this at least in known issues or something, because it is nasty!

I agree with that - it's a nasty pitfall.
We may also consider 'fixing' it by adding :collation => :utf8mb4_general_ci to the CreateActionTextTables migration for action_text_rich_texts...? Or is that to database specific a thing to have it in a rails migration?

@yksflip
Copy link
Member

yksflip commented Mar 11, 2024

wow thanks, great investigation!
I think with the rails7 upgrade 3d81dd6 the schema.rb was changed to contain charset and collation. I think it's not to database specific if we rely on the db to handle unicode. Would it make sense to have a Migration for all tables to the correct collation? Maybe this could be combined with your cleanup migration @lentschi ?

I'll put a note to the release notes, definitely a nasty pitfall!

@yksflip yksflip moved this to Prioritised in Foodsoft roadmap Mar 11, 2024
@RayOei
Copy link

RayOei commented Jan 10, 2025

I may add: when working on the migration of the v3.4 FCAdam database to v4.9.0 I also noticed that the migration doesn't update the character set and collating. Originally this was latin and latin1_swedish_ci which was the MariaDB default before v.11.6.0. This has been changed to utf8mb4 with utf8mb4_general_ci (although the latter differs per distribution apparently). I had to manually modify all tables and the database default to utf8mb4 and utf8mb4_general_ci. Some columns needed to be changed on top of that, as the conversion may loose data and MariaDB doesn't change those with an ALTER TABLE by default, so it seems.
I have no idea how many users are still on old databases, it may be useful to have this added to a migration file regardless? But assuming most, if not all, work on v4.9.0 by now the issue may already have dissipated? I am also not sure this fits in the normal ActiveRecord scheme migration model (I am not too familiar with it) as the current v.4.9.0 schema is correct.

Edit: The last sentence may not be correct. When somebody works with an older MariaDB with older defaults for the collating those would be set, I think. So forcing this may still be a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Prioritised
Development

Successfully merging a pull request may close this issue.

4 participants