Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix online user location when there is unicode in the URL #71

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bsalamat
Copy link

The location of some online users is not shown correctly if the URL of the location has unicode characters. It is shown only as viewing a thread or a forum without specifying which thread or forum.
Each UTF-8 character in the URL is converted to a few ascii letters which cause some PHP string search functionality including str_replace to fail.
This PR fixed the issue.

@frostschutz
Copy link
Owner

that's odd, it works fine for me and I have unicode characters (日本語 - Japanese) in thread and forum titles (mybb-1.6, php-7.2). I'll test with mybb-1.8 later...

@bsalamat
Copy link
Author

that's odd, it works fine for me and I have unicode characters (日本語 - Japanese) in thread and forum titles (mybb-1.6, php-7.2). I'll test with mybb-1.8 later...

It has been a while since the first time I encountered this problem, but I remember that it was not happening for all users. It was happening for some users. IIRC, it was happening when there was & and maybe other args in the URL. Could you try to add ?page=3&highlight=test at the end of a URL with unicode letters and check if it still works for you.
I have patched my production website, but I will check my test server later this evening and try to reproduce it.

@frostschutz
Copy link
Owner

Looking at my sessions table in more detail, there were some entries that looked like affected users, but it also happened to threads with no UTF-8 in the suject. Somehow somewhere, URLs are slipping past. But that may be a separate issue.

@frostschutz
Copy link
Owner

OK, the URL slip-past happens when browsing the text-only archive like https://community.mybb.com/archive/index.php?thread-123.html

I still can't reproduce errors when browsing the regular forum. Can you confirm your issue is unrelated to archive browsing?

@bsalamat
Copy link
Author

I got some time to reproduce the problem. The problem is consistently reproducible and it does not need the URL to have extra arguments. It happens for both forums and threads that I tested.
One thing to note is that my forum is in Persian. Persian is an RTL language. RTL languages cause more problems than just unicode encoding. This problem may happen only to RTL languages.

@frostschutz
Copy link
Owner

Hm, copying random strings from Persian Wikipedia does not reproduce the issue for me either. Is your database encoding utf8 or utf8mb4?

Is it possible for you to provide the output of base64_encode($url) and base64_encode($location) after $location = get_current_location()? (base64 is an attempt to preserve encoding issues)

@bsalamat
Copy link
Author

bsalamat commented Jul 19, 2019

My database encoding is actually latin1, but table encoding for threads and forums is utf8.

Here are encoded versions of an example URL that reproduces the issue:

base64_encode($url): 2YXbjNiy2KfZhi3YtNin2YbYsy3Yr9ixLdin2K7YsC3ZiNuM2LLYp9uMLdiq2YjYsduM2LPYqtuMLdii2YXYsduM2YPYpw==

base64_encode($location):
L3BmL3Nob3d0aHJlYWQucGhwP2dvb2dsZV9zZW9fdGhyZWFkPSVkOSU4NSVkYiU4YyVkOCViMiVkOCVhNyVkOSU4Ni0lZDglYjQlZDglYTclZDklODYlZDglYjMtJWQ4JWFmJWQ4JWIxLSVkOCVhNyVkOCVhZSVkOCViMC0lZDklODglZGIlOGMlZDglYjIlZDglYTclZGIlOGMtJWQ4JWFhJWQ5JTg4JWQ4JWIxJWRiJThjJWQ4JWIzJWQ4JWFhJWRiJThjLSVkOCVhMiVkOSU4NSVkOCViMSVkYiU4YyVkOSU4MyVkOCVhNw==

On the public website, that thread is: https://www.academiacafe.com/pf/Thread-%D9%85%DB%8C%D8%B2%D8%A7%D9%86-%D8%B4%D8%A7%D9%86%D8%B3-%D8%AF%D8%B1-%D8%A7%D8%AE%D8%B0-%D9%88%DB%8C%D8%B2%D8%A7%DB%8C-%D8%AA%D9%88%D8%B1%DB%8C%D8%B3%D8%AA%DB%8C-%D8%A2%D9%85%D8%B1%DB%8C%D9%83%D8%A7

@bsalamat
Copy link
Author

To be clear, the encoded URL and location are from my local machine, not from the public website. I provided the link to the public website just in case it might be needed.

@frostschutz
Copy link
Owner

OK, that explains a lot. Your $location is urlencoded (%hex) instead of UTF-8. No wonder str_replace() doesn't work. This might trace back to $_SERVER['QUERY_STRING'] not being UTF-8 but urlencoded in your case. This is unexpected. But not the first time such an issue appears with nginx/PHP vs. Apache/PHP.

Is there anything special about your webserver/php configuration?

@frostschutz
Copy link
Owner

No, that's not right either... scratches head

@frostschutz
Copy link
Owner

test.php

<?php

header("Content-Type: text/plain");

print_r($_SERVER['QUERY_STRING']);

Calling in browser directly as test.php?var=ひらがな: var=%E3%81%B2%E3%82%89%E3%81%8C%E3%81%AA

Calling as Test-ひらがな with a RewriteRule ^Test\-([^./]+)$ test.php?var=$1: var=ひらがな

That's... confusing.

That's... confusing.

@frostschutz
Copy link
Owner

calling as Test-ひらがな?カタカナ is var=ひらがな&%E3%82%AB%E3%82%BF%E3%82%AB%E3%83%8A that's just wow.

with [B] flag added to the RewriteRule it's var=%e3%81%b2%e3%82%89%e3%81%8c%e3%81%aa (lowercase instead uppercase hex, I remember that being an issue at some point in the past, too...)

It seems for years I've been relying on some Apache rewrite oddity without noticing, haha. Well, it's MyBB's own get_current_location() that relies on it, but they don't expect odd characters after all...

I'll try to come up with a fix for this, ugh.

@bsalamat
Copy link
Author

Interesting findings! The last one (Test-ひらがな?カタカナ is var=ひらがな&%E3%82%AB%E3%82%BF%E3%82%AB%E3%83%8A) is really odd!

I don't use any particularly interesting configuration in my nginx with respect to character encoding.
This PR fixes the particular problem of online user location for me, but given you recent findings, there may be other oddities not covered by this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants