Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potentially harfmul SynchronizedObject usages on iOS platform #4282

Open
qwwdfsad opened this issue Nov 28, 2024 · 22 comments
Open

Potentially harfmul SynchronizedObject usages on iOS platform #4282

qwwdfsad opened this issue Nov 28, 2024 · 22 comments
Labels

Comments

@qwwdfsad
Copy link
Collaborator

qwwdfsad commented Nov 28, 2024

We have a report in public Slack about very slow CMP application: https://kotlinlang.slack.com/archives/C0346LWVBJ4/p1732321550585009
Luckily, the user provided an instrument profile trace, and, it's visible that some huge chunks of work are spent in our synchronization.

Notably, a good chunk (41%) of the time between 02:01 and 02:10 is spent within
kfun:kotlinx.coroutines.internal.LimitedDispatcher.obtainTaskOrDeallocateWorker#internal -> kfun:kotlinx.atomicfu.locks.SynchronizedObject#lock(){}. It requires further investigation, but it is worth taking a look

Instruments file for the history: Hang iPhone SE CMP 1.7.0.trace.zip

@qwwdfsad qwwdfsad changed the title Potentially slow SynchronizedObject usages Potentially harfmul SynchronizedObject usages on iOS platform Nov 28, 2024
@qwwdfsad
Copy link
Collaborator Author

Waiting for the specific coroutines version, as atomicfu got a lock upgrade recently

@dkhalanskyjb
Copy link
Collaborator

Will this issue be considered solved if LimitedDispatcher is rewritten to be lock-free, or are all usages of locks to be investigated?

@qwwdfsad
Copy link
Collaborator Author

Would be nice to understand the root of the issue (it seems like it's LimitedDispatcher peculiarity) and address it instead. I don't think avoiding locks on K/N is justified

@dkhalanskyjb
Copy link
Collaborator

I'll double-check, but after a brief refresher on the source code, I see no fault in LimitedDispatcher. Maybe that's the iOS priority inversion issue we've been hearing about. The lock in LimitedDispatcher is only taken for a short time, with no potential for spin-locking on it that I've noticed.

@qwwdfsad
Copy link
Collaborator Author

Might be the case with allocation-spamming lock either, I've asked the user about coroutines version

@creativedrewy
Copy link

I'm the OP from that Slack conversation, and was asked to provide the details here. I was just able to generate the freeze yet again, and here are the set of Kotlin & related dependencies I'm using. To answer the primary question asked above, I'm using kotlinx-coroutines-core version 1.9.0:

Screenshot 2024-12-02 at 10 45 21 AM

@creativedrewy
Copy link

creativedrewy commented Dec 9, 2024

As an update, I have updated my codebase to Kotlin 2.1 and Compose Multiplatform 1.7. Coroutines are still version 1.9.

The original genesis for this issue was my app freezing, but thankfully that has not happened. However, the phone is still getting hot, and profiling with Time Profiler shows large amounts of SynchronizedObject calls. Attached is an image of the profiling. I will provide the updated trace file in the Kotlin Slack.

Screenshot 2024-12-09 at 12 29 35 PM

@dkhalanskyjb
Copy link
Collaborator

Thanks! The culprit seems to be https://github.com/Kotlin/kotlinx-atomicfu/blob/bad63e743ac905a298386446a82a88225c2f71fc/atomicfu/src/nativeMain/kotlin/kotlinx/atomicfu/locks/Synchronized.kt#L13-L66. There is a work in progress to replace them with a more efficient implementation on the atomicfu side, so the problem is expected to be solved in future releases. It's not going to be very soon, though, as writing efficient mutex implementations takes a lot of effort.

It doesn't seem like there's evidence of coroutines using the mutex incorrectly, which was the original topic of this issue. Can this be closed?

@creativedrewy
Copy link

I suppose, but a few follow up questions:

  • Might I be able to provide my app's codebase to Jetbrains privately so you could use it for testing? I feel like this all especially pushes the limits of atomicfu through extensive use of SharedFlow - required for a websocket. It could be useful to benchmark if the updates improve things.
  • Is there anything in the interim I can do to mitigate this? I can't go live with intermittent app freezes.
  • If this issue is closed, is there any place I can still track updates that will improve this functionality?

Thanks!

@dkhalanskyjb
Copy link
Collaborator

Is there anything in the interim I can do to mitigate this? I can't go live with intermittent app freezes.

I thought this no longer happened with newer Kotlin and Compose versions? Is there a reason not to upgrade?

Might I be able to provide my app's codebase to Jetbrains privately so you could use it for testing?

Yes, this would be excellent, thanks! You could share a link / an archive / whatever with me in the kotlinlang Slack (Dmitry Khalanskiy [JB]) or add me to your project on github. I'll share your project with other teams internally if you do.

If this issue is closed, is there any place I can still track updates that will improve this functionality?

Good point, I don't think there's a public Github issue in atomicfu for tracking this. Let's leave this one open, then, and I'll close it if we do open an atomicfu issue.

@creativedrewy
Copy link

Thank you for the reply! I've sent you my app's codebase on the Koltinlang Slack.

Unfortunately, it is still happening with Compose 1.7 and Kotlin 2.1. I just profiled my app in release mode and was able to obtain the freeze. It appears most likely to happen when scrolling lists; I'll be scrolling, navigating back and forth, scroll some more, and then freeze.

What's interesting is that the "long" usage of SynchronizedObject.lock() shows up on more threads now - worker threads, child threads and the Main thread.

The file is too big to attach here, so I will send it to you as well.

@creativedrewy
Copy link

As another update, as I was scrolling through the same list that can sometimes cause the freezing, I got this Thread Performance Checker message:

Thread Performance Checker: Thread running at User-interactive quality-of-service class waiting on a lower QoS thread running at Default quality-of-service class. Investigate ways to avoid priority inversions
PID: 57512, TID: 2040891
Backtrace
=================================================================
3   DRiP Haus                           0x0000000106148f0c _ZN12SkMessageBusIN5skgpu27UniqueKeyInvalidatedMessageEjLb1EE5Inbox4pollEPN12skia_private6TArrayIS1_Lb0EEE + 148
4   DRiP Haus                           0x0000000106147818 _ZN15GrResourceCache13purgeAsNeededEv + 52
5   DRiP Haus                           0x000000010622bd9c _ZN17GrMtlRenderTargetC1EP8GrMtlGpu7SkISize5sk_spI15GrMtlAttachmentES5_NS_7WrappedENSt3__117basic_string_viewIcNS7_11char_traitsIcEEEE + 288
6   DRiP Haus                           0x000000010622c0f4 _ZN17GrMtlRenderTarget23MakeWrappedRenderTargetEP8GrMtlGpu7SkISizeiPU21objcproto10MTLTexture11objc_object + 468
7   DRiP Haus                           0x0000000106222424 _ZN8GrMtlGpu25onWrapBackendRenderTargetERK21GrBackendRenderTarget + 204
8   DRiP Haus                           0x0000000106139238 _ZN5GrGpu23wrapBackendRenderTargetERK21GrBackendRenderTarget + 156
9   DRiP Haus                           0x000000010614068c _ZN15GrProxyProvider23wrapBackendRenderTargetERK21GrBackendRenderTarget5sk_spIN5skgpu16RefCntedCallbackEE + 92
10  DRiP Haus                           0x00000001061e4a40 _ZN10SkSurfaces23WrapBackendRenderTargetEP18GrRecordingContextRK21GrBackendRenderTarget15GrSurfaceOrigin11SkColorType5sk_spI12SkColorSpaceEPK14SkSurfacePropsPFvPvESD_ + 484
11  DRiP Haus                           0x000000010653e4a0 org_jetbrains_skia_Surface__1nMakeFromBackendRenderTarget + 124
12  DRiP Haus                           0x0000000105ed55e0 kfun:org.jetbrains.skia.Surface.Companion#makeFromBackendRenderTarget(org.jetbrains.skia.DirectContext;org.jetbrains.skia.BackendRenderTarget;org.jetbrains.skia.SurfaceOrigin;org.jetbrains.skia.SurfaceColorFormat;org.jetbrains.skia.ColorSpace?;org.jetbrains.skia.SurfaceProps?){}org.jetbrains.skia.Surface? + 756
13  DRiP Haus                           0x000000010697c568 kfun:androidx.compose.ui.window.MetalRedrawer.draw#internal + 7044
14  DRiP Haus                           0x000000010697e994 kfun:androidx.compose.ui.window.MetalRedrawer.<init>$lambda$1#internal + 472
15  DRiP Haus                           0x0000000106980d8c kfun:androidx.compose.ui.window.MetalRedrawer.$<init>$lambda$1$FUNCTION_REFERENCE$1.invoke#internal + 72
16  DRiP Haus                           0x0000000106980e5c kfun:androidx.compose.ui.window.MetalRedrawer.$<init>$lambda$1$FUNCTION_REFERENCE$1.$<bridge-DNN>invoke(){}#internal + 72
17  DRiP Haus                           0x0000000105b4c710 kfun:kotlin.Function0#invoke(){}1:0-trampoline + 100
18  DRiP Haus                           0x0000000106981d34 kfun:androidx.compose.ui.window.DisplayLinkProxy.handleDisplayLinkTick#internal + 152
19  DRiP Haus                           0x0000000106981de8 kfun:androidx.compose.ui.window.DisplayLinkProxy.$imp:handleDisplayLinkTick#internal + 144
20  QuartzCore                          0x0000000189dc34ac _ZN2CA7Display15DisplayLinkItem9dispatch_ERNS_8SignPost8IntervalILNS2_11CAEventCodeE835322056EEE + 44
21  QuartzCore                          0x0000000189dc46d4 _ZN2CA7Display11DisplayLink14dispatch_itemsEyyy + 804
22  QuartzCore                          0x0000000189dc42a0 _ZN2CA7Display11DisplayLink8callbackEP15_CADisplayTimeryyybPv + 632
23  QuartzCore                          0x0000000189ecab9c _ZL22display_timer_callbackP12__CFMachPortPvlS1_ + 336
24  CoreFoundation                      0x00000001803b84e4 __CFMachPortPerform + 172
25  CoreFoundation                      0x00000001803eeddc __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE1_PERFORM_FUNCTION__ + 56
26  CoreFoundation                      0x00000001803ee3ac __CFRunLoopDoSource1 + 496
27  CoreFoundation                      0x00000001803e89bc __CFRunLoopRun + 2176
28  CoreFoundation                      0x00000001803e7d28 CFRunLoopRunSpecific + 572
29  GraphicsServices                    0x000000018e7cdbc0 GSEventRunModal + 160
30  UIKitCore                           0x00000001852bafdc -[UIApplication _run] + 868
31  UIKitCore                           0x00000001852bec54 UIApplicationMain + 124
32  SwiftUI                             0x00000001c4b04524 OUTLINED_FUNCTION_70 + 500
33  SwiftUI                             0x00000001c4b043c4 OUTLINED_FUNCTION_70 + 148
34  SwiftUI                             0x00000001c4816108 OUTLINED_FUNCTION_2 + 92
35  DRiP Haus                           0x00000001047a5e80 $s9DRiP_Haus6iOSAppV5$mainyyFZ + 40
36  DRiP Haus                           0x00000001047a5f30 main + 12

@dkhalanskyjb
Copy link
Collaborator

Thread running at User-interactive quality-of-service class waiting on a lower QoS thread running at Default quality-of-service class. Investigate ways to avoid priority inversions

We've heard that mutex implementations like our one can suffer from QoS problems (Kotlin/kotlinx-atomicfu#462), and this looks like solid proof. Thanks for the codebase! I'm sure it will be invaluable in our investigation.

@stefanhaustein
Copy link

@creativedrewy Would it be feasible for you to verify that this updated file for compose-multiplatform-core ios addresses the problem: https://gist.github.com/stefanhaustein/5a36e66672390a8b314e63f46e7baefe

(I have a slightly better pull request for atomicfu but unfortunately compose used their own version of synchronizedobject...)

@creativedrewy
Copy link

@stefanhaustein thanks for the reply! I would need to clone the compose core/runtime repositories locally and consume in my app in order to replace with this implementation, correct?

@stefanhaustein
Copy link

stefanhaustein commented Jan 8, 2025

Yes, exactly (probably via publishing the modified compose core repro to local maven and redirecting your dependency accordingly)

P.S: Relevant part of my notes:

  • Ensure Java 17
  • ./gradlew :mpp:publishComposeJbToMavenLocal -Pcompose.platforms=all

@creativedrewy
Copy link

Sadly we don't have any of this infrastructure in place currently; we're a small team and glancing at the onboarding docs this appears to be a significant effort to get setup.

Would you be able to provide a sample configured codebase or binary that I could drop in to my repo?

@stefanhaustein
Copy link

Yeah, I fell for the generic androidx onboarding doc as well and wasted half a day setting things up as described there... ¯\(ツ)

More relevant documentation for setting up compose-multiplatform-core locally is here: https://github.com/JetBrains/compose-multiplatform-core/blob/jb-main/MULTIPLATFORM.md

Unfortunately, I have not created a fork, so it's a bit tricky to share my changed project... But I really didn't change much (if anything) apart from replacing SynchronizedObject... (I can double-check tomorrow when back at my desk)

Basically I did the following steps:

  1. Clone the compose-multiplatform-core repository and check that my setup is correct (using the command line / gradle command from the multiplatform doc)
  2. Publish the project to local maven: ./gradlew :mpp:publishComposeJbToMavenLocal -Pcompose.platforms=all
  3. Check that my depending project still works when pointed to the local cmc project
  4. Change something simple and "obvious" in cmc and re-publish to verify that I am actually using the local project
  5. Actually change SynchronizedObject, re-publish and check again...

@stefanhaustein
Copy link

P.S.: Turns out I did a fork... https://github.com/stefanhaustein/compose-multiplatform-core

It has some extra half-cooked logging but should just work "out of the box" on ios via the local maven repro

@creativedrewy
Copy link

@stefanhaustein what published local version number are you using, and which dependencies should be updated to utilize the local build? For example, I have updated my compose dependency:

compose = "1.7.1-local"

Which is the local version number I have specified, but changes to said locally published code do not appear to show in the app.

@stefanhaustein
Copy link

stefanhaustein commented Jan 10, 2025

I did not set any number and then was using

force("org.jetbrains.compose.runtime:runtime:0.0.0-SNAPSHOT")

in the "multiplatform-compose" project that contains some benchmarks for "multiplatform-compose-core".

See stefanhaustein/compose-multiplatform@c6dc8b7#diff-a9b09c9df054c9c227aae1b0573c11551987717eb2cd4dc158cfc3f47b15b3c5R13

(the change has some noise but should convey the idea O:)

@stefanhaustein
Copy link

@creativedrewy Were you able to make any progress here? It would be really great if we could get some more insights into addressing this problem...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants