-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: ensure child node is a child of parent before removing #1438
Conversation
Thank you for submitting!!!! <3 🥳 |
This is a very targeted fix and may have other unintended consequences or just masks the problem even more. The root cause is usually that some external extensions or scripts messed with the parts of the DOM that glimmer-vm is managing/in charge of. In general that really is unsound and could cause things to fail in numerous ways. Maybe we can try to fail more silently as this PR tries to do, but if anything else goes wrong the error will probably be even more confusing |
@@ -65,7 +65,6 @@ export function move(bounds: Bounds, reference: Nullable<SimpleNode>): Nullable< | |||
} | |||
|
|||
export function clear(bounds: Bounds): Nullable<SimpleNode> { | |||
let parent = bounds.parentElement(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep this
@@ -75,7 +74,7 @@ export function clear(bounds: Bounds): Nullable<SimpleNode> { | |||
while (true) { | |||
let next = current.nextSibling; | |||
|
|||
parent.removeChild(current); | |||
current.parentNode?.removeChild(current); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
..and then in here, guard with if (parent === current.parentNode)
. In the else branch, we may want to add two different development mode warnings for the case where it is null and for the case it is not null but different (ie the node has been moved)
Fixes glimmerjsGH-1401 If `current` isn't a child of `parent` calling `parent.removeChild(current)` will result in the error: `Failed to execute 'removeChild' on 'Node': The node to be removed is not a child of this node`. This commonly occurs when the DOM is externally modified by a browser, such as when translating a page. The `nextSibling` may not be a child node of `parent`.
@chancancode Thank you for the review! I agree with everything you said, it is a targeted fix. I'm happy to do more research into edge cases this change might cause too. FWIW, in our Ember app (app.ynab.com), since upgrading Ember and adopting Glimmer in the last 3 months, we've seen this error over 76k+ times. It requires each user to refresh and figure out which extension or browser usage causes the error. From our perspective, it seems like a very common use case for end users of Glimmer to externally modify the DOM. Having Glimmer handle that condition would be immensely helpful. Our Ember app is 9+ years old and has a pretty robust test suite. It might not mean anything but all our tests are passing with this change but I'd be happy to do more exploratory testing to ensure this change doesn't have unintended consequences. I appreciate your time. |
@chancancode I agree with we could have special error handler for it, to be able (in background) send "mutated" node to logging apps to figure out source of breakage. |
I'm wanting to run this against a couple additional test suites -- but it looks like I first need to update glimmer in ember, here: emberjs/ember.js#20530 |
I had idea (long time ago) to introduce addon to track DOM mutations outside re-renders, using https://developer.mozilla.org/en-US/docs/Web/API/MutationObserver api, and emit errors/messages if DOM mutated outside of ember's (glimmer's) render loop. |
FWIW, we've been running this fix in production in app.ynab.com via |
apologies I've been so slow to get this shipped :( I think the changes look good, but I emberjs is already behind glimmer, and I've had it on my todo to get it up to date -- but work has been bogging me down with 🙃 tasks |
Mind sharing more about how you achieved this? I thought Ah – I'm guessing you patched in |
@gilest We used I ran
Then ran Then did the same thing with |
Interesting. Tried the simplified version of the patch with an - parent.removeChild(current);
+ if (parent === current.parentNode) {
+ current.parentNode.removeChild(current);
+ } Before patch On a translated page (Google Chrome), clicking a link to initiate a transition throws the reported error:
After patch Following the same reproduction throws in the
So the node count or node references still get out of sync once page DOM is maniuplated (at least in this reproduction). |
Yeah again I am to the extent that we can be a little less anal and silently ignore errors that “didn’t end up mattering”, it seems ok. But I think it really is unrealistic to expect that this can be “fixed”. I am not trying to be difficult here but let’s try to engage on the actual issue here. If Google translate or any other extensions just go in and mutate the DOM without regard then these tools are just not designed to play nicely with SPAs period. I don’t imagine they work on apps like Google Maps/Earth or Notion either? If there are documentation or even blog posts describing the heuristics of how these tools choose to mutate the DOM and how we are expected to work with them nicely then let’s enumerate what those constraints are and add tests for them specifically. But it is impossible to just say “let’s be more resilient against external things mutating our DOM”, it’s just way too underspecified. Imagine if someone is randomly mutating variables in your code. How do you be resilient against that? You can’t. Now let’s talk specifics about what is going on here. When rendering, say, an It seems like when translating, Google translate have the habit of removing/merging/replacing the text nodes rather than mutating their values in place. Sometimes you get lucky and those text nodes are in between rangers we care about so nobody cares. But if they remove a text node that we happen to be tracking as a boundary, then you are in for a bad time. Imagine it looking like this: Result:
{{#if this.isLoading}}[START]
loading… ({{this.est}} seconds left)
[END]{{/if}}
Please be patient. When the condition changes from true to false, we know we need to let current = bounds.start;
while (true) {
parent.removeChild(current);
if (current === bounds.end) break;
current = current.nextChild;
} Now what if someone messed with the DOM so either the start or end node, are already gone when the condition flips? Well let’s see, if the start node is gone we will immediately get an error, which currently breaks the app. We can try to silently discard the error. But then we wouldn’t have cleared the DOM and the screen would be stale and stuck on the loading thing anyway and probably cause other undefined behavior elsewhere (like destructors never running). If the start is there but the end is gone? Well then we won’t error the same way but we will clear too much DOM and eventually reach the end of the parent’s child nodes list. Which will cause other errors down the road elsewhere again. Sometimes things work out because the {{/if}} happens to be at the end of the parent anyway, or for any number of reasons. Or perhaps the user clicked away to another page and you need to clear a bigger piece of DOM higher up anyway. But just as likely maybe all you have accomplished is that you make the error go away on your bugsnag dashboard, but your user is now staring at a subtly broken app anyway. Again I want to stress that I am not unwilling to fix this because I don’t understand the problem or the pain. But I hope you see why this problem is fundamental because of the broken behavior of whatever is doing the DOM mutation and you really can’t fix it on our side. One thing can and should do is add better ways to recover from errors, like #1462. But even then, all that would do is to allow you to capture this error higher up, clear that bigger section of the DOM and render an error message in place of it when this does happen. And in fact, in light of that, I think suppressing the error here is actually bad, because it would propagate the error condition elsewhere rather than letting you catch and recover from it. So I think we will have to revert this whenever that lands. In which case, maybe using patch package is a better interim solution? I personably wouldn’t use that word, but if it works in the specific variant of the issue you are running into, that’s great. But I really hope everyone on this thread and finding this thread in the future can take a few minutes to try and understand the interaction before deciding how to proceed. |
And - to be extra clear - this is a very generic symptom (node unexpectedly gone when it comes time to remove it) with many possible causes. If we find a case where this is caused by our internal bookkeeping being messed up or the tracking has gone awry somehow, it’s a very different issue and we should definitely fix that. And that’s why this assertion is here. But I don’t think that’s what happening in this case. |
Appreciate your response @chancancode. My findings support what you were suggesting (not simple or even maybe possible to fix) which is why I posted them. |
@chancancode Thank you for the detailed and thoughtful response. Very much appreciated and that makes sense why you wouldn't want to suppress the error. Thanks for linking to that error recovery PR. It's nice to know that's on the horizon as that is ultimately what I'm after here. The problem I was trying to fix was when our users hit this error, the app stops working all together and there isn't a way for us to recovery without telling users to refresh the page. For our app, ignoring this error has been ok through all the exploratory testing we've done but I'm sure there are some edge cases where it's leaving lingering nodes in the app. I can see how that could be more problematic for other apps though and why we wouldn't want to implement this PR. I'm happy to close this PR and just continue using I appreciate your time! |
@shama Sounds good, I think it's good that your patch and the In Skylight we do have code for rendering an "red box of doom" on the page that uses vanilla DOM code instead of hbs/glimmer ("Sorry something went wrong [Reload]"), that we call when we detect unrecoverable rendering errors. Or.. you can try to do a hacked version of error recovery thing, the official version need to do this much more carefully and make sure we don't corrupt any internal states, call destructors, etc. But if you are looking for an imperfect solution in the meantime, you may be able to do a good enough version. Internally the error recovery code, does something verrrrrry loosely like this: const BOUNDARIES = new WeakMap();
export function didError(element) {
let boundary;
while (!boundary && element) {
boundary = BOUNDARIES.get(element);
if (boundary) {
break;
} else {
element = element.parentElement;
}
}
if (boundary) {
setImmediate(() => boundary.noError = true);
}
}
export default class ErrorBoundary {
@tracked noError = true;
<template>{{#if this.noError}}<div {{did-insert this.register}} {{will-destroy this.unregister}}>{{yield}}</div>{{else}}{{yield to="else"}}{{/if}}</template>
register = (element) => BOUNDARIES.set(element, this);
unregister = () => BOUNDARIES.delete(element);
} Because the This code is definitely not robust and it's still very deep in the undefined-behavior-land, but it sounds like we are already there 🤷🏼 |
Fixes GH-1401
If
current
isn't a child ofparent
callingparent.removeChild(current)
will result in the error:Failed to execute 'removeChild' on 'Node': The node to be removed is not a child of this node
. This commonly occurs when the DOM is externally modified by a browser, such as when translating a page. ThenextSibling
may not be a child node ofparent
.current.parentNode?.removeChild(current)
ensures we are only removing a child node of a parent node.