Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"paste-html" – error when pasting content in which A tag is wrapped in a text-formatting tag (i.e. STRONG) #3350

Open
lazharichir opened this issue Dec 18, 2019 · 14 comments

Comments

@lazharichir
Copy link

Do you want to request a feature or report a bug?

Bug

What's the current behavior?

Go to https://hungerlyyak.htmlpasta.com/ and copy the content. The HTML is below:

<p>The most important thing is to copy this text <strong><a href="https://example.com/">and this A link</a></strong> that is wrapped in STRONG tags.</p>
<p>Open your console and see what it says when you paste.</p>

Then, go to the paste-html example. Open your developer console. Paste the content in the Slate Editor. It triggers an error:

Uncaught Error: The <text> hyperscript tag can only contain text content as children.

This is because the <strong> tag becomes a text tag (with bold: true) and then, there is an Element nested in it ({ type: "link", url:"https://example.com/", children: [{ text: "and this A link" }] } – which is a forbidden nesting as per Slate's constraints.

Slate: 0.56.9
Browser: All
OS: Mac

What's the expected behavior?

I think the bolding should transfer and apply to the Text leaves nested within the Link Element.

@lazharichir
Copy link
Author

lazharichir commented Dec 18, 2019

For those facing this issue, for now, I am basically removing whatever "blocky Element" inside any text element.

// Text
if (TEXT_TAGS[nodeName]) {

	const attr = TEXT_TAGS[nodeName](el)

	// Check for potential conflicts in the children Array
	const nonTextChild = children.find(child => !Text.isText(child))

	if (nonTextChild) {
		// Reduce the children to only have leaf nodes (and remove whatever Block Elements we have in between)
		children = children.reduce((acc: Descendant[], cur: Descendant) => {
			if (!Text.isText(cur)) {
				const leaves = Array.from(Node.texts(cur)).map((entry) => entry[0])
				console.log(JSON.stringify(leaves, null, 4))
				acc.push(...leaves)
			}
			return acc
		}, [])
	}

	return children
		.map(child => {
			child.text = normalizeText(child.text)
			return jsx(`text`, attr, child)
		})
}

@MontoyaAndres
Copy link

The issue persists with the latest version of slate, even with this example https://github.com/ianstormtaylor/slate/blob/master/site/examples/paste-html.js

@MontoyaAndres
Copy link

What I did and it works for me great was to use the previous example and use the deserialize function like this:

import { Node, Text } from "slate";
import { jsx } from "slate-hyperscript";

import { ELEMENT_TAGS, TEXT_TAGS } from "../constants/slate";

export const deserialize = (el: HTMLElement | ChildNode) => {
  if (el.nodeType === 3) {
    return el?.textContent;
  } else if (el.nodeType !== 1) {
    return null;
  } else if (el.nodeName === "BR") {
    return "\n";
  }

  const { nodeName } = el;
  let parent = el;

  if (
    el.nodeName === "PRE" &&
    el.childNodes[0] &&
    el.childNodes[0].nodeName === "CODE"
  ) {
    parent = el.childNodes[0];
  }
  const children: Node[] = Array.from(parent.childNodes)
    .map(deserialize)
    .flat();

  if (el.nodeName === "BODY") {
    return jsx("fragment", {}, children);
  }

  if (ELEMENT_TAGS[nodeName]) {
    const attrs = ELEMENT_TAGS[nodeName](el);
    return jsx("element", attrs, children);
  }

  if (TEXT_TAGS[nodeName]) {
    const attrs = TEXT_TAGS[nodeName](el);

    return children
      .find((child) => Text.isText(child))
      ?.map((child: Node) => jsx("text", attrs, child));
  }

  return children;
};

Comparing it with the code that @lazharichir did I just added this:

if (TEXT_TAGS[nodeName]) {
    const attrs = TEXT_TAGS[nodeName](el);

    return children
      .find((child) => Text.isText(child))
      ?.map((child: Node) => jsx("text", attrs, child));
  }

It just returns what Text.isText(child) thinks is right

@davidgolden
Copy link

@MontoyaAndres's solution worked for me, however I would suggest

return children
    .filter(child => Text.isText(child))
    .map(child => jsx("text", attrs, child));

@MontoyaAndres
Copy link

Why filter instead of find?

@davidgolden
Copy link

Maybe I'm misunderstanding your logic, but I'm thinking the goal is to return only valid children of the text node. Since find() returns 'undefined' if there aren't any, but filter() will always return an array, it just seems cleaner to me.

@MontoyaAndres
Copy link

Interesting, thanks!

@nikita-nikita-nikita
Copy link

in my case i just did this and it works but there was just a {type: "link"} maybe in some other cases that can doesn't work fine

 if (TEXT_TAGS[nodeName]) {
    const attrs = TEXT_TAGS[nodeName](el);
    return children.map(child => {
      if(SlateElement.isElement(child)) {
        return jsx('element', child)
      }
      return jsx('text', attrs, child)
    })
  }

@rohankeskar19
Copy link

There also needs to be a check for !data.types.includes("application/x-slate-fragment") in paste-html.ts to prevent it from converting slate blocks into html.

@dylans
Copy link
Collaborator

dylans commented Sep 13, 2021

This should be fixed in Slate 0.66.0.

Regarding this:

There also needs to be a check for !data.types.includes("application/x-slate-fragment") in paste-html.ts to prevent it from converting slate blocks into html.

Usually the way I handle this is to never reach a paste html handler if there's already application/x-slate-fragment. That said, some would argue that it should only be treated as a slate fragment if it comes from your own domain and not from someone else's slate-based editor which likely has a different ast.

@prerakd
Copy link

prerakd commented Feb 3, 2022

I am still seeing this issue in paste-html example. Is example code updated with latest version?

@NourIM
Copy link

NourIM commented Jun 7, 2023

Can we have a proper solution here, please? We can't ignore part of the text just because of HTML order!
I'm thinking about creating a code to check if any links in content are wrapped with any mark, then fix it as below, then let it go through the deserializing process. But I hate it!

<p>The most important thing is to copy this text <strong><a href="https://example.com/">and this A link</a></strong> that is wrapped in STRONG tags.</p>
<p>Open your console and see what it says when you paste.</p>

To

<p>The most important thing is to copy this text <strong></strong><a href="https://example.com/"><strong>and this A link</strong></a><strong></strong> that is wrapped in STRONG tags.</p>
<p>Open your console and see what it says when you paste.</p>

@mp3846
Copy link

mp3846 commented Oct 7, 2023

This one preserves both the link and its formatting style applied by the wrapping tag, for other nonText elements we can do similar (AI helps a lot with generating regexes)

const resolveWrappingOrders = (html) => {
	const orderPattern =/<(i|b|s|u|em|del|strong)>(.*?)<a(.*?)href="([^"]+)"(.*?)>(.*?)<\/a>(.*?)<\/\1>/g
	const spacePattern =/<\s*(i|b|s|u|em|del|strong)\s*>(\s|&nbsp;|&ensp;|&emsp;|&thinsp;)*<\/\s*(i|b|s|u|em|del|strong)\s*>/g
	return html
		.replace(orderPattern, '<$1>$2</$1><a$3href="$4"$5><$1>$6</$1></a><$1>$7</$1>')
		.replace(spacePattern, '')
}

const withHtml = (editor) => {
    // existing code
    if (html) {
	const parsed = new DOMParser().parseFromString(resolveWrappingOrders(html), 'text/html')
	const fragment = deserialize(parsed.body)
	Transforms.insertFragment(editor, fragment)
	return
    }
    // rest
}

@quadrifolia
Copy link

This HTML gets correctly deserialized:

<p>
  <em>text</em>
  <a href="#">
    <em>link title</em>
  </a>
  <em>text</em>
</p>

this not:

<p>
  <em>text</em>
  <em>
    <a href="#">link title</a>
  </em>
  <em>text</em>
</p>

The problem here is that <em> is a TEXT_TAG. So the deserialize function runs into this:

if ((TEXT_TAGS as any)[nodeName]) {
    const attrs = (TEXT_TAGS as any)[nodeName](el);
    return children.map((child) => jsx('text', attrs, child));
}

I changed it to

    if ((TEXT_TAGS as any)[nodeName]) {
        const attrs = (TEXT_TAGS as any)[nodeName](el);

        if (el.children.length > 0) {
            return Array.from(el.childNodes).map((child: any) => {
                if (child.nodeType === Node.TEXT_NODE) {
                    return jsx('text', attrs, child.textContent);
                } else {
                    const parsed = new DOMParser().parseFromString(
                        typeof child === 'string' ? child : child.outerHTML,
                        'text/html',
                    );
                    return deserialize(parsed.body);
                }
            });
        } else {
            return jsx('text', attrs, el.textContent);
        }
    }

and it worked. The complete HTML is displayed, including the nested link with href and link title.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests