Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong parse [rt.cpan.org #55629] #12

Open
oalders opened this issue Aug 24, 2020 · 0 comments
Open

Wrong parse [rt.cpan.org #55629] #12

oalders opened this issue Aug 24, 2020 · 0 comments

Comments

@oalders
Copy link
Member

oalders commented Aug 24, 2020

Migrated from rt.cpan.org#55629 (status was 'open')

Requestors:

From [email protected] on 2010-03-16 15:09:51
:

HTML:
<iframe/**/src="http://mail.ru"  name="poc iframe jacking"  width="100%"
height="100%" scrolling="auto" frameborder="no"></iframe>

$parser = HTML::Parser->new(
 api_version => 3,
 start_h => [ sub{
   my ($Self, $Text, $Tag, $Attr) = @_;
   print "Tag is: ".$Tag;
 }, "self, text, tagname, attr" ]
);
$parser->ignore_elements( qw( iframe ));
$parser->ignore_tags( qw( iframe ));

output:
Tag is: iframe/**/src="http://mail.ru"


From [email protected] on 2010-03-18 13:51:31
:

��� �а� 16 11:09:51 2010, NIKOLAS пи�ал:
> HTML:
> <iframe/**/src="http://mail.ru"  name="poc iframe jacking"  width="100%"
> height="100%" scrolling="auto" frameborder="no"></iframe>
> 
> $parser = HTML::Parser->new(
>  api_version => 3,
>  start_h => [ sub{
>    my ($Self, $Text, $Tag, $Attr) = @_;
>    print "Tag is: ".$Tag;
>  }, "self, text, tagname, attr" ]
> );
> $parser->ignore_elements( qw( iframe ));
> $parser->ignore_tags( qw( iframe ));
> 
> output:
> Tag is: iframe/**/src="http://mail.ru"

HTML: <script/src="ya.ru"> wrong parse same


From [email protected] on 2010-04-04 20:38:08
:

I don't understand what rules you propose that HTML::Parser should follow to parse this kind of 
bogus HTML.  You think it should treat "/**/" and "/" as whitespace?

From [email protected] on 2010-06-01 07:13:54
:

Here 3 regular expressions applied to the entrance text correct this
problems:
s{(/\*)}{ $1}g;
s{(\*/)}{$1 }g;
s{(<[^/\s<>]+)/}{$1 /}g;

Probably you will find more correct architectural decision.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant