You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some trivial rippers are a matter of finding one or more CSS patterns that select all of the elements on the page that you want to rip, maybe specifying a naming scheme, a rate limit etc.
Look at some simple examples to notice what they have in common. Need to design a JSON format and algorithm that lets this be more declarative. There's a good chance we can remove a lot of the boilerplate involved in adding and maintaining rippers. Maybe it should be even easier for people to contribute their own.
I think this could be a huge productivity win for keeping up with gestures broadly the ever-changing internet.
The text was updated successfully, but these errors were encountered:
metaprime
changed the title
[Proposal] Add a DeclarativeRipper class, constructed from json description of how to find content on a page with a few properties
[Proposal] Add a (meta) DeclarativeRipperLogic class, constructed from json description of how to find content on a page with a few properties
Jan 7, 2025
It occurs to me that the current approach of finding all the classes and trying to construct them makes the program easily extensible, but we'd need to add at least one other mode of creating and loading rippers.
The JSON descriptions would have to be instantiated as instances of DeclarativeRipperLogic at runtime and selected by running a different kind of query than a "try to construct each one". Which might actually be way more efficient actually. Probably not a problem. This might mean that such a ripper can only be initialized for one type of URL at a time. We'll have to think about how that scales up for queues of lots of items to rip at a time. (Are these heavy instances? Do they all live in a queue, or do the URLs live in a queue and get dispatched at runtime?)
An interesting design problem, in part because of how the app currently works.
Some trivial rippers are a matter of finding one or more CSS patterns that select all of the elements on the page that you want to rip, maybe specifying a naming scheme, a rate limit etc.
Look at some simple examples to notice what they have in common. Need to design a JSON format and algorithm that lets this be more declarative. There's a good chance we can remove a lot of the boilerplate involved in adding and maintaining rippers. Maybe it should be even easier for people to contribute their own.
I think this could be a huge productivity win for keeping up with gestures broadly the ever-changing internet.
The text was updated successfully, but these errors were encountered: