Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blocklist size growth #4830

Open
friendly-bits opened this issue Jan 10, 2025 · 8 comments
Open

Blocklist size growth #4830

friendly-bits opened this issue Jan 10, 2025 · 8 comments
Assignees
Labels
question Further information is requested

Comments

@friendly-bits
Copy link

Hi Hagezi,

First of all: thanks for the great work!

Second: I am a contributor to the adblock-lean project (implementing adblocking on OpenWrt). adblock-lean recommends your blocklists to users and currently includes 4 pre-defined presets, each one intended for devices with certain memory capacity (64MiB/128MiB/256MiB/512MiB+).

A few months ago when we came up with these presets, we were able to more or less perfectly balance them based on a selection of your blocklists.

However, it seems that in the past few months the domains count in some of the blocklists grew significantly. So for example, the combination of Pro and tif.mini (which is included in our preset intended for devices with 128MiB of memory) grew from ~250k domains a few months ago to ~311k domains now (numbers after deduplication). This is already borderline too much for these devices. Since adblock-lean (like many other adblockers) implements and by default enables automatic blocklist updates, we may soon be getting into territory where users start getting OOMs and dnsmasq crashes.

So I feel that this needs to be addressed.

I am not entirely sure but it seems to me that the main contributor to the domains count growth are the TIF lists. My current idea to deal with the situation for the 128MiB routers is to downgrade them from Pro to Pro.mini, however this is not ideal because this effectively makes them do less adblocking for more harmful domains blocking, which is not necessarily the best tradeoff.

So we would like to ask:

  1. Is it possible to keep the domains count in blocklists more or less constant?
  2. Is it possible to have another TIF blocklist, smaller than current 'tif.mini'?

Thank you again! Your work is very highly appreciated by many people, including us and our users.

@friendly-bits friendly-bits added the question Further information is requested label Jan 10, 2025
@hagezi
Copy link
Owner

hagezi commented Jan 10, 2025

Hi @friendly-bits,

I have made some optimisations to the lists and some malicious NRDs (normal to ultimate) are now also blocked. Furthermore, I can't save myself from active malicious domains at the moment. They are springing up like mushrooms.

For adblockers that have problems with the list size, the mini versions are available, e.g. Pro mini + TIF mini

Alternatively, use the Light, Normal or Pro alone. A useful combination would be to use the Pro alone and also use a Secure DNS such as Quad9 as an upstream.

It is impossible to keep the lists compatible for all adblockers. I don't want the effectiveness of the normal lists to suffer, especially since they don't contain dead domains or domains that won't be called up in this life anyway. That's why there is the mini option. With the mini version, the size remains more or less constant because it only contains domains from current Top 1M lists. Apart from the TIF mini, which has become significantly larger due to the growing volume of malware, scam, phishing and malicious NRDs.

Somewhere you have to make compromises if the “AdBlocker” cannot cope with large lists. It is impossible to achieve the effectiveness of the Pro with half of the domains.
The Mini versions are designed to cover almost all popular ads and tracking - especially since all Mini versions include Light. They are less effective with popup ads and block almost no TIF domains.

What is the target value for desired combinations?

@hagezi
Copy link
Owner

hagezi commented Jan 10, 2025

@friendly-bits For the TIF.mini I could deactivate the NRDs, some of which are already included in the Pro (only from phishing/scam feeds). I'll test that out.

@hagezi
Copy link
Owner

hagezi commented Jan 10, 2025

I'll have another look at what I can do with the Pro without limiting its effectiveness.

@friendly-bits
Copy link
Author

friendly-bits commented Jan 10, 2025

Thank you for the super prompt reply. In order to not make you wait, I'll write a short reply now and follow up with a bit more details later, if needed.

The target maximum domains count for the 128MiB routers is <300k entries total (that includes the pro list and the tif.mini list, deduplicated - as adblock-lean by default deduplicates the blocklists). Currently we have ~311k. Of course, we prefer the actual count to be a bit smaller than 300k, in order to have some headroom. So let's say 250k total.

I also want to better articulate our request. We are not asking you to make blocklists which fit adblock-lean's specific needs. We can and we do pick a combination of available blocklists which fits the various target devices of our users. The fundamental issue is that the size is growing over time, so a user who picked the then-optimal preset (which even had enough headroom for size fluctuations) 5 months ago may suddenly get a misbehaving router after adblock-lean pulls the updated blocklist. So our request is to keep the blocklist sizes constant. I'm not sure what the best strategy for this would be. Maybe just having a small selection of blocklists with fixed sizes (+/- some margin). Maybe when the largest of these lists overflows, add a new blocklist with next size gradation. There is no need to make too many lists available because the standard memory sizes are doubled. So we are talking about 4 or 5 size gradations which would cover everything from 64MiB to 1024MiB+ devices.

I am pretty sure that this will help not only adblock-lean but virtually every adblocker running on devices with limited memory capacity.

@hagezi
Copy link
Owner

hagezi commented Jan 10, 2025

@friendly-bits Thank you, I understand that. I can try to optimize the lists, but it is almost impossible to keep TIF lists somehow constant. Large jumps are almost normal in these lists - depending on what the feeds deliver. For a small amount of RAM, I recommend not using TIF lists in order to have enough headroom.

@friendly-bits
Copy link
Author

We may have to give up on TIF lists for the smaller memory capacities then. If there is no choice then we will do it. That said, this would be unfortunate. If it's a technical issue then I think it might be solvable with a bit of automation. I could contribute a shell+awk script which would automate that if this helps.

@hagezi
Copy link
Owner

hagezi commented Jan 10, 2025

@friendly-bits I've made a few changes, let's see where we end up with the next release in a few hours. I am currently also struggling with the size of the TIF full and am trying to get this under control.

@friendly-bits
Copy link
Author

Thank you! And I am serious about the offer to help. So please feel free to ping me if/when you are interested. I helped implement blocklist processing in adblock-lean, which (AFAIK) currently has the fastest and most memory efficient processing among the available adblockers for that platform.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants