-
Notifications
You must be signed in to change notification settings - Fork 762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Concurrency To Download Vendor List #4005
base: master
Are you sure you want to change the base?
Use Concurrency To Download Vendor List #4005
Conversation
…ncurrency behaviour; add sample configuratio in the sample
@@ -284,12 +292,6 @@ var vendorList2Expected = testExpected{ | |||
vendorPurposes: map[int]bool{1: false, 2: true, 3: true}, | |||
} | |||
|
|||
var vendorListFallbackExpected = testExpected{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code block is not used at all so i deleted it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you need to delete it to accomplish your goal in this PR ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but why we need to keep it if the code is not used at all? there is no comment or description associated with the deleted test case.
i used a practice called grooming as you go
- so if you happen to touch the related code piece as part of the solution, then you just do a little bit grooming to avoid large bulk of grooming/refactoring in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I see no problem with that.
config/config.go
Outdated
@@ -240,10 +240,21 @@ type Privacy struct { | |||
LMT LMT | |||
} | |||
|
|||
type VendorListFetcher struct { | |||
// Max concurrent request for downloading the latest version of the vendor list from a specific major version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the word "request" should be spelled in the plural form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks !
config/config.go
Outdated
// If it is 0 or negative, then means there is no limit for the max concurrency. | ||
MaxConcurrencyInitFetchLatestVersion int `mapstructure:"max_concurrency_init_fetch_latest_version"` | ||
|
||
// Max concurrent request for downloading every specific version of the vendor list from its first version to its latest version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the word "request" should be spelled in the plural form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks !
Enabled bool `mapstructure:"enabled"` | ||
HostVendorID int `mapstructure:"host_vendor_id"` | ||
DefaultValue string `mapstructure:"default_value"` | ||
Enabled bool `mapstructure:"enabled"` | ||
HostVendorID int `mapstructure:"host_vendor_id"` | ||
DefaultValue string `mapstructure:"default_value"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the vertical alignment was intentional.
Why change the indentation style of those lines ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sebhtml do you mind elaborating more about vertical alignment was intentional
? Changing the indentation is automatically done by the go formatter ( i am using MS Visual Studio ) , which i believe it is easier to read.
but i am curious about other technical aspects related to it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The types are vertically aligned:
|
v
Timeouts GDPRTimeouts `mapstructure:"timeouts_ms"`
NonStandardPublishers []string `mapstructure:"non_standard_publishers,flow"`
NonStandardPublisherMap map[string]struct{}
TCF2 TCF2 `mapstructure:"tcf2"`
AMPException bool `mapstructure:"amp_exception"`
see
prebid-server/config/config.go
Line 258 in 0a5b937
Timeouts GDPRTimeouts `mapstructure:"timeouts_ms"` |
I don't know if my comment is relevant though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be go fmt's decision; they actually look wrong before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bsardo let me know your opinion
config/config.go
Outdated
v.SetDefault("gdpr.vendorlist_fetcher.max_concurrency_init_fetch_latest_version", 1) // by default one download at a time | ||
v.SetDefault("gdpr.vendorlist_fetcher.max_concurrency_init_fetch_latest_version", 1) // by default one download at a time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this line repeated twice ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, will change it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks !
gdpr/vendorlist-fetching.go
Outdated
tsStart := time.Now() // For logging how long this takes | ||
for _, v := range versions { | ||
specVersion := v.specVersion | ||
firstVersion := v.firstListVersion | ||
wgLatestVersion.Go(func() error { | ||
latestVersion := saveOne(ctx, client, urlMaker(specVersion, 0), saver) | ||
for i := firstVersion; i < latestVersion; i++ { | ||
currentVersion := i | ||
wgSpecificVersion.Go(func() error { | ||
saveOne(ctx, client, urlMaker(specVersion, currentVersion), saver) | ||
return nil | ||
}) | ||
} | ||
return nil | ||
}) | ||
} | ||
|
||
wgLatestVersion.Wait() | ||
wgSpecificVersion.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi !
I don't know if it matters.
The WaitGroup wgSpecificVersion
will likely wait for more Goroutines than wgLatestVersion
.
Globally, you start Goroutines with a limit of conf.MaxConcurrencyInitFetchLatestVersion + conf.MaxConcurrencyInitFetchSpecificVersion
concurrent Go routines.
So why have 2 wait groups instead of just 1 wait group ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sebhtml , good thought, however the reason i did it is to have fine granular control of two different behaviours:
- fetching the latest version
- fetching each specific version.
plus provide sequential execution as default behaviour so the decision of tuning the controlling parameters is determined by the host.
by using only one wait group, there are two thoughts behind it
-
conf.MaxConcurrencyInitFetchLatestVersion
andconf.MaxConcurrencyInitFetchSpecificVersion
loses its meaning since one wait group wont be able to control two types of behaviour precisely. -
lets say you use one wait group and the corresponding config is
conf.MaxConcurrencyInitFetch
, and then you set the value to1
( which is supposed to mean it is a sequential execution ) with the following implementation
var wg errorgroup.Group
for _, v := range versions {
specVersion := v.specVersion
firstVersion := v.firstListVersion
wg.Go(func() error {
latestVersion := saveOne(ctx, client, urlMaker(specVersion, 0), saver)
for i := firstVersion; i < latestVersion; i++ {
currentVersion := i
wg.Go(func() error {
saveOne(ctx, client, urlMaker(specVersion, currentVersion), saver)
return nil
})
}
return nil
})
}
then you have deadlock because one go routine is waiting for another go routine while only 1 go routine is available
let me know if you have different idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. You have a good point. I therefore resolve this thread.
gdpr/vendorlist-fetching_test.go
Outdated
saverMutex.Lock() | ||
*s = append(*s, vi) | ||
saverMutex.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If *s = append(*s, vi)
panics, I think that the saverMutex
will never be unlocked.
I think that deferring can avoid that:
saverMutex.Lock()
defer saverMutex.Unlock()
*s = append(*s, vi)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
vendorlist_fetcher: | ||
max_concurrency_init_fetch_latest_version: 2 | ||
max_concurrency_init_fetch_specific_version: 20 | ||
timeouts_ms: | ||
init_vendorlist_fetches: 30000 | ||
active_vendorlist_fetch: 30000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are those new configuration keys optional ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if init_vendorlist_fetches
is 0 - then you literally set 0 sec as the timeout of preloading the vendor list, what you end up seeing the preload will have context deadline exceeded/timeout
active_vendorlist_fetch
is more for fetching the vendor list on the run - same idea as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @zhongshixi
I am not asking what are those new configuration keys for.
I am asking whether or not the Prebid Server software works if you don't provide them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sebhtml
for vendor list max concurrent request
v.SetDefault("gdpr.vendorlist_fetcher.max_concurrency_init_fetch_latest_version", 1) // by default one fetch at a time
v.SetDefault("gdpr.vendorlist_fetcher.max_concurrency_init_fetch_specific_version", 1) // by default one fetch at a time
we have default settings, which is literally the same as before ( sequential download )
for init_vendorlist_fetches
and active_vendorlist_fetch
, here is the default, if you do not set them to a reason value, you wont be able to preload the vendor list due to context deadline issue. The server can still spin up and run perfectly, but you do not have a prefetched vendor list
v.SetDefault("gdpr.timeouts_ms.init_vendorlist_fetches", 0)
v.SetDefault("gdpr.timeouts_ms.active_vendorlist_fetch", 0)
let me know if i answered your question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it's not backward-compatible.
I think that it should be backward-compatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sebhtml i am not following
- even without my change the issue
context deadline issue
still happen, because you can not really set 0 timeout , the default setting is using0
as timeout, i talked to many people who host prebid server, they all configure the following value to more than 0 so we do not immediately timeout when the fetchiing request is fired
v.SetDefault("gdpr.timeouts_ms.init_vendorlist_fetches", 0)
v.SetDefault("gdpr.timeouts_ms.active_vendorlist_fetch", 0)
- the default logic before my PR change uses sequential execution, the default setting added in this PR also leads to sequential execution
v.SetDefault("gdpr.vendorlist_fetcher.max_concurrency_init_fetch_latest_version", 1) // by default one fetch at a time
v.SetDefault("gdpr.vendorlist_fetcher.max_concurrency_init_fetch_specific_version", 1) // by default one fetch at a time
in summary, i do not think this PR incurs any compatibility issue. So i would like to ask if could you point out exactly which part of the code logic is not backwards compatible, maybe i am actually missing something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how to rephrase it. So I let it go.
for i := firstVersion; i < latestVersion; i++ { | ||
currentVersion := i | ||
wgSpecificVersion.Go(func() error { | ||
saveOne(ctx, client, urlMaker(specVersion, currentVersion), saver) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to help reviewer understand the reason behind reason that i do not use channel or wait group+mutex is because the implementation of saveOne
is already mutex safe due to its using sync.Map
underneath, there is no need to add unnecessary new lock mechanism
Problem
The current implementation of downloading vendor list in preload stage is executed sequentially for individual vendor file - it costs 5 to 10 seconds to download all ~300 vendor files on my local machine ( Apple M1 Pro, 32GB memory).
Solution
the solution is to optimize the preload procedure by applying concurrency which is configurable by the end user. The end user can tune the number depending on their own spec of the machine.
by default, the concurrency is disabled so it wont impact anything. It is end-user's responsibility to provide a number to tune the performance.
in my
pbs.yaml
config , i set the following value which i observed almost 5x to 10x performance improvement ( ~500 to 600 ms ), any number more than 20 formax_concurrency_init_fetch_specific_version
, the performance increase is diminishingyou are welcome to play around with the solution on your own setting and i highly recommend you see the number yourself.