-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix: Grouped store operations by partition key #7
base: master
Are you sure you want to change the base?
Bugfix: Grouped store operations by partition key #7
Conversation
This is a wonderful extension but I'm worried about the performance with huge amount of data sets, meaning multiple 10K. Because I see a parameter which enables this operation, what do you think? |
I am not sure what you mean by "parameter that enables this operation" either the caller of the library should do this manually or the library should do this. Any batch operation that spans multiple partitions will fail with an error. I can see that the Linq groupby clause could be a pain point for performance for very large datasets, I could probably take a stab at changing that out for a manual grouping. Do you think the performance increase would be worth it compared to the readability of the code? |
You are right in the opinion either the caller or the library should do this. I would like to prefer to have two APIs one which is not doing this grouping automatically and which runs into an error but which is optimized for huge datasets and another one more as comfort function which is doing this stuff so the caller can decide. It could be also encoded in the storage operation flag or so. |
I see your point now, let me think for a second |
I will have an update on this early next week. I did a little testing and refactoring and found one error in my PR and then I got a factor 7 increase in speed on the HugeDemoTest with the latest code. I just need to clean it up |
Please note that I have NOT removed the timer code yet. I just wanted to let you see the progress. The testing has shown that the LINQ groupby clause has zero effect on performance. However the foreach loop that actually makes the connections to tablestorage can be parallelized and that gives a huge performance boost My test case with 20.000 rows (up from 2000)
|
Batch operations will fail when they are on different partitions. When using custom filters this is more of an issue.
This bugfix will simply separate the batch operations into different batches based on the partition key.