Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: Grouped store operations by partition key #7

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

petero-dk
Copy link
Contributor

Batch operations will fail when they are on different partitions. When using custom filters this is more of an issue.

This bugfix will simply separate the batch operations into different batches based on the partition key.

@dei79
Copy link
Member

dei79 commented Jun 7, 2019

This is a wonderful extension but I'm worried about the performance with huge amount of data sets, meaning multiple 10K. Because I see a parameter which enables this operation, what do you think?

@petero-dk
Copy link
Contributor Author

I am not sure what you mean by "parameter that enables this operation" either the caller of the library should do this manually or the library should do this. Any batch operation that spans multiple partitions will fail with an error.

I can see that the Linq groupby clause could be a pain point for performance for very large datasets, I could probably take a stab at changing that out for a manual grouping. Do you think the performance increase would be worth it compared to the readability of the code?

@dei79
Copy link
Member

dei79 commented Jun 7, 2019

You are right in the opinion either the caller or the library should do this. I would like to prefer to have two APIs one which is not doing this grouping automatically and which runs into an error but which is optimized for huge datasets and another one more as comfort function which is doing this stuff so the caller can decide. It could be also encoded in the storage operation flag or so.

@petero-dk
Copy link
Contributor Author

I see your point now, let me think for a second

@petero-dk
Copy link
Contributor Author

I will have an update on this early next week. I did a little testing and refactoring and found one error in my PR and then I got a factor 7 increase in speed on the HugeDemoTest with the latest code. I just need to clean it up

@petero-dk
Copy link
Contributor Author

Please note that I have NOT removed the timer code yet. I just wanted to let you see the progress. The testing has shown that the LINQ groupby clause has zero effect on performance. However the foreach loop that actually makes the connections to tablestorage can be parallelized and that gives a huge performance boost

My test case with 20.000 rows (up from 2000)

Executing Demo Cases (WW Cloud) with Parallel connections

CoreHelpers.WindowsAzure.Storage.Table.Demo.DemoCases.UC10CreateHugeAmountOfDemoEntries
RunTime 00:00:05.67
RunTime 00:00:02.11

Executed Write Operations:
        mergeOrInserOperation: 200
        delete: 200
Executed Query Operations:
        Query: 20



Executing Demo Cases (WW Cloud) without Parallel connections

CoreHelpers.WindowsAzure.Storage.Table.Demo.DemoCases.UC10CreateHugeAmountOfDemoEntries
RunTime 00:01:03.29
RunTime 00:00:57.95

Executed Write Operations:
        mergeOrInserOperation: 200
        delete: 200
Executed Query Operations:
        Query: 20





Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants