-
Notifications
You must be signed in to change notification settings - Fork 1
Azure Setup #18
Comments
Currently, running it on |
So basically we have two options here:
From what I gather the relationship group will scale by having distinct subsets of workloads. Meaning VM 1 gets WET files 1-100, VM 101-200, etc. Is my assumption correct? How does this distribution work for auto scaling? ENV vars? Second point of interest: Some algorithms run faster than others, can both methods adapt to that? Third point: is there a limit for auto scaling (yes, I think its 20 - please confirm) Fourth point: we need to have a intermediate NodeJS doing authentication before starting the algorithm |
The problem here I am facing is that our algorithms use Java underneath and I was not able to find a way to use them with the Web App Linux(Preview). I tried using the docker container image to it but it seems to be not working. Here https://docs.microsoft.com/en-us/azure/app-service-web/app-service-linux-intro also they have not written anything about running a java app. But with the same image if I use on the Linux VM it works fine. I have mailed them about this lets see what they reply. About other things : Currently our plan is to have different algorithms running on different VMs and one VM for the main application(get the WET file, parse them, pass it to algorithms and store back the results to DB). I think the bottleneck for our application will not be the main application but the VMs on which the algorithms will be running. So we have to add the auto-scaling policy among those VMs based upon the CPU usage. Like if for a VM CPU usage has increased above 80% then add one more instance. Also on top of these VMs we then need to add a load-balancer so that the main application has a single address to send the data to and the load-balancer underneath takes care to send it to associated VMs. Second point: Currently its like main application send the data to all the algorithms and the main application waits for their replies. It doesn't process the next query until it get replies within a particular time from all the algorithms. @SANDR00 can you confirm this ? Third Point: I am not sure which service we will be using. Currently looks like the mix of both services so we have to check the scaling feature of both services. Fourth Point: I am not sure about it. Do you mean something like a handshake mechanism between the main application and algorithm applications ? |
Yes, but we can add multiple calls easily. But we can "scale" that up. It's just config. |
The limit for instances for scaling out Regarding former group4's Open IE algorithm it works on an app service, but is currently public. I'm thinking of a way to implement a static api key via environment variables. Does that answer your question about ENV vars? |
@simonzachau Do you have an estimate for the time needed to process one GB of WET files using the app service setup with 10? Is processing the output of group unstructured data group in a reasonable amount of time feasible? |
Having estimates is absolutely critical, otherwise we will most probably run aground and be able to process maybe 0.1% of the data set - who knows? I'm worried that 10 containers are not nearly enough for what we are doing here. If the app service doesn't allow for more, we must go another route (the self-managed VMs). |
@SANDR00 is your chain able to send requests to former group4's Open IE app service? If not, please tell me what's missing so I can help you get this going. We need this in order to provide statistics. |
Today: @SANDR00 and I tried to integrate former group4's Open IE into the callChain in order to get some statistics for the app service solution. Somehow, the service actually isn't called (at least no console.log in the algorithm is printed), although it works if we change |
@simonzachau I will check and fix it up! |
@ansjin any progress? |
We had all algorithms running on Azure before it went down and already sent data to them and got responses. As of now, we are dependent on G1 and G2 to fill the DB on Azure (when it's up again), then we will take their data and call our 5 machines with it and push everything to the DB. I hope that does not spend too much money. |
About the algortihms : All the algorithms were deployed on the Azure(before the money got finished up) with each algorithm on a single VM. The working of this was tested based upon the data given by team 2 to @SANDR00 locally. @SANDR00 was able to push the relationships and the date events data locally to db. And the next thing was to scale up those algorithms, me and @simonzachau were trying to scale up but before we could really test the scaling up the money got finished. For the time we will not scale up the algorithms, we will just singe VM for each algorithm. Hopefully that doesn't consume up all the money. |
Small-Scale-Up testing: |
Looks good - how does google's interface compare to azure? 😛 |
@kordianbruck it does what it should and doesn't need extra clicks to show you what's running / consuming money xD |
@kordianbruck The required things are easily available without much hassle as its in Azure and the $300 free money :D |
Yeeeeeeeea - I know - what a bummer. Come join the seminar next semester again, we will hopefully have the G-Sponsorship by then 😬 |
Can we close that? Or is there something left to do? |
No, I think you guys are done with this. Autoscaling is working right? |
With Kubernetes yes. Done from my point of view. Only thing that is still undergoing is the queue setup I think. |
as currently discussed at MusicConnectionMachine/RelationshipsG4#34
The text was updated successfully, but these errors were encountered: