Skip to content
This repository has been archived by the owner on Apr 12, 2019. It is now read-only.

Azure Setup #18

Closed
Henni opened this issue Apr 4, 2017 · 23 comments
Closed

Azure Setup #18

Henni opened this issue Apr 4, 2017 · 23 comments

Comments

@Henni
Copy link
Contributor

Henni commented Apr 4, 2017

as currently discussed at MusicConnectionMachine/RelationshipsG4#34

@simonzachau
Copy link
Contributor

Currently, running it on virtual machines (classic) is fine for debugging/testing. On Monday we should be capable of pushing big data through our chain -> scalability necessary -> Web app on linux (preview) for scaling out

@kordianbruck
Copy link
Contributor

kordianbruck commented Apr 7, 2017

So basically we have two options here:

  1. Setup an automated way to create static VMs and initialize them with the Dockerfile using cloud-init
  2. Using the auto scaling feature of azure

From what I gather the relationship group will scale by having distinct subsets of workloads. Meaning VM 1 gets WET files 1-100, VM 101-200, etc. Is my assumption correct? How does this distribution work for auto scaling? ENV vars?

Second point of interest: Some algorithms run faster than others, can both methods adapt to that?

Third point: is there a limit for auto scaling (yes, I think its 20 - please confirm)

Fourth point: we need to have a intermediate NodeJS doing authentication before starting the algorithm

@ansjin
Copy link
Member

ansjin commented Apr 7, 2017

The problem here I am facing is that our algorithms use Java underneath and I was not able to find a way to use them with the Web App Linux(Preview). I tried using the docker container image to it but it seems to be not working. Here https://docs.microsoft.com/en-us/azure/app-service-web/app-service-linux-intro also they have not written anything about running a java app.

But with the same image if I use on the Linux VM it works fine. I have mailed them about this lets see what they reply.

About other things :

Currently our plan is to have different algorithms running on different VMs and one VM for the main application(get the WET file, parse them, pass it to algorithms and store back the results to DB).

I think the bottleneck for our application will not be the main application but the VMs on which the algorithms will be running. So we have to add the auto-scaling policy among those VMs based upon the CPU usage. Like if for a VM CPU usage has increased above 80% then add one more instance. Also on top of these VMs we then need to add a load-balancer so that the main application has a single address to send the data to and the load-balancer underneath takes care to send it to associated VMs.

Second point: Currently its like main application send the data to all the algorithms and the main application waits for their replies. It doesn't process the next query until it get replies within a particular time from all the algorithms. @SANDR00 can you confirm this ?

Third Point: I am not sure which service we will be using. Currently looks like the mix of both services so we have to check the scaling feature of both services.

Fourth Point: I am not sure about it. Do you mean something like a handshake mechanism between the main application and algorithm applications ?

@Sandr0x00
Copy link

Yes, but we can add multiple calls easily.
We use a queue and at the moment we do one call per algorithm at a time.

But we can "scale" that up. It's just config.

@simonzachau
Copy link
Contributor

simonzachau commented Apr 8, 2017

The limit for instances for scaling out web app on linux (preview) machines is 10. It doesn't matter if you turn on automatic scaling or set it manually. Regarding your question yesterday if automatic scaling works, I tried to test it but it's hard to see how many instances are actually running (there's only a number for the average over a set period of time). I also started an azure performance test which ran better with more instances. Just in case that automatic scaling does not work, we can just manually scale it out to the maximum (the mentioned 10 instances). Currently, I went with the S3 plan for scaling up. So when scaling out to the maximum we have 10x S3. Since each algorithm can have its own app service (given that @ansjin is able to get former group3's algorithms to run on app services) we are able to give each app service the power the algorithm needs (e.g. the coreference algorithm might need 10 instances, a cheaper one might be fine with less instances). As far as I understood in the Microsoft presentation we were told that the instances are load balanced. So requests that we send to our algorithms services are automatically distributed. This is one of the advantages an app service has over manually configured VMs. Our main app is currently not scaled out.

Regarding former group4's Open IE algorithm it works on an app service, but is currently public. I'm thinking of a way to implement a static api key via environment variables. Does that answer your question about ENV vars?

@vviro
Copy link

vviro commented Apr 8, 2017

@simonzachau Do you have an estimate for the time needed to process one GB of WET files using the app service setup with 10? Is processing the output of group unstructured data group in a reasonable amount of time feasible?

@simonzachau
Copy link
Contributor

@vviro unfortunately, we don't have statistics yet. @SANDR00 worked on the WET files but as far as I know he didn't call our app service yet?

@vviro
Copy link

vviro commented Apr 8, 2017

Having estimates is absolutely critical, otherwise we will most probably run aground and be able to process maybe 0.1% of the data set - who knows? I'm worried that 10 containers are not nearly enough for what we are doing here. If the app service doesn't allow for more, we must go another route (the self-managed VMs).

@simonzachau
Copy link
Contributor

@SANDR00 is your chain able to send requests to former group4's Open IE app service? If not, please tell me what's missing so I can help you get this going. We need this in order to provide statistics.

@simonzachau
Copy link
Contributor

Today: @SANDR00 and I tried to integrate former group4's Open IE into the callChain in order to get some statistics for the app service solution. Somehow, the service actually isn't called (at least no console.log in the algorithm is printed), although it works if we change algorithms/openie-stanford/app.js to start it up with a static text variable. Our effort can be found on the adapt-format-of-stanford-open-ie branch. @ansjin could you please have a look at how it differs from former group3's algorithms / if you have an idea what's missing?

@ansjin
Copy link
Member

ansjin commented Apr 8, 2017

@simonzachau I will check and fix it up!

@kordianbruck
Copy link
Contributor

@ansjin any progress?

@Sandr0x00
Copy link

We had all algorithms running on Azure before it went down and already sent data to them and got responses. As of now, we are dependent on G1 and G2 to fill the DB on Azure (when it's up again), then we will take their data and call our 5 machines with it and push everything to the DB. I hope that does not spend too much money.
We will not call our algorithms before that time again to save money. They all are tested and working.

@ansjin
Copy link
Member

ansjin commented Apr 10, 2017

@kordianbruck

About the algortihms : All the algorithms were deployed on the Azure(before the money got finished up) with each algorithm on a single VM. The working of this was tested based upon the data given by team 2 to @SANDR00 locally. @SANDR00 was able to push the relationships and the date events data locally to db.

And the next thing was to scale up those algorithms, me and @simonzachau were trying to scale up but before we could really test the scaling up the money got finished.

For the time we will not scale up the algorithms, we will just singe VM for each algorithm. Hopefully that doesn't consume up all the money.

@ansjin
Copy link
Member

ansjin commented Apr 11, 2017

Currently running all the algorithms on google compute engine.

capture2

Also there is limitation that we can have no more than 8 cores (or Virtual CPUs) running at the same time. So will not be able to test the scalability. Later will be shifted to azure, when it will be back.

@ansjin
Copy link
Member

ansjin commented Apr 13, 2017

Small-Scale-Up testing:
#59 (comment)

@kordianbruck
Copy link
Contributor

Looks good - how does google's interface compare to azure? 😛

@simonzachau
Copy link
Contributor

@kordianbruck it does what it should and doesn't need extra clicks to show you what's running / consuming money xD

@ansjin
Copy link
Member

ansjin commented Apr 13, 2017

@kordianbruck The required things are easily available without much hassle as its in Azure and the $300 free money :D

@kordianbruck
Copy link
Contributor

Yeeeeeeeea - I know - what a bummer. Come join the seminar next semester again, we will hopefully have the G-Sponsorship by then 😬

@Sandr0x00
Copy link

Can we close that? Or is there something left to do?
@kordianbruck @ansjin @simonzachau

@kordianbruck
Copy link
Contributor

No, I think you guys are done with this. Autoscaling is working right?

@simonzachau
Copy link
Contributor

With Kubernetes yes. Done from my point of view. Only thing that is still undergoing is the queue setup I think.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants