This is a repo I created to keep track of system administration tasks and workflows for virtual machines running (currently) Ubuntu 14.04.2 LTS. If I start managing machines that use a different OS, I will update this README appropriately.
If you attempt to use these notes to create a similar set-up, then I recommend the following rough order:
- Using Volumes (if you need to format the file systems that your data nodes are going to use)
- Install Java
- Install Scala
- Install HDFS
- Install Spark
- Install Hadoop (Optional; I ended up not using Hadoop, just HDFS)
- Using HDFS (to create the /user directory and a user account)
Use the commands in Misc as needed to create any non-spark, non-hdfs users that you need.
Questions and feedback on these instructions are welcome!