Skip to content

Latest commit

 

History

History
701 lines (512 loc) · 20.6 KB

README.adoc

File metadata and controls

701 lines (512 loc) · 20.6 KB

One Script Deploy

One script to rule them all: CDP / CDP Private Cloud / CDH / HDP !

Given some machines, this script will setup all pre-requisites, Install, Configure a fully secure cluster and Load Data into it.

Example of a basic launch

Table of tested Deployments

SUCCESS Type Versions OS DB KDC TLS Comment

basic

CDP 7.3.1.0 & CM 7.13.1.0

RHEL 8.10

Postgres 14

MIT KDC

True

full

CDP 7.3.1.0 & CM 7.13.1.0

RHEL 8.10

Postgres 14

MIT KDC

True

all-services

CDP 7.3.1.0 & CM 7.13.1.0 & CFM 2.2.9.0 & CSA 1.14.0.0

RHEL 8.10

Postgres 14

MIT KDC

True

streaming

CDP 7.3.1.0 & CM 7.13.1.0 & CFM 2.2.9.0 & CSA 1.14.0.0

RHEL 8.10

Postgres 14

MIT KDC

True

streaming-with-efm

CDP 7.3.1.0 & CM 7.13.1.0 & CFM 2.2.9.0 & EFM 2.2.99.0 & CSA 1.14.0.0

RHEL 8.10

Postgres 14

MIT KDC

True

observability

CDP 7.3.1.0 & CM 7.13.1.0 & Observability 3.5.3

RHEL 8.10

Postgres 14

MIT KDC

True

pvc

CDP 7.3.1.0 & CM 7.13.1.0 & PvC 1.5.4

RHEL 8.8

Postgres 14

Free IPA

True

all-services-pvc

CDP 7.3.1.0 & CM 7.13.1.0 & PvC 1.5.4

RHEL 8.8

Postgres 14

Free IPA

True

pvc-oc

CDP 7.3.1.0 & CM 7.13.1.0 & PvC 1.5.4

CENTOS 7.9

Postgres 14

Free IPA

True

all-services-pvc-oc

CDP 7.3.1.0 & CM 7.13.1.0 & PvC 1.5.4

CENTOS 7.9

Postgres 14

Free IPA

True

basic

CDP 7.1.9.14 & CM 7.11.3.9

CENTOS 8.8

Postgres 14

MIT KDC

True

On AWS with setup-cluster-on-cloud

full

CDP 7.1.9.14 & CM 7.11.3.9

CENTOS 8.8

Postgres 14

MIT KDC

True

On AWS with setup-cluster-on-cloud

streaming

CDP 7.1.9.14 & CM 7.11.3.9 & CFM 2.1.7

CENTOS 8.8

Postgres 14

MIT KDC

True

On AWS with setup-cluster-on-cloud

pvc

CDP 7.1.9.14 & CM 7.11.3.9 & PvC 1.5.4

CENTOS 8.8

Postgres 14

Free IPA

True

On AWS with setup-cluster-on-cloud

cdp-719

CDP 7.1.9.1015 & CM 7.11.3.26

RHEL 8.10

Postgres 14

MIT KDC

True

cdp-719-basic

CDP 7.1.9.1015 & CM 7.11.3.26

RHEL 8.10

Postgres 14

MIT KDC

True

cdp-719-all-services

CDP 7.1.9.1015 & CM 7.11.3.26

RHEL 8.10

Postgres 14

MIT KDC

True

cdp-719-streaming

CDP 7.1.9.1015 & CM 7.11.3.26 & CFM 2.1.7.1000

RHEL 8.10

Postgres 14

MIT KDC

True

cdp-719-pvc

CDP 7.1.9.1015 & CM 7.11.3.26 & PvC 1.5.4

RHEL 8.8

Postgres 14

Free IPA

True

Use --os-version=8.8

cdp-717

CDH 7.1.7.2026 & CM 7.6.7

CENTOS 7.9

Postgres 14

MIT KDC

True

cdh-5

CDH 5.16 & CM 5.16.2

CENTOS 7.6

Postgres 12

MIT KDC

False

cdh-6

CDH 6.3.4 & CM 6.3.4

CENTOS 7.6

Postgres 12

MIT KDC

False

hdp-3

HDP 3.1.5.6091 & Ambari 2.7.5.0

CENTOS 7.6

MySQL

MIT KDC

False

hdp-2

HDP 2.6.5.0 & Ambari 2.6.2.2

Postgres 12

MySQL

MIT KDC

False

Successful installation are marked with a ✓ in SUCCESS column

Requirements

It requires Ansible with minimum version 2.10 and maximum version 2.16.

Launch the script requirements.sh to enable all requirements before launching the full script.

You need to pass one parameter, which is the type of the machine where you are running it, it could be: rhel, debian, suse or mac. As an example, if you run this on your macOS, you will run:

./requirements.sh mac

Installation

Command line tool

To install a cluster, default one is a CDP 7 - 10 nodes with Kerberos and TLS set:

export PAYWALL_USER=  # Your Paywall User from Cloudera to access archive.cloduera.com
export PAYWALL_PASSWORD=  # Your Paywall password from Cloudera to access archive.cloduera.com
export LICENSE_FILE=   # Your Licence file from Cloudera
export CLUSTER_NAME=   # A name of your choice (ex: cloudera-test )
export NODES=   # *Space* separated list of nodes (ex: "node1 node2 node3 ") (You must provide as much as nodes are needed for the type of installation you are launching, default being 10.)
./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
        --license-file=${LICENSE_FILE} \
        --paywall-username=${PAYWALL_USER} \
        --paywall-password=${PAYWALL_PASSWORD} \
        --nodes="${NODES}"

N.B. : This assumes that a passwordless connection is present from here to all your cluster nodes, however provide a password with --node-password or a private key file with --node-key

Configuration

Many more configurations are available, see them all with:

./setup-cluster.sh --help

NEW: Script to create Instances and launch creation

It is only working with AWS as of now.

Note: This requires you to install Terraform and AWS CLI on the machine where you will launch this script.

It will require also that you create in advance a key pair in AWS, get your private key locally, and also get AWS access key and secret access key:

export ACCESS_KEY="" # Your AWS Access Key
export SECRET_ACCESS_KEY="" # Your AWS Secret Access Key
export KEY_PAIR_NAME="" # Name of a key pair in AWS that will be set to acess your machines
export PRIVATE_KEY_PATH="" # Local private key path to use to access your machines
export WHITELIST_IP="" # Your IP, so only this IP will be able to access your machines
export PAYWALL_USER=  # Your Paywall User from Cloudera to access archive.cloduera.com
export PAYWALL_PASSWORD=  # Your Paywall password from Cloudera to access archive.cloduera.com
export LICENSE_FILE=   # Your Licence file from Cloudera
export CLUSTER_NAME=   # A name of your choice (ex: cloudera-test )
./setup-cluster-on-cloud.sh \
        --cloud-provider="AWS" \
        --aws-access-key=${ACCESS_KEY} \
        --aws-secret-access-key=${SECRET_ACCESS_KEY} \
        --aws-key-pair-name=${KEY_PAIR_NAME} \
        --private-key-path=${PRIVATE_KEY_PATH} \
        --whitelist-ip=${WHITELIST_IP} \
        --os-version=8.7 \
        --setup-etc-hosts=false \
        \
        --cluster-name=${CLUSTER_NAME} \
        --license-file=${LICENSE_FILE} \
        --paywall-username=${PAYWALL_USER} \
        --paywall-password=${PAYWALL_PASSWORD}

All parameters above must be let like this, as they are appropriate to AWS machines. After these parameters, you can add all other parameters that worked with script: setup-cluster.sh.

The script, will use terraform to provide your machines, setup connectivity and then launch setup-cluster.sh with pre-configured parameters to create the wanted cluster.

Examples

!!! Special No license or Paywall Cluster : CDP 7 - Basic 6 nodes !!!

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --cluster-type=basic \
    --nodes-base="${NODES}"

CDP 7 - Full 10 nodes with almost all services (Kerberos / TLS)

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --nodes-base="${NODES}"

CDP 7 - Basic 6 nodes (Kerberos / TLS)

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cluster-type=basic \
    --nodes-base="${NODES}"

CDP 7 - Basic encrypted 6 nodes (Kerberos / TLS) (You can specify 1 or 2 nodes for KTS)

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cluster-type=basic-enc \
    --nodes-kts=<Dedicated Node(s) for KTS> \
    --nodes-base="${NODES}"

CDP 7 - Basic 6 nodes with Free IPA on a dedicated node (All CDP clusters can have free-ipa just by adding --free-ipa=true and provide a node with --node-ipa=) (Kerberos / TLS)

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cluster-type=basic \
    --free-ipa=true \
    --node-ipa=<One node dedicated to IPA> \
    --nodes-base="${NODES}"
./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cluster-type=streaming \
    --nodes-base="${NODES}"
./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cluster-type=all-services \
    --nodes-kts=<Dedicated Node for KTS> \
    --nodes-base="${NODES}"

CDP 7 - 9 nodes with 3 dedicated for PvC with ECS (Kerberos / TLS / FreeIPA)

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cluster-type=pvc \
    --nodes-ecs=<Space separated list of 3 nodes> \
    --node-ipa=<One node dedicated to IPA> \
    --nodes-base="${NODES}"

CDP 7 - 6 nodes basic for PVC with Openshift (Experiences installed on a provided OCP cluster) (Kerberos / TLS / FreeIPA)

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cluster-type=pvc-oc \
    --kubeconfig-path=<Path to your kubeconfig file> \
    --oc-tar-file-path=<Path to your oc.tar file downloaded from RedHat> \
    --node-ipa=<One node dedicated to IPA> \
    --nodes-base="${NODES}"
./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cluster-type=all-services-pvc \
    --nodes-kts=<Dedicated Node for KTS> \
    --node-ipa=<Dedicated Node for IPA> \
    --nodes-ecs=<Space separated list of 3 nodes> \
    --nodes-base="${NODES}"

CDP 7 - Observability cluster (Requires a cluster to be pluggued to; it creates a cluster of 6 nodes ) (Kerberos / TLS)

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cluster-type=observability \
    --altus-key-id=<ALTUS key ID provided by Cloudera> \
    --altus-private-key=<path to ALTUS private key provided by Cloudera> \
    --cm-base-url=<http://<CM host to connect to OBSERVABILITY>:<Port> \
    --tp-host=<Host in base cluster that will have Telemetry Publisher installed> \
    --nodes-base="${NODES}"

CDP 7.1.8 - Full 10 nodes with almost all services (Kerberos / TLS)

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cdh-version='7.1.8.1' \
    --cm-version='7.7.3-33365545' \
    --nodes-base="${NODES}"

CDP 7 - Unsecure

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --kerberos=false \
    --tls=false \
    --nodes-base="${NODES}"

CDH 6 (Kerberos)

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cluster-type=cdh6 \
    --nodes-base="${NODES}"

CDH 5 (Kerberos)

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cluster-type=cdh5 \
    --nodes-base="${NODES}"

HDP 3 (Kerberos)

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cluster-type=hdp3 \
    --nodes-base="${NODES}"

HDP 2 (Kerberos)

./setup-cluster.sh \
    --cluster-name=${CLUSTER_NAME} \
    --license-file=${LICENSE_FILE} \
    --paywall-username=${PAYWALL_USER} \
    --paywall-password=${PAYWALL_PASSWORD} \
    --cluster-type=hdp2 \
    --nodes-base="${NODES}"

Output

CM & Ambari

At the end, CM or Ambari depending on your installation should be available at the first node URL with appropriate http or https and port (depending on tls parameters for HDP which is false by default and tls for CDP which is true by default).

During the installation, you can also follow the installation from CM or Ambari by connecting to it.

N.B.: It is recommended to not interfer with the cluster during ansible installation until it is done

Users and Data

At the end of the installation, if it completed successfully, users are created on machines, their keytabs too and are retrieved in your local computer under /tmp/, krb5.conf is also retrieved.

Moreover, it is also possible to launch some random data generation into various systems.

All default passwords are Cloudera1234

Details on Installation

This describe in details the steps made during the installation in the right order, each one could be skipped and hence be launched separately.

Architecture

Once you gathered all previous requirements, a launch could be made, it will mainly consist of 5 steps:

  • Prepare your machines

  • Launch the installation from the first node of your cluster using appropriate ansible playbook and files

  • Do post-install configuration (mainly for CDP)

  • Create users on your cluster

  • Load some data into your cluster

Each step could be skipped (see command line help).

Scripts

This group of scripts, coordinated by main script: setup-cluster.sh has the goal to configure machines provided and launch a CDP (or HDP, CDH) installation with ansible. Finally, some extra configurations steps and random data could be generated into different services.

All this, is only made from your machine.

This script relies on ansible scripts that must be accessible from your machine (if they are not, please setup an internal webserver and provide its url through command line).

Ansible script relies also on Cloudera repository to access CDP, CM, HDP, Ambari etc…​ (if they are not accessible, please setup an internal webserver and provide its url through command line).

This script relies also on github repository to load data. (if they are not accessible, please setup an internal webserver and provide its url through command line).

Setup Machines

This step uses Playbook hosts_setup.

If you did not set parameter --setup to false, it will prepare all machines by setting ssh-passwordless, pushing required files to them.

N.B.: This step can be done only one time and then bypass if you reuse same machines

Ansible Installation

This step uses Playbook ansible_install_preparation and then launch commands directly on the host to launch ansible installation there.

The first playbook used can be skipped setting parameter --install to false, which is true by default.

It cleans up the first node, creates a directory ~/deployment/ansible-repo/, get ansible repository as a zip in it and add files for your installation in it.

Then, the proper ansible command corresponding to the installation is lauched directly on the first node.

Post Installation

This step uses Playbook post_install.

If you install a CDP cluster and let parameter --post-install to true, it will do some extra-steps, such as setting no unlogin on CM, fix various potential bugs.

User Creation

This step uses Playbook user_creation.

If you did not set explicitly parameter --user-creation to false, and installation completed succesfully, some users are created defined in extra_vars of user_creation.

They are present on all nodes with their /home directory containing their keytabs.

Their keytabs are also fetch in your /tmp directory along with the krb5.conf allowing you to kinit directly from your computer.

Data Loading

This step uses Playbook data_load.

If you let parameter --data-load to true, a data loading step will start (only on CDP, HDP 2 and CDH 5 currently) to generate data into existing services of the paltform: HDFS, HBase, Hive etc…​

It is based on random-datagen project

Note that this step is completely extensible as you can add new files to specify how data should be generated in folder playbooks/data_load/generate_data/models

N.B.: This step will also create Ranger required policies, and these are also extensible by adding policies in playbooks/data_load/ranger_policies/push_policies/policies

Extension

Once you are familiar with these scripts, you can easily tune them using command-line parameters to provide your own cluster files and repositories.

Cluster Definition

To provide a quick new definition of a cluster:

  1. Copy-Paste directory ansible-cdp and name it for example: ansible-cdp-configured

  2. Make all your modifications in files of your copied directory

  3. Launch script with argument: --cluster-type=ansible-cdp-configured (It will automatically take files under ansible-cdp-configured/ directory)

User Creation & Data Loading

Those steps can be launched indepently and you can configure it to create more users or load different and more data.

Look inside playbooks folder to extra_vars.yml to get more about possibilities.

Private Cloud

Private Cloud setup (on ECS or OC) can also be launched independently on a running cluster.

Configuration of private cloud cluster can also be launched independently. (Use --install-pvc=false but --pvc=true to configure but not re-install your pvc).

In extra_vars.yml you can provide CDWs, CDEs, CMLs that will be provisionned for you and also rights that you expect on your users.

Limitations & Known Bugs

  • TLS is not set for HDP & CDH clusters

  • Data loading is not made for HDP 3 & CDH 6 clusters

  • Free IPA is only available for CDP clusters

Please feel free to contribute and help solve and implement TODOs listed in TODOs.adoc