Skip to content

Commit

Permalink
OZ-429: Migrate Superset from Ozone (#22)
Browse files Browse the repository at this point in the history
  • Loading branch information
enyachoke authored Jun 24, 2024
1 parent 8166bda commit c117010
Show file tree
Hide file tree
Showing 16 changed files with 771 additions and 119 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -253,4 +253,5 @@ docker/data/parquet/
!docker/sqls/postgres_restore/.gitkeep
!docker/data/parquet/.gitkeep
data/
debezium-connect/
debezium-connect/
scripts/distro
31 changes: 26 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,9 @@ export OPENMRS_DB_NAME=openmrs; \
export EXPORT_DESTINATION_TABLES_PATH=$DISTRO_PATH/distro/configs/analytics/dsl/export/tables/; \
export EXPORT_SOURCE_QUERIES_PATH=$DISTRO_PATH/distro/configs/analytics/dsl/export/queries; \
export EXPORT_OUTPUT_PATH=./data/parquet; \
export EXPORT_OUTPUT_TAG=h1;
export EXPORT_OUTPUT_TAG=h1; \
export SUPERSET_CONFIG_PATH=$DISTRO_PATH/configs/superset/ ; \
export SUPERSET_DASHBOARDS_PATH=$DISTRO_PATH/configs/superset/assets
```

**Note**: The gateway.docker.internal is a special DNS name that resolves to the host machine from within containers. It is only available for Mac and Windows. For Linux, use the docker host IP by default 172.17.0.1
Expand Down Expand Up @@ -89,7 +91,7 @@ Which will start ;
To start the complete streaming and flattening suite, including Superset as the BI tool, run:

```bash
docker compose -f docker-compose-db.yaml -f docker-compose-data-pipelines-local.yaml -f docker-compose-superset.yaml up -d --build
docker compose -f docker-compose-db.yaml -f docker-compose-data-pipelines-local.yaml -f docker-compose-superset.yaml -f docker-compose-superset-ports.yaml up -d --build
```

This will start the following services:
Expand Down Expand Up @@ -120,7 +122,7 @@ In cases where you have multiple instances of Ozone deployed in remote locations

To start this stack run;

`docker compose -f docker-compose-db.yaml -f docker-compose-superset.yaml -f docker-compose-minio.yaml -f docker-compose-drill.yaml up -d --build`
`docker compose -f docker-compose-db.yaml -f docker-compose-superset.yaml -f docker-compose-superset-ports.yaml -f docker-compose-minio.yaml -f docker-compose-drill.yaml up -d --build`


### Usage with external databases
Expand Down Expand Up @@ -158,7 +160,7 @@ export ODOO_DB_HOST=gateway.docker.internal; \
export OPENMRS_DB_HOST=gateway.docker.internal
```

`docker compose -f docker-compose-db.yaml -f docker-compose-streaming-common.yaml docker-compose-superset.yaml up -d --build`
`docker compose -f docker-compose-db.yaml -f docker-compose-streaming-common.yaml docker-compose-superset.yaml -f docker-compose-superset-ports.yaml up -d --build`

**Note**: We still need the `docker-compose-db.yaml` file as it will start the PostgreSQL database for Superset if you don't need Superset you can ignore `docker-compose-db.yaml` and `docker-compose-superset.yaml`

Expand All @@ -172,7 +174,26 @@ In cases where you have multiple instances of Ozone deployed in remote locations

To start this stack run;

`docker compose -f docker-compose-db.yaml -f docker-compose-superset.yaml -f docker-compose-minio.yaml -f docker-compose-drill.yaml up -d --build`
`docker compose -f docker-compose-db.yaml -f docker-compose-superset.yaml -f docker-compose-superset-ports.yaml-f docker-compose-minio.yaml -f docker-compose-drill.yaml up -d --build`

### Running with helper scripts
The examples above are for running the services manually, we have included helper scripts to simplify the process of running the services. The helper scripts are located in the `scripts` folder. The scripts assume you have an Ozone instance running locally. If you don't follow the instructions [here](#to-run) section to start the services.
To run the services using the helper scripts you:

```bash
cd scripts
```
Fetch the Ozone Pro Distro

```bash
./fetch-distro.sh 1.0.0-SNAPSHOT
```
Start the project with streaming pipelines

```bash
./start.sh
```


#### Parquet export using an OpenMRS database backup

Expand Down
22 changes: 19 additions & 3 deletions docker/.env
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,18 @@ SECRET_KEY=000000OOOO00000_ThisSampleSecretShouldBeReplaced_00000OOOO000000
SUPERSET_DB=superset
SUPERSET_DB_USER=superset
SUPERSET_DB_PASSWORD=superset
ADMIN_PASSWORD=password
SUPERSET_ADMIN_USERNAME=admin
SUPERSET_ADMIN_PASSWORD=password
SUPERSET_LOAD_EXAMPLES=no
DATABASE_DIALECT=postgresql
DATABASE_PORT=5432
ENABLE_PROXY_FIX=True
REDIS_HOST=redis
REDIS_PORT=6379
SUPERSET_CLIENT_SECRET=
SUPERSET_CLIENT_ID=superset
ANALYTICS_DATASOURCE_NAME=PostgreSQL
ENABLE_OAUTH=false

ANALYTICS_DB_USER=analytics
ANALYTICS_DB_PASSWORD=password
Expand All @@ -56,12 +61,17 @@ ODOO_ANALYTICS_TABLES='databasechangelog,account_account,product_category,sale_o
ANALYTICS_KAFKA_URL=kafka:9092

# Kafka
CREATE_TOPICS=openmrs.openmrs.appointment_service,openmrs.openmrs.appointment_service_type,openmrs.openmrs.care_setting,openmrs.openmrs.concept,openmrs.openmrs.concept_name,openmrs.openmrs.concept_reference_map,openmrs.openmrs.concept_reference_source,openmrs.openmrs.concept_reference_term,openmrs.openmrs.conditions,openmrs.openmrs.encounter,openmrs.openmrs.encounter_diagnosis,openmrs.openmrs.encounter_type,openmrs.openmrs.location,openmrs.openmrs.form,openmrs.openmrs.obs,openmrs.openmrs.order_type,openmrs.openmrs.orders,openmrs.openmrs.patient,openmrs.openmrs.patient_appointment,openmrs.openmrs.patient_appointment_provider,openmrs.openmrs.patient_identifier,openmrs.openmrs.patient_identifier_type,openmrs.openmrs.patient_program,openmrs.openmrs.program,openmrs.openmrs.person,openmrs.openmrs.person_attribute,openmrs.openmrs.person_attribute_type,openmrs.openmrs.person_name,openmrs.openmrs.person_address,openmrs.openmrs.visit_type,openmrs.openmrs.visit,openmrs.openmrs.visit_attribute,openmrs.openmrs.visit_attribute_type,odoo.public.sale_order,odoo.public.sale_order_line,odoo.public.res_partner,odoo.public.product_product,odoo.public.product_template,odoo.public.ir_model_data
CREATE_TOPICS=openmrs.openmrs.appointment_service,openmrs.openmrs.appointment_service_type,openmrs.openmrs.care_setting,openmrs.openmrs.concept,openmrs.openmrs.concept_set,openmrs.openmrs.concept_answer,openmrs.openmrs.concept_name,openmrs.openmrs.concept_reference_map,openmrs.openmrs.concept_reference_source,openmrs.openmrs.concept_reference_term,openmrs.openmrs.conditions,openmrs.openmrs.encounter,openmrs.openmrs.encounter_diagnosis,openmrs.openmrs.encounter_type,openmrs.openmrs.location,openmrs.openmrs.location_tag_map,openmrs.openmrs.location_tag,openmrs.openmrs.form,openmrs.openmrs.obs,openmrs.openmrs.order_type,openmrs.openmrs.orders,openmrs.openmrs.patient,openmrs.openmrs.patient_appointment,openmrs.openmrs.patient_appointment_provider,openmrs.openmrs.patient_identifier,openmrs.openmrs.patient_identifier_type,openmrs.openmrs.patient_program,openmrs.openmrs.program,openmrs.openmrs.person,openmrs.openmrs.person_attribute,openmrs.openmrs.person_attribute_type,openmrs.openmrs.person_name,openmrs.openmrs.person_address,openmrs.openmrs.visit_type,openmrs.openmrs.visit,openmrs.openmrs.visit_attribute,openmrs.openmrs.visit_attribute_type,odoo.public.sale_order,odoo.public.sale_order_line,odoo.public.res_partner,odoo.public.product_product,odoo.public.product_template,odoo.public.ir_model_data

# Postgres
POSTGRES_USER=postgres
POSTGRES_PASSWORD=password
POSTGRES_DB_HOST=postgresql
SUPERSET_DB=superset
SUPERSET_DB_USER=superset
SUPERSET_DB_PASSWORD=password
ENABLE_OAUTH=false


# Flink
JOB_MANAGER_PROCESS_MEMORY=1000m
Expand Down Expand Up @@ -91,13 +101,19 @@ ANALYTICS_BUCKET=analytics
DEFAULT_BUCKETS=backups
MINIO_DOMAIN=

# Traefik domains
# Traefik
SUPERSET_DOMAIN=
MINIO_DOMAIN=
SUPERSET_CERT_RESOLVER=letsencrypt
MINIO_CERT_RESOLVER=letsencrypt


# Kafka Connect
CONNECT_MYSQL_HOSTNAME=mysql
CONNECT_MYSQL_SERVER_ID=5001
SUPERSET_HOME=

ZOOKEEPER_URL=zookeeper:2181

#Keycloak
KEYCLOAK_HOSTNAME=
7 changes: 0 additions & 7 deletions docker/docker-compose-db.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,6 @@ services:
MYSQL_ROOT_PASSWORD: ${MYSQL_ROOT_PASSWORD}
MYSQL_USER: ${MYSQL_USER}
MYSQL_PASSWORD: ${MYSQL_PASSWORD}
ports:
- "3306:3306"
healthcheck:
test: mysqladmin ping -h 127.0.0.1 -u $$MYSQL_USER --password=$$MYSQL_PASSWORD
volumes:
Expand Down Expand Up @@ -46,11 +44,6 @@ services:
ODOO_DB_NAME: ${CONNECT_ODOO_DB_NAME}
volumes:
- ${POSTGRES_DATADIR:-postgresql-data}:/var/lib/postgresql/data
- "${SQL_SCRIPTS_PATH:-./sqls/postgresql}:/docker-entrypoint-initdb.d"
ports:
- "5432:5432"


volumes:
mysql-data:
postgresql-data: ~
Expand Down
2 changes: 1 addition & 1 deletion docker/docker-compose-minio.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ services:
- "traefik.docker.network=web"
- "traefik.http.routers.minio.rule=Host(`${MINIO_DOMAIN}`)"
- "traefik.http.routers.minio.tls=true"
- "traefik.http.routers.minio.tls.certresolver=letsencrypt"
- "traefik.http.routers.minio.tls.certresolver=${MINIO_CERT_RESOLVER}"
- "traefik.http.routers.minio.entrypoints=websecure"
- "traefik.http.middlewares.minio-redirect-web-secure.redirectscheme.scheme=https"
- "traefik.http.routers.minio.middlewares=superset-redirect-web-secure"
Expand Down
4 changes: 4 additions & 0 deletions docker/docker-compose-superset-ports.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
services:
superset:
ports:
- "8088:8088"
155 changes: 78 additions & 77 deletions docker/docker-compose-superset.yaml
Original file line number Diff line number Diff line change
@@ -1,103 +1,104 @@
x-superset-volumes: &superset-volumes
- ./superset/docker:/app/docker
- ./superset/docker/pythonpath:/app/pythonpath
- ./superset/config/datasources:/app/datasources
- ${SUPERSET_HOME:-superset_home}:/app/superset_home
x-superset-environment: &superset-environment
- DATABASE_HOST=${POSTGRES_DB_HOST}
- DATABASE_DB=${SUPERSET_DB}
- DATABASE_USER=${SUPERSET_DB_USER}
- DATABASE_PASSWORD=${SUPERSET_DB_PASSWORD}
- SECRET_KEY=${SECRET_KEY}
- ANALYTICS_DATASOURCE_NAME=Postgres
- ANALYTICS_DB_NAME=${ANALYTICS_DB_NAME}
- ANALYTICS_DB_USER=${ANALYTICS_DB_USER}
- ANALYTICS_DB_PASSWORD=${ANALYTICS_DB_PASSWORD}
- ANALYTICS_DB_HOST=${POSTGRES_DB_HOST}
- FLASK_ENV=production
- SUPERSET_ENV=production
- DATABASE_DIALECT=${DATABASE_DIALECT}
- DATABASE_PORT=${DATABASE_PORT}
- ENABLE_PROXY_FIX=${ENABLE_PROXY_FIX}
- REDIS_HOST=${REDIS_HOST}
- REDIS_PORT=${REDIS_PORT}
- ADMIN_PASSWORD=${ADMIN_PASSWORD}
version: '3.8'
services:

superset:
command: ["/app/docker/docker-bootstrap.sh", "app-gunicorn"]
networks:
ozone-analytics:
web:
build:
context: superset/
environment: *superset-environment
restart: on-failure
depends_on:
redis:
condition: service_started
postgresql:
condition: service_started
superset-init:
condition: service_completed_successfully
ports:
- "8088:8088"
volumes: *superset-volumes
environment: &superset-env
- DATABASE_HOST=${POSTGRES_DB_HOST}
- DATABASE_DB=${SUPERSET_DB}
- DATABASE_USER=${SUPERSET_DB_USER}
- DATABASE_PASSWORD=${SUPERSET_DB_PASSWORD}
- SECRET_KEY=${SECRET_KEY}
- ADMIN_USERNAME=${SUPERSET_ADMIN_USERNAME}
- ADMIN_PASSWORD=${SUPERSET_ADMIN_PASSWORD}
- ANALYTICS_DB_PASSWORD=${ANALYTICS_DB_PASSWORD}
- ANALYTICS_DB_NAME=${ANALYTICS_DB_NAME}
- ANALYTICS_DB_USER=${ANALYTICS_DB_USER}
- ANALYTICS_DB_HOST=${ANALYTICS_DB_HOST}
- ANALYTICS_DATASOURCE_NAME=${ANALYTICS_DATASOURCE_NAME}
- SUPERSET_PUBLIC_URL=https://${SUPERSET_HOSTNAME}
- KEYCLOAK_URL=https://${KEYCLOAK_HOSTNAME}
- SUPERSET_CLIENT_SECRET=${SUPERSET_CLIENT_SECRET}
- SUPERSET_CLIENT_ID=${SUPERSET_CLIENT_ID}

image: &superset-image mekomsolutions/superset-pro
labels:
- "traefik.enable=true"
- "traefik.docker.network=web"
- "traefik.http.routers.superset.rule=Host(`${SUPERSET_DOMAIN}`)"
- "traefik.http.routers.superset.tls=true"
- "traefik.http.routers.superset.tls.certresolver=letsencrypt"
- "traefik.http.routers.superset.entrypoints=websecure"
- "traefik.http.middlewares.superset-redirect-web-secure.redirectscheme.scheme=https"
- "traefik.http.routers.superset-web.middlewares=superset-redirect-web-secure"
- "traefik.http.routers.superset-web.rule=Host(`${SUPERSET_DOMAIN}`)"
- "traefik.http.routers.superset-web.entrypoints=web"
superset-worker:
traefik.enable: "true"
traefik.http.routers.superset.rule: "Host(`${SUPERSET_HOSTNAME}`)"
traefik.http.routers.superset.entrypoints: "websecure"
traefik.http.services.superset.loadbalancer.server.port: 8088
networks:
ozone-analytics:
build:
context: superset/
environment: *superset-environment
restart: on-failure
- web
- ozone-analytics
restart: unless-stopped
volumes:
- ${SUPERSET_CONFIG_PATH}/:/etc/superset/

superset-worker:
command: "celery --app=superset.tasks.celery_app:app worker --pool=gevent -Ofair -n worker1@%h --loglevel=INFO"
depends_on:
redis:
condition: service_started
postgresql:
condition: service_started
superset-init:
condition: service_completed_successfully
command: ["/app/docker/docker-bootstrap.sh", "worker"]
volumes: *superset-volumes
redis:
image: redis
restart: on-failure
environment: *superset-env
image: *superset-image
restart: unless-stopped
volumes:
- redis:/data
- ${SUPERSET_CONFIG_PATH}/:/etc/superset/
networks:
ozone-analytics:

- ozone-analytics
superset-init:
networks:
ozone-analytics:
build:
context: superset/
environment: *superset-environment
restart: on-failure
command: "/etc/superset/superset-init.sh"
depends_on:
postgresql:
condition: service_healthy
redis:
condition: service_started
command: ["/app/docker/docker-init.sh"]
volumes: *superset-volumes
- postgresql
- redis
environment: *superset-env
image: *superset-image
restart: on-failure
volumes:
- ${SUPERSET_CONFIG_PATH}/:/etc/superset/
- ${SUPERSET_DASHBOARDS_PATH}/:/dashboards/
networks:
- ozone-analytics

redis:
image: redis:7
restart: unless-stopped
volumes:
- redis-data:/data
networks:
- ozone-analytics

postgresql:
environment:
# Analytics
ANALYTICS_DB_NAME: ${ANALYTICS_DB_NAME}
ANALYTICS_DB_USER: ${ANALYTICS_DB_USER}
ANALYTICS_DB_PASSWORD: ${ANALYTICS_DB_PASSWORD}
# Superset
SUPERSET_DB: ${SUPERSET_DB}
SUPERSET_DB_USER: ${SUPERSET_DB_USER}
SUPERSET_DB_PASSWORD: ${SUPERSET_DB_PASSWORD}
volumes:
- "${SQL_SCRIPTS_PATH}/postgresql/create_db.sh:/docker-entrypoint-initdb.d/create_db.sh"
- "${SQL_SCRIPTS_PATH}/postgresql/analytics:/docker-entrypoint-initdb.d/db/analytics"
- "${SQL_SCRIPTS_PATH}/postgresql/superset:/docker-entrypoint-initdb.d/db/superset"
networks:
- ozone-analytics

volumes:
redis: ~
superset_home:
external: false
redis-data: ~
networks:
ozone-analytics:
web:
web:
external: true
name: web
name: web
Binary file added scripts/.mvn/wrapper/maven-wrapper.jar
Binary file not shown.
18 changes: 18 additions & 0 deletions scripts/.mvn/wrapper/maven-wrapper.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
distributionUrl=https://repo.maven.apache.org/maven2/org/apache/maven/apache-maven/3.9.6/apache-maven-3.9.6-bin.zip
wrapperUrl=https://repo.maven.apache.org/maven2/org/apache/maven/wrapper/maven-wrapper/3.1.1/maven-wrapper-3.1.1.jar
16 changes: 16 additions & 0 deletions scripts/destroy.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#!/usr/bin/env bash
set -e

source utils.sh

# Export the DISTRO_PATH variable
setupDirs

# Export the paths variables to point to distro artifacts
exportEnvs

setTraefikIP

setTraefikHostnames

docker compose -p ozone-analytics -f ../docker/docker-compose-db.yaml -f ../docker/docker-compose-migration.yaml -f ../docker/docker-compose-streaming-common.yaml -f ../docker/docker-compose-kowl.yaml -f ../docker/docker-compose-superset.yaml down -v
Loading

0 comments on commit c117010

Please sign in to comment.