Skip to content

Commit

Permalink
C2C-181: Allow creating arbitrary buckets in Minio Alongside analytic…
Browse files Browse the repository at this point in the history
…s bucket (#8)

* C2C-181: Allow creating arbitrary buckets in Minio Alongside analytics bucket

* Remove entry point

* Set snapshot mode to when_needed

* align Parquet docs

* Reorganize drill and External db titles

* Address review

* Disable ANALYTICS_BUCKET override
  • Loading branch information
enyachoke authored Nov 30, 2023
1 parent 8c83abb commit ae8b655
Show file tree
Hide file tree
Showing 8 changed files with 88 additions and 28 deletions.
65 changes: 55 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,17 +121,52 @@ To start this stack run;

`docker compose -f docker-compose-db.yaml -f docker-compose-superset.yaml -f docker-compose-minio.yaml -f docker-compose-drill.yaml up -d --build`

### Services coordinates
| Service | Access| Credentials|
| ------------ | ------------ |------------ |
| Kowl | http://localhost:8282 | |
| Flink | http://localhost:8084 | |
| Superset | http://localhost:8088 | admin/password|
| Minio | http://localhost:9000 |minioadmin/minioadmin123|
| Drill | http://localhost:8047 | |

### Usage with external databases
To Simpify the setup of the project we have included OpenMRS and Analtics databases for easy testing, in production the OpenMRS and Analytics databases will be external to the project. To use external databases you need to set the following environment variables:
| Variable|Description |
|---|----|
|CONNECT_MYSQL_HOSTNAME|The project uses Kafka connect to get the OpenMRS changes we need to set this to the source OpenMRS MySQL host|
|CONNECT_MYSQL_PORT|This is the port the source OpenMRS MySQL is listening on|
|CONNECT_MYSQL_USERNAME|This is the username of a user in the source OpenMRS MySQL with the privileges `SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION` |
|CONNECT_MYSQL_PASSWORD|This is the password of `CONNECT_MYSQL_USERNAME`|
|ANALYTICS_DB_HOST|This is the host of the analytics sink PostgreSQL database |
|ANALYTICS_DB_PORT|This is the port on which the analytics sink PostgreSQL database is listening on |
|ANALYTICS_DB_NAME|This is the name of the analytics sink database|
|ANALYTICS_DB_USER|This is the username for writing into the analytics sink database|
|ANALYTICS_DB_PASSWORD|This is the password for `ANALYTICS_DB_PASSWORD`|

example for a source and sink databases listening on the current host. The example assumes a Linux host

```
export CONNECT_MYSQL_HOSTNAME=172.17.0.1 && \
export CONNECT_MYSQL_PORT=3306 && \
export CONNECT_MYSQL_USERNAME=root && \
export CONNECT_MYSQL_PASSWORD=3cY8Kve4lGey && \
export ANALYTICS_DB_HOST=172.17.0.1 && \
export ANALYTICS_DB_PORT=5432 && \
export ANALYTICS_DB_NAME=analytics && \
export ANALYTICS_DB_USER=analytics && \
export ANALYTICS_DB_PASSWORD=password
```

`docker compose -f docker-compose-db.yaml -f docker-compose-data-pipelines-external.yaml docker-compose-superset.yaml up -d --build`

### Parquet export using an OpenMRS database backup
**Note**: We still need the `docker-compose-db.yaml` file as it will start the PostgreSQL database for Superset if you don't need Superset you can ignore `docker-compose-db.yaml` and `docker-compose-superset.yaml`

### Drill-backed analytics server

In cases where you have multiple instances of Ozone deployed in remote locations, you may what to process data onsite with the streaming and flatening pipelines but ship the data to a central repository for analytics. This provides a solution that uses:
* [Minio](https://min.io/ "Minio") - An S3 compatible object storage server.
* [Drill](https://drill.apache.org/ "Drill") - A Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage.
* [Superset](https://superset.apache.org/ "Superset") - Data exploration and data visualization tool.
* [Superset Worker](https://superset.apache.org/docs/intro "Superset Worker") - Run Superset background tasks.

To start this stack run;

`docker compose -f docker-compose-db.yaml -f docker-compose-superset.yaml -f docker-compose-minio.yaml -f docker-compose-drill.yaml up -d --build`

#### Parquet export using an OpenMRS database backup

- Copy the OpenMRS database dump to `./docker/sqls/mysql`
- cd `docker/` and run the following commands
Expand All @@ -150,7 +185,7 @@ docker compose -f docker-compose-export.yaml up
```
:bulb: data folder should be found at `./docker/data/parquet`

### Parquet export against an existing production deployment
#### Parquet export against an existing production deployment

- Run the batch ETL job to transform the data
```bash
Expand All @@ -161,3 +196,13 @@ docker compose -f docker-compose-migration.yaml -f docker-compose-batch-etl.yaml
docker compose -f docker-compose-export.yaml up
```
:bulb: data folder should be found at `./docker/data/parquet`

### Services coordinates
| Service | Access| Credentials|
| ------------ | ------------ |------------ |
| Kowl | http://localhost:8282 | |
| Flink | http://localhost:8084 | |
| Superset | http://localhost:8088 | admin/password|
| Minio | http://localhost:9000 |minioadmin/minioadmin123|
| Drill | http://localhost:8047 | |

12 changes: 9 additions & 3 deletions docker/.env
Original file line number Diff line number Diff line change
Expand Up @@ -78,14 +78,20 @@ MYSQL_PASSWORD=password
DEBEZIUM_VERSION=1.9

# Minio
UID=1000
GID=1001

MINIO_ROOT_USER=minioadmin
MINIO_ROOT_PASSWORD=minioadmin123
BUCKET=openmrs-data
ANALYTICS_BUCKET=analytics
DEFAULT_BUCKETS=backups
MINIO_DOMAIN=

# Traefik domains
SUPERSET_DOMAIN=c2c-analytics.traefik.me
MINIO_DOMAIN=c2c-analytics-minio.traefik.me
SUPERSET_DOMAIN=
MINIO_DOMAIN=

# Kafka Connect
CONNECT_MYSQL_HOSTNAME=mysql
CONNECT_MYSQL_SERVER_ID=5001
SUPERSET_HOME=
18 changes: 6 additions & 12 deletions docker/docker-compose-minio.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ services:
MINIO_NOTIFY_WEBHOOK_ENABLE: on
MINIO_NOTIFY_WEBHOOK_ENDPOINT: http://minio-webhook:3000
volumes:
- storage-minio:/data
- ${MINIO_DATA_PATH:-storage-minio}:/data
command: server --address ":9099" --console-address ":9000" /data
restart: unless-stopped
labels:
Expand All @@ -28,31 +28,25 @@ services:
- "traefik.http.routers.minio.middlewares=superset-redirect-web-secure"
- "traefik.http.services.minio.loadbalancer.server.port=9000"

createbuckets:
minio-buckets:
networks:
ozone-analytics:
image: minio/mc:RELEASE.2023-10-24T21-42-22Z
build: ./minio-buckets
depends_on:
- minio
environment:
MINIO_ROOT_USER: ${MINIO_ROOT_USER}
MINIO_ROOT_PASSWORD: ${MINIO_ROOT_PASSWORD}
DEFAULT_BUCKET: ${BUCKET}
entrypoint: >
/bin/sh -c "
/usr/bin/mc config host add myminio http://minio:9099 $${MINIO_ROOT_USER} $${MINIO_ROOT_PASSWORD};
/usr/bin/mc mb myminio/$${DEFAULT_BUCKET};
/usr/bin/mc event add myminio/$${DEFAULT_BUCKET} arn:minio:sqs::_:webhook --event put;
exit 0;
"
DEFAULT_BUCKETS: ${DEFAULT_BUCKETS}
ANALYTICS_BUCKET: analytics
minio-webhook:
networks:
ozone-analytics:
build: ./minio-webhook
environment:
ACCESS_KEY: ${MINIO_ROOT_USER}
ACCESS_SECRET: ${MINIO_ROOT_PASSWORD}
DATA_BUCKET: ${BUCKET}
DATA_BUCKET: analytics


volumes:
Expand Down
2 changes: 1 addition & 1 deletion docker/docker-compose-superset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ x-superset-volumes: &superset-volumes
- ./superset/docker:/app/docker
- ./superset/docker/pythonpath:/app/pythonpath
- ./superset/config/datasources:/app/datasources
- superset_home:/app/superset_home
- ${SUPERSET_HOME:-superset_home}:/app/superset_home
x-superset-environment: &superset-environment
- DATABASE_HOST=${POSTGRES_DB_HOST}
- DATABASE_DB=${SUPERSET_DB}
Expand Down
2 changes: 1 addition & 1 deletion docker/drill/storage-plugins-override.conf
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"storage": {
s3: {
"type": "file",
"connection": "s3a://openmrs-data/",
"connection": "s3a://analytics/",
"config": {
"fs.s3a.connection.ssl.enabled": "false",
"fs.s3a.path.style.access": "true"
Expand Down
4 changes: 4 additions & 0 deletions docker/minio-buckets/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
FROM minio/mc:RELEASE.2023-09-02T21-28-03Z
ADD entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]
10 changes: 10 additions & 0 deletions docker/minio-buckets/entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/sh
/usr/bin/mc config host add myminio http://minio:9099 ${MINIO_ROOT_USER} ${MINIO_ROOT_PASSWORD}
IFS=","
for v in $DEFAULT_BUCKETS
do
/usr/bin/mc mb -p myminio/$v
done
/usr/bin/mc mb -p myminio/analytics
/usr/bin/mc event add -p myminio/analytics arn:minio:sqs::_:webhook --event put
exit 0;
3 changes: 2 additions & 1 deletion docker/setup-connect/setup-connect.sh
Original file line number Diff line number Diff line change
Expand Up @@ -38,5 +38,6 @@ curl --fail -i -X PUT -H "Accept:application/json" -H "Content-Type:application/
"timestampConverter.format.time": "HH:mm:ss",
"timestampConverter.format.date": "YYYY-MM-dd",
"timestampConverter.format.datetime": "yyyy-MM-dd HH:mm:ss",
"timestampConverter.debug": "false"
"timestampConverter.debug": "false",
"snapshot.mode": "when_needed"
}'

0 comments on commit ae8b655

Please sign in to comment.