Skip to content

Commit

Permalink
Merge pull request #47 from CuteChuanChuan/develop
Browse files Browse the repository at this point in the history
Provide README
  • Loading branch information
CuteChuanChuan authored Oct 27, 2023
2 parents a21f274 + 21f3b1f commit b8184aa
Show file tree
Hide file tree
Showing 13 changed files with 218 additions and 93 deletions.
113 changes: 113 additions & 0 deletions .github/workflows/cloud_build_deploy_run.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# This workflow build and push a Docker container to Google Artifact Registry and deploy it on Cloud Run when a commit is pushed to the "main" branch
#
# Overview:
#
# 1. Authenticate to Google Cloud
# 2. Authenticate Docker to Artifact Registry
# 3. Build a docker container
# 4. Publish it to Google Artifact Registry
# 5. Deploy it to Cloud Run
#
# To configure this workflow:
#

# 2. Create and configure Workload Identity Federation for GitHub (https://github.com/google-github-actions/auth#setting-up-workload-identity-federation)
#
# 3. Ensure the required IAM permissions are granted
#
# Cloud Run
# roles/run.admin
# roles/iam.serviceAccountUser (to act as the Cloud Run runtime service account)
#
# Artifact Registry
# roles/artifactregistry.admin (project or repository level)
#
# NOTE: You should always follow the principle of least privilege when assigning IAM roles
#
# 4. Create GitHub secrets for WIF_PROVIDER and WIF_SERVICE_ACCOUNT
#
# 5. Change the values for the GAR_LOCATION, SERVICE and REGION environment variables (below).
#

#
# For more support on how to run this workflow, please visit https://github.com/marketplace/actions/deploy-to-cloud-run
#
# Further reading:
# Cloud Run IAM permissions - https://cloud.google.com/run/docs/deploying
# Artifact Registry IAM permissions - https://cloud.google.com/artifact-registry/docs/access-control#roles
# Container Registry vs Artifact Registry - https://cloud.google.com/blog/products/application-development/understanding-artifact-registry-vs-container-registry
# Principle of least privilege - https://cloud.google.com/blog/products/identity-security/dont-get-pwned-practicing-the-principle-of-least-privilege

name: Build and Deploy to Cloud Run

on:
push:
branches: [ "develop" ]

env:
PROJECT_ID: comment-detector-400115
GAR_LOCATION: asia-east1
REPOSITORY: comment-detector
SERVICE: server
REGION: asia-east1

jobs:
deploy:
# Add 'id-token' with the intended permissions for workload identity federation
permissions:
contents: 'read'
id-token: 'write'

runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3

- name: Google Auth
id: auth
uses: 'google-github-actions/auth@v0'
with:
token_format: 'access_token'
workload_identity_provider: '${{ secrets.WIF_PROVIDER }}' # e.g. - projects/123456789/locations/global/workloadIdentityPools/my-pool/providers/my-provider
service_account: '${{ secrets.WIF_SERVICE_ACCOUNT }}' # e.g. - [email protected]

# NOTE: Alternative option - authentication via credentials json
# - name: Google Auth
# id: auth
# uses: 'google-github-actions/auth@v0'
# with:
# credentials_json: '${{ secrets.GCP_CREDENTIALS }}''

# BEGIN - Docker auth and build (NOTE: If you already have a container image, these Docker steps can be omitted)

# Authenticate Docker to Google Cloud Artifact Registry
- name: Docker Auth
id: docker-auth
uses: 'docker/login-action@v1'
with:
username: 'oauth2accesstoken'
password: '${{ steps.auth.outputs.access_token }}'
registry: '${{ env.GAR_LOCATION }}-docker.pkg.dev'

- name: Build and Push Container
run: |-
cd src/server
SHORT_SHA=$(echo "${{ github.sha }}" | cut -c 1-6)
docker build -t server:$SHORT_SHA --platform linux/amd64 -f DockerfileDashboard .
docker tag server:$SHORT_SHA "${{ env.GAR_LOCATION }}-docker.pkg.dev/${{ env.PROJECT_ID }}/${{ env.REPOSITORY }}/${{ env.SERVICE }}:$SHORT_SHA"
# asia-east1-docker.pkg.dev/comment-detector-400115/comment-detector
docker push "${{ env.GAR_LOCATION }}-docker.pkg.dev/${{ env.PROJECT_ID }}/${{ env.REPOSITORY }}/${{ env.SERVICE }}:$SHORT_SHA"
# END - Docker auth and build

- name: Deploy to Cloud Run
id: deploy
uses: google-github-actions/deploy-cloudrun@v0
with:
service: ${{ env.SERVICE }}
region: ${{ env.REGION }}
image: ${{ env.GAR_LOCATION }}-docker.pkg.dev/${{ env.PROJECT_ID }}/${{ env.REPOSITORY }}/${{ env.SERVICE }}:$(echo "${{ github.sha }}" | cut -c 1-6)

# If required, use the Cloud Run url output in later steps
- name: Show Output
run: echo ${{ steps.deploy.outputs.url }}
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
*.log
.idea/
trying/
tests/testing_html/*

*/.DS_Store
.env
Expand Down
191 changes: 101 additions & 90 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,91 +1,102 @@
# Comment-Detector
Personal Project (AppWorks School #21 Data Engineering)

# Data
## Source
- PTT - Gossiping
- PTT - HatePolitics

## Extraction
- requests + beautifulsoup


# Tools and Skills
## <u>Database: MongoDB (NoSQL)</u>
### Objective:
### Why use this?

## <u>Schedule: Airflow</u>
### Objective:
### Why use this?

## <u>CDC: Kafka</u>
### Objective:
### Why use this?
### Steps:
1. downloading from [Kafka edu](https://github.com/mongodb-university/kafka-edu.git)
2. running docker: ```docker-compose -p mongo-kafka up -d --force-recreate```
3. adding connectors: ```docker exec -it mongo1 /bin/bash```
4. creating connector.json: ```nano simplesource.json```
```json
{
"name": "mongo-simple-source",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
"connection.uri": "mongodb://mongo1",
"database": "Tutorial1",
"collection": "orders"
}
}
```
5. connecting: ```cx simplesource.json```

### Reference:
1. [MongoDB Quickstart](https://www.mongodb.com/docs/kafka-connector/current/quick-start/)
2. [Kafka Connector Tutorial Setup](https://www.mongodb.com/docs/kafka-connector/current/tutorials/tutorial-setup/#std-label-kafka-tutorials-docker-setup)
3. [Getting Started with the MongoDB Kafka Source Connector](https://www.mongodb.com/docs/kafka-connector/master/tutorials/source-connector/)


## <u>Dashboard: Plotly Dash</u>
### Objective:
1. Create interactive interface for users to explore this product
2. Create dashboards to demonstrate the product
### URL: http://3.106.78.149:8000/
### Why use this?

## <u>Middleware - WSGI (Web Server Gateway Interface) server: gunicorn</u>
### Objective:
### Why use this?

## <u>Middleware - ASGI (Asynchronous Server Gateway Interface) server: uvicorn</u>
### Objective:
### Why use this?

## <u>Process manager: pm2</u>
### Objective: managing the process of streamlit
### Why use this?
1. Dashboards needs to be available for users.
2. Currently, streamlit does not support running with gunicorn.
### Steps:
1. installation: ```sudo apt install npm```
2. installation: ```sudo npm install pm2 -g```
3. creating .sh: ```vim start_streamlit.sh```
```shell
#!/bin/bash
gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker
```
4. changing .sh permission: ```chmod +x start_streamlit.sh```
5. starting virtual env: ```source ./crawler/bin/activate```
6. running script:```pm2 start start_streamlit.sh```

## <u>API: FastAPI</u>
### Objective:
### Why use this?

## <u>Cache: Redis</u>
### Objective:
### Why use this?

## <u>Rate limiter: Redis</u>
### Objective:
### Why use this?

## Table of Contents
* [Introduction](#Introduction)
* [Architecture](#Architecture)
* [Data](#Data)
* [Feature](#Feature)
* [Tools](#Tools)
* [Monitoring](#Monitoring)
* [Clip](#Clip)
* [Contact](#Contact)


## Introduction
#### A dashboard offering users comprehensive & insightful data about PTT (Taiwan's largest forum)
#### Users can form judgment about cyber warriors (網軍) and people manipulating public opinions (帶風向)
#### Platform: [https://comment-detector.com](https://comment-detector.com)
![Homepage](readme-img/Homepage.png)


## Architecture
![Architecture](readme-img/Architecture.png)

### Compute Engine #1:
- Aim 1: Executing web crawling every 10 minute orchestrated by Apache Airflow on Docker
- Aim 2: Cleaning and extracting data
- On: GCP Compute Engine

### Compute Engine #2:
- Aim: Deploying Redis as cache system storing data updated by Python scripts scheduled by APScheduler
- On: GCP Compute Engine

### Database
- Aim: Storing cleaned data and providing data for platform
- On: MongoDB Atlas

### Dashboard (Application)
- Aim: Retrieving data from MongoDB and demonstrating organized data for users
- On: image managed by Cloud Run

## Data
### Source
- PTT - Gossiping which has the largest number of users.
- PTT - HatePolitics which is highly related to politics.

### ETL
- Extract: web crawling (requests + Beautiful Soup)
- Transform: python (data cleaning and extraction)
- Load: MongoDB

## Feature
- Trend (趨勢分析):
- 提供資料量數據
- 呈現熱門關鍵字與文章
- Keywords (關鍵字分析):
- 使用者輸入想要了解的關鍵字後,儀表板會呈現與關鍵字相關的熱門文章
- 統計出留言數量前20名的留言者,以及留言者之間的關係 (Concurrency Analysis)
- Commenter (留言者分析):
- 使用者輸入想要了解的留言者後,儀表板會呈現該留言者的活躍時段
- 彙整該留言者的所有留言,並彙整成文字雲
- 開源資料 API:
- 獲得更多資訊:IP 與作者等



## Tools
| Category | Tool/Technique |
|----------------|---------------------------------|
| Database | MongoDB |
| Data Pipeline | Airflow |
| Dashboard | Plotly Dash |
| Backend | FastAPI |
| Cache system | Redis |
| Autoscaling | Cloud Run |
| Load Balancing | Cloud Load Balancing |
| Monitoring | Cloud Monitoring, Cloud Logging |
| Others | GCP Compute Engine |


## Monitoring
#### Overall
- ![overall](readme-img/monitoring-overall.png)
#### Airflow
- ![airflow](readme-img/monitoring-airflow.png)
#### Dashboard
- ![dashboard](readme-img/monitoring-dashboard.png)

## Clip
#### 趨勢分析
- ![Trend](readme-img/demo-trend.gif)
#### 關鍵字分析
- ![Keyword](readme-img/demo-keyword.gif)
#### 留言者分析
- ![Commenter](readme-img/demo-commenter.gif)
#### APIs
- ![ipaddress](readme-img/demo-api.gif)




## Contact
Raymond Hung [email protected]
Binary file added readme-img/Architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added readme-img/Homepage.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added readme-img/demo-api.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added readme-img/demo-commenter.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added readme-img/demo-keyword.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added readme-img/demo-trend.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added readme-img/monitoring-airflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added readme-img/monitoring-dashboard.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added readme-img/monitoring-overall.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 3 additions & 3 deletions src/stress_test/stress_test_api.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@ import http from "k6/http";
import { check, sleep } from "k6";

export default function () {
const res = http.get("https://comment-detector.org/ipaddress/49.217.49.133?target_collection=gossip");
const res = http.get("https://comment-detector.org");
check(res, { 'status was 200': (r) => r.status === 200 });
// sleep(1);
}

export const options = {
vus: 1000,
duration: '30s',
vus: 5000,
duration: '180s',
};

// export const options = {
Expand Down

0 comments on commit b8184aa

Please sign in to comment.