Skip to content

CuteChuanChuan/__SideProject__Comment-Detector

Repository files navigation

Comment-Detector

Table of Contents

Introduction

A dashboard offering users comprehensive & insightful data about PTT (Taiwan's largest forum)

Users can form judgment about cyber warriors (網軍) and people manipulating public opinions (帶風向)

Homepage

Architecture

Architecture

Compute Engine #1:

  • Aim 1: Executing web crawling every 10 minute orchestrated by Apache Airflow on Docker
  • Aim 2: Cleaning and extracting data
  • On: GCP Compute Engine

Compute Engine #2:

  • Aim: Deploying Redis as cache system storing data updated by Python scripts scheduled by APScheduler
  • On: GCP Compute Engine

Database

  • Aim: Storing cleaned data and providing data for platform
  • On: MongoDB Atlas

Dashboard (Application)

  • Aim: Retrieving data from MongoDB and demonstrating organized data for users
  • On: image managed by Cloud Run

Data

Source

  • PTT - Gossiping which has the largest number of users.
  • PTT - HatePolitics which is highly related to politics.

ETL

  • Extract: web crawling (requests + Beautiful Soup)
  • Transform: python (data cleaning and extraction)
  • Load: MongoDB

Feature

  • Trend (趨勢分析):
    • 提供資料量數據
    • 呈現熱門關鍵字與文章
  • Keywords (關鍵字分析):
    • 使用者輸入想要了解的關鍵字後,儀表板會呈現與關鍵字相關的熱門文章
    • 統計出留言數量前20名的留言者,以及留言者之間的關係 (Concurrency Analysis)
  • Commenter (留言者分析):
    • 使用者輸入想要了解的留言者後,儀表板會呈現該留言者的活躍時段
    • 彙整該留言者的所有留言,並彙整成文字雲
  • 開源資料 API:
    • 獲得更多資訊:IP 與作者等

Tools

Category Tool/Technique
Database MongoDB
Data Pipeline Airflow
Dashboard Plotly Dash
Backend FastAPI
Cache system Redis
Autoscaling Cloud Run
Load Balancing Cloud Load Balancing
Monitoring Cloud Monitoring, Cloud Logging
Others GCP Compute Engine

Monitoring

Overall

  • overall

Airflow

  • airflow

Dashboard

  • dashboard

Clip

趨勢分析

  • Trend

關鍵字分析

  • Keyword

留言者分析

  • Commenter

APIs

  • ipaddress

Contact

Raymond Hung [email protected]

About

Personal Project (AppWorks School #21 Data Engineering)

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages