Skip to content
View teowu's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@VQAssessment @Q-Future

Block or report teowu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
teowu/README.md
  • 👋 Hi, I’m Teo Wu (officially Haoning Wu), working on LMMs in Rhymes AI, closely advised by Dongxu Li and Junnan Li. Prior to this, I have been a PhD candidate (preparing thesis defense) in Nanyang Technological University 🇸🇬, supervised by Prof. Weisi Lin. I obtained by B.S. degree of computer science in Peking University (北京大学).

  • I am currently focusing on LMM pre-training and evaluation (video & longer context & better instruction-following). See our LongVideoBench, the first video benchmark for LMMs proven improvable given more input frames (>=256). I have led video and long-context training of Aria (Model, Paper, GitHub), an excellent open-source native MoE LMM with abilities matching GPT-4o-mini/Gemini-1.5-Flash in only 3.9B activated parameters.

  • 🌱 I have also been the lead of project Q-Future: Visual Evaluation with LMMs📹, on which 7 first-authored papers accepted in top conferences and journels including ICML, ICLR, NeurIPS, TPAMI, CVPR, ECCV and ACMMM. The flagship scorer, OneAlign has been downloaded more than 238K times (until Jul 25, 2024) on HuggingFace.

  • Prior to LMMs, my PhD topic was on video quality assessment, a traditional area trying to gauge the quality scores (and more) on videos. Among 6 papers published in that area (in ECCV, ICCV, TPAMI, etc), the two representative works are FAST-VQA and DOVER, which have been most-used baselines in that field.

  • 📫 Reach me by e-mail: [email protected]/[email protected], Twitter: Twitter

  • Google Scholar

Pinned Loading

  1. rhymes-ai/Aria rhymes-ai/Aria Public

    Codebase for Aria - an Open Multimodal Native MoE

    Jupyter Notebook 961 80

  2. longvideobench/LongVideoBench longvideobench/LongVideoBench Public

    [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.

    Python 82 2

  3. Q-Future/Co-Instruct Q-Future/Co-Instruct Public

    ④[ECCV 2024 Oral, Comparison among Multiple Images!] A study on open-ended multi-image quality comparison: a dataset, a model and a benchmark.

    73 4

  4. Q-Future/Q-Align Q-Future/Q-Align Public

    ③[ICML2024] [IQA, IAA, VQA] All-in-one Foundation Model for visual scoring. Can efficiently fine-tune to downstream datasets.

    Python 330 22

  5. Q-Future/Q-Instruct Q-Future/Q-Instruct Public

    ②[CVPR 2024] Low-level visual instruction tuning, with a 200K dataset and a model zoo for fine-tuned checkpoints.

    Python 210 10

  6. Q-Future/Q-Bench Q-Future/Q-Bench Public

    ①[ICLR2024 Spotlight] (GPT-4V/Gemini-Pro/Qwen-VL-Plus+16 OS MLLMs) A benchmark for multi-modality LLMs (MLLMs) on low-level vision and visual quality assessment.

    Jupyter Notebook 251 13