Skip to content

Part of ML projects focused on SMS spam classification with Python.

License

Notifications You must be signed in to change notification settings

KwonNayeon/sms-spam-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SMS Spam Detection Project

Part of my data science portfolio - Building a machine learning system for binary classification of SMS messages.

Project Overview

Developing a spam detection system using ML techniques, currently focusing on establishing strong baseline models and evaluation metrics.

Motivation

This project builds on prior work in text analysis (e.g., Word Cloud Visualization, Travel Blog Analysis) and classification (e.g., SME Closure Prediction). It establishes a solid foundation before diving into more sophisticated techniques, starting with strong baseline models and robust evaluation metrics to develop a deep understanding of the core challenges in classification.

🛠 Tech Stack

Current

  • Python 3.9.13
  • Data Processing: Pandas, NumPy
  • Machine Learning: Scikit-learn
  • NLP: NLTK
  • Data Visualization: Matplotlib, Seaborn

📊 Project Structure

/data-science-consulting-solutions
│
├── README.md                    # Project overview and basic information
├── LICENSE                      # License file for the project
├── requirements.txt             # Python package dependencies
├── vs_code_setup.md            # VS Code setup guide
├── notebooks/                   # Jupyter notebooks
│   ├── 01_exploratory_analysis/# Exploratory data analysis
│   ├── 02_modeling/            # Model building and training
│   └── 03_evaluation/          # Model evaluation
├── src/                        # Source code
│   ├── data/                   # Data processing
│   ├── models/                 # ML models
│   └── utils/                  # Utility functions
├── tests/                      # Unit tests
└── docs/                       # Documentation

🚧 Current Progress

  • Implemented initial baseline models (Logistic Regression, Random Forest)
  • Basic text preprocessing and feature extraction
  • Initial model evaluation completed

📝 Next Steps

  • Improve model performance by addressing class imbalance
  • Enhance text preprocessing techniques
  • Implement feature engineering
  • Document findings and insights

📁 Dataset

  • Using the UCI SMS Spam Collection Dataset from Kaggle
  • Binary classification: spam vs ham (non-spam) messages

🔧 Setup

  1. Create virtual environment
python -m venv spam_detector_env
  1. Activate virtual environment
# Windows
spam_detector_env\Scripts\activate
# Mac/Linux
source spam_detector_env/bin/activate
  1. Install dependencies
pip install numpy pandas scikit-learn jupyter

This project is part of my journey to become a data scientist who solves real-world problems through data-driven solutions.

About

Part of ML projects focused on SMS spam classification with Python.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published