SMS Spam Detection Project

Part of my data science portfolio - Building a machine learning system for binary classification of SMS messages.

Project Overview

Developing a spam detection system using ML techniques, currently focusing on establishing strong baseline models and evaluation metrics.

Motivation

This project builds on prior work in text analysis (e.g., Word Cloud Visualization, Travel Blog Analysis) and classification (e.g., SME Closure Prediction). It establishes a solid foundation before diving into more sophisticated techniques, starting with strong baseline models and robust evaluation metrics to develop a deep understanding of the core challenges in classification.

🛠 Tech Stack

Current

Python 3.9.13
Data Processing: Pandas, NumPy
Machine Learning: Scikit-learn
NLP: NLTK
Data Visualization: Matplotlib, Seaborn

📊 Project Structure

/data-science-consulting-solutions
│
├── README.md                    # Project overview and basic information
├── LICENSE                      # License file for the project
├── requirements.txt             # Python package dependencies
├── vs_code_setup.md            # VS Code setup guide
├── notebooks/                   # Jupyter notebooks
│   ├── 01_exploratory_analysis/# Exploratory data analysis
│   ├── 02_modeling/            # Model building and training
│   └── 03_evaluation/          # Model evaluation
├── src/                        # Source code
│   ├── data/                   # Data processing
│   ├── models/                 # ML models
│   └── utils/                  # Utility functions
├── tests/                      # Unit tests
└── docs/                       # Documentation

🚧 Current Progress

Implemented initial baseline models (Logistic Regression, Random Forest)
Basic text preprocessing and feature extraction
Initial model evaluation completed

📝 Next Steps

Improve model performance by addressing class imbalance
Enhance text preprocessing techniques
Implement feature engineering
Document findings and insights

📁 Dataset

Using the UCI SMS Spam Collection Dataset from Kaggle
Binary classification: spam vs ham (non-spam) messages

🔧 Setup

Create virtual environment

python -m venv spam_detector_env

Activate virtual environment

# Windows
spam_detector_env\Scripts\activate
# Mac/Linux
source spam_detector_env/bin/activate

Install dependencies

pip install numpy pandas scikit-learn jupyter

This project is part of my journey to become a data scientist who solves real-world problems through data-driven solutions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SMS Spam Detection Project

Project Overview

Motivation

🛠 Tech Stack

Current

📊 Project Structure

🚧 Current Progress

📝 Next Steps

📁 Dataset

🔧 Setup

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
docs		docs
notebooks		notebooks
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
vs_code_setup.md		vs_code_setup.md

License

KwonNayeon/sms-spam-classifier

Folders and files

Latest commit

History

Repository files navigation

SMS Spam Detection Project

Project Overview

Motivation

🛠 Tech Stack

Current

📊 Project Structure

🚧 Current Progress

📝 Next Steps

📁 Dataset

🔧 Setup

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages