Author: Jiachuan Deng
Using CNN to handel with textual data combine with meta data to build a regression model for predicting commodity price.
Data set is from Kaggle: https://www.kaggle.com/c/mercari-price-suggestion-challenge/data
Pre-trained W2V is used: http://nlp.stanford.edu/data/glove.6B.zip
DPLModel4textualRegression.pdf: detailed paper format description for our model config.ini: you can change your data path and model parameters in this file
dataprocessing.py: script to process data into format that can be directly fed into model
minibatcher.py: helper script to load data in batch
network.py: define the model
run_model.py: main executable code to train/test model
Model can be built based on RNN structure or CNN structure. Here we only provide code for CNN structure, whose complexity is lower and performs better.runme.py: use this script to run all things together
Convolutional Neural Networks for Sentence Classification: https://arxiv.org/abs/1408.5882
Non-Linear Text Regression with a Deep Convolutional Neural Network: http://anthology.aclweb.org/P/P15/P15-2030.pdf
Empirical Bayes method for categorical features: http://helios.mm.di.uoa.gr/~rouvas/ssi/sigkdd/sigkdd.vol3.1/barreca.pdf
run command : python3 runme.py