Abstractive Summarization

Based on Seq2seq learning with attention mechanism, specifically local attention.

Loading Pre-processed Dataset

The Data is preprocessed in Data_Pre-Processing.ipynb

Dataset source: https://www.kaggle.com/snap/amazon-fine-food-reviews

import json

with open('Processed_Data/Amazon_Reviews_Processed.json') as file:

    for json_data in file:
        saved_data = json.loads(json_data)

        vocab2idx = saved_data["vocab"]
        embd = saved_data["embd"]
        train_batches_text = saved_data["train_batches_text"]
        test_batches_text = saved_data["test_batches_text"]
        val_batches_text = saved_data["val_batches_text"]
        train_batches_summary = saved_data["train_batches_summary"]
        test_batches_summary = saved_data["test_batches_summary"]
        val_batches_summary = saved_data["val_batches_summary"]
        train_batches_true_text_len = saved_data["train_batches_true_text_len"]
        val_batches_true_text_len = saved_data["val_batches_true_text_len"]
        test_batches_true_text_len = saved_data["test_batches_true_text_len"]
        train_batches_true_summary_len = saved_data["train_batches_true_summary_len"]
        val_batches_true_summary_len = saved_data["val_batches_true_summary_len"]
        test_batches_true_summary_len = saved_data["test_batches_true_summary_len"]

        break
        
idx2vocab = {v:k for k,v in vocab2idx.items()}

Hyperparameters

hidden_size = 300
learning_rate = 0.001
epochs = 5
max_summary_len = 16 # should be summary_max_len as used in data_preprocessing with +1 (+1 for <EOS>) 
D = 5 # D determines local attention window size
window_len = 2*D+1
l2=1e-6

Tensorflow Placeholders

import tensorflow as tf 

embd_dim = len(embd[0])

tf_text = tf.placeholder(tf.int32, [None, None])
tf_embd = tf.placeholder(tf.float32, [len(vocab2idx),embd_dim])
tf_true_summary_len = tf.placeholder(tf.int32, [None])
tf_summary = tf.placeholder(tf.int32,[None, None])
tf_train = tf.placeholder(tf.bool)

Embed vectorized text

Dropout used for regularization (https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf)

embd_text = tf.nn.embedding_lookup(tf_embd, tf_text)
embd_text = tf.layers.dropout(embd_text,rate=0.3,training=tf_train)

LSTM function

More info:
https://dl.acm.org/citation.cfm?id=1246450,
https://www.bioinf.jku.at/publications/older/2604.pdf,
https://en.wikipedia.org/wiki/Long_short-term_memory

def LSTM(x,hidden_state,cell,input_dim,hidden_size,scope):
    
    with tf.variable_scope(scope,reuse=tf.AUTO_REUSE):
        
        w = tf.get_variable("w", shape=[4,input_dim,hidden_size],
                                    dtype=tf.float32,
                                    trainable=True,
                                    initializer=tf.glorot_uniform_initializer())
        
        u = tf.get_variable("u", shape=[4,hidden_size,hidden_size],
                            dtype=tf.float32,
                            trainable=True,
                            initializer=tf.glorot_uniform_initializer())
        
        b = tf.get_variable("bias", shape=[4,1,hidden_size],
                    dtype=tf.float32,
                    trainable=True,
                    initializer=tf.zeros_initializer())
        
    input_gate = tf.nn.sigmoid( tf.matmul(x,w[0]) + tf.matmul(hidden_state,u[0]) + b[0])
    forget_gate = tf.nn.sigmoid( tf.matmul(x,w[1]) + tf.matmul(hidden_state,u[1]) + b[1])
    output_gate = tf.nn.sigmoid( tf.matmul(x,w[2]) + tf.matmul(hidden_state,u[2]) + b[2])
    cell_ = tf.nn.tanh( tf.matmul(x,w[3]) + tf.matmul(hidden_state,u[3]) + b[3])
    cell = forget_gate*cell + input_gate*cell_
    hidden_state = output_gate*tf.tanh(cell)
    
    return hidden_state, cell

Bi-Directional LSTM Encoder

(https://maxwell.ict.griffith.edu.au/spl/publications/papers/ieeesp97_schuster.pdf)

More Info: https://machinelearningmastery.com/develop-bidirectional-lstm-sequence-classification-python-keras/

Bi-directional LSTM encoder has a forward encoder and a backward encoder. The forward encoder encodes a text sequence from start to end, and the backward encoder encodes the text sequence from end to start. The final output is a combination (in this case, a concatenation) of the forward encoded text and the backward encoded text

Forward Encoding

S = tf.shape(embd_text)[1] #text sequence length
N = tf.shape(embd_text)[0] #batch_size

i=0
hidden=tf.zeros([N, hidden_size], dtype=tf.float32)
cell=tf.zeros([N, hidden_size], dtype=tf.float32)
hidden_forward=tf.TensorArray(size=S, dtype=tf.float32)

#shape of embd_text: [N,S,embd_dim]
embd_text_t = tf.transpose(embd_text,[1,0,2]) 
#current shape of embd_text: [S,N,embd_dim]

def cond(i, hidden, cell, hidden_forward):
    return i < S

def body(i, hidden, cell, hidden_forward):
    x = embd_text_t[i]
    
    hidden,cell = LSTM(x,hidden,cell,embd_dim,hidden_size,scope="forward_encoder")
    hidden_forward = hidden_forward.write(i, hidden)

    return i+1, hidden, cell, hidden_forward

_, _, _, hidden_forward = tf.while_loop(cond, body, [i, hidden, cell, hidden_forward])

Backward Encoding

i=S-1
hidden=tf.zeros([N, hidden_size], dtype=tf.float32)
cell=tf.zeros([N, hidden_size], dtype=tf.float32)
hidden_backward=tf.TensorArray(size=S, dtype=tf.float32)

def cond(i, hidden, cell, hidden_backward):
    return i >= 0

def body(i, hidden, cell, hidden_backward):
    x = embd_text_t[i]
    hidden,cell = LSTM(x,hidden,cell,embd_dim,hidden_size,scope="backward_encoder")
    hidden_backward = hidden_backward.write(i, hidden)

    return i-1, hidden, cell, hidden_backward

_, _, _, hidden_backward = tf.while_loop(cond, body, [i, hidden, cell, hidden_backward])

Merge Forward and Backward Encoder Hidden States

hidden_forward = hidden_forward.stack()
hidden_backward = hidden_backward.stack()

encoder_states = tf.concat([hidden_forward,hidden_backward],axis=-1)
encoder_states = tf.transpose(encoder_states,[1,0,2])

encoder_states = tf.layers.dropout(encoder_states,rate=0.3,training=tf_train)

final_encoded_state = tf.layers.dropout(tf.concat([hidden_forward[-1],hidden_backward[-1]],axis=-1),rate=0.3,training=tf_train)

Implementation of attention scoring function

Given a sequence of encoder states (H_s) and the decoder hidden state (H_t) of current timestep t, the equation for computing attention score is:

Score = (H_s.W_a).Transpose(H_t)

(W_a = trainable parameters)

(https://nlp.stanford.edu/pubs/emnlp15_attn.pdf)

def attention_score(encoder_states,decoder_hidden_state,scope="attention_score"):
    
    with tf.variable_scope(scope,reuse=tf.AUTO_REUSE):
        Wa = tf.get_variable("Wa", shape=[2*hidden_size,2*hidden_size],
                                    dtype=tf.float32,
                                    trainable=True,
                                    initializer=tf.glorot_uniform_initializer())
        
    encoder_states = tf.reshape(encoder_states,[N*S,2*hidden_size])
    
    encoder_states = tf.reshape(tf.matmul(encoder_states,Wa),[N,S,2*hidden_size])
    decoder_hidden_state = tf.reshape(decoder_hidden_state,[N,2*hidden_size,1])
    
    return tf.reshape(tf.matmul(encoder_states,decoder_hidden_state),[N,S])

Local Attention Function

Based on: https://nlp.stanford.edu/pubs/emnlp15_attn.pdf

def align(encoder_states, decoder_hidden_state,scope="attention"):
    
    with tf.variable_scope(scope,reuse=tf.AUTO_REUSE):
        Wp = tf.get_variable("Wp", shape=[2*hidden_size,125],
                                    dtype=tf.float32,
                                    trainable=True,
                                    initializer=tf.glorot_uniform_initializer())
        
        Vp = tf.get_variable("Vp", shape=[125,1],
                            dtype=tf.float32,
                            trainable=True,
                            initializer=tf.glorot_uniform_initializer())
    
    positions = tf.cast(S-window_len,dtype=tf.float32) # Maximum valid attention window starting position
    
    # Predict attention window starting position 
    ps = positions*tf.nn.sigmoid(tf.matmul(tf.tanh(tf.matmul(decoder_hidden_state,Wp)),Vp))
    # ps = (soft-)predicted starting position of attention window
    pt = ps+D # pt = center of attention window where the whole window length is 2*D+1
    pt = tf.reshape(pt,[N])
    
    i = 0
    gaussian_position_based_scores = tf.TensorArray(size=S,dtype=tf.float32)
    sigma = tf.constant(D/2,dtype=tf.float32)
    
    def cond(i,gaussian_position_based_scores):
        
        return i < S
                      
    def body(i,gaussian_position_based_scores):
        
        score = tf.exp(-((tf.square(tf.cast(i,tf.float32)-pt))/(2*tf.square(sigma)))) 
        # (equation (10) in https://nlp.stanford.edu/pubs/emnlp15_attn.pdf)
        gaussian_position_based_scores = gaussian_position_based_scores.write(i,score)
            
        return i+1,gaussian_position_based_scores
                      
    i,gaussian_position_based_scores = tf.while_loop(cond,body,[i,gaussian_position_based_scores])
    
    gaussian_position_based_scores = gaussian_position_based_scores.stack()
    gaussian_position_based_scores = tf.transpose(gaussian_position_based_scores,[1,0])
    gaussian_position_based_scores = tf.reshape(gaussian_position_based_scores,[N,S])
    
    scores = attention_score(encoder_states,decoder_hidden_state)*gaussian_position_based_scores
    scores = tf.nn.softmax(scores,axis=-1)
    
    return tf.reshape(scores,[N,S,1])

LSTM Decoder With Local Attention

with tf.variable_scope("decoder",reuse=tf.AUTO_REUSE):
    SOS = tf.get_variable("sos", shape=[1,embd_dim],
                                dtype=tf.float32,
                                trainable=True,
                                initializer=tf.glorot_uniform_initializer())
    
    # SOS represents starting marker 
    # It tells the decoder that it is about to decode the first word of the output
    # I have set SOS as a trainable parameter
    
    Wc = tf.get_variable("Wc", shape=[4*hidden_size,embd_dim],
                            dtype=tf.float32,
                            trainable=True,
                            initializer=tf.glorot_uniform_initializer())
    


SOS = tf.tile(SOS,[N,1]) #now SOS shape: [N,embd_dim]
inp = SOS
hidden=final_encoded_state
cell=tf.zeros([N, 2*hidden_size], dtype=tf.float32)
decoder_outputs=tf.TensorArray(size=max_summary_len, dtype=tf.float32)
outputs=tf.TensorArray(size=max_summary_len, dtype=tf.int32)

for i in range(max_summary_len):
    
    inp = tf.layers.dropout(inp,rate=0.3,training=tf_train)
    
    attention_scores = align(encoder_states,hidden)
    encoder_context_vector = tf.reduce_sum(encoder_states*attention_scores,axis=1)
    
    hidden,cell = LSTM(inp,hidden,cell,embd_dim,2*hidden_size,scope="decoder")
    
    hidden_ = tf.layers.dropout(hidden,rate=0.3,training=tf_train)
    
    concated = tf.concat([hidden_,encoder_context_vector],axis=-1)
    
    linear_out = tf.nn.tanh(tf.matmul(concated,Wc))
    decoder_output = tf.matmul(linear_out,tf.transpose(tf_embd,[1,0])) 
    # produce unnormalized probability distribution over vocabulary
    
    
    decoder_outputs = decoder_outputs.write(i,decoder_output)
    
    # Pick out most probable vocab indices based on the unnormalized probability distribution
    
    next_word_vec = tf.cast(tf.argmax(decoder_output,1),tf.int32)

    next_word_vec = tf.reshape(next_word_vec, [N])

    outputs = outputs.write(i,next_word_vec)

    next_word = tf.nn.embedding_lookup(tf_embd, next_word_vec)
    inp = tf.reshape(next_word, [N, embd_dim])
    
    
decoder_outputs = decoder_outputs.stack()
outputs = outputs.stack()

decoder_outputs = tf.transpose(decoder_outputs,[1,0,2])
outputs = tf.transpose(outputs,[1,0])

Define Cross Entropy Cost Function and L2 Regularization

filtered_trainables = [var for var in tf.trainable_variables() if
                       not("Bias" in var.name or "bias" in var.name
                           or "noreg" in var.name)]

regularization = tf.reduce_sum([tf.nn.l2_loss(var) for var
                                in filtered_trainables])

with tf.variable_scope("loss"):

    epsilon = tf.constant(1e-9, tf.float32)

    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
        labels=tf_summary, logits=decoder_outputs)

    pad_mask = tf.sequence_mask(tf_true_summary_len,
                                maxlen=max_summary_len,
                                dtype=tf.float32)

    masked_cross_entropy = cross_entropy*pad_mask

    cost = tf.reduce_mean(masked_cross_entropy) + \
        l2*regularization

    cross_entropy = tf.reduce_mean(masked_cross_entropy)

Accuracy

# Comparing predicted sequence with labels
comparison = tf.cast(tf.equal(outputs, tf_summary),
                     tf.float32)

# Masking to ignore the effect of pads while calculating accuracy
pad_mask = tf.sequence_mask(tf_true_summary_len,
                            maxlen=max_summary_len,
                            dtype=tf.bool)

masked_comparison = tf.boolean_mask(comparison, pad_mask)

# Accuracy
accuracy = tf.reduce_mean(masked_comparison)

Define Optimizer

all_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)

optimizer = tf.contrib.opt.NadamOptimizer(
    learning_rate=learning_rate)

gvs = optimizer.compute_gradients(cost, var_list=all_vars)

capped_gvs = [(tf.clip_by_norm(grad, 5), var) for grad, var in gvs] # Gradient Clipping

train_op = optimizer.apply_gradients(capped_gvs)

Training and Validation

import pickle
import random

with tf.Session() as sess:  # Start Tensorflow Session
    display_step = 100
    patience = 5

    load = input("\nLoad checkpoint? y/n: ")
    print("")
    saver = tf.train.Saver()

    if load.lower() == 'y':

        print('Loading pre-trained weights for the model...')

        saver.restore(sess, 'Model_Backup/Seq2seq_summarization.ckpt')
        sess.run(tf.global_variables())
        sess.run(tf.tables_initializer())

        with open('Model_Backup/Seq2seq_summarization.pkl', 'rb') as fp:
            train_data = pickle.load(fp)

        covered_epochs = train_data['covered_epochs']
        best_loss = train_data['best_loss']
        impatience = 0
        
        print('\nRESTORATION COMPLETE\n')

    else:
        best_loss = 2**30
        impatience = 0
        covered_epochs = 0

        init = tf.global_variables_initializer()
        sess.run(init)
        sess.run(tf.tables_initializer())

    epoch=0
    while (epoch+covered_epochs)<epochs:
        
        print("\n\nSTARTING TRAINING\n\n")
        
        batches_indices = [i for i in range(0, len(train_batches_text))]
        random.shuffle(batches_indices)

        total_train_acc = 0
        total_train_loss = 0

        for i in range(0, len(train_batches_text)):
            
            j = int(batches_indices[i])

            cost,prediction,\
                acc, _ = sess.run([cross_entropy,
                                   outputs,
                                   accuracy,
                                   train_op],
                                  feed_dict={tf_text: train_batches_text[j],
                                             tf_embd: embd,
                                             tf_summary: train_batches_summary[j],
                                             tf_true_summary_len: train_batches_true_summary_len[j],
                                             tf_train: True})
            
            total_train_acc += acc
            total_train_loss += cost

            if i % display_step == 0:
                print("Iter "+str(i)+", Cost= " +
                      "{:.3f}".format(cost)+", Acc = " +
                      "{:.2f}%".format(acc*100))
            
            if i % 500 == 0:
                
                idx = random.randint(0,len(train_batches_text[j])-1)
                
                
                
                text = " ".join([idx2vocab.get(vec,"<UNK>") for vec in train_batches_text[j][idx]])
                predicted_summary = [idx2vocab.get(vec,"<UNK>") for vec in prediction[idx]]
                actual_summary = [idx2vocab.get(vec,"<UNK>") for vec in train_batches_summary[j][idx]]
                
                print("\nSample Text\n")
                print(text)
                print("\nSample Predicted Summary\n")
                for word in predicted_summary:
                    if word == '<EOS>':
                        break
                    else:
                        print(word,end=" ")
                print("\n\nSample Actual Summary\n")
                for word in actual_summary:
                    if word == '<EOS>':
                        break
                    else:
                        print(word,end=" ")
                print("\n\n")
                
        print("\n\nSTARTING VALIDATION\n\n")
                
        total_val_loss=0
        total_val_acc=0
                
        for i in range(0, len(val_batches_text)):
            
            if i%100==0:
                print("Validating data # {}".format(i))

            cost, prediction,\
                acc = sess.run([cross_entropy,
                                outputs,
                                accuracy],
                                  feed_dict={tf_text: val_batches_text[i],
                                             tf_embd: embd,
                                             tf_summary: val_batches_summary[i],
                                             tf_true_summary_len: val_batches_true_summary_len[i],
                                             tf_train: False})
            
            total_val_loss += cost
            total_val_acc += acc
            
        avg_val_loss = total_val_loss/len(val_batches_text)
        
        print("\n\nEpoch: {}\n\n".format(epoch+covered_epochs))
        print("Average Training Loss: {:.3f}".format(total_train_loss/len(train_batches_text)))
        print("Average Training Accuracy: {:.2f}".format(100*total_train_acc/len(train_batches_text)))
        print("Average Validation Loss: {:.3f}".format(avg_val_loss))
        print("Average Validation Accuracy: {:.2f}".format(100*total_val_acc/len(val_batches_text)))
              
        if (avg_val_loss < best_loss):
            best_loss = avg_val_loss
            save_data={'best_loss':best_loss,'covered_epochs':covered_epochs+epoch+1}
            impatience=0
            with open('Model_Backup/Seq2seq_summarization.pkl', 'wb') as fp:
                pickle.dump(save_data, fp)
            saver.save(sess, 'Model_Backup/Seq2seq_summarization.ckpt')
            print("\nModel saved\n")
              
        else:
            impatience+=1
              
        if impatience > patience:
              break
              
              
        epoch+=1

Load checkpoint? y/n: n



STARTING TRAINING


Iter 0, Cost= 2.086, Acc = 0.00%

Sample Text

i 'm not a big pretzel eater , but i love this little <UNK> nibblers . i like the low fat snack and how it fills you up .

Sample Predicted Summary

municipality jackass municipality mongolian seats han han mongolian hah sus sus wat hah casbah dynasty province 

Sample Actual Summary

great pretzels 


Iter 100, Cost= 0.985, Acc = 35.58%
Iter 200, Cost= 0.914, Acc = 33.33%
Iter 300, Cost= 0.928, Acc = 36.11%
Iter 400, Cost= 0.943, Acc = 35.19%
Iter 500, Cost= 0.676, Acc = 42.71%

Sample Text

we <UNK> this one , but the flavor could have been a tad stronger . very yummy tho , we will totally purchase again !

Sample Predicted Summary

delicious ! 

Sample Actual Summary

very good ! 


Iter 600, Cost= 0.878, Acc = 35.24%
Iter 700, Cost= 0.949, Acc = 33.04%
Iter 800, Cost= 1.074, Acc = 34.65%
Iter 900, Cost= 0.831, Acc = 44.21%
Iter 1000, Cost= 0.911, Acc = 36.36%

Sample Text

tried this hoping for something better than the thick salsa that everyone else makes and it was great ! after making our own it gets time consuming so this is a good alternative .

Sample Predicted Summary

great 

Sample Actual Summary

great salsa 


Iter 1100, Cost= 1.081, Acc = 23.33%
Iter 1200, Cost= 1.018, Acc = 32.73%
Iter 1300, Cost= 0.902, Acc = 35.87%
Iter 1400, Cost= 0.946, Acc = 31.07%
Iter 1500, Cost= 0.798, Acc = 42.31%

Sample Text

i had a coupon for this so it was a good value . otherwise it is to expense for what you get . my box had a couple of opened cereals in it so i did n't get the full value of all ...

Sample Predicted Summary

good 

Sample Actual Summary

good value 


Iter 1600, Cost= 0.871, Acc = 33.33%
Iter 1700, Cost= 0.943, Acc = 40.00%
Iter 1800, Cost= 0.876, Acc = 40.20%
Iter 1900, Cost= 0.973, Acc = 37.25%
Iter 2000, Cost= 0.978, Acc = 29.73%

Sample Text

my 4 dogs all had allergies and are just fine now that i switched to <UNK> the <UNK> one smell abit but <UNK> they still love it <UNK> the dried <UNK> canned r terrific <UNK> nooo grani !

Sample Predicted Summary

<UNK> ! 

Sample Actual Summary

great food 


Iter 2100, Cost= 0.907, Acc = 37.04%
Iter 2200, Cost= 0.928, Acc = 34.31%
Iter 2300, Cost= 0.906, Acc = 31.25%
Iter 2400, Cost= 0.903, Acc = 37.00%
Iter 2500, Cost= 0.811, Acc = 33.01%

Sample Text

the chocolate was a little crumbly , but the taste is very good . my hubby has <UNK> , and it is gluten free , so it is an excellent bar to stock in the pantry for whenever he does n't have time for breakfast .

Sample Predicted Summary

great 

Sample Actual Summary

yum 


Iter 2600, Cost= 0.839, Acc = 34.62%
Iter 2700, Cost= 0.927, Acc = 37.07%
Iter 2800, Cost= 0.853, Acc = 36.73%
Iter 2900, Cost= 0.805, Acc = 40.00%
Iter 3000, Cost= 0.855, Acc = 35.51%

Sample Text

tea came packaged as expected , delivered quickly and with stash you can not go wrong . individually wrapped and stays fresh and very flavorful . highly recommended for the earl gray tea lover .

Sample Predicted Summary

delicious tea 

Sample Actual Summary

great tea 


Iter 3100, Cost= 0.854, Acc = 36.63%


STARTING VALIDATION


Validating data # 0
Validating data # 100
Validating data # 200
Validating data # 300


Epoch: 0


Average Training Loss: 0.907
Average Training Accuracy: 35.42
Average Validation Loss: 0.865
Average Validation Accuracy: 36.65

Model saved



STARTING TRAINING


Iter 0, Cost= 0.808, Acc = 34.34%

Sample Text

quaker oatmeal squares has been our family favorite for a couple of years now . ca n't get enough of it . just the right sweetness and crunch .

Sample Predicted Summary

great 

Sample Actual Summary

favorite cereal 


Iter 100, Cost= 1.036, Acc = 34.26%
Iter 200, Cost= 0.934, Acc = 33.03%
Iter 300, Cost= 0.972, Acc = 35.85%
Iter 400, Cost= 0.926, Acc = 32.35%
Iter 500, Cost= 0.738, Acc = 41.05%

Sample Text

great taste , nice smell , great <UNK> < br / > if you mix it with fresh ment you will get fantastic <UNK> < br / > i will buy it again .

Sample Predicted Summary

great 

Sample Actual Summary

the best 


Iter 600, Cost= 0.858, Acc = 41.24%
Iter 700, Cost= 0.905, Acc = 36.45%
Iter 800, Cost= 0.795, Acc = 35.05%
Iter 900, Cost= 0.806, Acc = 37.50%
Iter 1000, Cost= 0.795, Acc = 35.64%

Sample Text

i bought about 5 different kinds of <UNK> when i first got my coffee maker , which i love by the way , and i 'd have to say that this was my favorite one out of them all . it has the perfect balance of everything , i was really surprised .

Sample Predicted Summary

great 

Sample Actual Summary

excellent stuff 


Iter 1100, Cost= 0.825, Acc = 39.42%
Iter 1200, Cost= 0.743, Acc = 38.78%
Iter 1300, Cost= 0.813, Acc = 41.84%
Iter 1400, Cost= 0.933, Acc = 29.66%
Iter 1500, Cost= 0.978, Acc = 33.61%

Sample Text

i really wanted to like this , as it was organic , and came in a glass bottle , but there was hardly any flavor at all . i could barely smell it , and even when i poured a generous amount on my dish , it imparts little to no truffle <UNK> . my truffle salt is much more potent .

Sample Predicted Summary

good 

Sample Actual Summary

weak 


Iter 1600, Cost= 0.778, Acc = 45.10%
Iter 1700, Cost= 0.855, Acc = 38.83%
Iter 1800, Cost= 0.815, Acc = 41.58%
Iter 1900, Cost= 0.853, Acc = 37.62%
Iter 2000, Cost= 1.003, Acc = 32.74%

Sample Text

i love milk chocolate and do n't like dark <UNK> . my husband is the opposite , so i always buy him the dark stuff and it 's safe for him , haha ! until i happened to try this one . it 's awesome !

Sample Predicted Summary

<UNK> ! 

Sample Actual Summary

it 's good ! ! 


Iter 2100, Cost= 0.817, Acc = 37.74%
Iter 2200, Cost= 0.977, Acc = 33.33%
Iter 2300, Cost= 0.840, Acc = 35.96%
Iter 2400, Cost= 0.749, Acc = 31.58%
Iter 2500, Cost= 0.885, Acc = 31.73%

Sample Text

the best thing about this coffee is the sweet smell , just like a blueberry muffin . the taste is good , not as sweet as i was expecting but it was good nonetheless . its a nice treat when you 're craving something sweet but it wo n't replace my morning donut shop coffee : )

Sample Predicted Summary

delicious 

Sample Actual Summary

smells yummy : ) 


Iter 2600, Cost= 0.887, Acc = 32.73%
Iter 2700, Cost= 0.780, Acc = 44.94%
Iter 2800, Cost= 0.899, Acc = 35.71%
Iter 2900, Cost= 0.797, Acc = 38.24%
Iter 3000, Cost= 1.061, Acc = 33.33%

Sample Text

this tea is wonderful , one bag will make three cups for most people . i like my tea very strong so these were perfect . i bet they will be good for making a good ice tea .

Sample Predicted Summary

great tea 

Sample Actual Summary

one bag 3 cups 


Iter 3100, Cost= 0.769, Acc = 37.86%


STARTING VALIDATION


Validating data # 0
Validating data # 100
Validating data # 200
Validating data # 300


Epoch: 1


Average Training Loss: 0.863
Average Training Accuracy: 36.40
Average Validation Loss: 0.837
Average Validation Accuracy: 37.30

Model saved



STARTING TRAINING


Iter 0, Cost= 0.959, Acc = 35.85%

Sample Text

really good bars . you could cut this baby in 1/2 and have 2 snacks out of it ! i bought 1 at the store first to see if i liked them and paid lots more for it . i do n't eat alot of meat so this caught my eye . i now have them on auto delivery ! !

Sample Predicted Summary

great ! 

Sample Actual Summary

great bars ! 


Iter 100, Cost= 0.792, Acc = 33.33%
Iter 200, Cost= 0.781, Acc = 35.29%
Iter 300, Cost= 0.825, Acc = 40.74%
Iter 400, Cost= 0.793, Acc = 40.19%
Iter 500, Cost= 0.860, Acc = 31.07%

Sample Text

i always buy my coffee from amazon as the prices are cheaper and i love all the coffee . best price on line .

Sample Predicted Summary

great coffee 

Sample Actual Summary

peggy 


Iter 600, Cost= 0.990, Acc = 28.57%
Iter 700, Cost= 0.736, Acc = 41.41%
Iter 800, Cost= 0.826, Acc = 33.68%
Iter 900, Cost= 0.904, Acc = 35.24%
Iter 1000, Cost= 0.858, Acc = 35.71%

Sample Text

i am very pleased with this product and the company sent it on a timely basis , well packed to prevent breakage .

Sample Predicted Summary

great 

Sample Actual Summary

good stuff 


Iter 1100, Cost= 0.999, Acc = 30.36%
Iter 1200, Cost= 0.726, Acc = 44.79%
Iter 1300, Cost= 0.798, Acc = 36.73%
Iter 1400, Cost= 0.712, Acc = 48.98%
Iter 1500, Cost= 0.884, Acc = 38.18%

Sample Text

i love this cereal , easy to eat out of the box . sweet but not too sweet and very crunchy . since frosted mini wheats have gelatin in them they are not an option for vegans or vegetarians , these are the best replacement .

Sample Predicted Summary

<UNK> 

Sample Actual Summary

love it . 


Iter 1600, Cost= 0.819, Acc = 44.12%
Iter 1700, Cost= 0.958, Acc = 32.11%
Iter 1800, Cost= 0.800, Acc = 37.96%
Iter 1900, Cost= 0.649, Acc = 40.82%
Iter 2000, Cost= 0.900, Acc = 36.21%

Sample Text

i have been buying these bars ( without chocolate ) for years and have recently found them with the chocolate . our family of six loves them . they are a great bar to hold you over and give you extended fuel , made with great ingredients to boot . i highly recommend you try a box !

Sample Predicted Summary

great ! 

Sample Actual Summary

<UNK> ! 


Iter 2100, Cost= 0.767, Acc = 36.19%
Iter 2200, Cost= 0.676, Acc = 37.62%
Iter 2300, Cost= 0.871, Acc = 40.00%
Iter 2400, Cost= 0.687, Acc = 39.60%
Iter 2500, Cost= 0.898, Acc = 36.27%

Sample Text

this is top notch almond syrup . we put it in lemonade and strawberries . great for many <UNK> < br / > also use in baking recipes .

Sample Predicted Summary

great 

Sample Actual Summary

soooo yummy 


Iter 2600, Cost= 0.796, Acc = 40.74%
Iter 2700, Cost= 0.775, Acc = 46.00%
Iter 2800, Cost= 0.856, Acc = 40.19%
Iter 2900, Cost= 0.954, Acc = 35.85%
Iter 3000, Cost= 0.831, Acc = 35.40%

Sample Text

this is very good coffee at a good price ... it is an old product that has been on the market since i was quite young .

Sample Predicted Summary

coffee 

Sample Actual Summary

good stuff ! 


Iter 3100, Cost= 0.754, Acc = 36.27%


STARTING VALIDATION


Validating data # 0
Validating data # 100
Validating data # 200
Validating data # 300


Epoch: 2


Average Training Loss: 0.840
Average Training Accuracy: 37.26
Average Validation Loss: 0.818
Average Validation Accuracy: 38.42

Model saved



STARTING TRAINING


Iter 0, Cost= 0.822, Acc = 36.36%

Sample Text

the chocolate covered figs were delicious and presented beautifully in the package . great for a gift for someone who has everything .

Sample Predicted Summary

delicious 

Sample Actual Summary

figs 


Iter 100, Cost= 0.734, Acc = 37.86%
Iter 200, Cost= 0.837, Acc = 41.18%
Iter 300, Cost= 0.717, Acc = 34.91%
Iter 400, Cost= 0.797, Acc = 38.61%
Iter 500, Cost= 0.718, Acc = 32.38%

Sample Text

one of my favorite flavors of <UNK> , it used to be called twisted tornado , now called fruit twist either way i ca n't stop myself from eating its so flavorful = )

Sample Predicted Summary

great ! 

Sample Actual Summary

yum 


Iter 600, Cost= 0.846, Acc = 40.59%
Iter 700, Cost= 0.676, Acc = 43.75%
Iter 800, Cost= 0.882, Acc = 39.22%
Iter 900, Cost= 0.803, Acc = 36.54%
Iter 1000, Cost= 0.718, Acc = 40.40%

Sample Text

i found this product to be a nice tasting pepper blend and would recommend it to all of those who enjoy the fresh flavor of ground pepper .

Sample Predicted Summary

good taste 

Sample Actual Summary

peppercorn mix 


Iter 1100, Cost= 0.749, Acc = 41.24%
Iter 1200, Cost= 0.821, Acc = 38.10%
Iter 1300, Cost= 0.883, Acc = 39.81%
Iter 1400, Cost= 0.961, Acc = 29.91%
Iter 1500, Cost= 1.130, Acc = 33.96%

Sample Text

bought the popper about two years ago and have been enjoying the delicious fresh buttery salty ( as i want ) best popcorn ever . love it and it 's a staple snack in our house . would never <UNK> corn again .

Sample Predicted Summary

great 

Sample Actual Summary

top notch 


Iter 1600, Cost= 0.855, Acc = 35.24%
Iter 1700, Cost= 0.701, Acc = 38.61%
Iter 1800, Cost= 0.865, Acc = 35.64%
Iter 1900, Cost= 0.868, Acc = 39.62%
Iter 2000, Cost= 0.849, Acc = 40.78%

Sample Text

i love sour stuff . this is n't too sour but still gets the job done . good chewy candy . arrived faster than expected too .

Sample Predicted Summary

good 

Sample Actual Summary

mmmmm 


Iter 2100, Cost= 0.951, Acc = 32.73%
Iter 2200, Cost= 0.875, Acc = 31.68%
Iter 2300, Cost= 0.866, Acc = 42.20%
Iter 2400, Cost= 0.725, Acc = 46.32%
Iter 2500, Cost= 0.793, Acc = 35.71%

Sample Text

i had not tried this tea before but i was hoping it was similar to one i tried while in england . i was not disappointed . the pack of 6 makes it a very good value as well .

Sample Predicted Summary

tea tea 

Sample Actual Summary

love this tea ! 


Iter 2600, Cost= 0.864, Acc = 34.82%
Iter 2700, Cost= 0.853, Acc = 38.10%
Iter 2800, Cost= 0.694, Acc = 40.40%
Iter 2900, Cost= 1.020, Acc = 34.26%
Iter 3000, Cost= 0.782, Acc = 43.00%

Sample Text

extremely disappointing . frankly , i think plain old lipton tea is smoother and less bitter . when brewed , i could hardly recognize it as green tea . it tasted more like a very poor earl gray .

Sample Predicted Summary

sad 

Sample Actual Summary

not good at all 


Iter 3100, Cost= 0.756, Acc = 35.64%


STARTING VALIDATION


Validating data # 0
Validating data # 100
Validating data # 200
Validating data # 300


Epoch: 3


Average Training Loss: 0.820
Average Training Accuracy: 38.18
Average Validation Loss: 0.801
Average Validation Accuracy: 39.24

Model saved



STARTING TRAINING


Iter 0, Cost= 0.821, Acc = 39.00%

Sample Text

love this tea . i do not like the plain sleepytime but adding the vanilla is a great move ! highly recommend it . looking forward to trying the honey sleepy time !

Sample Predicted Summary

love ! 

Sample Actual Summary

love it 


Iter 100, Cost= 0.725, Acc = 37.00%
Iter 200, Cost= 0.805, Acc = 39.29%
Iter 300, Cost= 0.838, Acc = 41.23%
Iter 400, Cost= 0.713, Acc = 49.07%
Iter 500, Cost= 0.722, Acc = 37.86%

Sample Text

the product arrived quickly . all bags and chips were in place ... and safe ; <UNK> these chips are delicious and only four ww points !

Sample Predicted Summary

delicious 

Sample Actual Summary

yum ! 


Iter 600, Cost= 0.941, Acc = 34.82%
Iter 700, Cost= 0.678, Acc = 42.00%
Iter 800, Cost= 0.607, Acc = 47.47%
Iter 900, Cost= 0.679, Acc = 41.94%
Iter 1000, Cost= 0.763, Acc = 48.60%

Sample Text

this is a light to medium roast , wish it was slightly stronger , but the flavor is good and i am having it every morning using 2 6 oz . <UNK> pumps to make it as strong as possible .

Sample Predicted Summary

very good 

Sample Actual Summary

i like it ! 


Iter 1100, Cost= 0.671, Acc = 44.44%
Iter 1200, Cost= 0.810, Acc = 39.81%
Iter 1300, Cost= 0.899, Acc = 31.78%
Iter 1400, Cost= 0.865, Acc = 39.42%
Iter 1500, Cost= 0.809, Acc = 36.54%

Sample Text

i expected a little more flavor as i usually like green mountain <UNK> < br / > next time i 'll look for a french roast !

Sample Predicted Summary

good 

Sample Actual Summary

too weak 


Iter 1600, Cost= 0.873, Acc = 39.45%
Iter 1700, Cost= 0.882, Acc = 38.14%
Iter 1800, Cost= 0.953, Acc = 34.86%
Iter 1900, Cost= 0.961, Acc = 33.66%
Iter 2000, Cost= 0.774, Acc = 35.92%

Sample Text

i use this sauce on pork ribs , after baking them at 300 degrees for 3 hours . the sweet taste of honey along with the tomato is heavenly .

Sample Predicted Summary

great sauce 

Sample Actual Summary

the best 


Iter 2100, Cost= 0.744, Acc = 39.13%
Iter 2200, Cost= 0.697, Acc = 41.58%
Iter 2300, Cost= 0.869, Acc = 34.26%
Iter 2400, Cost= 0.867, Acc = 31.48%
Iter 2500, Cost= 0.784, Acc = 38.14%

Sample Text

excellent < a <UNK> '' http : <UNK> '' > kellogg 's cereal in a cup , favorite assortment pack , 1.5 - <UNK> <UNK> cups ( pack of 60 ) < <UNK> >

Sample Predicted Summary

good 

Sample Actual Summary

kelloggs 


Iter 2600, Cost= 0.653, Acc = 45.45%
Iter 2700, Cost= 0.713, Acc = 46.73%
Iter 2800, Cost= 0.777, Acc = 39.05%
Iter 2900, Cost= 0.795, Acc = 38.10%
Iter 3000, Cost= 0.802, Acc = 41.12%

Sample Text

this is a good product . the honey tastes great , and it 's very convenient and <UNK> . my local <UNK> store was trying to sell this to me for twice the price as amazon , so i 'm pretty sure this is a good buy .

Sample Predicted Summary

great 

Sample Actual Summary

honey ! 


Iter 3100, Cost= 0.773, Acc = 45.54%


STARTING VALIDATION


Validating data # 0
Validating data # 100
Validating data # 200
Validating data # 300


Epoch: 4


Average Training Loss: 0.804
Average Training Accuracy: 39.03
Average Validation Loss: 0.786
Average Validation Accuracy: 40.62

Model saved

Future Works

Beam Search
Pointer Mechanisms
BLEU\ROUGE evaluation
Implement Testing
Complete Training and Optimize Hyperparameters

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
.gitignore		.gitignore
Dataset		Dataset
Embeddings		Embeddings
Model_Backup		Model_Backup
Processed_Data		Processed_Data
Data_Pre-Processing.ipynb		Data_Pre-Processing.ipynb
LICENSE		LICENSE
README.md		README.md
Summarization.ipynb		Summarization.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Abstractive Summarization

Loading Pre-processed Dataset

Hyperparameters

Tensorflow Placeholders

Embed vectorized text

LSTM function

Bi-Directional LSTM Encoder

Forward Encoding

Backward Encoding

Merge Forward and Backward Encoder Hidden States

Implementation of attention scoring function

Local Attention Function

LSTM Decoder With Local Attention

Define Cross Entropy Cost Function and L2 Regularization

Accuracy

Define Optimizer

Training and Validation

Future Works

About

Releases

Packages

Languages

License

helloleejq/Abstractive-Summarization

Folders and files

Latest commit

History

Repository files navigation

Abstractive Summarization

Loading Pre-processed Dataset

Hyperparameters

Tensorflow Placeholders

Embed vectorized text

LSTM function

Bi-Directional LSTM Encoder

Forward Encoding

Backward Encoding

Merge Forward and Backward Encoder Hidden States

Implementation of attention scoring function

Local Attention Function

LSTM Decoder With Local Attention

Define Cross Entropy Cost Function and L2 Regularization

Accuracy

Define Optimizer

Training and Validation

Future Works

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages