Supporting-SLR-Using-DL-Based-Language-Models

The repository contains code scripts for replicating experiments in the paper "Supporting Systematic Literature Reviews Using Deep-Learning-Based Language Models". In this paper, we address the tedious process of identifying relevant primary studies during the conduct phase of a Systematic Literature Review. For this purpose, we use deep learning architectures in the form of the two language models BERT and S-BERT to learn embedded representations and cluster on them to semi-automate this phase, and thus support the entire SLR process.

The methodology is mainly divided into three parts : Extracting embeddings using Language models such as BERT and SBERT, Weightage schemes to obtain document-level representations from weighting important sentences of the document,Clustering on embeddings(weighted/unweighted) to obtain clusters of relevant and non-relevant documents

Setup Instructions:

Clone the repository
Create a python virtual environment and activate it. (Please follow the link https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/ for more information on virtual environment.

python3 -m pip install virtualenv
python3 -m venv /path/to/new/virtual/environment
source environment/bin/activate

Install all the required dependencies using the file: requirements.txt

pip install -r requirements.txt

config.py - Contains all the necessary configuration settings needed to run the experiments.

CSVFILEPATH - Path for the csv file of the dataset containg title, abstract and ground truth labels to classify the documents as relevant or non-relevant to SLR in study.
RESULTS_FILENAME - Filename for saving predictions.
TABLE_FILENAME - Filename for displaying metric results.
PRETRAINED_MODEL_NAME - [bert, sbert, baseline] possible values and the models available. Can replicate the code similarly by using other pre-trained models with respective libraries.
WEIGHTED - [True, False] - True to use the weighted scheme
MODE - [PREDICT, ANALYSE]: PREDICT - To make predictions and ANALYSE - To obtain results from predictions.
LEVEL - [sentence, paragraph]: TO obtain embeddings at sentence or document level respectively.

main.py - Main entry point after defining the desired configurations.

python3 main.py

embeddings.py - Contains scripts to make use of the three models, namely 'bert-base-cased', SentenceTransformer('paraphrase-distilroberta-base-v1') and baseline model TfidfTransformer.
weightage.py - Methods implementing the weightage scheme mentioned in the paper.
preprocessing.py - For handling the data cleaning and pre-processing.
result_visualiser.py - To compile and store the prediction results in terms of metrics such as Fowlkes-Mallow Index and Adjusted Rand Index score. Also displays other information such as number of clusters , additional documents as part of the cluster.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
NLBSE__Supporting_SLR_Using_DL_Based_Language_Models__Copy_.pdf		NLBSE__Supporting_SLR_Using_DL_Based_Language_Models__Copy_.pdf
Predictions_dataset1.csv		Predictions_dataset1.csv
Predictions_dataset4.csv		Predictions_dataset4.csv
README.md		README.md
Results_SLR_dataset1.csv		Results_SLR_dataset1.csv
Results_SLR_dataset4.csv		Results_SLR_dataset4.csv
clustering.py		clustering.py
config.py		config.py
dataset1.csv		dataset1.csv
dataset4.csv		dataset4.csv
embeddings.py		embeddings.py
main.py		main.py
preprocessing.py		preprocessing.py
req.txt		req.txt
result_visualizer.py		result_visualizer.py
weightage.py		weightage.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Supporting-SLR-Using-DL-Based-Language-Models

Setup Instructions:

About

Releases

Packages

Languages

SharanyaMohan-30/Supporting-SLR-Using-DL-Based-Language-Models

Folders and files

Latest commit

History

Repository files navigation

Supporting-SLR-Using-DL-Based-Language-Models

Setup Instructions:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages