EXTRACTIVE TEXT SUMMARIZATION USING DEEP NEURAL NETWORKS (RBM and BERT)
Introduction
Automatic text summarization is one of the major tasks in natural language processing. The goal is to generate a concise as well as a relevant summary of a longer text. The need for text summarization extends across multiple domains, from news articles to technical reports and reviews. In the present scenario, when data on the World Wide Web is growing exponentially every day, short summaries are in huge demand to capture the essence of the relevant information to prevent the time-consuming process of going through the entire text.
There are broadly two categories of automatic text summarization:-
- Extractive Text Summarization: As the name suggests, this approach involves extracting the original text's keywords and phrases to generate a summary. This means that the summary is generated entirely from the original longer text.
- Abstractive Text Summarization: In contrast to the above-discussed approach, abstractive text summarization involves generating a different summary from the words and phrases you would find in the original text. This abstractive summarization approach is more challenging than the extractive ones.
In our project, we have worked on various techniques for Extractive text summarization. We have chosen certain baseline models — LexRank, TextRank, and Pysummarization. Then, we have implemented the Restricted Boltzmann Machines (RBM) model for extractive text summarization from scratch and compared its performance with the above baselines and with BERT based model for extractive text summarization.
Dataset
We have used the BBC News Summary dataset for this project and the news articles on technology. This dataset has four hundred articles of BBC from 2004 to 2005, and for each article, the corresponding human-generated summary is also available. The size of this human-generated reference summary is approximately 40% of the size of the original news article.
Baseline Models
The description of the baselines used by us is as follows:-
- LexRank: LexRank is an unsupervised approach to text summarization based on graph-based centrality scoring of sentences. The importance of a sentence in a graphical representation is based on the centrality of eigenvectors utilizing intra- sentence cosine similarity to form the adjacency matrix. For more details, refer to the paper in this link.
- TextRank: Textrank is another graph-based model for text summarization based on Google’s PageRank algorithm for ranking online web pages. Basic ideas are similar in both the algorithms, and more details can be found in this link.
- Pysummarization: The third baseline we have used is the extractive summarization using the pysummarization python library, which uses Encoder/Decoder based on LSTM and Sequence-to-Sequence based learning.
RESTRICTED BOLTZMANN MACHINE (RBM) for Extractive Summarization
RBM is the generative artificial neural network that uses the set of nodes to find the data distribution of the inputs. Every kind of data has hidden information that is not easy to capture using different features. The RBM model maps input data into such a complex representation. The RBM model has basically two kinds of layers, which include:
- Visible Layer: This layer corresponds to the input layer of the model. The numerical data is the input to the visible layers. The number of nodes in the visible layers corresponds to the number of features in the data.
- Hidden Layer: This layer is the next in the model. This layer consists of nodes that receive the processed input from the visible layer.
Architecture for RBM Model
In our case, we use the RBM model that will find out the complex representation of the input feature matrix. The input feature matrix has been described in the above section.
- Number of Visible Units: For each sentence in the document, we have extracted 9 features. Therefore, the number of units in the visible layer is 9.
- Number of Hidden Units: Each 9-dimensional feature is converted into the 9-dimensional complex features, which are sometimes called latent representations of features. Therefore, the number of units in the hidden layer is 9.
Methodology
- Perform the basic text processing on the input documents.
- Extract the 9 features from the preprocessed documents and create the sentence feature matrix.
- Pass the sentence feature matrix as input to the RBM Model to generate the complex representation of the sentence feature matrix.
- After getting the sentence feature matrix, finally, using the Jaccard Similarity concept, we will rank the sentences in the document, which will be finally used in summary generation.
BERT Model
BERT stands for Bidirectional Encoder Representations from Transformers. BERT model is initiated to pre-train deep bidirectional models such that the unlabeled sentences are read from both left and right contexts. Bidirectional suffices that the model gains information from both the left end and right end during training.
The importance of the bidirectional feature in the BERT model can be understood by the following example:- There are two sentences. 1) date is nutritious dry-fruit. 2) Joe took Alexandria out on a date. In the given example, the word ‘date’ occurs in both sentences. Thus, it is important for the model to gain information from both the direction in order to avoid an error.
Architecture for BERT Model
There are two variants of the model:-
- BERT Base: 12 layers (transformer blocks), 12 attention heads, and 110 million parameters
- BERT Large: 24 layers (transformer blocks), 16 attention heads and, 340 million parameters
Methodology
- Python provides a module named bert-extractive-summarizer, which can be used to implement the BERT model.
- After creating the BERT model, important parameters such as min_length and max_length are being used to specify the minimum and maximum size of the summary.
Results
The extractive summaries generated from the three baseline models- LexRank, TextRank, and Pysummarization and the RBM model implemented from scratch and BERT based extractive summarizer- are evaluated human-generated summaries using the ROUGE metric.
We have evaluated the performance of each of these models averaged over 100 documents of the dataset, and the corresponding recall and F-Measure values obtained are tabulated below:-
The document number vs. the F-Measure plots for all the five models are shown below.
Analysis
First, let us understand what the recall and precision scores tell us about the model generated summary.
The recall is the number of common words in the model generated summary and the reference summary divided by the reference summary size. It tells us how much the model generated summary has covered the reference summary.
Precision is the number of common words in the model generated summary and the reference summary divided by the size of the model generated summary. Thus, precision gives us an idea of the relevance and the conciseness of the model generated summary.
- We get low F-Measure scores for the LexRank and Pysummarization model, even though the recall is high, indicating that the model generated summaries are not very relevant.
- The TextRank model gives us the lowest recall value hence poor performance.
- Comparing the BERT and the RBM model, the BERT model lags behind the RBM model in terms of F-Measure.
Conclusion
From the above analysis and the tabulated results, it is clear that the RBM model gives us the best performance for extractive text summarization as the summary generated by it is consistent with both scores- recall as well as the F-Measure.
Blog Authors and Contribution
Aniket Chauhan(linkedin.com/in/aniket-chauhan-65805b148/)
LexRank baseline, Text preprocessing, Feature extraction (RBM), Dataset handling, RBM from scratch, ROUGE metric evaluations, Literature Survey.
Divisha Bisht(linkedin.com/in/divisha-bisht-3a24111aa/)
TextRank baseline, Text preprocessing, Feature extraction (RBM), Analysis of results, RBM from scratch, result visualization plots, Literature Survey.
Purudewa Pawar(linkedin.com/in/purudewa-pawar-2ba955154/)
Pysummarization baseline, Text preprocessing, Feature extraction (RBM), RBM from scratch, BERT based summarizer, Literature Survey.
Acknowledgment
We extend our gratitude to Professor Dr. Tanmoy Chakraborty and our TA Vivek Reddy for their constant support and guidance throughout this project as part of the Machine Learning(PG) Course 2020.
- Professor: linkedin.com/in/tanmoy-chakraborty-89553324
- Prof. Website: faculty.iiitd.ac.in/~tanmoy/
- Teaching Fellow: Ms. Ishita Bajaj
- Teaching Assistants: Pragya Srivastava, Shiv Kumar Gehlot, Chhavi Jain, Vivek Reddy, Shikha Singh, and Nirav Diwan.
References
1] Liu, Y. (2019). Fine-tune BERT for extractive summarization. arXiv preprint arXiv:1903.10318.
2] Erkan, G., Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research, 22, 457–479.
3] Mihalcea, R., Tarau, P. (2004, July). Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing (pp. 404–411).
4] Wang, S., Jiang, J. (2015). Learning natural language inference with LSTM. arXiv preprint arXiv:1512.08849.
5] Extractive Summarization using Deep Learning Sukriti Verma and Vagisha Nidhi (Extractive Summarization using Deep Learning)
6] Miller, D. (2019). Leveraging BERT for extractive text summarization on lectures. arXiv preprint arXiv:1906.04165.
7] Hands-on Guide To Extractive Text Summarization With BERTSum link
8] Corpora Evaluation and System Bias Detection in Multi-document Summarization link.
9] Natural Language Toolkit for Python NLTK.