This project demonstrates extractive text summarization techniques using Python's Natural Language Toolkit (NLTK) and Gensim. Extractive summarization identifies the most important sentences from a given text, allowing users to quickly understand the essence of the content. This repository contains two implementations:
- Gensim-based Summarization
- Custom NLTK-based Summarization
- Extractive Summarization using Gensim: Uses Gensim's
summarize()function to generate concise summaries. - Custom Summarization using NLTK: Tokenizes the text, removes stopwords, and scores sentences based on word frequency to generate summaries.
- Comparison of Two Methods: Easily compare Gensim’s model with a custom NLTK-based approach.
- Customizable Parameters: Modify summary length and scoring logic according to your needs.
Make sure you have Python installed. Then, follow the steps below:
-
Clone this repository:
git clone https://github.com/sanchitc05/Text-Summarization-NLP.git cd Text-Summarization-NLP -
Install the required packages:
pip install nltk gensim
-
Download NLTK resources:
import nltk nltk.download('punkt') nltk.download('stopwords')
- Add your input text to the
textvariable in the code. - Run the script to see summaries generated by both Gensim and the custom implementation.
python summarization.pyExample Output:
Gensim Summary:
[Generated Summary]
Custom Summary:
[Generated Summary]
Text-Summarization-NLP/
│
├── summarization.py # Main script containing summarization functions
├── README.md # Documentation file
└── requirements.txt # List of dependencies (optional)
- Description: Uses Gensim's built-in
summarize()function to extract a summary. - Parameter:
text: Input text to be summarized.
- Returns: A summary with 20% of the original text length.
- Description:
Custom implementation using NLTK. It creates a frequency table of words, scores each sentence, and selects the top 3 most relevant sentences. - Parameter:
text: Input text to be summarized.
- Returns: A summary consisting of the top-scoring sentences.
- Quick Overview: Get concise summaries from lengthy articles or documents.
- Easy to Implement and Compare: Offers two summarization approaches for comparison.
- Highly Customizable: Modify stopword lists, sentence selection logic, and summary ratio to suit your needs.
- Scalable: Can be integrated into larger NLP applications such as chatbots, recommendation systems, or content summarizers.
Contributions are welcome! Feel free to submit issues or pull requests to improve this project.
- Fork this repository.
- Create a new branch:
git checkout -b feature-name
- Make your changes and commit:
git commit -m "Add new feature" - Push to your branch:
git push origin feature-name
- Submit a pull request.
This project is licensed under the MIT License. See the LICENSE file for more details.
For further inquiries, please contact:
Sanchit Chauhan
GitHub Profile
LinkedIn Profile
Email: sanchitchauhan005@gmail.com