Skip to content

NLP project for text summarization using extractive summarization techniques.

License

Notifications You must be signed in to change notification settings

sanchitc05/Text-Summarization-NLP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Text-Summarization-NLP

This project demonstrates extractive text summarization techniques using Python's Natural Language Toolkit (NLTK) and Gensim. Extractive summarization identifies the most important sentences from a given text, allowing users to quickly understand the essence of the content. This repository contains two implementations:

  1. Gensim-based Summarization
  2. Custom NLTK-based Summarization

Table of Contents


Features

  • Extractive Summarization using Gensim: Uses Gensim's summarize() function to generate concise summaries.
  • Custom Summarization using NLTK: Tokenizes the text, removes stopwords, and scores sentences based on word frequency to generate summaries.
  • Comparison of Two Methods: Easily compare Gensim’s model with a custom NLTK-based approach.
  • Customizable Parameters: Modify summary length and scoring logic according to your needs.

Installation

Make sure you have Python installed. Then, follow the steps below:

  1. Clone this repository:

    git clone https://github.com/sanchitc05/Text-Summarization-NLP.git
    cd Text-Summarization-NLP
  2. Install the required packages:

    pip install nltk gensim
  3. Download NLTK resources:

    import nltk
    nltk.download('punkt')
    nltk.download('stopwords')

Usage

  1. Add your input text to the text variable in the code.
  2. Run the script to see summaries generated by both Gensim and the custom implementation.
python summarization.py

Example Output:

Gensim Summary:
[Generated Summary]

Custom Summary:
[Generated Summary]

Project Structure

Text-Summarization-NLP/  
│  
├── summarization.py      # Main script containing summarization functions  
├── README.md             # Documentation file  
└── requirements.txt      # List of dependencies (optional)

Functions Explained

1. extractive_summarization_gensim(text)

  • Description: Uses Gensim's built-in summarize() function to extract a summary.
  • Parameter:
    • text: Input text to be summarized.
  • Returns: A summary with 20% of the original text length.

2. extractive_summarization_custom(text)

  • Description:
    Custom implementation using NLTK. It creates a frequency table of words, scores each sentence, and selects the top 3 most relevant sentences.
  • Parameter:
    • text: Input text to be summarized.
  • Returns: A summary consisting of the top-scoring sentences.

Benefits

  • Quick Overview: Get concise summaries from lengthy articles or documents.
  • Easy to Implement and Compare: Offers two summarization approaches for comparison.
  • Highly Customizable: Modify stopword lists, sentence selection logic, and summary ratio to suit your needs.
  • Scalable: Can be integrated into larger NLP applications such as chatbots, recommendation systems, or content summarizers.

Contributing

Contributions are welcome! Feel free to submit issues or pull requests to improve this project.

Steps to Contribute:

  1. Fork this repository.
  2. Create a new branch:
    git checkout -b feature-name
  3. Make your changes and commit:
    git commit -m "Add new feature"
  4. Push to your branch:
    git push origin feature-name
  5. Submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for more details.


Acknowledgments

  • NLTK for Natural Language Processing tools.
  • Gensim for efficient text summarization models.

Contact

For further inquiries, please contact:
Sanchit Chauhan
GitHub Profile LinkedIn Profile

Email: sanchitchauhan005@gmail.com

About

NLP project for text summarization using extractive summarization techniques.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages