Welcome to the Urdu Text Summarization repository powered by m-BART! ✨
This project is based on a multilingual variant of the BART model, designed to generate concise and coherent summaries for Urdu text. It uses a finetuned m-BART model and offers a Flask-based web application as a simple GUI for interaction.
m-BART (Multilingual BART) is an extension of the BART model pretrained on a large-scale multilingual dataset. BART is built for sequence-to-sequence tasks, and m-BART extends this for multilingual applications.
This model has been trained on ~67,000 Urdu news articles and is optimized specifically for Urdu summarization tasks.
You can manually download the model Checkpoints and place them in a folder named ckpt inside the cloned repo.
If not provided or placed incorrectly, the model and tokenizer will automatically be downloaded from Hugging Face Hub and saved locally in the ckpt directory.
You can run this project using Docker for quick setup and deployment.
Build the Docker image:
docker build -t urdu-summarizer .Run the container: If you have checkpoints locally, can optionally mount your local
ckpt/dir to avoid downloading model files:docker run -v $(pwd)/ckpt:/app/ckpt -p 5000:5000 urdu-summarizerVisit
http://localhost:5000in your browser.
pip
pip install -r requirements.txtconda
conda env create -f environment.yaml
conda activate urdu-summarizer- Start the app:
python app.py- Open your browser and go to
http://localhost:5000. - Paste your Urdu text to get instant summaries.
The notebooks directory contains three Jupyter notebooks. These notebooks can be used to:
- Load and run inference on the model.
- Finetune m-BART on your own Urdu dataset.
- Push the trained model to the Hugging Face Model Hub.
For any questions or suggestions, please reach out at [email protected].
