A FastAPI-based service built on top of Microsoft's MarkItDown library that provides endpoints for converting various document formats to markdown and extracting YouTube video transcripts. This project extends MarkItDown's capabilities with additional features specifically focused on YouTube transcript extraction and web content conversion.
- Convert various file formats to markdown
- Extract YouTube video transcripts with timestamps
- URL content conversion
- Support for multiple document formats including:
- HTML
- DOCX
- XLSX
- PPTX
- Images
- Audio files (WAV, MP3)
- ZIP files
- Wikipedia pages
- YouTube pages / Videos
- Clone the repository
- Create a
.envfile with the following variables:
PROXY_URL=your_proxy_url_if_needed
OPENAI_API_KEY=your_openai_api_key- Build and run using Docker:
docker build -t document-converter .
docker run -p 8000:8000 document-converterOr install locally:
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txt
uvicorn main:app --host 0.0.0.0 --port 8000POST /convert/
Content-Type: multipart/form-data
file: <file>Converts uploaded files to markdown format.
POST /convert-url/
Content-Type: application/x-www-form-urlencoded
url=https://example.comConverts web page content to markdown format.
POST /youtube/
Content-Type: application/x-www-form-urlencoded
url=https://www.youtube.com/watch?v=video_idReturns:
- Full transcript without timestamps
- Timestamped transcript
- Markdown formatted transcript
- Structured transcript with timing information
import requests
# Convert URL
response = requests.post(
"http://localhost:8000/convert-url/",
data={"url": "https://example.com"}
)
print(response.json())
# Get YouTube Transcript
response = requests.post(
"http://localhost:8000/youtube/",
data={"url": "https://www.youtube.com/watch?v=video_id"}
)
print(response.json())
# Convert File
with open("document.pdf", "rb") as f:
response = requests.post(
"http://localhost:8000/convert/",
files={"file": f}
)
print(response.json())# Convert URL
curl -X POST http://localhost:8000/convert-url/ \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "url=https://example.com"
# Get YouTube Transcript
curl -X POST http://localhost:8000/youtube/ \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "url=https://www.youtube.com/watch?v=video_id"
# Convert File
curl -X POST http://localhost:8000/convert/ \
-F "[email protected]"The project uses several key dependencies which are listed in the requirements.txt file:
fastapi[standard]
uvicorn
pydub
speechrecognition
youtube-transcript-api
python-dotenv
openai
python-multipart
# Add these missing dependencies:
mammoth
markdownify
pandas
pdfminer.six
python-pptx
puremagic
requests
beautifulsoup4
charset-normalizerThe project includes a Dockerfile that sets up all necessary dependencies and environment:
FROM alpine:3.19
# Set environment variables and locale
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PATH="/app/.venv/bin:$PATH" \
LANG=C.UTF-8 \
LC_ALL=C.UTF-8 \
PYTHONIOENCODING=UTF-8
# Install system dependencies and Python
RUN apk add --no-cache \
python3 \
py3-pip \
python3-dev \
ffmpeg \
exiftool \
build-base \
pango-dev \
cairo-dev \
jpeg-dev \
zlib-dev \
gcc \
musl-dev \
libffi-dev \
git
WORKDIR /app
# Create and activate virtual environment
RUN python3 -m venv .venv
# Install dependencies with extras
COPY requirements.txt .
RUN . .venv/bin/activate && \
pip install --no-cache-dir -r requirements.txt && \
pip install --no-cache-dir --upgrade \
youtube-dl \
youtube-transcript-api \
pydub \
speechrecognition \
python-dotenv \
openai \
python-multipart
# Copy application code
COPY . .
# Expose the port
EXPOSE 8000
# Run using the full path to uvicorn
CMD [".venv/bin/uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
The API includes comprehensive error handling and will return appropriate HTTP status codes and error messages when issues occur. All endpoints return JSON responses with either the requested data or detailed error information.
- For YouTube transcripts, the API attempts multiple methods to retrieve the transcript, including:
- English only
- Auto-generated captions
- Multiple language variants
- Available transcript list
- Proxy support is available through the PROXY_URL environment variable
- OpenAI integration is available for enhanced processing capabilities