Geonsik Moon gsmoon97

Hi there 👋 I'm Geonsik Moon

MSCS @ Columbia University | ML Engineer | LLM Training & Evaluation

🎯 About Me

I specialize in Large Language Models (LLMs) and Natural Language Processing (NLP), with a focus on advancing the state-of-the-art in downstream tasks such as semantic understanding, timeline summarization, and grammatical error correction. My work bridges cutting-edge research (4 ACL publications) and production-grade systems (ByteDance, Apple), making AI accessible and practical. Last but not least, my first name "Geonsik" is pronounced as "Gun-Shik" or /kʌn.ɕik/. You can also just call me "GS" 🙋🏻‍♂️

🚀 Core AI/ML Expertise

Large Language Models

Fine-tuning & Optimization: LoRA-based PEFT, 4-bit quantization, hyperparameter tuning with W&B
Model Deployment: vLLM serving, LangChain pipelines, ChromaDB integration, AWS Bedrock inference
Models: Mistral, Llama 2/3, FLAN-T5, GPT family, OpenAI GPT-OSS-20B
Structured Outputs: Using instructor library with Pydantic models for reliable LLM responses
Research: Incremental clustering algorithms using LLM-based pairwise classification

Natural Language Processing

Grammatical Error Correction (GEC): Sequence-to-sequence & sequence tagging approaches
Timeline Summarization (TLS): Event detection, clustering, and narrative construction
Semantic Understanding: Word Sense Disambiguation (WSD), Words-in-Context (WiC)
Email Classification: Topic-based email classification with RAG-enriched semantic search
Transfer Learning: Encoder-only vs. decoder-only architectures for semantic tasks

Production ML Systems

Scalable Web Applications: Flask, Streamlit, Bootstrap, Docker containerization, LAMP stack
Microservices Architecture: GEC system with separate API and web interface modules, email processing pipelines
Model Serving: Production-grade deployment of transformer models and LLMs for real-time inference
RAG Pipelines: Retrieval-Augmented Generation with vector embeddings for semantic search

💫 Featured Projects

Timeline Summarization with LLMs (ACL 2024) [`Code` | `Paper`]

Novel approach leveraging LLMs for incremental event clustering and timeline construction from text streams. Outperformed SOTA on 4 TLS benchmarks.
Tech Stack: PyTorch vLLM Llama-2-13B LangChain ChromaDB

Semantic Understanding with LLMs (ACL 2024) [`Code` | `Paper`]

Comprehensive framework demonstrating encoder-only models outperform decoder-only LLMs on word meaning comprehension tasks.
Tech Stack: PyTorch HuggingFace Transformers LoRA PEFT WandB

Email Prime: AI-Powered Email Classification & Summarization [`Code` | `Live Demo`]

End-to-end email processing pipeline with Streamlit web UI for Gmail integration, intelligent topic classification using AWS Bedrock LLMs, and AI-generated email thread summaries. Features incremental processing, LLM-powered topic attribute generation, RAG-enriched classification with semantic search, and complete lifecycle management (create, view, delete projects).
Tech Stack: Python Streamlit AWS Bedrock OpenAI GPT-OSS-20B instructor Pydantic Gmail API FAISS Amazon Titan Embeddings ChromaDB LangChain

🗂️ Side Projects

LLM Agent Evaluation [`Code`]

Research toolkit for analyzing LLM agent trajectories on software engineering tasks.
Tech Stack: Jupyter Python Agent Frameworks

Algorithm Practice [`Code`]

Self-contained archive of LeetCode solutions demonstrating strong algorithmic foundations.
Tech Stack: Python Data Structures Algorithms

🛠️ Technical Skills

Programming Languages

Deep Learning & AI Frameworks

LLM & NLP Ecosystem

Data Science & Analytics

Web Development

Cloud & Distributed Computing

Data Infrastructure & Processing

Tools & Version Control

📚 Publications

From Moments to Milestones: Incremental Timeline Summarization Leveraging Large Language Models
Qisheng Hu, Geonsik Moon, Hwee Tou Ng
ACL 2024 (Main Conference) | [Code | Paper]
Are Decoder-Only Language Models Better than Encoder-Only Language Models in Understanding Word Meaning?
Muhammad Qorib, Geonsik Moon, Hwee Tou Ng
ACL 2024 (Findings) | [Code | Paper]
ALLECS: A Lightweight Language Error Correction System
Muhammad Reza Qorib, Geonsik Moon, Hwee Tou Ng
EACL 2023 (System Demonstrations) | [Code | Paper]
WAMP: Writing, Annotation, and Marking Platform
Geonsik Moon, Muhammad Reza Qorib, Daniel Dahlmeier, Hwee Tou Ng
IJCNLP-AACL 2023 (System Demonstrations) | [Code | Paper]

📫 Connect With Me

🌐 Website: gsmoon97.github.io
💼 LinkedIn: linkedin.com/in/gsmoon97
🎓 Google Scholar: si3AXV8AAAA
🔬 ORCID: 0009-0001-5646-466X
📧 Based in: New York, NY

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Geonsik Moon gsmoon97

Achievements

Achievements

Highlights

Block or report gsmoon97

Hi there 👋 I'm Geonsik Moon

MSCS @ Columbia University | ML Engineer | LLM Training & Evaluation

🎯 About Me

🚀 Core AI/ML Expertise

Large Language Models

Natural Language Processing

Production ML Systems

💫 Featured Projects

Timeline Summarization with LLMs (ACL 2024) [`Code` | `Paper`]

Semantic Understanding with LLMs (ACL 2024) [`Code` | `Paper`]

Email Prime: AI-Powered Email Classification & Summarization [`Code` | `Live Demo`]

🗂️ Side Projects

LLM Agent Evaluation [`Code`]

Algorithm Practice [`Code`]

🛠️ Technical Skills

Programming Languages

Deep Learning & AI Frameworks

LLM & NLP Ecosystem

Data Science & Analytics

Web Development

Cloud & Distributed Computing

Data Infrastructure & Processing

Tools & Version Control

📚 Publications

📫 Connect With Me

Pinned Loading

Uh oh!

Geonsik Moon gsmoon97

Achievements

Achievements

Highlights

Hi there 👋 I'm Geonsik Moon

MSCS @ Columbia University | ML Engineer | LLM Training & Evaluation

🎯 About Me

🚀 Core AI/ML Expertise

Large Language Models

Natural Language Processing

Production ML Systems

💫 Featured Projects

Timeline Summarization with LLMs (ACL 2024) [Code | Paper]

Semantic Understanding with LLMs (ACL 2024) [Code | Paper]

Email Prime: AI-Powered Email Classification & Summarization [Code | Live Demo]

🗂️ Side Projects

LLM Agent Evaluation [Code]

Algorithm Practice [Code]

🛠️ Technical Skills

Programming Languages

Deep Learning & AI Frameworks

LLM & NLP Ecosystem

Data Science & Analytics

Web Development

Cloud & Distributed Computing

Data Infrastructure & Processing

Tools & Version Control

📚 Publications

📫 Connect With Me

Pinned Loading

Uh oh!

Timeline Summarization with LLMs (ACL 2024) [`Code` | `Paper`]

Semantic Understanding with LLMs (ACL 2024) [`Code` | `Paper`]

Email Prime: AI-Powered Email Classification & Summarization [`Code` | `Live Demo`]

LLM Agent Evaluation [`Code`]

Algorithm Practice [`Code`]