Turn ArXiv Papers into High-Fidelity Technical Schematics. A specialized Anthropic Skill that architects professional diagrams for research papers, optimized for Nano Banana Pro.
🌐 Official Website: https://wilsonwukz.github.io/paper-visualizer-skill/
Figure 1: The "Golden Schema" generated via Anthropic Console (Claude 3.5 Sonnet). Note the precise recursive structure of the Encoder/Decoder stacks and the detailed "Multi-Head Attention" insets.
"Why can't AI draw my architecture correctly?"
Researchers and Engineers often struggle to visualize complex systems. While standard generative AI excels at art, it fundamentally fails at Scientific Logic and Topological Consistency, often producing "hallucinated" connections or gibberish text.
Paper Visualizer bridges this gap. It acts as a Structural Architect middleware that:
- Decodes the PDF: Reads the raw academic text to extract the logical topology (e.g., Is it a cyclic loop? A parallel stream? A hierarchical tree?).
- Visual Tokenization: Translates abstract concepts (e.g., "Residual Connection") into concrete visual tokens (e.g., "Curved bypass arrow with (+) symbol").
- Strict Layout Enforcement: Outputs a structured, coordinate-based prompt that forces Nano Banana Pro to obey physical laws.
- 6 Cognitive Layout Engines: Automatically selects the best visual topology for your paper:
Linear Pipeline(for CNNs/Preprocessing)Parallel Dual-Stream(for Transformers/Siamese Networks)Central Hub(for Agents/RL)Cyclic Loop(for Optimization/GANs)Hierarchical Stack(for FPNs/UNets)Matrix Grid(for Ablation Studies)
- Typography Guardrails: Enforces sans-serif hierarchy rules to minimize text artifacts, ensuring that main labels (e.g., "ENCODER") remain legible.
- Nano Banana Pro Optimized: Specifically tuned to leverage Nano Banana Pro's strengths in text rendering and structural adherence.
This skill supports different aesthetic outputs based on the configuration passed to Nano Banana Pro.
(See Figure 1 above)
- Pipeline: Claude 3.5 Sonnet → Nano Banana Pro
- Style: Clean, Academic, White Background. Perfect for Paper Submissions (LaTeX).
Figure 2: The same Transformer architecture rendered with a "Sci-Fi/High-Tech" aesthetic via GPT-4o logic. Ideal for Conference Slides, Posters, and Pitch Decks.
We strictly evaluate this skill across different environments to ensure robustness.
| Environment | Logic Model | Logic Adherence | Detail Insets | Log Output |
|---|---|---|---|---|
| Anthropic Console | Claude 3.5 Sonnet | Excellent | Perfect | View Log |
| ChatGPT Web | GPT-4o | Very Good | Good | View Log |
Observation: Claude 3.5 Sonnet tends to follow the "Detail Inset" (Zone 7 & 8) instructions more strictly, making it the recommended engine for complex architectures.
- Download the core skill file:
skills/visual-architect/SKILL.md. - Add it to your Project Knowledge (Claude Desktop / Cursor) or System Instructions.
- Trigger: "Generate a visual schema for this paper's methodology."
This skill forces the LLM to output a structured JSON-like Markdown block, bypassing its usual "chatty" nature:
[LAYOUT CONFIGURATION]
* Selected Layout: Parallel Dual-Stream
* Composition Logic: Left column = Encoder... Right column = Decoder...
[ZONE 1: INPUT]
* Visual Structure: A stack of 3 realistic paper icons...
...