-
|
I'm trying offload some layers to a NVIDIA GeForce RTX 2080 Ti. Args: If I use Let me know if you need a complete log. Why can't I use partial GPU offloading? |
Beta Was this translation helpful? Give feedback.
Answered by
LostRuins
Jan 22, 2026
Replies: 1 comment 3 replies
-
|
Please try lower the context size first, also enable --flashattention |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Yes, please enable flash attention and try