Replies: 3 comments 1 reply
-
|
PS: No matter what you want to build, just ask AI on Google Search. It will explain everything. It only takes 24 hours of "red-eye" hardcore coding, and you're good to go. |
Beta Was this translation helpful? Give feedback.
-
|
Actually that's not true. You can get CUDA without AVX2 simply by downloading the oldpc version. No recompile needed, just download and run cuda. https://github.com/LostRuins/koboldcpp/releases/download/v1.105.4/koboldcpp-oldpc.exe |
Beta Was this translation helpful? Give feedback.
-
|
Thanks, it worked. It seems this (older) version is a bit more memory-intensive, and if there isn't enough, it switches to the CPU backend (without CUDA). |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm here to brag! I’ve successfully forced the latest KoboldCPP to run on my legendary Intel Core i7-2600 using FULL CUDA acceleration.
The real challenge? Modern builds have completely abandoned CUDA+CPUs without AVX2—they aren't even included in the distribution anymore. While the model might have barely crawled before, I’ve achieved a breakthrough by manually recompiling the CUDA backend specifically for this "impossible" hardware combo.
The Achievement: Full CUDA Bridge
I manually rebuilt
koboldcpp_cublas.dllto bridge the gap between a CPU from 2011 and a modern GPU. This isn't just a "launch"—it's a fully optimized CUDA implementation running on a Sandy Bridge rig.My Custom Build Flags (CMake):
CMAKE_CUDA_ARCHITECTURES "75"(Targeting the RTX 20-series Turing core).LLAMA_AVX2=OFF(Eliminating the instructions my i7-2600 can't handle).LLAMA_FMA=OFF(Stripping out unsupported math sets).The Payoff (Qwen 7B GGUF):
The chat is lightning fast. All the heavy lifting is done via CUDA, bypassing the CPU's limitations entirely.
The Benchmark Quirk:
Interestingly, clicking the "Run Benchmark" button in the
koboldcpp-launcher.exeGUI still causes an instant crash. It seems the GUI's internal hardware-probing logic isn't aware of my custom CUDA DLL and tries to call AVX2 instructions anyway. However, running the benchmark through the command line with the--benchmarkflag works perfectly.My Pro-Config:
--quantkv 1 --flashattention --gpulayers 999 --noavx2Who says Sandy Bridge is dead? With a custom-compiled CUDA bridge, this 15-year-old chip is crushing LLMs at modern speeds!
Beta Was this translation helpful? Give feedback.
All reactions