High-Performance CUDA Flex: Sandy Bridge (i7-2600) + RTX 2060 Super #1914

PavSuslo · 2026-01-07T10:59:07Z

PavSuslo
Jan 7, 2026

I'm here to brag! I’ve successfully forced the latest KoboldCPP to run on my legendary Intel Core i7-2600 using FULL CUDA acceleration.

The real challenge? Modern builds have completely abandoned CUDA+CPUs without AVX2—they aren't even included in the distribution anymore. While the model might have barely crawled before, I’ve achieved a breakthrough by manually recompiling the CUDA backend specifically for this "impossible" hardware combo.

The Achievement: Full CUDA Bridge

I manually rebuilt koboldcpp_cublas.dll to bridge the gap between a CPU from 2011 and a modern GPU. This isn't just a "launch"—it's a fully optimized CUDA implementation running on a Sandy Bridge rig.

My Custom Build Flags (CMake):

CMAKE_CUDA_ARCHITECTURES "75" (Targeting the RTX 20-series Turing core).
LLAMA_AVX2=OFF (Eliminating the instructions my i7-2600 can't handle).
LLAMA_FMA=OFF (Stripping out unsupported math sets).

The Payoff (Qwen 7B GGUF):

Prompt Processing: 1056.46 T/s (Full GPU offload + Flash Attention).
Text Generation: 22.87 T/s.
The chat is lightning fast. All the heavy lifting is done via CUDA, bypassing the CPU's limitations entirely.

The Benchmark Quirk:

Interestingly, clicking the "Run Benchmark" button in the koboldcpp-launcher.exe GUI still causes an instant crash. It seems the GUI's internal hardware-probing logic isn't aware of my custom CUDA DLL and tries to call AVX2 instructions anyway. However, running the benchmark through the command line with the --benchmark flag works perfectly.

My Pro-Config:

--quantkv 1 --flashattention --gpulayers 999 --noavx2

Who says Sandy Bridge is dead? With a custom-compiled CUDA bridge, this 15-year-old chip is crushing LLMs at modern speeds!

PavSuslo · 2026-01-07T11:01:55Z

PavSuslo
Jan 7, 2026
Author

PS: No matter what you want to build, just ask AI on Google Search. It will explain everything. It only takes 24 hours of "red-eye" hardcore coding, and you're good to go.

0 replies

LostRuins · 2026-01-07T11:32:10Z

LostRuins
Jan 7, 2026
Maintainer

Actually that's not true. You can get CUDA without AVX2 simply by downloading the oldpc version. No recompile needed, just download and run cuda.

https://github.com/LostRuins/koboldcpp/releases/download/v1.105.4/koboldcpp-oldpc.exe

1 reply

patrickzel Jan 12, 2026

You are right. However, building on your own machine with the correct CUDA architecture is actually faster. I increased my prompt processing by 40%, which was the biggest issue for me.

I use an Intel 3570k with an RTX 3060.

PavSuslo · 2026-01-07T15:14:13Z

PavSuslo
Jan 7, 2026
Author

Thanks, it worked. It seems this (older) version is a bit more memory-intensive, and if there isn't enough, it switches to the CPU backend (without CUDA).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High-Performance CUDA Flex: Sandy Bridge (i7-2600) + RTX 2060 Super #1914

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

High-Performance CUDA Flex: Sandy Bridge (i7-2600) + RTX 2060 Super #1914

Uh oh!

Uh oh!

PavSuslo Jan 7, 2026

The Achievement: Full CUDA Bridge

My Custom Build Flags (CMake):

The Payoff (Qwen 7B GGUF):

The Benchmark Quirk:

My Pro-Config:

Replies: 3 comments · 1 reply

Uh oh!

PavSuslo Jan 7, 2026 Author

Uh oh!

LostRuins Jan 7, 2026 Maintainer

Uh oh!

patrickzel Jan 12, 2026

Uh oh!

PavSuslo Jan 7, 2026 Author

PavSuslo
Jan 7, 2026

Replies: 3 comments 1 reply

PavSuslo
Jan 7, 2026
Author

LostRuins
Jan 7, 2026
Maintainer

PavSuslo
Jan 7, 2026
Author