fix: optimizations to try and reduce deadlocks #1256

CuzImClicks · 2026-01-04T21:06:31Z

Description

Uses Notify per chunk and stops reference counting pending writes. This pr still deadlocks but its not triggered from flying around but can be triggered by just standing still for 20min? 😭

Testing

Please follow our Coding Guidelines

Acquire write lock only once

…sync_io_lock

tn-lorenz · 2026-01-04T23:36:01Z

Could this be a ThreadPool becoming full, because we have a memory leak somewhere in the code? I think this might look the same as a dead lock.

…sync_io_lock

sprinox · 2026-01-06T14:13:57Z

If no reference counting is used, then with the current PR implementation, it’s possible for two save tasks to be issued for the same chunk. After the first save task finishes writing, the chunk gets unlocked. At that point, a reader thread may read it immediately without waiting for the second save task to complete, which can result in chunk data being rolled back.

CuzImClicks · 2026-01-07T23:25:28Z

@sprinox

i added back the reference counting but its still deadlocking. What i noticed was that it deadlocks when im standing still and just coding while not loading/generating chunks. I unbounded all of the channels but that also didnt really fix anything. this deadlock seems to be really stubborn. but i definitely think using CancellationToken is better than condvar just because CancellationToken was meant for this exact purpose of waiting or completing instantly if a permit has been deposited.

i moved the write task onto its own thread but it would probably be better to just move all write tasks for all levels together onto one thread instead of each their own. i found that even though its async io its still better to have them on their own thread and cause less slow ticks.

2026-01-08 00:09:14 [INFO] (33) Write took 71.524089ms
2026-01-08 00:09:29 [INFO] (33) io write thread receive chunks size 2731
2026-01-08 00:09:29 [INFO] (33) Write took 33.541264ms
2026-01-08 00:09:45 [INFO] (33) io write thread receive chunks size 2731
2026-01-08 00:09:45 [INFO] (33) Write took 107.093064ms
2026-01-08 00:10:00 [INFO] (33) io write thread receive chunks size 2731
2026-01-08 00:10:00 [INFO] (33) Write took 57.036952ms
2026-01-08 00:10:15 [INFO] (33) io write thread receive chunks size 2731
2026-01-08 00:10:15 [INFO] (33) Write took 12.304892ms
2026-01-08 00:10:31 [INFO] (33) io write thread receive chunks size 2731
2026-01-08 00:10:31 [INFO] (33) Write took 14.138839ms

if you have any more ideas on how this could be improved i would appreciate your advice!

but i made some improvements to the tick_data stuff and eliminated a lot of per tick allocation. i found that per tick the random tick calculation actually takes the most time...

for i in 0..chunk.section.sections.len() {
    for _ in 0..random_tick_speed {
        let r = rng.random::<u32>();
        let x_offset = (r & 0xF) as i32;
        let y_offset = ((r >> 4) & 0xF) as i32 - 32;
        let z_offset = (r >> 8 & 0xF) as i32;

        let random_pos = BlockPos::new(
            chunk_x_base + x_offset,
            i as i32 * 16 + y_offset,
            chunk_z_base + z_offset,
        );

        let block_id = chunk
            .section
            .get_block_absolute_y(x_offset as usize, random_pos.0.y, z_offset as usize)
            .unwrap_or(Block::AIR.default_state.id);

        if has_random_ticks(block_id) {
            random_ticks.push(ScheduledTick {
                position: random_pos,
                delay: 0,
                priority: TickPriority::Normal,
                value: (),
            });
        }
    }
}

sprinox · 2026-01-08T08:29:56Z

I agree that CancellationToken is better, but I can't figure out a safe implementation.

About deadlock, it seldom occurs in my pc. If anyone uses Linux and can reproduce deadlock, it will be a great help to enable the task dump to print all current task and find which one is deadlock.

I commented the code because I use Windows. By using it, you can get output like
#1086 (comment)

I suggest calling the dump function if shutdown takes longer than 300 seconds. I removed it earlier because it wouldn’t pass CI, so it needs to be added back locally.

sprinox · 2026-01-08T08:35:27Z

There is also some resources leaking.

I mentioned it at my previous pr.

This PR does not touch entities or entity chunks. I’m not sure what mechanism is currently used for entities, but it causes them to keep being processed even when the player is far away. As a result, chunks are repeatedly requested, temporarily loaded, and then unloaded by the scheduling thread, which severely impacts performance. So, when testing performance, try freezing the game tick.

CuzImClicks · 2026-01-08T22:15:50Z

bro i have been sitting here with two accounts for over an hour and this shit just wont deadlock

CuzImClicks · 2026-01-09T10:45:36Z

Could this be a ThreadPool becoming full, because we have a memory leak somewhere in the code? I think this might look the same as a dead lock.

We definitely have a memory leak which i was able to observe yesterday but it still didnt deadlock

CuzImClicks · 2026-01-09T16:54:22Z

@sprinox are you deadlocking on this pr on windows? Because i cant get it to deadlock on linux rn

sprinox · 2026-01-09T16:56:40Z

@sprinox are you deadlocking on this pr on windows? Because i cant get it to deadlock on linux rn

I have not test this pr. but I met deadlock on master branch.

sprinox · 2026-01-10T08:41:06Z

pumpkin-world/src/chunk_system.rs

-        let (send_gen, recv_gen) = crossfire::mpmc::bounded_blocking(gen_thread_count + 5);
-        let io_lock = Arc::new((Mutex::new(HashMapType::default()), Condvar::new()));
-        for _ in 0..oi_read_thread_count {
+        let (send_gen, recv_gen) = crossfire::mpmc::unbounded_blocking();


dont use unbounded. It will be filled with too many tasks which can't be cancelled.

The sender blocks tho when the channel is filled... i made it unbounded to see if it fixes deadlocks

sprinox · 2026-01-10T08:45:45Z

pumpkin-world/src/chunk_system.rs

    }

-    fn work(mut self, level: Arc<Level>) {
+    async fn work(mut self, level: Arc<Level>) {


i still believe that schedule thread should not be async. it is a cpu-heavy task. you can use blocking_lock to lock tokio mutex in normal thread.

also if it should be changed to async, please change blocking channel to async channel

it doesnt really matter if its async or not... since its on its own thread and i wanted to make it async so i can see on what part of execution it gets stuck. i might try and put all of the generation thread onto their multi threaded tokio runtime at some point but i havent started with that yet

In my opinion, sync function is much easier to debug than async. If sync function deadlock, we can simply pause at lldb or gdb, and see which line of code it stuck. But this don't work at async function. So I think async is really hard to debug.

Besides, async is much slower at recursion.

…sync_io_lock

- actually put write tasks onto their own threads - only one entity chunks thread per dimension

…sync_io_lock

NailLegProcessorDivide and others added 5 commits December 27, 2025 17:27

Reuse TickData allocations between cycles

f70a3c1

Move work to its own thread Make IOLock async to prevent more deadlocks

3962fb9

Acquire write lock only once

remove reference counting of chunk writes

828bfbd

Merge branch 'master' of https://github.com/Pumpkin-MC/Pumpkin into a…

c87e91e

…sync_io_lock

cargo clippy

d1dad1b

CuzImClicks added 4 commits January 5, 2026 14:17

Merge branch 'master' of https://github.com/Pumpkin-MC/Pumpkin into a…

c05c17c

…sync_io_lock

Merge branch 'master' of https://github.com/Pumpkin-MC/Pumpkin into a…

ad42711

…sync_io_lock

switch to cancellation token

562e947

improve speed of tick data and add ability to change random tick speed

8a8fad5

CuzImClicks and others added 3 commits January 6, 2026 18:20

Merge branch 'tickdata' into async_io_lock

0a241eb

optimize even more

14cd8d8

added reference counted writes back

9537363

track time that write takes

a6c33ef

CuzImClicks changed the title ~~Fix: Stop Reference Counting Writes to Reduce Deadlocks~~ fix: optimizations to try and reduce deadlocks Jan 9, 2026

sprinox reviewed Jan 10, 2026

View reviewed changes

CuzImClicks and others added 5 commits January 11, 2026 20:50

Merge branch 'master' of https://github.com/Pumpkin-MC/Pumpkin into a…

068737a

…sync_io_lock

made debug saves 50s but put writers onto a single thread (not working)

5e41684

Merge branch 'Pumpkin-MC:master' into async_io_lock

3519c03

make the send and recv chunk channel async aswell

5327124

Merge branch 'master' of https://github.com/Pumpkin-MC/Pumpkin into a…

8cd6c18

…sync_io_lock

CuzImClicks and others added 4 commits January 14, 2026 12:23

- explicitly initialize tokio runtime with num_cpus

f2ce9e0

- actually put write tasks onto their own threads - only one entity chunks thread per dimension

Merge branch 'master' into async_io_lock

b138da1

Merge branch 'master' of https://github.com/Pumpkin-MC/Pumpkin into a…

4179887

…sync_io_lock

clippy .-.

d32786b

Snowiiii closed this Jan 25, 2026

Uh oh!

fix: optimizations to try and reduce deadlocks #1256

fix: optimizations to try and reduce deadlocks #1256

Uh oh!

Conversation

CuzImClicks commented Jan 4, 2026

Description

Testing

Uh oh!

tn-lorenz commented Jan 4, 2026

Uh oh!

sprinox commented Jan 6, 2026

Uh oh!

CuzImClicks commented Jan 7, 2026

Uh oh!

sprinox commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sprinox commented Jan 8, 2026

Uh oh!

CuzImClicks commented Jan 8, 2026

Uh oh!

CuzImClicks commented Jan 9, 2026

Uh oh!

CuzImClicks commented Jan 9, 2026

Uh oh!

sprinox commented Jan 9, 2026

Uh oh!

sprinox Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

CuzImClicks Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

sprinox Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CuzImClicks Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

sprinox Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sprinox commented Jan 8, 2026 •

edited

Loading

sprinox Jan 10, 2026 •

edited

Loading

sprinox Jan 10, 2026 •

edited

Loading