Skip to content

Commit 7e72ec2

Browse files
fix: race condition in shared runtime services (#37825)
There is a singleton SplitMongoModuleStore instance that is returned whenever we call the ubiquitous modulestore() function (wrapped in a MixedModuleStore). During initialization, SplitMongoModuleStore sets up a small handful of XBlock runtime services that are intended to be shared globally: i18n, fs, cache. When we get an individual block back from the store using get_item(), SplitMongoModuleStore creates a SplitModuleStoreRuntime using SplitMongoModuleStore.create_runtime(). These runtimes are intended to be modified on a per-item, and later per-user basis (using prepare_runtime_for_user()). Prior to this commit, the create_runtime() method was assigning the globally shared SplitMongoModuleStore.services dict directly to the newly instantiated SplitModuleStoreRuntime. This meant that even though each block had its own _services dict, they were all in fact pointing to the same underlying object. This exposed us to a risk of multiple threads contaminating each other's SplitModuleStoreRuntime services when deployed under load in multithreaded mode. We believe this led to a race condition that caused student submissions to be mis-scored in some cases. This commit makes a copy of the SplitMongoModuleStore.services dict for each SplitModuleStoreRuntime. The baseline global services are still shared, but other per-item and per-user services are now better isolated from each other. This commit also includes a small modification to the PartitionService, which up until this point had relied on the (incorrect) shared instance behavior. The details are provided in the comments in the PartitionService __init__(). It's worth noting that the historical rationale for having a singleton ModuleStore instance is that the ModuleStore used to be extremely expensive to initialize. This was because at one point, the init process required reading entire XML-based courses into memory, or pre-computing complex field inheritance caches. This is no longer the case, and SplitMongoModuleStore initialization is in the 1-2 ms range, with most of that being for PyMongo's connection setup. We should try to fully remove the global singleton in the Verawood release cycle in order to make this kind of bug less likely.
1 parent 474dc71 commit 7e72ec2

File tree

2 files changed

+46
-3
lines changed

2 files changed

+46
-3
lines changed

xmodule/modulestore/split_mongo/split.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3283,7 +3283,11 @@ def create_runtime(self, course_entry, lazy):
32833283
"""
32843284
Create the proper runtime for this course
32853285
"""
3286-
services = self.services
3286+
# A single SplitMongoModuleStore may create many SplitModuleStoreRuntimes,
3287+
# each of which will later modify its internal dict of services on a per-item and often per-user basis.
3288+
# Therefore, it's critical that we make a new copy of our baseline services dict here,
3289+
# so that each runtime is free to add and replace its services without impacting other runtimes.
3290+
services = self.services.copy()
32873291
# Only the CourseBlock can have user partitions. Therefore, creating the PartitionService with the library key
32883292
# instead of the course key does not work. The XBlock validation in Studio fails with the following message:
32893293
# "This component's access settings refer to deleted or invalid group configurations.".

xmodule/partitions/partitions_service.py

Lines changed: 41 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -99,8 +99,47 @@ class PartitionService:
9999
with a given course.
100100
"""
101101

102-
def __init__(self, course_id, cache=None, course=None):
103-
self._course_id = course_id
102+
def __init__(self, course_id: CourseKey, cache=None, course=None):
103+
"""Create a new ParititonService. This is user-specific."""
104+
105+
# There is a surprising amount of complexity in how to save the
106+
# course_id we were passed in this constructor.
107+
if course_id.org and course_id.course and course_id.run:
108+
# This is the normal case, where we're instantiated with a CourseKey
109+
# that has org, course, and run information. It will also often have
110+
# a version_guid attached in this case, and we will want to strip
111+
# that off in most cases.
112+
#
113+
# The reason for this is that the PartitionService is going to get
114+
# recreated for every runtime (i.e. every block that's created for a
115+
# user). Say you do the following:
116+
#
117+
# 1. You query the modulestore's get_item() for block A.
118+
# 2. You update_item() for a different block B
119+
# 3. You publish block B.
120+
#
121+
# When get_item() was called, a SplitModuleStoreRuntime was created
122+
# for block A and it was given a CourseKey that had the version_guid
123+
# encoded in it. If we persist that CourseKey with the version guid
124+
# intact, then it will be incorrect after B is published, and any
125+
# future access checks on A will break because it will try to query
126+
# for a version of the course that is no longer published.
127+
#
128+
# Note that we still need to keep the branch information, or else
129+
# this wouldn't work right in preview mode.
130+
self._course_id = course_id.replace(version_guid=None)
131+
else:
132+
# If we're here, it means that the CourseKey we were sent doesn't
133+
# have an org, course, and run. A much less common (but still legal)
134+
# way to query by CourseKey involves a version_guid-only query, i.e.
135+
# everything is None but the version_guid. In this scenario, it
136+
# doesn't make sense to remove the one identifying piece of
137+
# information we have, so we just assign the CourseKey without
138+
# modification. We *could* potentially query the modulestore
139+
# here and get the more normal form of the CourseKey, but that would
140+
# be much more expensive and require database access.
141+
self._course_id = course_id
142+
104143
self._cache = cache
105144
self.course = course
106145

0 commit comments

Comments
 (0)