Case API v.2: Upsert on `external_id` #37118

kaapstorm · 2025-11-07T18:37:20Z

Technical Summary

Some context: SUPPORT-25970

(SaaS/CommCare ticket pending)

Adds upsert functionality on external_id for Case API v.2 Bulk Create/Update Cases.

See the update to documentation for details and warnings.

Curious to hear your thoughts on this @esoergel

Safety Assurance

Safety story

This change is limited to the Case API v.2
Change in behavior is only encountered if users do not follow existing documentation.

Automated test coverage

Yes

QA Plan

QA planned. Ticket pending.

Rollback instructions

This PR can be reverted after deploy with no further considerations

Labels & Review

Risk label is set correctly
The set of people pinged as reviewers is appropriate for the level of risk of the change

esoergel · 2025-11-10T12:36:34Z

corehq/apps/hqcase/api/updates.py

+                        domain,
+                        data['external_id'],
+                        raise_multiple=True,
+                    )  # Raises CommCareCase.MultipleObjectsReturned


This should return a 4xx error, as it's a client issue, not server

Or is that handled somewhere higher up?

I see it is handled higher up, as you've demonstrated in tests. Carry on!

esoergel · 2025-11-10T12:49:57Z

docs/api/cases-v2.rst

+If a case update includes a value for "external_id" and does not include
+a value for "case_id", then the "create" field may be omitted, and the


I went through a range of reactions in response to this PR - first was disapproval ("it removes useful protection against creating duplicates!"), then I thought about the external_id constraint and about where this work was coming from and I realized it does make a lot of sense - maybe this integration doesn't have perfect knowledge of the state of our system.

Then I read your caveats in this doc and see that you too have some concerns. An alternative idea would be to require an explicit opt-in to this behavior. The check described here isn't super intuitive. Ideally we'd make create a ternary operator, but that's hard in JSON - know if we can verify that it's present and set to null? Alternatively, it could be a second flag - create_or_update: true (or upsert: true, whichever term is more broadly understood). Then the check could still require that the client specify one of the two flags. That'd let us be sure of the behavior the client expects, without impacting existing integrations.

An alternative idea would be to require an explicit opt-in to this behavior. ... Ideally we'd make create a ternary operator, but that's hard in JSON.

"create": null

"create_or_update": true

"upsert": true

I like the idea of making the behavior opt-in.

Adding another flag feels clumsy, and using "create" as a ternary value feels unintuitive. That's why I settled on neither. But you make a very good point, that we need some way to ensure that this behavior is what the client wants. If I had to choose, I would go with "create": null. It avoids having to draw up logic tables like

create create_or_update Action

True (absent) Create

True True Upsert

True False Create

False (absent) Update

False True Update?

False False Update?

Oh yeah I was imagining that we'd require you to provide either create or create_or_update, but disallow both. I agree that it's clumsy - could use an enumerable but it's too late since this is already in-use - that'd have to be a v3 change. I also like "create": null - from a quick look at the code it doesn't seem too hard to implement either, basically just change

is_creation = data.pop('create', None)

to

is_creation = data.pop('create', ...)

and weave that in to what you've already done here.

Other thought - this applies only to bulk updates, right? Does the same issue also apply to individual updates? Think we need to carve out a similar exemption there?

I was wondering the same thing. I settled on implementing this for bulk updates only, because YAGNI. But moving this change into _get_individual_update() might offer more consistent functionality.

What do you think of 77c8fec ?

Adding my 2 cents, my vote is for "create": null . It seems like the cleanest and most robust design. It's explicit, which is better than the original "omit the flag" approach. But more importantly, it avoids the state-management complexity of adding a new "upsert": true flag . The API shouldn't require people to cross-validate two separate flags. Centralizing all three write-modes (true, false, null) onto the single "create" key is much cleaner.

I settled on implementing this for bulk updates only, because YAGNI
That is compelling, but I don't think I have strong opinions either way here (yet!)

esoergel · 2025-11-11T12:31:18Z

corehq/apps/hqcase/api/updates.py

+            if 'create' not in data:
                raise UserError("A 'create' flag is required for each update.")
-            updates.append(_get_individual_update(domain, data, user, is_creation))
+            updates.append(_get_individual_update(domain, data, user, is_creation=True))


I think I prefer either

Suggested change

updates.append(_get_individual_update(domain, data, user, is_creation=True))

updates.append(_get_individual_update(domain, data, user, is_creation=data.pop('create')))

or making the other caller of _get_individual_update insert create into data, so there's only one way to pass it in.

esoergel · 2025-11-11T12:48:28Z

corehq/apps/hqcase/api/updates.py

+        # in bulk and individual updates to override `is_creation` for
+        # upsert on external_id when `data['create']` is `None` and
+        # `data["external_id"]` is set and data["case_id"] is not set.
+        if data['create'] is None:


Another possible way of doing this is making a JsonCaseUpsert (a sibling of JsonCaseCreation and JsonCaseUpdate). The new logic here is working at a different abstraction level than the other parts of the function.

The main thing that gives me pause is that this adds logic to look up cases by external ID, but the JsonCase* classes already support that, and they do the lookup in bulk. See test_update_by_external_id for reference. The main distinction between the two classes is how they handle case IDs and external IDs (and which fields are required).

I'm trying to think of a really clean way to do this and am coming up short - the set of required fields for an upsert varies depending on whether it's an update or an insert, so maybe dispatching to one of the two is the best course of action. Or maybe JsonCaseUpsert.get_caseblock does something like:

def get_caseblock(self, case_db): case_id = case_db.get_by_external_id(self.external_id) update_class = JsonCaseCreation if not case_id else JsonCaseUpdate update = update_class.wrap(self) return update.get_caseblock(self, case_db)

Again, I'm not certain that would come out ideal, but I'll pass it back to you to get your take.

corehq/apps/hqcase/views.py

kaapstorm · 2025-11-20T19:53:43Z

When you have a chance @esoergel could you check that the last commits, specifically --

-- implement the changes we discussed. (I'm also happy to squash this branch into a handful of commits and open a new PR if you think that will be easier to review.)

esoergel · 2025-11-21T11:02:10Z

corehq/apps/hqcase/tests/test_case_api_bulk_get.py

+    def test_get_bulk_duplicate_external_id(self):
+        result = get_bulk(self.domain, self.web_user, external_ids=['test2'])
+
+        # When multiple cases have the same external_id, only one is


Huh, I guess if you try to get the same case twice, once by case ID and once by external ID, that'd probably show up twice, yeah?

esoergel · 2025-11-21T11:08:00Z

corehq/apps/api/urls.py

        messaging_events, name="api_messaging_event_detail"),
    url(r'v0\.6/case/bulk-fetch/$', case_api_bulk_fetch),
    url(r'case/v2/bulk-fetch/$', case_api_bulk_fetch, name='case_api_bulk_fetch'),
    # match v0.6/case/ AND v0.6/case/e0ad6c2e-514c-4c2b-85a7-da35bbeb1ff1/ trailing slash optional


In retrospect, I wonder if I could've just added an additional URL pattern instead of a complicated regex

esoergel · 2025-11-21T11:10:34Z

corehq/apps/api/urls.py

+    url(r'v0\.6/case/ext/(?P<external_id>[\w\-]+)/$', case_api),
+    url(r'case/v2/ext/(?P<external_id>[\w\-]+)/$', case_api),


Does this need to go above the other two lines? Guessing not, because it has an extra slash? Does the [\w\-]+ character set match what we allow for external ID elsewhere? Do we allow ., for instance?

🤔 We could extend the pattern ~~and then URL-decode the value~~ to allow any Unicode characters in an external ID. (Edit: Isn't it already URL decoded?)

esoergel · 2025-11-21T11:12:08Z

corehq/apps/hqcase/tests/test_case_api_bulk_get.py

+        case_block = CaseBlock(
+            case_id=case_id,
+            case_type='player',
+            case_name='Irina Levitina',


esoergel · 2025-11-21T11:12:59Z

corehq/apps/hqcase/tests/test_case_api_bulk_get.py

        assert res.status_code == 200
        assert res.json()['case_id'] == self.case_ids[0]

+    def test_get_single_case_by_external_id(self):


Nit: this file is called test_case_api_bulk_get, but this functionality is not part of that

esoergel · 2025-11-21T16:49:05Z

corehq/apps/hqcase/views.py

+        elif data['case_id'] != case['case_id']:
+            return JsonResponse(
+                {'error': 'The given value of "case_id" does not match the '
+                          'existing value for the case identified by '
+                          f'external_id = "{external_id}". "case_id" is '
+                          'read-only and cannot be modified.'},
+                status=400,
+            )


Good idea to call this out specifically

esoergel · 2025-11-21T17:06:29Z

corehq/apps/hqcase/tests/test_api_updates.py

+
+
+@sharded
+class TestIndividualUpdate(TestCase):


For what this test covers, I don't think it needs to be a DB test - it doesn't actually submit any forms or create cases, so the tearDown is superfluous, and it looks like _get_individual_update doesn't actually even use the domain arg (must be leftover from some old code shuffling). All it needs user for is for the user ID, so that could easily be mocked, or switch the arg to user_id

esoergel · 2025-11-21T17:08:54Z

corehq/apps/hqcase/tests/test_api_updates.py

+        FormProcessorTestUtils.delete_all_cases(self.domain)
+        FormProcessorTestUtils.delete_all_xforms(self.domain)


This likewise doesn't actually submit any forms

esoergel · 2025-11-21T17:13:31Z

corehq/apps/hqcase/tests/test_views.py

+@privilege_enabled(privileges.API_ACCESS)
+@flag_enabled("API_THROTTLE_WHITELIST")
+@patch("corehq.apps.hqcase.api.updates.validate_update_permission", MagicMock())
+class TestCaseUpdateView(TestCase):


Nit: test_case_update_api.py does view level testing with the django test client (against the case update API) - might be able to share setup by moving some of this there.

esoergel · 2025-11-21T17:39:05Z

corehq/apps/hqcase/api/updates.py

+            if 'create' not in data:
                raise UserError("A 'create' flag is required for each update.")
+            is_creation = data.pop('create')
+            if is_creation is None:


I still feel like this earlier comment applies:
#37118 (comment)

This does a DB hit in a loop, and this function otherwise doesn't hit the DB. If you look at the current implementation of handle_case_update you can see the data pipeline I'd imagined for updates:

Wrap the raw (mostly) JSON provided by the user, which does some basic validation

Initialize a state object case_db that will perform all necessary lookups as optimally as possible (interacting with the wrapped updates to know what to fetch)

Perform a location validation step that requires this state

Get the caseblocks from the updates, using the case_db to do whatever ID lookups are necessary. This may also error if a case is missing.

The location validation step was a later addition - if I were to redo it, the DB lookups would happen on the stateful DB class, and the actual checks would happen on the update class, either in a new validate method or as part of get_caseblock.

I think ideally this would somehow fit into that pattern, where this function only wraps the JSON in an update class that will require a case_db instance to fully evaluate the caseblock. That way it fits the pattern of the other flows through this code, and uses the same bulk ID lookup.

Thanks @esoergel . Could you take a look at a1616f4 and let me know if that's in line with what you have in mind?

(Also, how do you feel about dropping the domain parameter of _get_individual_update(), _get_upsert_update() and _get_bulk_updates()?)

Could you take a look at a1616f4 and let me know if that's in line with what you have in mind?

Yeah I love it - that's great!

dropping the domain parameter

Yeah sounds good to me - YAGNI and all that. Made a mention of it in this comment actually. Could also switch user to user_id while you're at it and save some trouble with test mocks.

esoergel

I have a bunch of open comments from my last review, but I think they're all pretty nitpicky and small. I like the big picture approach overall

esoergel · 2025-12-22T15:44:54Z

corehq/apps/hqcase/api/updates.py

+    if 'case_id' in data:
+        raise UserError("UPSERT does not allow case_id to be specified")
+    if not data.get('external_id'):
+        raise UserError("UPSERT requires external_id to be specified")


I think these two are redundant with validation done in JsonCaseUpsert, right?

esoergel · 2025-12-22T15:50:02Z

corehq/apps/hqcase/api/updates.py

+            if 'create' not in data:
                raise UserError("A 'create' flag is required for each update.")
+            is_creation = data.pop('create')
+            if is_creation is None:


Could you take a look at a1616f4 and let me know if that's in line with what you have in mind?

Yeah I love it - that's great!

dropping the domain parameter

Yeah sounds good to me - YAGNI and all that. Made a mention of it in this comment actually. Could also switch user to user_id while you're at it and save some trouble with test mocks.

esoergel · 2025-12-22T16:04:41Z

corehq/apps/hqcase/api/updates.py

+    """Handles UPSERT operations where create/update is determined at lookup time."""
+    external_id = jsonobject.StringProperty(required=True)
+
+    # Required because we might create


Think this is fine from a user's perspective? We could in theory be a little more flexible and only require it if an update does turn out to be a create, though maybe there's no real need. Like, if they know it's not a create, then maybe they shouldn't be upserting. Is that about what you were thinking?

esoergel · 2025-12-22T16:27:47Z

corehq/apps/hqcase/api/updates.py

+    case_type = jsonobject.StringProperty(required=True)
+    owner_id = jsonobject.StringProperty(required=True)
+
+    _is_case_creation = None  # Determined when get_case_id() is called


This introduces a temporal coupling - attempting to access _is_case_creation before it's set could cause confusing issues. You could avoid this by raising an error, something like this:

Suggested change

_is_case_creation = None # Determined when get_case_id() is called

_raw_is_case_creation = ... # Determined when get_case_id() is called

@property

def _is_case_creation(self):

if self._raw_is_case_creation is ...:

raise Exception("_is_case_creation has not yet been initialized")

return self._raw_is_case_creation

Though I think this currently depends on that ordering - ids_to_find sees None for all upserts, and get_caseblock sees the final value. Another approach might be to use two different things for those - in fact, I see there's already a is_new_case property that just returns this value. Here's an idea:

class JsonCaseCreation: is_new_case = True class JsonCaseUpdate: is_new_case = True class JsonCaseUpsert: def is_new_case(self): ... # Something like the suggestion above ids_to_find = { update.external_id for update in self.updates if not isinstance(update, JsonCaseCreation) and not update.case_id } | {...}

kaapstorm added 3 commits November 7, 2025 17:28

Add upsert on external_id to _get_bulk_updates()

ec8bb03

Tests

34a70e7

Documentation

c7e8be2

kaapstorm added the product/invisible Change has no end-user visible impact label Nov 7, 2025

esoergel reviewed Nov 10, 2025

View reviewed changes

kaapstorm added 2 commits November 10, 2025 20:49

Move upsert logic into _get_individual_update()

77c8fec

Update tests

b5c6837

esoergel reviewed Nov 11, 2025

View reviewed changes

kaapstorm added 6 commits November 19, 2025 13:58

Pythonic asserts

3413aa6

Add tests for get_bulk()

92ed620

Add GET endpoint for external_id

ac5532e

Implement PUT for external_id

0397732

Move UPSERT logic back to _get_bulk_updates()

08a8cff

Add and update tests

f3f38b8

dimagimon added the Risk: High Change affects files that have been flagged as high risk. label Nov 20, 2025

github-advanced-security bot found potential problems Nov 20, 2025

View reviewed changes

corehq/apps/hqcase/views.py Show resolved Hide resolved

corehq/apps/hqcase/views.py Dismissed Show dismissed Hide dismissed

kaapstorm marked this pull request as ready for review November 20, 2025 02:29

esoergel reviewed Nov 21, 2025

View reviewed changes

kaapstorm added 4 commits December 16, 2025 11:55

isort

0ba43b9

JsonCaseUpsert

a1616f4

Add tests

7ed69d5

Pythonic asserts

e66b190

esoergel reviewed Dec 22, 2025

View reviewed changes

		If a case update includes a value for "external_id" and does not include
		a value for "case_id", then the "create" field may be omitted, and the

`create`	`create_or_update`	Action
True	(absent)	Create
True	True	Upsert
True	False	Create
False	(absent)	Update
False	True	Update?
False	False	Update?

	updates.append(_get_individual_update(domain, data, user, is_creation=True))
	updates.append(_get_individual_update(domain, data, user, is_creation=data.pop('create')))

		url(r'v0\.6/case/ext/(?P<external_id>[\w\-]+)/$', case_api),
		url(r'case/v2/ext/(?P<external_id>[\w\-]+)/$', case_api),

		FormProcessorTestUtils.delete_all_cases(self.domain)
		FormProcessorTestUtils.delete_all_xforms(self.domain)

-    _is_case_creation = None  # Determined when get_case_id() is called
+    _raw_is_case_creation = ...  # Determined when get_case_id() is called
+    @property
+    def _is_case_creation(self):
+        if self._raw_is_case_creation is ...:
+            raise Exception("_is_case_creation has not yet been initialized")
+        return self._raw_is_case_creation

Uh oh!

Case API v.2: Upsert on external_id #37118

Are you sure you want to change the base?

Case API v.2: Upsert on external_id #37118

Uh oh!

Conversation

kaapstorm commented Nov 7, 2025 • edited by atlassian bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Technical Summary

Safety Assurance

Safety story

Automated test coverage

QA Plan

Rollback instructions

Labels & Review

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kaapstorm commented Nov 20, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kaapstorm Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

esoergel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Case API v.2: Upsert on `external_id` #37118

Case API v.2: Upsert on `external_id` #37118

kaapstorm commented Nov 7, 2025 •

edited by atlassian bot

Loading

kaapstorm Dec 31, 2025 •

edited

Loading