BCDA-8777: Update alert destinations for workflows and lambdas by laurenkrugen-navapbc · Pull Request #1175 · CMSgov/bcda-app

laurenkrugen-navapbc · 2025-08-04T20:22:37Z

🎫 Ticket

https://jira.cms.gov/browse/BCDA-8777

🛠 Changes

Admin Lambdas:

Updated slack alerts to be triggered in the handle function
All status updates will go to operations channel

Workflows:

Deploy related workflows have had destination channels updated to deploy channel

General:

Added slack utils package and constants for slack messages to reduce duplicate code

ℹ️ Context

Alerting channels are currently getting status and informational messages that don't pertain to alerts. To better manage alerts and critical infrastructure related monitoring, our alerting channel should be free of messages that do not indicate an issue. Deploy related updates will only be in the deploy channel and admin lambdas will report status updates to the operations channel, including run failures, since those are expected.

🧪 Validation

Integration tests are passing, workflows in expected channels. Looking to verify:

lambdas that run on a schedule (non-admin) route failures to the alerts channel but route success to operations
admin lambdas route all statuses to operations
deploy workflows and related workflows route only to deploy channel
Starting.. messages removed to reduce noise

.github/workflows/build-and-package-all.yml

laurenkrugen-navapbc · 2025-08-05T19:33:02Z

bcda/lambda/admin_aco_deny/main.go

 	SlackToken string
 }

-type Notifier interface {


Seems like this was only in use so it could be mocked during tests. I moved the slack postMessage function to the handler call to simplify things so this was no longer needed.

laurenkrugen-navapbc · 2025-08-05T19:35:11Z

bcda/lambda/admin_aco_deny/main.go

 }

-func handleACODenies(ctx context.Context, conn PgxConnection, data payload, notifier Notifier) error {
-	_, _, err := notifier.PostMessageContext(ctx, slackChannel, slack.MsgOptionText(


the notifier.PostMessageContext + the error check has been removed for each lambda and replaced with SendSlackMessage. Mostly done for readability and removing duplicate lines.

carlpartridge · 2025-08-05T20:22:52Z

bcda/slack/slack.go

+	color := "danger"
+	if status {
+		color = "good"
+	}


Nit: Dont think we need to have Slack in the name of the function (SendSlackMessage -> SendMessage).

Nit: Not a big fan of status (bool) param type, it seems to exist just to differentiate between the color? I know its more verbose but Id prefer a SendSuccess(...) function and a SendFailure(...) function but interested to hear what other people think. I think at the very least we could change the param name to something more appropriate.

Overall this feels like a fairly thin wrapper around the actual slack client and its interface. To some degree I feel like its a bit excessive but it does also slim various use cases down a fair amount. I think this concern is shown by what happens if we need to add a bunch of customization/configuration to the messages we want to send. This wrapper then gets built up until we more or less just have the slack client all over again. However as our use case is now, this does save us quite a bit of verbose references elsewhere.

Part of the goal of notifier was to decouple the thing we want to happen (notification of success/failure) from the implementation tool (slack). If/when we need to change implementations it should be a bit more straightforward. I dont think the notifier was a good implementation of this, but maybe theres a good middle ground? One possibility would be to change this slack_utils package to a notifier package, with a slack implementation/wrapper.

yeah, I agree that the wrapper is pretty thin and if we needed something more verbose in the future, then we would have to modify it to suit the customizations, etc. I mostly created it because the notifier statements with the error handling were repetitive and added a lot of noise to the functions.

I ultimately decided against making it a little more flexible for future changes because 1) I don't really see this changing much in the future and 2) this wrapper is so light that a modification to it would have been equivalent in lift to modifying every notifier statement that we had or reverting it would been similar.

I did the green/red bool because I removed the informational "starting.." messages but if we want the neutral, warning, etc, we can always updated. I'd be happy to break it out into two functions, or I can modify so the color is optional.

Just adding my 2 cents here, but we wouldn't want a whole lot of customization even in the future. If we do need it at a particular time, we can update it then. Doing it now feels a bit premature optimization and against the principle of Yagni

Yeah, I assumed we weren't going to be needing a lot of changes in the future which is why I went with the bool red/green strategy.

@carlpartridge - what are we thinking in terms of leaving this wrapper in here? Seemed like a good way to enforce consistent slack messages and clean up redundant code, but I am open if we feel strongly that this isn't the best approach.

I think things are fine as is. I think one or some of the following could help though:

rename bool var (status -> success_msg? color?)

change bool var to string and just pass in the color option

default bool value to success instead of danger

create a SendSuccess() and a SendFailure() function that handle the bool

instead of passing a bool in pass in an options object, with color as one of the options (again this is rewriting the slack package so in addition to being more work is probably not worth it)

carlpartridge · 2025-08-05T20:36:30Z

bcda/lambda/admin_create_aco/main.go

-		err = handleCreateACO(ctx, conn, data, id, slackClient)
+		err = handleCreateACO(ctx, conn, data, id)
 		if err != nil {
+			slUtls.SendSlackMessage(slackClient, slUtls.OperationsChannel, fmt.Sprintf("%s: Create ACO lambda in %s env.", slUtls.FailureMsg, os.Getenv("ENV")), false)


One concern about moving notification messages out of the business logic (eg handleCreateACO) is that we can no longer test/verify that we are sending notifications as part of the process. Im not sure if that matters too much overall.

My general thinking around lambdas is that the handler should do as little as possible since its very difficult to properly test. Basically it should be setting up the requirements/environment in which to run the actual business logic. This allows us to verify as much as possible with unit tests and such.

that's fair - there's definitely a gap in testing with slack if we are to make changes in the future, some even outside the scope of testing here. Ex: while moving some of these alerts to a different channel, the slackbot didn't have access to the channel yet. The error was logged, but that was it.

possible solutions:

revert these changes altogether and put the notifiers back in

do some error handling for the slack messages (which didn't exist before - just logging it like it is now)

manual review every so often of alerts to verify that they aren't missed from our critical services

nothing; if modifications are made to these lambdas, reviewers and PR owner should be validating that integration tests work, and lambda runs aren't logging errors, etc

I believe the reason we only log an error when slack fails to send a message is that we dont want to fail the function on an issue with slack. Overall I see the slack messages in these lambda contexts as a nice to have. Keeps us more aware of whats happening, but losing some isnt the end of the world.

yup, and we will still get logs if it fails - I was able to debug because it failed to send the message to the new slack channel yesterday. Let me know if you'd like me to revert -- I made a call to keep things as they are because the impact would be super low and I don't expect many changes to this in the future. The mock we used in testing also assumed a happy path, so I felt like it was no risk to remove it.

I'm open to any changes!

bcda/lambda/admin_aco_deny/main.go

carlpartridge

I would like some tests for slack_utils, other than that things look good!

laurenkrugen-navapbc · 2025-08-07T20:36:34Z

I would like some tests for slack_utils, other than that things look good!

sounds good, adding tests now!

bcda/slack/slack.go

laurenkrugen-navapbc added 5 commits August 4, 2025 11:37

update alerting for admin lambdas

21d08dc

update deploy workflow slack alerts for failure

8ac9d39

test create aco message

f897253

update lambdas and archive and clean job

8ee535a

update clean and archive alerting channel

fbfbbbd

laurenkrugen-navapbc commented Aug 5, 2025

View reviewed changes

.github/workflows/build-and-package-all.yml Show resolved Hide resolved

laurenkrugen-navapbc commented Aug 5, 2025

View reviewed changes

laurenkrugen-navapbc added 2 commits August 5, 2025 12:37

remove unused constants, clean up conditional

a3cd07c

linting

07cbba7

laurenkrugen-navapbc requested a review from a team August 5, 2025 19:40

laurenkrugen-navapbc marked this pull request as ready for review August 5, 2025 19:40

laurenkrugen-navapbc requested a review from a team as a code owner August 5, 2025 19:40

carlpartridge reviewed Aug 5, 2025

View reviewed changes

bhagatparwinder reviewed Aug 7, 2025

View reviewed changes

bcda/lambda/admin_aco_deny/main.go Outdated Show resolved Hide resolved

laurenkrugen-navapbc and others added 2 commits August 7, 2025 10:35

name updates

733d4df

Merge branch 'main' into lauren/BCDA-8777

eb45e76

carlpartridge reviewed Aug 7, 2025

View reviewed changes

laurenkrugen-navapbc added 2 commits August 7, 2025 15:50

add slack test

b901e23

linting

ecd272d

carlpartridge previously approved these changes Aug 8, 2025

View reviewed changes

michaeljvaldes reviewed Aug 8, 2025

View reviewed changes

bcda/slack/slack.go Outdated Show resolved Hide resolved

rename slack package for consistency

05a6c87

laurenkrugen-navapbc dismissed carlpartridge’s stale review via 05a6c87 August 8, 2025 16:47

michaeljvaldes approved these changes Aug 8, 2025

View reviewed changes

carlpartridge approved these changes Aug 8, 2025

View reviewed changes

Merge branch 'main' into lauren/BCDA-8777

126a46e

laurenkrugen-navapbc merged commit 5b3a227 into main Aug 8, 2025
11 of 12 checks passed

laurenkrugen-navapbc deleted the lauren/BCDA-8777 branch August 8, 2025 17:52

Conversation

laurenkrugen-navapbc commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎫 Ticket

🛠 Changes

ℹ️ Context

🧪 Validation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

carlpartridge left a comment

Choose a reason for hiding this comment

Uh oh!

laurenkrugen-navapbc commented Aug 7, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

laurenkrugen-navapbc commented Aug 4, 2025 •

edited

Loading