Skip to content

Conversation

@ArjunNarendra
Copy link
Contributor

  • Slim down test suite to its essentials
  • Move common source code logic into core.py
  • Add support for SqlAlchemy

dependencies = [
"azure-identity>=1.13.0",
"azure-core>=1.24.0",
"aiohttp>=3.8.0",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required for all implementations? Otherwise it should go in the optional dependency that requires it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this! Both psycopg3 and sqlalchemy require it because they support asynchronous token retrieval via get_entra_conninfo_async, so aiohttp should go into their respective optional dependencies.

connection_factory=entra_connection_factory
host=SERVER,
database=DATABASE,
connection_factory=SyncEntraConnection
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to remove the Sync naming everywhere for psycopg2 since we aren't implementing async.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense - done

try:
# We pass in the AsyncEntraConnection class to enable Entra authentication for the
# PostgreSQL database by acquiring an Azure access token, extracting a username from the token, and using
# the token itself (with the PostgreSQL scope) as the password.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment isn't clear and seems wrong.

"We pass in AsyncEntraConnection to enable Entra...by aquiring an Azure access token .. and using the token itself as the password". This is confusing. The latter part talks about the implementation details of how AsyncEntraConnection work. This might confuse users who are just looking to use the functionality - reading this quickly I might think I need to get a token and pass it in as the password. Suggest rewording to focus on the required parameter, and a brief high level description of how the pool uses the parameter, potentially with a link to psycopg documentation about that capability.

e.g. (needs better wording):

 # We pass in the AsyncEntraConnection class to enable Entra authentication for accessing the
 # database. This is the connection class that will be used whenever the pool creates a new connection,
 # and this connection type will manage the Entra authentication token to ensure a valid token is used
 # when connecting.

Copy link
Contributor Author

@ArjunNarendra ArjunNarendra Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworded this comment to focus more on what the code does rather than how it does it. I made this change in all relevant places across the three drivers. I also added links to the documentation for the user to learn more about these concepts.

import logging
import json
import base64
import jwt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I implemented JWT parsing directly is:

  • A) adding dependencies as a library is always a disadvantage. The more libraries your particular library brings in, the more potential for conflicting versions with the user's environment, duplication with other libraries the user might be using, more contribution to code bloat, and more dependency management. For this reason, when implementing a library, the ideal is to use as few dependencies as possible. This of course needs to be balanced with hand-rolling code, which is often not ideal. However:
  • B) The JWT token content and the way to parse it out is very stable. This isn't a moving target - it's a bit complex of logic, but it's a standardized algorithm that won't change.

Given those two above I decided that hand-rolling the implementation (or I should say, having GPT roll it, which I have high confidence in its capability of doing), saves us a dependency and doesn't have high risk of producing less effective code than the library or a lot of code that needs maintenance. I still believe that to be the correct decision for this library.

Copy link
Contributor Author

@ArjunNarendra ArjunNarendra Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand - thanks for the thorough explanation. Hand-rolling the library code makes sense in this case, although it was not intuitive to me at first. I have went back to using the hand-rolled code over using the jwt library.

"azure-identity>=1.13.0",
"azure-core>=1.24.0",
"aiohttp>=3.8.0",
"PyJWT>=2.0.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment in src/azurepg_entra/core.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I got rid of both of these.


For example:

from azure_pg_entra import AsyncEntraConnection
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix import; make sure all examples work. Most robust way to do this is to copy out the examples and run them as code in e.g. a notebook to ensure correctness before PR is merged.

Also - similar comment above to file naming.

Also - I think SyncEntraConnection isn't right, it doesn't follow the naming convention of the connection pooling itself. It's ConnectionPool and AsyncConnectionPool not SyncConnectionPool and AsyncConnectionPool. Let's follow the naming.

I would split the two classes up into two files, entra_connection.py and async_entra_connection.py.

Copy link
Contributor Author

@ArjunNarendra ArjunNarendra Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually removed the examples because I thought the examples in the samples folder were sufficient (although I verified the examples in the samples folder). I have also modified the class and file naming according to your suggestions.

"""
SQLAlchemy integration for Azure PostgreSQL with Entra ID authentication.

This module provides seamless integration between SQLAlchemy and Azure Entra ID
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-seamless

overused marketing word

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got rid of it.

logger.debug("User and password already present, skipping Entra authentication")
except Exception as e:
logger.error(f"Failed to get Entra credentials: {e}")
raise
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs specific error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got rid of it because we do error handling inside the method

if not has_password and "password" in entra_creds:
cparams["password"] = entra_creds["password"]

logger.debug(f"Provided Entra credentials for user: {entra_creds.get('user', 'unknown')}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add debug logging here, we should add it to the other implementations as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed debug logging here. I was only using it in one other place in core.py.

# For async engines, we need to handle the async credential fetching
try:
# Try to get the current event loop
asyncio.get_running_loop()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels out of band.

Is there no way to add an async event handler? I see engine.sync_engine, is there an engine.async_engine that would allow us to do a async def provide_token_async? We don't want to be managing loops and thread pools in the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. As we discussed, I will simplify this and use get_entra_conninfo even in the async case.

- name: Run tests
working-directory: ./python
run: |
pytest tests/ -v --tb=short No newline at end of file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consolidate this into a script that can be run locally to give devs the confidence that CI will pass.

Consider "scripts to rule them all" - see pgts or other Matt/Rob repositories for examples of modified implementation.

Need to consider how to handle these scripts and GitHub Actions based on the different languages in this project. Can we run workflows just on changes to the language folders? Should there be scripts/ under each language? Design challenge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added GitHub actions for python and dotnet that are triggered when changes are made to their respective folders. I have also added corresponding scripts for developers to run before they push up a PR.

python/README.md Outdated
print(f"Pool connection as: {user} at {time}")
print(f"Connected as: {user} at {time}")
finally:
pool.close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should move back to with instead of try/catch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

python/README.md Outdated
print(f"Pool connection as: {user} at {time}")
print(f"Async connected as: {user} at {time}")
finally:
await pool.close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above - use with

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

## SQLAlchemy Integration

SQLAlchemy integration uses psycopg3 as the backend driver with automatic Entra ID authentication.
SQLAlchemy integration uses psycopg3 as the backend driver with automatic Entra ID authentication through event listeners.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth linking to the sqlalchemy docs for the event hooks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

print(f"Async SQLAlchemy connected as: {user} at {time}")

# Async ORM usage
from sqlalchemy.ext.asyncio import async_sessionmaker
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to have this type of usage as an example; move to samples and have brief samples in README to give reader context, and link out to samples for more broad treatment.

Copy link
Contributor Author

@ArjunNarendra ArjunNarendra Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the example with sessionmaker to samples and focused more on how to use/enable the Entra libraries in README and saved the more rehearsed aspects like querying the database to the samples.

UsernameExtractionError,
ScopePermissionError,
) as e:
print(repr(e))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

TokenDecodeError,
UsernameExtractionError,
ScopePermissionError,
) as e:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

catch bare Exception

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


This function registers an event listener that automatically provides
Entra ID credentials for each database connection if they are not already set.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify here and elsewhere that although this is async, event handlers don't provide async so the fetch token call will still be sync

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 55 to 56
) as e:
print(repr(e))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

# Example usage and test runner


class TestEnableEntraAuthentication:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, think through a new testing strategy that can reduce mocking code that should be targeted by tests.

Ideas:

  • docker Postgres
  • mock at the last possible moment - get_entra_token could be patched to return test db password to mimick token generation

@@ -0,0 +1,117 @@
param(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scripts should be in scripts/

@lossyrob lossyrob merged commit ff71687 into main Oct 16, 2025
2 checks passed
@lossyrob lossyrob deleted the user/an/python-revisions branch October 16, 2025 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants