-
Notifications
You must be signed in to change notification settings - Fork 1
Update Python support #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
python/pyproject.toml
Outdated
| dependencies = [ | ||
| "azure-identity>=1.13.0", | ||
| "azure-core>=1.24.0", | ||
| "aiohttp>=3.8.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required for all implementations? Otherwise it should go in the optional dependency that requires it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this! Both psycopg3 and sqlalchemy require it because they support asynchronous token retrieval via get_entra_conninfo_async, so aiohttp should go into their respective optional dependencies.
| connection_factory=entra_connection_factory | ||
| host=SERVER, | ||
| database=DATABASE, | ||
| connection_factory=SyncEntraConnection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to remove the Sync naming everywhere for psycopg2 since we aren't implementing async.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense - done
| try: | ||
| # We pass in the AsyncEntraConnection class to enable Entra authentication for the | ||
| # PostgreSQL database by acquiring an Azure access token, extracting a username from the token, and using | ||
| # the token itself (with the PostgreSQL scope) as the password. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment isn't clear and seems wrong.
"We pass in AsyncEntraConnection to enable Entra...by aquiring an Azure access token .. and using the token itself as the password". This is confusing. The latter part talks about the implementation details of how AsyncEntraConnection work. This might confuse users who are just looking to use the functionality - reading this quickly I might think I need to get a token and pass it in as the password. Suggest rewording to focus on the required parameter, and a brief high level description of how the pool uses the parameter, potentially with a link to psycopg documentation about that capability.
e.g. (needs better wording):
# We pass in the AsyncEntraConnection class to enable Entra authentication for accessing the
# database. This is the connection class that will be used whenever the pool creates a new connection,
# and this connection type will manage the Entra authentication token to ensure a valid token is used
# when connecting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworded this comment to focus more on what the code does rather than how it does it. I made this change in all relevant places across the three drivers. I also added links to the documentation for the user to learn more about these concepts.
python/src/azurepg_entra/core.py
Outdated
| import logging | ||
| import json | ||
| import base64 | ||
| import jwt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I implemented JWT parsing directly is:
- A) adding dependencies as a library is always a disadvantage. The more libraries your particular library brings in, the more potential for conflicting versions with the user's environment, duplication with other libraries the user might be using, more contribution to code bloat, and more dependency management. For this reason, when implementing a library, the ideal is to use as few dependencies as possible. This of course needs to be balanced with hand-rolling code, which is often not ideal. However:
- B) The JWT token content and the way to parse it out is very stable. This isn't a moving target - it's a bit complex of logic, but it's a standardized algorithm that won't change.
Given those two above I decided that hand-rolling the implementation (or I should say, having GPT roll it, which I have high confidence in its capability of doing), saves us a dependency and doesn't have high risk of producing less effective code than the library or a lot of code that needs maintenance. I still believe that to be the correct decision for this library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand - thanks for the thorough explanation. Hand-rolling the library code makes sense in this case, although it was not intuitive to me at first. I have went back to using the hand-rolled code over using the jwt library.
python/pyproject.toml
Outdated
| "azure-identity>=1.13.0", | ||
| "azure-core>=1.24.0", | ||
| "aiohttp>=3.8.0", | ||
| "PyJWT>=2.0.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment in src/azurepg_entra/core.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I got rid of both of these.
|
|
||
| For example: | ||
|
|
||
| from azure_pg_entra import AsyncEntraConnection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix import; make sure all examples work. Most robust way to do this is to copy out the examples and run them as code in e.g. a notebook to ensure correctness before PR is merged.
Also - similar comment above to file naming.
Also - I think SyncEntraConnection isn't right, it doesn't follow the naming convention of the connection pooling itself. It's ConnectionPool and AsyncConnectionPool not SyncConnectionPool and AsyncConnectionPool. Let's follow the naming.
I would split the two classes up into two files, entra_connection.py and async_entra_connection.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually removed the examples because I thought the examples in the samples folder were sufficient (although I verified the examples in the samples folder). I have also modified the class and file naming according to your suggestions.
| """ | ||
| SQLAlchemy integration for Azure PostgreSQL with Entra ID authentication. | ||
|
|
||
| This module provides seamless integration between SQLAlchemy and Azure Entra ID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-seamless
overused marketing word
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got rid of it.
| logger.debug("User and password already present, skipping Entra authentication") | ||
| except Exception as e: | ||
| logger.error(f"Failed to get Entra credentials: {e}") | ||
| raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs specific error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got rid of it because we do error handling inside the method
| if not has_password and "password" in entra_creds: | ||
| cparams["password"] = entra_creds["password"] | ||
|
|
||
| logger.debug(f"Provided Entra credentials for user: {entra_creds.get('user', 'unknown')}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add debug logging here, we should add it to the other implementations as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed debug logging here. I was only using it in one other place in core.py.
| # For async engines, we need to handle the async credential fetching | ||
| try: | ||
| # Try to get the current event loop | ||
| asyncio.get_running_loop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels out of band.
Is there no way to add an async event handler? I see engine.sync_engine, is there an engine.async_engine that would allow us to do a async def provide_token_async? We don't want to be managing loops and thread pools in the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. As we discussed, I will simplify this and use get_entra_conninfo even in the async case.
.github/workflows/pr.yml
Outdated
| - name: Run tests | ||
| working-directory: ./python | ||
| run: | | ||
| pytest tests/ -v --tb=short No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consolidate this into a script that can be run locally to give devs the confidence that CI will pass.
Consider "scripts to rule them all" - see pgts or other Matt/Rob repositories for examples of modified implementation.
Need to consider how to handle these scripts and GitHub Actions based on the different languages in this project. Can we run workflows just on changes to the language folders? Should there be scripts/ under each language? Design challenge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added GitHub actions for python and dotnet that are triggered when changes are made to their respective folders. I have also added corresponding scripts for developers to run before they push up a PR.
python/README.md
Outdated
| print(f"Pool connection as: {user} at {time}") | ||
| print(f"Connected as: {user} at {time}") | ||
| finally: | ||
| pool.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should move back to with instead of try/catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
python/README.md
Outdated
| print(f"Pool connection as: {user} at {time}") | ||
| print(f"Async connected as: {user} at {time}") | ||
| finally: | ||
| await pool.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above - use with
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| ## SQLAlchemy Integration | ||
|
|
||
| SQLAlchemy integration uses psycopg3 as the backend driver with automatic Entra ID authentication. | ||
| SQLAlchemy integration uses psycopg3 as the backend driver with automatic Entra ID authentication through event listeners. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth linking to the sqlalchemy docs for the event hooks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| print(f"Async SQLAlchemy connected as: {user} at {time}") | ||
|
|
||
| # Async ORM usage | ||
| from sqlalchemy.ext.asyncio import async_sessionmaker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to have this type of usage as an example; move to samples and have brief samples in README to give reader context, and link out to samples for more broad treatment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the example with sessionmaker to samples and focused more on how to use/enable the Entra libraries in README and saved the more rehearsed aspects like querying the database to the samples.
| UsernameExtractionError, | ||
| ScopePermissionError, | ||
| ) as e: | ||
| print(repr(e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| TokenDecodeError, | ||
| UsernameExtractionError, | ||
| ScopePermissionError, | ||
| ) as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
catch bare Exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
|
||
| This function registers an event listener that automatically provides | ||
| Entra ID credentials for each database connection if they are not already set. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify here and elsewhere that although this is async, event handlers don't provide async so the fetch token call will still be sync
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| ) as e: | ||
| print(repr(e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
| # Example usage and test runner | ||
|
|
||
|
|
||
| class TestEnableEntraAuthentication: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, think through a new testing strategy that can reduce mocking code that should be targeted by tests.
Ideas:
- docker Postgres
- mock at the last possible moment - get_entra_token could be patched to return test db password to mimick token generation
…s and exception handling
| @@ -0,0 +1,117 @@ | |||
| param( | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scripts should be in scripts/
core.py