-
Notifications
You must be signed in to change notification settings - Fork 37
Description
Context
I maintain a CI/CD deploy script (deploy.py) that orchestrates Lakehouse, Notebook, SemanticModel, DataPipeline, and VariableLibrary deployments. The script is 240 lines of Python, most of which are workarounds for CLI gaps. Here are the feature requests, ordered by impact.
1. fab cp --recursive — Bulk/recursive copy to OneLake
This is the single biggest pain point.
Copying a directory tree (30 dbt files across nested folders) to a Lakehouse requires creating every subdirectory individually with fab mkdir, then copying each file one-by-one with fab cp. I parallelize with ThreadPoolExecutor(max_workers=8) to make it tolerable — 40+ individual CLI invocations for 30 files.
# Current workaround — 40+ individual CLI calls for 30 files
for d in sorted(dirs):
subprocess.run(["fab", "mkdir", f"{LAKEHOUSE}/Files/{d}"])
def copy_file(f):
fab(["cp", f, f"{LAKEHOUSE}/Files/{f.parent}/", "-f"])
with ThreadPoolExecutor(max_workers=8) as executor:
executor.map(copy_file, files)Request:
fab cp --recursive ./dbt/ "myWorkspace/data.Lakehouse/Files/dbt/" -fOne command, auto-creates directories, handles the full tree.
Bonus: fab sync that only uploads changed files (checksum-based).
2. fab create --wait — Wait for item provisioning
After fab create for a Lakehouse with schemas enabled, the item isn't immediately usable. I have to time.sleep(60) and hope that's enough.
fab(["create", LAKEHOUSE, "-P", "enableSchemas=true"])
print("New lakehouse — waiting 60s for provisioning...")
time.sleep(60) # fragileRequest:
fab create "myWorkspace/data.Lakehouse" -P enableSchemas=true --wait
# Blocks until fully provisioned, exits non-zero if provisioning fails3. fab deploy --item-types as a CLI flag
To deploy a subset of item types (critical because deploy order matters — Lakehouse before Notebook before SemanticModel), I must write a temporary YAML config file, run deploy, then delete it.
def fab_deploy(item_types):
tmp = root / "_fab_deploy_tmp.yml"
tmp.write_text(yaml_content) # build YAML string manually
try:
fab(["deploy", "--config", tmp.name, "-f"])
finally:
tmp.unlink(missing_ok=True)Request:
fab deploy --workspace "myWorkspace" --item-types Notebook,Lakehouse -f4. Token resolution in content files
$workspace.id and $items.* tokens are only resolved through parameter.yml find/replace. For semantic models with OneLake URLs in .bim files, I have to mutate the file, deploy, then git checkout to restore:
bim_path.write_text(bim_text.replace(source_ws_id, WS_ID).replace(source_lh_id, target_lh_id))
try:
fab_deploy(["SemanticModel"])
finally:
subprocess.run(["git", "checkout", str(bim_path)]) # restore originalRequest: Allow $workspace.id and $items.Type.Name.$id tokens directly in content files, resolved at deploy time. At minimum, support token embedding within strings (e.g., inside a OneLake URL) — the current parameter.yml requires replace_value to start with $.
5. fab refresh for Semantic Models
Refreshing a semantic model requires manually switching to the PowerBI API:
sm_id = get_item_id(SEMANTIC_MODEL)
fab(["api", "-A", "powerbi", "-X", "post", f"groups/{WS_ID}/datasets/{sm_id}/refreshes"])Request:
fab refresh "myWorkspace/myModel.SemanticModel"Consistent with fab job run for notebooks.
6. First-class Variable Library commands
Creating/updating Variable Libraries requires raw fab api calls with base64-encoded JSON payloads:
vl_definition = {
"format": "VariableLibraryV1",
"parts": [
{"path": "variables.json",
"payload": base64.b64encode(json.dumps(vl_variables).encode()).decode(),
"payloadType": "InlineBase64"},
],
}
fab(["api", "-X", "post",
f"workspaces/{WS_ID}/variableLibraries/{vl_id}/updateDefinition",
"-i", json.dumps({"definition": vl_definition})])Request:
fab set "myWorkspace/deploy_config.VariableLibrary" -q "variables.download_limit" -i "60"7. --json output on all commands
Checking if a pipeline has an active schedule requires parsing stdout for the string "True":
result = subprocess.run(["fab", "job", "run-list", PIPELINE, "--schedule"],
capture_output=True, text=True)
has_active_schedule = "True" in result.stdout # fragileRequest: --json flag on all commands for machine-readable output.
8. Accept workspace ID everywhere
Deploy config requires workspace name, but CI/CD tracks workspace ID (stable across renames). I make an extra API call just to resolve ID → name:
result = subprocess.run(["fab", "api", "-X", "get", f"workspaces/{WS_ID}"], ...)
ws = json.loads(result.stdout)["text"]["displayName"]Request: Accept workspace GUID anywhere a workspace name is accepted.
9. Better error messages
All discovered through trial and error:
| Issue | Current behavior | Should be |
|---|---|---|
item_type_in_scope (singular typo) |
Silently ignored, deploys everything | Error: "did you mean item_types_in_scope?" |
is_regex: true (YAML boolean) |
"not of type string" | Accept both, or suggest quotes |
fab job run Notebook without -i '{}' |
Silently does nothing | Warning: "Notebooks require -i '{}'" |
| Token embedded in URL in parameter.yml | Silently unresolved | Error: "Token must be the entire value" |
10. fab job run-sch --if-not-scheduled
An idempotent flag to create a schedule only if none exists, for CI/CD:
fab job run-sch "myWorkspace/myPipeline.DataPipeline" \
--type cron --interval 30 --enable --if-not-scheduledWhat the ideal deploy would look like
fab create myWS/data.Lakehouse -P enableSchemas=true --wait
fab deploy --item-types Notebook --workspace-id $WS_ID -f
fab set myWS/run.Notebook -q lakehouse -i '...'
fab cp --recursive ./dbt/ myWS/data.Lakehouse/Files/dbt/
fab job run myWS/run.Notebook -i '{}'
fab deploy --item-types SemanticModel,DataPipeline --workspace-id $WS_ID -f
fab refresh myWS/myModel.SemanticModel
fab job run-sch myWS/myPipeline.DataPipeline --type cron --interval 30 --enable --if-not-scheduled8 commands instead of a 240-line Python script.