Skip to content

Commit 8f27203

Browse files
authored
Merge pull request #144 from sw360/feat/purl-qualifier-search
PURL qualifier-based search
2 parents def93be + 8153bbd commit 8f27203

15 files changed

+684
-77
lines changed

ChangeLog.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,12 @@
55

66
# CaPyCli - Clearing Automation Python Command Line Tool for SW360
77

8+
## NEXT
9+
10+
* `bom map`: The options `--dbx` and `-all` were replaced by `--matchmode`.
11+
* `bom map`: new `--matchmode` options `full-search` (report all best matches) and
12+
`qualifier-match` (consider PackageURL qualifiers). See `Readme_Mapping.md`.
13+
814
## 2.9.1
915

1016
* `bom map` will provide the `purl` from SW360 in the output BOM's components

Readme_Mapping.md

Lines changed: 32 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -23,32 +23,58 @@ informs about the mapping result:
2323

2424
* **`INVALID` (0)** => Invalid SBOM entry, could not get processed
2525
* **`FULL_MATCH_BY_ID` (1)** => Full match by identifier
26-
* **`FULL_MATCH_BY_HASH` (2)** => Full match by source file hash
26+
* **`FULL_MATCH_BY_HASH` (2)** => Full match by source or binary file hash
2727
* **`FULL_MATCH_BY_NAME_AND_VERSION` (3)** => Full match by name and version
2828
* **`MATCH_BY_FILENAME` (4)** => Match by source code filename
2929
* **`GOOD_MATCH_FOUND`** == `MATCH_BY_FILENAME` => successfully found a sufficiently good match
3030
* **`MATCH_BY_NAME` (5)** => Component found, but no version match
3131
* **`SIMILAR_COMPONENT_FOUND` (6)** => Component with similar name found, no version check done
3232
* **`NO_MATCH` (100)** => Component was not found
3333

34-
In general you can say that the lower the number, the better the match.
34+
We consider lower numbers as better matches. By default, CaPyCli will stop the
35+
search when a "good" match (match code between 1 and 4) is found and add this
36+
release to the output BOM. If there are multiple good matches in SW360, the
37+
output thus depends on the order the results are returned by SW360 (or found in
38+
the CaPyCli cache).
39+
40+
The "bom map --matchmode full-search" option allows to change that behaviour so that
41+
CaPyCli will always search through all releases in the API answer or cache, and
42+
report *all best* matches found. If there are matches by ID, other matches are
43+
ignored; matches by (source or binary) file hash will win over matches by name
44+
and version etc.
3545

3646
## Notes on id mapping / PackageURL mapping
3747

38-
CaPyCli supports mapping releases by the PackageURL. As encoding of a
39-
PackageURL is not unique (some characters *may* use URL encoding, qualifiers
48+
CaPyCli supports mapping **releases** by the PackageURL. As encoding of a
49+
PackageURL is not unique (some characters may be percent-encoded, qualifiers
4050
can be given in random order etc.), we can't just do a string comparison, but
4151
instead *all* SW360 releases with PackageURLs (using external id `package-url`)
4252
are retrieved and decoded. When your input BOM specifies a `purl` field, then
4353
the PackageURL is compared field by field (type, namespace, name, version) for
4454
a `FULL_MATCH_BY_ID`.
4555

46-
Also, components will be mapped by PackageURL and if a match is found, the
56+
Also, **components** will be mapped by PackageURL and if a match is found, the
4757
`capycli:componentId` property will be added to the output BOM item. Components
4858
can be identified directly by their external id `package-url` or as fallback
4959
also by the `package-url`s of their releases.
5060

51-
PackageURL subpath and qualifiers are currently ignored during PURL matching.
61+
PackageURL **qualifiers** (like `?distro=alpine-3.21&package-id=3a23`) will be
62+
considered when using `bom map --matchmode qualifier-match`. In some cases,
63+
qualifiers are essential for correct mapping, but many scanners also include
64+
non-essential qualifiers in their SBOMs. And the distinction might be
65+
challenging: while `distro` is crucial for correct mapping of Alpine packages
66+
(same package release can have different patches in different Alpine releases),
67+
but for Debian, `distro` is unnecessary since package versions are already
68+
unique. So we use the following rules to balance accuracy and practicality:
69+
70+
* Only the qualifiers specified in the input BOM are considered during matching,
71+
qualifiers only present in SW360 releases are ignored. So you can control
72+
matching by removing the unwanted qualifiers in your SBOM.
73+
* If one or more SW360 releases are found where *all* qualifiers specified in the
74+
input BOM match, *only* these releases are added to the output BOM. Otherwise,
75+
qualifiers will be ignored, so all release matches will be added.
76+
77+
PackageURL subpath is currently ignored during PURL matching.
5278

5379
## Example 1: Very Simple, Full Match
5480

capycli/bom/create_components.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -399,7 +399,7 @@ def update_release(self, cx_comp: Component, release_data: Dict[str, Any]) -> No
399399
bom_purl = packageurl.PackageURL.from_string(
400400
data["externalIds"][repository_type])
401401
sw360_purls = PurlUtils.get_purl_list_from_sw360_object(release_data)
402-
id_match = PurlUtils.contains(sw360_purls, bom_purl)
402+
id_match = PurlUtils.contains(sw360_purls, bom_purl, compare_qualifiers=True)
403403
except ValueError:
404404
pass
405405
if not id_match:

capycli/bom/map_bom.py

Lines changed: 88 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
# SPDX-License-Identifier: MIT
77
# -------------------------------------------------------------------------------
88

9+
import copy
910
import json
1011
import logging
1112
import os
@@ -58,6 +59,8 @@ def __init__(self) -> None:
5859
self.mode = MapMode.ALL
5960
self.purl_service: Optional[PurlService] = None
6061
self.no_match_by_name_only = True
62+
self.full_search = False
63+
self.qualifier_match = False
6164

6265
def is_id_match(self, release: Dict[str, Any], component: Component) -> bool:
6366
"""Determines whether this release is a match via identifier for the specified SBOM item"""
@@ -194,39 +197,51 @@ def map_bom_item(self, component: Component, check_similar: bool, result_require
194197
# first check: unique id
195198
if release["Sw360Id"] in result_release_ids or self.is_id_match(release, component):
196199
self.add_match_if_better(result, release, MapResult.FULL_MATCH_BY_ID)
197-
break
198-
199-
# second check: name AND version
200-
if (component.name and release.get("Name")):
201-
if release["ComponentId"] in result_component_ids:
202-
name_match = True
200+
if self.full_search:
201+
continue
203202
else:
204-
name_match = component.name.lower() == release["Name"].lower()
205-
version_exists = "Version" in release
206-
if (name_match
207-
and version_exists and component.version
208-
and (component.version.lower() == release["Version"].lower())):
209-
self.add_match_if_better(result, release, MapResult.FULL_MATCH_BY_NAME_AND_VERSION)
210203
break
211-
else:
212-
name_match = False
213204

214-
# third check unique(?) file hashes
205+
# second check unique(?) file hashes
215206
cmp_hash = CycloneDxSupport.get_source_file_hash(component)
216207
if (("SourceFileHash" in release)
217208
and cmp_hash
218209
and release["SourceFileHash"]):
219210
if (cmp_hash.lower() == release["SourceFileHash"].lower()):
220211
self.add_match_if_better(result, release, MapResult.FULL_MATCH_BY_HASH)
221-
break
212+
if self.full_search:
213+
continue
214+
else:
215+
break
222216

223217
cmp_hash = CycloneDxSupport.get_binary_file_hash(component)
224218
if (("BinaryFileHash" in release)
225219
and cmp_hash
226220
and release["BinaryFileHash"]):
227221
if (cmp_hash.lower() == release["BinaryFileHash"].lower()):
228222
self.add_match_if_better(result, release, MapResult.FULL_MATCH_BY_HASH)
229-
break
223+
if self.full_search:
224+
continue
225+
else:
226+
break
227+
228+
# third check: name AND version
229+
if (component.name and release.get("Name")):
230+
if release["ComponentId"] in result_component_ids:
231+
name_match = True
232+
else:
233+
name_match = component.name.lower() == release["Name"].lower()
234+
version_exists = "Version" in release
235+
if (name_match
236+
and version_exists and component.version
237+
and (component.version.lower() == release["Version"].lower())):
238+
self.add_match_if_better(result, release, MapResult.FULL_MATCH_BY_NAME_AND_VERSION)
239+
if self.full_search:
240+
continue
241+
else:
242+
break
243+
else:
244+
name_match = False
230245

231246
# fourth check: source filename
232247
cmp_src_file = CycloneDxSupport.get_ext_ref_source_file(component)
@@ -235,7 +250,10 @@ def map_bom_item(self, component: Component, check_similar: bool, result_require
235250
and release["SourceFile"]):
236251
if cmp_src_file.lower() == release["SourceFile"].lower():
237252
self.add_match_if_better(result, release, MapResult.MATCH_BY_FILENAME)
238-
break
253+
if self.full_search:
254+
continue
255+
else:
256+
break
239257

240258
# fifth check: name and ANY version
241259
if name_match:
@@ -299,8 +317,10 @@ def get_release_details(href: str) -> Optional[Dict[str, Any]]:
299317
release = get_release_details(href)
300318
if release:
301319
self.add_match_if_better(result, release, MapResult.FULL_MATCH_BY_ID)
302-
# If we have release matches by PURL, we're done
303-
return result
320+
if not self.full_search:
321+
return result
322+
# If we have release matches by PURL, we're done
323+
return result
304324

305325
if result.component_hrefs:
306326
components += result.component_hrefs
@@ -343,29 +363,38 @@ def get_release_details(href: str) -> Optional[Dict[str, Any]]:
343363
self.add_match_if_better(result, release, MapResult.FULL_MATCH_BY_ID)
344364
break
345365

346-
# second check: name AND version (we don't need to check the name
347-
# again as we checked it when compiling component list)
348-
version_exists = "Version" in release
349-
if (version_exists
350-
and ((component.version or "").lower() == release.get("Version", "").lower())):
351-
self.add_match_if_better(result, release, MapResult.FULL_MATCH_BY_NAME_AND_VERSION)
352-
break
353-
354-
# third check unique(?) file hashes
366+
# second check unique(?) file hashes
355367
cmp_hash = CycloneDxSupport.get_source_file_hash(component)
356368
if (("SourceFileHash" in release)
357369
and cmp_hash
358370
and release["SourceFileHash"]):
359371
if (cmp_hash.lower() == release["SourceFileHash"].lower()):
360372
self.add_match_if_better(result, release, MapResult.FULL_MATCH_BY_HASH)
361-
break
373+
if self.full_search:
374+
continue
375+
else:
376+
break
362377

363378
cmp_hash = CycloneDxSupport.get_binary_file_hash(component)
364379
if (("BinaryFileHash" in release)
365380
and cmp_hash
366381
and release["BinaryFileHash"]):
367382
if (cmp_hash.lower() == release["BinaryFileHash"].lower()):
368383
self.add_match_if_better(result, release, MapResult.FULL_MATCH_BY_HASH)
384+
if self.full_search:
385+
continue
386+
else:
387+
break
388+
389+
# third check: name AND version (we don't need to check the name
390+
# again as we checked it when compiling component list)
391+
version_exists = "Version" in release
392+
if (version_exists
393+
and ((component.version or "").lower() == release.get("Version", "").lower())):
394+
self.add_match_if_better(result, release, MapResult.FULL_MATCH_BY_NAME_AND_VERSION)
395+
if self.full_search:
396+
continue
397+
else:
369398
break
370399

371400
# fifth check: name and ANY version
@@ -506,6 +535,9 @@ def update_bom_item(self, component: Optional[Component], match: Dict[str, Any])
506535
name=match.get("Name", ""),
507536
version=match.get("Version", ""))
508537
else:
538+
# copy component so we don't overwrite the input component
539+
component = copy.deepcopy(component)
540+
509541
# always overwrite the following properties
510542
name = match.get("Name", "")
511543
if name:
@@ -730,7 +762,9 @@ def map_bom_commons(self, component: Component) -> MapResult:
730762
# search release and component by purl which is independent of the component cache.
731763
if component.purl:
732764
result.component_hrefs = self.external_id_svc.search_components_by_purl(component.purl)
733-
result.release_hrefs = self.external_id_svc.search_releases_by_purl(component.purl)
765+
r = self.external_id_svc.search_releases_by_purl(component.purl, self.qualifier_match)
766+
result.release_hrefs = r["hrefs"]
767+
result.release_hrefs_results = r["results"]
734768

735769
return result
736770

@@ -815,9 +849,14 @@ def show_help(self) -> None:
815849
print(" all = default, write everything to resulting SBOM")
816850
print(" found = resulting SBOM shows only components that were found")
817851
print(" notfound = resulting SBOM shows only components that were not found")
818-
print(" --dbx relaxed Debian version handling: *completely* ignore Debian revision,")
819-
print(" so SBOM version 3.1 will match SW360 version 3.1-3.debian")
820-
print(" -all also report matches for name, but different version")
852+
print(" --matchmode MATCHMODE matching mode, comma separated list of:")
853+
print(" full-search = report best matches, don't abort on first match (recommended)")
854+
print(" all-versions = also report matches for name, but different version")
855+
print(" qualifier-match = consider qualifiers for PURL matching")
856+
print(" ignore-debian = ignore Debian revision in version comparison, so SBOM")
857+
print(" version 3.1 will match SW360 version 3.1-3.debian")
858+
print(" -all deprecated, please use --matchmode all-versions")
859+
print(" --dbx deprecated, please use --matchmode ignore-debian")
821860

822861
def run(self, args: Any) -> None:
823862
"""Main method()"""
@@ -849,16 +888,29 @@ def run(self, args: Any) -> None:
849888
if args.verbose:
850889
self.verbosity = 2
851890

852-
if args.dbx:
891+
if not args.matchmode:
892+
args.matchmode = ""
893+
894+
if "ignore-debian" in args.matchmode or args.dbx:
895+
if args.dbx:
896+
print_yellow("bom map --dbx is deprecated, use --matchmode ignore-debian instead")
853897
print_text("Using relaxed debian version checks")
854898
self.relaxed_debian_parsing = True
855899

856900
if args.mode:
857901
self.mode = args.mode
858902

859-
if args.all:
903+
if "all-versions" in args.matchmode or args.all:
904+
if args.all:
905+
print_yellow("bom map -all is deprecated, use --matchmode all-versions instead")
860906
self.no_match_by_name_only = False
861907

908+
if "full-search" in args.matchmode:
909+
self.full_search = True
910+
911+
if "qualifier-match" in args.matchmode:
912+
self.qualifier_match = True
913+
862914
print_text("Loading SBOM file", args.inputfile)
863915
try:
864916
sbom = CaPyCliBom.read_sbom(args.inputfile)

capycli/common/capycli_bom_support.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ class CycloneDxSupport():
6060
CDX_PROP_COMPONENT_ID = "capycli:componentId"
6161
CDX_PROP_FILENAME = "siemens:filename"
6262
CDX_PROP_MAPRESULT = "capycli:mapResult"
63+
CDX_PROP_MAPRESULT_BY_ID = "capycli:mapResultById"
6364
CDX_PROP_SW360_HREF = "capycli:sw360Href"
6465
CDX_PROP_SW360_URL = "capycli:sw360Url"
6566
CDX_PROP_REL_STATE = "capycli:releaseMainlineState"

capycli/common/map_result.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,20 @@
77
# -------------------------------------------------------------------------------
88

99
from typing import Any, List, Optional
10+
from enum import Enum
1011

1112
from cyclonedx.model.component import Component
1213

1314
from capycli.common.capycli_bom_support import CycloneDxSupport
1415

1516

17+
class MapResultByIdQualifiers(Enum):
18+
FULL_MATCH = "qualifiers-full-match"
19+
IGNORED = "qualifiers-ignored"
20+
UNKNOWN = "qualifiers-unknown-match"
21+
NO_QUALIFIER_MAPPING = ""
22+
23+
1624
class MapResult:
1725
"""Result of mapping a SBOM item to the list of releases"""
1826

@@ -50,8 +58,22 @@ def __init__(self, component: Optional[Component] = None) -> None:
5058
self.result: str = MapResult.NO_MATCH
5159
self._component_hrefs: List[str] = []
5260
self._release_hrefs: List[str] = []
61+
self._release_hrefs_results: List[str] = []
5362
self.releases: List[Any] = []
5463

64+
@property
65+
def release_hrefs_results(self) -> list[str]:
66+
return self._release_hrefs_results
67+
68+
@release_hrefs_results.setter
69+
def release_hrefs_results(self, value: list[str]) -> None:
70+
self._release_hrefs_results = value
71+
if not self.input_component or not value:
72+
return
73+
CycloneDxSupport.update_or_set_property(
74+
self.input_component, CycloneDxSupport.CDX_PROP_MAPRESULT_BY_ID,
75+
" ".join(value))
76+
5577
@property
5678
def component_hrefs(self) -> List[str]:
5779
return self._component_hrefs

0 commit comments

Comments
 (0)