Commit 7d5c64b
correctly handle corner case barrier rdd errors (#993)
We currently have an optimization that if input dfs have the same number
of partitions as num_workers, we skip the repartition. This can be
problematic in cases when the df has ancestors with a different number
of partitions before a shuffle boundary, which is incompatible with
barrier rdd logic. There are two fixes:
- In the case of the algos that collect the barrier rdd results to get
model training results, the default path is taken and is retried with a
hash repartitioning (introducing a shuffle boundary) if a barrier error
is encountered.
- In the case of knn and dbscan which return the barrier rdd after
converting to a df, there is no opportunity to retry since the calling
user code would encounter this error. In this case, the only option is
to repartition the input in all cases.
---------
Signed-off-by: Erik Ordentlich <[email protected]>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>1 parent bf4ad1d commit 7d5c64b
File tree
4 files changed
+62
-7
lines changed- python
- src/spark_rapids_ml
- tests
4 files changed
+62
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1094 | 1094 | | |
1095 | 1095 | | |
1096 | 1096 | | |
| 1097 | + | |
| 1098 | + | |
| 1099 | + | |
| 1100 | + | |
| 1101 | + | |
| 1102 | + | |
1097 | 1103 | | |
1098 | 1104 | | |
1099 | 1105 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1192 | 1192 | | |
1193 | 1193 | | |
1194 | 1194 | | |
1195 | | - | |
1196 | | - | |
1197 | | - | |
1198 | | - | |
1199 | | - | |
1200 | 1195 | | |
1201 | 1196 | | |
1202 | 1197 | | |
1203 | 1198 | | |
1204 | | - | |
| 1199 | + | |
| 1200 | + | |
| 1201 | + | |
| 1202 | + | |
| 1203 | + | |
| 1204 | + | |
| 1205 | + | |
| 1206 | + | |
| 1207 | + | |
| 1208 | + | |
| 1209 | + | |
| 1210 | + | |
| 1211 | + | |
| 1212 | + | |
| 1213 | + | |
| 1214 | + | |
| 1215 | + | |
| 1216 | + | |
| 1217 | + | |
| 1218 | + | |
| 1219 | + | |
1205 | 1220 | | |
1206 | 1221 | | |
1207 | 1222 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
628 | 628 | | |
629 | 629 | | |
630 | 630 | | |
631 | | - | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
632 | 637 | | |
633 | 638 | | |
| 639 | + | |
| 640 | + | |
634 | 641 | | |
635 | 642 | | |
636 | 643 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
282 | 282 | | |
283 | 283 | | |
284 | 284 | | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
285 | 312 | | |
286 | 313 | | |
287 | 314 | | |
| |||
0 commit comments