Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

megaboom place stopped working [ERROR GPL-0307] RePlAce divergence detected. Re-run with a smaller max_phi_cof value. #6465

Open
oharboe opened this issue Jan 3, 2025 · 8 comments
Assignees
Labels
gpl Global Placement

Comments

@oharboe
Copy link
Collaborator

oharboe commented Jan 3, 2025

Describe the bug

The-OpenROAD-Project/megaboom#225

Tried a PLACE_DENSITY sweep with higher and lower values, no luck.

image

untar and run https://drive.google.com/file/d/1xbTQs6K922zpV9Qw7jSq4Il2Tgjg8ab-/view?usp=sharing

OpenROAD v2.0-17966-g1ffb8502d 
Features included (+) or not (-): +GPU +GUI +Python
This program is licensed under the BSD-3 license. See the LICENSE file for details.
Components of this program may be licensed under more restrictive licenses which must be honored.
global_placement -skip_io -density 0.54 -pad_left 0 -pad_right 0
[INFO GPL-0002] DBU: 1000
[INFO GPL-0003] SiteSize: (  0.054  0.270 ) um
[INFO GPL-0004] CoreBBox: (  2.052  2.160 ) ( 1247.994 1247.940 ) um
[INFO GPL-0006] NumInstances:           1072964
[INFO GPL-0007] NumPlaceInstances:      1046339
[INFO GPL-0008] NumFixedInstances:           72
[INFO GPL-0009] NumDummyInstances:        26553
[INFO GPL-0010] NumNets:                1076855
[INFO GPL-0011] NumPins:                3743460
[INFO GPL-0012] DieBBox:  (  0.000  0.000 ) ( 1250.000 1250.000 ) um
[INFO GPL-0013] CoreBBox: (  2.052  2.160 ) ( 1247.994 1247.940 ) um
[INFO GPL-0016] CoreArea:            1552169.625 um^2
[INFO GPL-0017] NonPlaceInstsArea:   937003.689 um^2
[INFO GPL-0018] PlaceInstsArea:      127453.578 um^2
[INFO GPL-0019] Util:                    20.719 %
[INFO GPL-0020] StdInstsArea:        127453.578 um^2
[INFO GPL-0021] MacroInstsArea:           0.000 um^2
[INFO GPL-0031] FillerInit:NumGCells:   2801620
[INFO GPL-0032] FillerInit:NumGNets:    1076855
[INFO GPL-0033] FillerInit:NumGPins:    3743460
[INFO GPL-0023] TargetDensity:            0.540
[INFO GPL-0024] AvrgPlaceInstArea:        0.122 um^2
[INFO GPL-0025] IdealBinArea:             0.226 um^2
[INFO GPL-0026] IdealBinCnt:            6881038
[INFO GPL-0027] TotalBinArea:        1552169.625 um^2
[INFO GPL-0028] BinCnt:      2048   2048
[INFO GPL-0029] BinSize: (  0.609  0.609 )
[INFO GPL-0030] NumBins: 4194304
[NesterovSolve] Iter:    1 overflow: 1.006 HPWL: 7705769250
[NesterovSolve] Iter:   10 overflow: 1.001 HPWL: 7068976680
[NesterovSolve] Iter:   20 overflow: 0.994 HPWL: 6657066255
[NesterovSolve] Iter:   30 overflow: 0.986 HPWL: 6113800038
[NesterovSolve] Iter:   40 overflow: 0.982 HPWL: 5754923358
[NesterovSolve] Iter:   50 overflow: 0.978 HPWL: 5642920248
[NesterovSolve] Iter:   60 overflow: 0.975 HPWL: 5630791832
[NesterovSolve] Iter:   70 overflow: 0.973 HPWL: 5628620695
[NesterovSolve] Iter:   80 overflow: 0.971 HPWL: 5648920206
[deleted]
[NesterovSolve] Iter:  480 overflow: 0.204 HPWL: 17020286863
[NesterovSolve] Iter:  490 overflow: 0.163 HPWL: 16820153887
[NesterovSolve] Iter:  500 overflow: 0.152 HPWL: 45295649114
[ERROR GPL-0307] RePlAce divergence detected. Re-run with a smaller max_phi_cof value.
Error: global_place_skip_io.tcl, 12 GPL-0307
openroad> 

Expected Behavior

Placement should work or an an actionable error message

Environment

OpenROAD v2.0-17966-g1ffb8502d

To Reproduce

See above

Relevant log output

No response

Screenshots

No response

Additional Context

No response

@oharboe
Copy link
Collaborator Author

oharboe commented Jan 3, 2025

@maliberty @jeffng-or A standalone test case of failure on megaboom main

@maliberty maliberty added the gpl Global Placement label Jan 3, 2025
@gudeh
Copy link
Contributor

gudeh commented Jan 3, 2025

Hi! Recently we have noticed divergences happening on stage 3-1 (skip io), and we made modifications to gpl regarding a bivariate normal distribution adjustment made with macros, reducing its effect (PR #6438).

Here it is the design without any changes, diverging on iteration 500:
image

I tried removing the bivariate effect locally for this megaboom package. But we also have the divergence happening. I will run again trying to make sure it is not a false positive detection.
image

Something that called my attention is that gpl is not using the space available to the left and to the bottom to place the cells.

@mikesinouye
Copy link
Contributor

I am also observing this issue on two different designs on two different non-public PDKs. Both designs are large (3.7 / 4.6M instances). One has macros, one does not. I have not seen this issue previously with these designs.

We are using an OpenROAD build from December 11th: 8495fc8. We have not changed our TCL parameterization of OR within that timeframe. If there has been a regression, it may have occurred before 12/11.

These designs are private and large so they would be difficult to share, but if the megaboom testcase is not sufficient to identify the issue let me know.

@gudeh
Copy link
Contributor

gudeh commented Jan 6, 2025

Hi @mikesinouye, what stage exactly do you have the error? is it on skip io stage 3-1, or during global placement itself on stage 3-3?

@mikesinouye
Copy link
Contributor

Hey @gudeh, we use our own custom flow instead of ORFS, but it is the second/final iteration of global placement with set pins/macros etc. I believe it would best align with ORFS 3-3.

I noticed that the recently enabled resizer in gpl is causing large area swings, and in these cases causing the pecentage of overlap to regress:

[NesterovSolve] Iter:  600 overflow: 0.289 HPWL: 67302121984
[INFO GPL-0100] Timing-driven iteration 4/6, virtual: false.
[INFO GPL-0101] Iter: 602, overflow: 0.284, keep rsz at: 0.3
[INFO GPL-0106] Timing-driven: worst slack -3.13e-09
[INFO GPL-0103] Timing-driven: weighted 457358 nets.
[INFO GPL-0107] Timing-driven: RSZ delta area:     171131.849683
[INFO GPL-0108] Timing-driven: new target density: 1.1083497
[INFO GPL-0100] Timing-driven iteration 5/6, virtual: false.
[INFO GPL-0101] Iter: 608, overflow: 0.204, keep rsz at: 0.3
[INFO GPL-0106] Timing-driven: worst slack -3.44e-09
[INFO GPL-0103] Timing-driven: weighted 457357 nets.
[INFO GPL-0107] Timing-driven: RSZ delta area:     135257.596875
[INFO GPL-0108] Timing-driven: new target density: 1.4269857
[NesterovSolve] Iter:  610 overflow: 0.206 HPWL: 136988748908
[NesterovSolve] Iter:  620 overflow: 0.169 HPWL: 127833675226
[NesterovSolve] Iter:  630 overflow: 0.168 HPWL: 120356659720
[NesterovSolve] Iter:  640 overflow: 0.169 HPWL: 112627453336
[NesterovSolve] Iter:  650 overflow: 0.169 HPWL: 105231071433
[NesterovSolve] Iter:  660 overflow: 0.169 HPWL: 99857577668
[NesterovSolve] Iter:  670 overflow: 0.168 HPWL: 96970269792
[NesterovSolve] Iter:  680 overflow: 0.168 HPWL: 93953985280
[NesterovSolve] Iter:  690 overflow: 0.168 HPWL: 91624357017
[NesterovSolve] Iter:  700 overflow: 0.168 HPWL: 89226007808
[NesterovSolve] Iter:  710 overflow: 0.168 HPWL: 86916387174
[NesterovSolve] Iter:  720 overflow: 0.169 HPWL: 84213743906
[NesterovSolve] Iter:  730 overflow: 0.170 HPWL: 81640217219
[NesterovSolve] Iter:  740 overflow: 0.171 HPWL: 78962074863
[NesterovSolve] Iter:  750 overflow: 0.172 HPWL: 76398076537
[NesterovSolve] Iter:  760 overflow: 0.172 HPWL: 73899545408
[NesterovSolve] Iter:  770 overflow: 0.171 HPWL: 71603526945
[NesterovSolve] Iter:  780 overflow: 0.167 HPWL: 69535734015
[NesterovSolve] Iter:  790 overflow: 0.161 HPWL: 67765426870
[NesterovSolve] Iter:  800 overflow: 0.150 HPWL: 66489127872
[INFO GPL-0100] Timing-driven iteration 6/6, virtual: false.
[INFO GPL-0101] Iter: 804, overflow: 0.143, keep rsz at: 0.3
[INFO GPL-0106] Timing-driven: worst slack -3.01e-09
[INFO GPL-0103] Timing-driven: weighted 457357 nets.
[INFO GPL-0107] Timing-driven: RSZ delta area:     -112784.908768
[INFO GPL-0108] Timing-driven: new target density: 1.1612902
[NesterovSolve] Iter:  810 overflow: 0.607 HPWL: 640895754879
[NesterovSolve] Iter:  820 overflow: 0.254 HPWL: 755318003477
[NesterovSolve] Iter:  830 overflow: 0.239 HPWL: 499468799259
[NesterovSolve] Iter:  840 overflow: 0.229 HPWL: 337789212007
[NesterovSolve] Iter:  850 overflow: 0.219 HPWL: 257770740234
[NesterovSolve] Iter:  860 overflow: 0.216 HPWL: 211072788780
[NesterovSolve] Iter:  870 overflow: 0.209 HPWL: 179756633409
[NesterovSolve] Iter:  880 overflow: 0.203 HPWL: 157765978747

After the final timing driven non-virtual resizing, the overflow goes from 0.150 to 0.607, which seems unexpected to me.

@maliberty
Copy link
Member

It is particularly odd since the RSZ delta area is negative suggesting we removed more logic than we added which should tend to reduce overflow.

@gudeh
Copy link
Contributor

gudeh commented Jan 6, 2025

Indeed, that's a big jump after the last timing-driven iteration. I would have to take a look on the debug mode, it is unfortunate it is a private PDK. I believe @mikesinouye 's issue is different from the one on megaboom in the current GH issue.

Either way, you can remove the non-virtual iterations with the new gpl TCL command: keep_resize_below_overflow, the current default is 0.3, if you set it to 0 you should get only virtual timing-driven iterations, meaning the rsz work is undone.

@gudeh
Copy link
Contributor

gudeh commented Jan 6, 2025

Concerning mega boom, I am investigating the issue and should provide new insights soon enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gpl Global Placement
Projects
None yet
Development

No branches or pull requests

4 participants