Skip to content

Latest commit

 

History

History
137 lines (121 loc) · 8.13 KB

0089-2023-06-05.md

File metadata and controls

137 lines (121 loc) · 8.13 KB

5 Jun 2023

Previous journal: Next journal:
0088-2023-05-28.md 0090-2023-06-06.md

Busy few days recently!

On June 1st I started preparing a submission for Tiny Tapeout "tt03p5" (an experimental round).

I took solo_squash and adapted it to the submission template to create: tt03p5-solo-squash.

I was going to make solo_squash a submodule of the submission repo, but while the gds GHA supports this, the docs and wokwi_test GHAs do not yet.

Changes that the tt03p5 submission brings to the original solo_squash as well as some just in the copy of solo_squash.v:

  1. Fixed a couple of WIDTH lint warnings flagged by Verilator.
  2. Created de0nano target; tested on DE0-Nano via Quartus.
  3. tt_um_algofoogle_solo_squash wrapper in top.v: Interfaces the main solo_squash module to the tt03p5 busses.
  4. Added a test of (a prototype of) the LZC (lzc24) that I want to use for Raybox. Not needed for this design, but it fits easily enough. Note that the input to the LZC comes from internal counters of solo_squash packed into one 24-bit input: {offset[3:0], v, h}
  5. In top.v:
    1. Registered the RGB outputs.
    2. Made inputs synchronous for metastability avoidance.
    3. Made all of the available bidirectional ports (uio_out) into outputs for debug/test purposes. Bits 0..4 are the LZC output (can express a value from 0..24), bit 5 is LZC "all zeroes" signal, and bits 6 and 7 debug1 and debug2.
    4. Made ui_in[4] select the function of debug1/2:
      • select=0: debug1 is UNregistered green signal; debug2 is UNregistered red signal
      • select=1: debug1 is internal visible signal; debug2 is UNregistered blue signal
    5. uo_out[6] is col0 flag (i.e. h counter is 0).
    6. uo_out[7] is row0 flag (i.e. v counter is 0).

NOTE: When the ASIC is made, all of ui_in is expected to default to being pulled low at power-on, so we should be able to wire up buttons that short those inputs high when pressed.

I managed to get the submission ready about 2am on the morning of Saturday, June 3rd, adding in the LZC and other debug/test signal stuff in the remaining time after that, and getting the submission in to Uri on time for inclusion with tinytapeout-03p5.

Other stuff along the way:

  • I wrote a testbench and some basic tests. Tests were a bit tricky:
    • RGB outputs are registered, which means they trail by 1 clock, while HSYNC and VSYNC are not, so there's a bit of cross-over between loops that are trying to get timing exactly right.
    • It seems that the VCD shows various signals transitioning 1 picosecond later than I expect. Not sure why there's that subtle difference (which matches the `timescale resolution in the Verilog, I think). Probably something to do with my settle_step?
    • Maybe the trick is to write one big loop that just waits for the time when certain events are expected to happen (whether subject to a delay or not), and otherwise clocks along for as many clock ticks as we want to verify.
    • In some other cases I resort to just counting pixels by colour and decide whether the result matches what we expect, rather than whether it's exactly at the right time.
  • I was able to add in USE_POWER_PINS and make gate-level tests work, i.e. GL=yes make
  • I don't yet have tests for the debug signals or input buttons.
  • I was able to do the TT build locally for rapid iteration:
    • Clone repo and set up tools:
      git clone [email protected]:algofoogle/tt03p5-solo-squash
      cd tt03p5-solo-squash
      git clone [email protected]:tinytapeout/tt-support-tools tt
      cd tt
      git checkout tt03p5
      cd ..
      pip install -r tt/requirements.txt
      ./tt/tt_tool.py --create-user-config
    • Harden:
      export OPENLANE_IMAGE_NAME=efabless/openlane:cb59d1f84deb5cedbb5b0a3e3f3b4129a967c988-amd64
      ./tt/tt_tool.py --harden
      NOTE: core area warning is OK. Just make sure fanout warnings are not excessive. Try to work out other setup/hold/slew violations (but I had none).
    • Check results:
      ./tt/tt_tool.py --print-warnings
      ./tt/tt_tool.py --print-stats  # 36.14%, 14178um wire length
      mkdir -p JUNK
      ./tt/tt_tool.py --print-cell-category > JUNK/cells.md
      ./tt/tt_tool.py --create-png
  • I came up with what I think is a better method for generating the preview PNG.

Possible solo_squash stuff for TT04:

  • Async register overrides; allow setting them during safe regions like VBLANK and HBLANK, in a holding buffer, and then latch them in at the right time for the design.
  • A dedicated signal marking the start of (or the whole duration of) VBLANK.
  • Similar for HBLANK? Together these could be used for external sync, IRQs, or to update registers. This might even allow an external controller to fake more than one ball in different parts of the screen. What about read registers, too, to help this?
  • Better image design, and RGB222 option.
  • Get rid of LZC.
  • Better sound options?
  • Different resolution/timing options?
  • Time at start-up for a controller to set initial registers, instead of via input pins.
  • Text, sprite, or other overlay
  • Learn how we'd go about doing an R2R DAC in a digital design that's compatible with Caravan.
  • Simple "playfield" as a DFFRAM or OpenRAM grid (just to test the concept). Could be the start of a Breakout clone?

Other stuff:

  • KLayout layer properties files that suit different printing targets: https://github.com/mattvenn/klayout_properties
  • VGA 800x600 56Hz timing is interesting. It uses a 36.0MHz clock and a H line counter of exactly 1024. I wonder if we could hack other VGA modes with different timing, and different base clocks.
  • If Raybox 640x480 is too optimistic and we can't get a good route that runs fast enough, we could do 400x300 by using the above 800x600@56 mode, and then we only need an 18MHz clock (which means timing for external RAM is even more generous: ~55ns). Other option is maybe 320x200, which could work with a 12.5MHz clock, and would produce probably a reasonably compact layout...?
  • If we use a 50MHz core, but just clock out 25MHz or even 12.5MHz to VGA, will this help the design set up external memory timing in a more structured way?

Next:

  • Test full tt03p5 (inc. wrapper) on FPGA
  • Make a few incremental solo_squash improvements
  • Make Raybox a little more modular so we can enable/disable different parts of the design if we want to get ready for tougher constraints.