Skip to content

Latest commit

 

History

History
82 lines (65 loc) · 5.34 KB

File metadata and controls

82 lines (65 loc) · 5.34 KB

Demo HW design of MicroBlaze using DDR3 RAM on Arty A7

The folders ArtyA7_MicroBlaze_demo_hw_2023.1 and ArtyA7_MicroBlaze_demo_hw_2024.1 contain the HW design project created by the tutorial in Vivado 2023.1 and Vivado 2024.1

I tested the design on Arty A7-35 (which is no longer in production). I expect it to work also on Arty A7-100.

Important

If you have Arty A7-100, you need to change the board to A7-100 in Vivado in Tools|Settings|General|Project device.

No other changes in the design should be necessary.

Memory read speed benchmarking app

The folder MicroBlaze_DDR_speed_test_sw_2023.1 is a Vitis 2023.1 workspace of the benchmarking app, which runs on the HW design created by the tutorial.

You can also open this workspace in Vitis Classic 2024.1.
It will ask you if you want to update it to version 2024.1. After the update, the app will compile and work the very same way as it does in Vitis 2023.1.
It's impossible to correctly build this app in Vitis 2024.1 (i.e., in the new Vitis Unified IDE). The Vitis Unified 2024.1 doesn't read correctly MicroBlaze parameters from the XSA file.

Important

You need an oscilloscope in order to make use of the app.
The app provides no output to the console.

It drives the Arty A7 pin marked A0 high before the testing loop is executed. The pin is driven low after the loop finishes. Testing loops are repeated indefinitely.
You, therefore, need to measure the duration of a positive pulse created on the pin A0 down to tens of microseconds. Even a cheap scope should be able to do that.

The amount of data read from memory in the testing loop is defined by the macro BUFF_WORDS (i.e., number of 32b words) defined in main.cpp. The value of the macro must be divisible by 4.

Important

The app published in this repository was compiled for Arty A7-35. It won't load on Arty A7-100.

If you have Arty A7-100, you need to generate the Vivado outputs with the board Arty A7-100 set in the Vivado project and export the hardware specification (File|Export|Export Hardware, select "Include Bitstream").
Then, in Vitis, you right-click the "system" project in the Explorer, select "Update Hardware Specification" and specify the .xsa file generated by Vivado. With the HW specification updated, you must re-build the Vitis workspace.

Tip

The app will, by default, run with instruction and data cache disabled, in order to measure DDR3 SDRAM read speed.
To do the measurement with caches enabled, comment the following line in main.cpp:

//Comment out this macro in order to run the test with instruction and data cache enabled
#define CACHES_DISABLED

Compilers do not like loops, which read data and do nothing with them. Such code is discarded by a compiler.
In order to remove any dependency on compiler optimization I wrote the critical piece of the benchmarking code in MicroBlaze assembly.
This is an excerpt from main.cpp:

volatile uint32_t buff[BUFF_WORDS];

/* Sequentially read content of buff into a register */
asm volatile (
    "xor r0, r0, r0      \n\t"  //make sure r0 is zero
    "addi r10, %0        \n\t"  //load address of buff to r10
    "addi r11, r0, %1    \n\t"  //load value of BUFF_WORDS/4 to r11
    "addi r13, r0, -1    \n\t"  //load value -1 to r13
    "1:                  \n\t"  //label for branching
    //Load four 32b words from memory:
    "lwi  r12, r10, 0    \n\t"  //load 32b word from address r10 to r12
    "lwi  r12, r10, 4    \n\t"  //load 32b word from address r10+4 to r12
    "lwi  r12, r10, 8    \n\t"  //load 32b word from address r10+8 to r12
    "lwi  r12, r10, 12   \n\t"  //load 32b word from address r10+14 to r12
    "addi r10, r10, 4*4  \n\t"  //increment address in r10 to next 4 words
    "add  r11, r11, r13  \n\t"  //decrement counter in r11 (r13 == -1)
    "bgti r11, 1b        \n\t"  //if r11 > 0 then branch backward to label 1
    :                                   //no output operands
    : "m" (buff), "i" (BUFF_WORDS/4)    //input operands
    : "r0","r10","r11","r12","r13","cc" //clobbered registers + CPU condition codes
);

Important

Even though the critical benchmarking loop is written in assembly, do compile the whole project in the Release Configuration (i.e., with optimization set to -O2 or -O3) so the surrounding code is optimized as well.

Measurements

I made the following measurements on Arty A7-35 using the exact HW design and SW app published in this repository:

test buffer size duration with caches disabled duration with caches enabled
30 kB
(fits into the 32 kB cache)
7.24 ms 0.088 ms
50 kB 12.05 ms 0.689 ms