All of the JTAG utilities I've been mentioning are quite handy if you need to load a bitstream onto a board from one of several workstations. But JTAG is capable of much more, including powerful on-chip debug features.
One of the often-overlooked hard IP blocks in Xilinx FPGAs is BSCAN. This primitive (usually described in the FPGA's configuration user guide) connects a JTAG data register for certain special instructions to FPGA fabric.
Xilinx 6 and 7 series FPGAs each contain four BSCANs, one connected to each of the four JTAG instructions USER1...USER4. These are very rarely used by user designs, but Xilinx utilities like ChipScope and the in-system SPI programming cores use them to communicate with the FPGA without needing additional connections.
The primitive is named BSCAN_SPARTAN6 in Spartan-6 and BSCANE2 in 7
series. As far as I can tell, both are functionally equivalent.
BSCAN_SPARTAN6 #(
.JTAG_CHAIN(1)
)
user1_bscan (
.SEL(instruction_active),
.TCK(tck),
.CAPTURE(state_capture_dr),
.RESET(state_reset),
.RUNTEST(state_runtest),
.SHIFT(state_shift_dr),
.UPDATE(state_update_dr),
.DRCK(tck_gated),
.TMS(tms),
.TDI(tdi),
.TDO(tdo)
);
The JTAG_CHAIN parameter specifies which of the four user instructions to use. I'll summarize the interesting ports below including some notes:
- SEL goes high whenever USERx is loaded into the instruction register, regardless of the test state machine's current state.
- CAPTURE, RESET, RUNTEST, SHIFT, UPDATE are one-hot flags that go high when the corresponding DR state is active. When the state machine is in the IR shift path, all flags are held low.
- TMS is of little practical use since the state machine is already implemented for you.
- TCK provides direct access to the JTAG clock. (Be sure to create a timing constraint for any signals clocked by this net.) In my experience the Xilinx tools often do not recognize this signal as a clock and use high-skew local routing; manual insertion of a BUFG/BUFH is advised for optimal results.
- TDI and TDO are connected to the corresponding JTAG pins when in the SHIFT-DR state. You can connect any fabric logic you want to them.
Given this core plus libjtaghal on the PC side, we have a solid framework for building an on-chip debug system! The first step is to decide what sort of data to move over the link. Since my framework is NoC based, raw NoC frames seemed the natural choice. This would create a sort of layer-3 VPN encapsulating RPC/DMA transactions within JTAG scan operations.
After some experimenting with protocols I came up with one that seemed to work reasonably well. USER1 is the status/control register, USER2 is the RPC data register, and USER3 is the DMA data register. USER4 is left free for future expansion.
The FPGA side of the link is a module called JtagDebugController. It exposes RPC and DMA ports to the NoC; my current convention calls for addresses in subnet c000/2 to be routed to the debug bridge.
I'm deliberately not describing the actual on-wire protocol in depth because it's still in flux; when I get closer to a stable release I'll document it somewhere.
The PC side of the link is a C++ application using libjtaghal called "nocswitch". Example usage:
$./x86_64-linux-gnu/nocswitch --server localhost --port 50100 --lport 50101
Emulated NoC switch [SVN rev 1253:1254M] by Andrew D. Zonenberg.
License: 3-clause ("new" or "modified") BSD.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Connected to JTAG daemon at localhost:50100
Querying adapter...
Remote JTAG adapter is a Dev board JTAG (232H) (serial number "FTWOON60", userid "FTWOON60", frequency 10.00 MHz)
Initializing chain...
Scan chain contains 1 devices
Device 0 is a Xilinx XC6SLX25 stepping 2
Virtual TAP status register is 1000adba
Valid NoC endpoint detected
This spawns a nocswitch listening on localhost:50101 connecting to a jtagd at localhost:50100.
Once nocswitch is running, it polls the status register on USER1 constantly waiting for the "new RPC message" or "new DMA message" bit to be set. (This causes a lot of traffic on the nocswitch-jtagd link and uses a decent amount of CPU on the host; my custom 8-port ICE will include FPGA based polling and an onboard nocswitch along with the jtagd's to avoid this problem.)
Client applications can then connect to nocswitch via a TCP-based protocol. The nocswitch assigns an address in c000/2 to each client in a manner somewhat reminiscent of DHCP; client applications (on the same machine or elsewhere on the LAN) can then send and receive NoC packets directly to the device under test. Multiple clients are fully supported; the nocswitch performs layer-2 switching between clients and the DUT as needed.
Nocswitch is able to switch frames from one client to another as well as just to the DUT; this permits a client to send messages to a NoC address without caring about whether it's a core in the SoC, a PC-side unit test, or even an RTL simulation (my mechanism for doing the latter will be described in a future post).
From a test case author's perspective, the NocSwitchInterface class implements the RPCAndDMAInterface class and supports the usual complement of operations.
printf("Connecting to nocswitch server...\n");
NOCSwitchInterface iface;
iface.Connect(server, port);
uint16_t eaddr = nameserver.ForwardLookup("eth0");
printf("eth0 is at %04x\n", eaddr);
printf("Resetting interface...\n");
iface.RPCFunctionCall(eaddr, ETH_RESET, 0, 0, 0, rxm);
Finally, here's a sneak peek at what's coming in future posts:
- Hardware cosimulation, including a workaround for ISim's lack of Verilog PLI support
- Splash, my build system inspired by Google Blaze
- RED TIN, my internal logic analyzer (ChipScope/SignalTap replacement with lots of features useful in my work, like state machine decoding, RLE, and time-scale compression)
- A look at both the hardware and software sides of the infrastructure for my dev board farm (batch scheduling, distributed build, automated testing, managed power distribution, and more). Hooking a single board up to a single JTAG dongle works fine if you only have one device but becomes a lot more of a pain to maintain when you have over twenty dev boards with more on the way!