Saturday, April 8, 2017

STARSHIPRAIDER: Preparing for high-speed I/O characterization

In my previous post, I characterized the STARSHIPRAIDER I/O circuit for high voltage fault transient performance, but was unable to adequately characterize the high speed data performance because my DSO (Rigol DS1102D) only has 100 MHz of bandwidth.

Although I did have some ideas on how to improve the performance of the current I/O circuit, it was already faster than I could measure so I had no way to know if my improvements were actually making it any better. Ideally I'd just buy an oscilloscope with several GHz of bandwidth, but I'm not made of money and those scopes tend to be in the "request a quote" price range.

The obvious solution was to build one. I already had a proven high-speed sampling architecture from my TDR project so all I had to do was repackage it as an oscilloscope and make it faster still.

The circuit was beautifully simple: an output from the FPGA drives a 50 ohm trace to a SMA connector, then a second SMA connector drives the positive input of an ADCMP572 through a 3 dB attenuator (to keep my signal within range). The negative input is driven by a cheap 12-bit I2C DAC. The comparator output is then converted from CML to LVDS and fed to the host FPGA board. Finally, a 3.3V CML output from the FPGA drives the latch enable input on the comparator.

The "ADC" algorithm is essentially the same as on my TDR. I like to think of it as an equivalent-time version of a flash ADC: rather than 256 comparators digitizing the signal once, I digitize the signal 256 times with one comparator (and of course 256 different reference voltages). The post-processing to turn the comparator outputs into 8-bit ADC codes is the same.

Unlike the TDR, however, I also do equivalent-time sampling in the time domain. The FPGA generates the sampling and PRBS clocks with different PLL outputs (at 250 MHz / 4 ns period), and sweeps the relative phase in 100 ps steps to produce an effective resolution of 10 Gsps / 100 ps timebase.

Without further ado here's a picture of the board. Total BOM cost including connectors and PCB was approximately $50.

Oscilloscope board (yes, it's PMOD form factor!)
After some initial firmware development I was able to get some preliminary eye renders off the board. They were, to say the least, not ideal.

250 Mbps: very bumpy rise
500 Mbps: significant eye closure even with increased drive strength

I spent quite a while tracking down other bugs before dealing with the signal integrity issues. For example, a low-frequency pulse train showed up with a very uneven duty cycle:

Duty cycle distortion
Someone suggested that I try a slow rise time pulse to show the distortion more clearly. Not having a proper arbitrary waveform generator, I made do with a squarewave and R-C lowpass filter.

Ever seen breadboarded passives interfacing to edge-launch SMA connectors before?
It appeared that I had jump discontinuities in my waveform every two blocks (color coding)
I don't have an EE degree, but I can tell this looks wrong!

Interestingly enough, two blocks (of 32 samples each) were concatenated into a single JTAG transfer. These two were read in one clock cycle and looked fine, but the junction to the next transfer seemed to be skipping samples.

As it turned out, I had forgotten to clear a flag which led to me reading the waveform data before it was done capturing. Since the circular buffer was rotating in between packets, some samples never got sent.

The next bug required zooming into the waveform a bit to see. The samples captured on the first few (the number seemed to vary across bitstream builds) of my 40 clock phases were showing up shifted by 4 ns (one capture clock).

Horizontally offset samples

I traced this issue to a synchronizer between clock domains having variable latency depending on the phase offset of the source and destination clocks. This is an inherent issue in clock domain crossing, so I think I'm just going to have to calibrate it out somehow. For the short term I'm manually measuring the number of offset phases each time I recompile the FPGA image, and then correcting the data in post-processing.

The final issue was a hardware bug. I was terminating the incoming signal with a 50Ω resistor to ground. Although this had good AC performance, at DC the current drawn from a high-level input was quite significant (66 mA at 3.3V). Since my I/O pins can't drive this much, the line was dragged down.

I decided to rework the input termination to replace the 50Ω terminator with split 100Ω resistors to 3.3V and ground. This should have about half the DC current draw, and is Thevenin equivalent to a 50Ω terminator to 1.65V. As a bonus, the mid-level termination will also allow me to AC-couple the incoming signal if that becomes necessary.

Mill out trace from ground via to on-die 50Ω termination resistor

Remove soldermask from ground via and signal trace

Add 100Ω 0402 low-side terminator
Add 100Ω 0402 high-side terminator, plus jumper trace to 3.3V bulk decoupling cap

Add 10 nF high speed decoupling cap to help compensate for inductance of long feeder trace
I cleaned off all of the flux residue and ran a second set of eye loopback tests at 250 and 500 Mbps. The results were dramatically improved:

Post-rework at 250 Mbps
Post-rework at 500 Mbps
While not perfect, the new eye openings are a lot cleaner. I hope to tweak my input stage further to reduce probing artifacts, but for the time being I think I have sufficient performance to compare multiple STARSHIPRAIDER test circuits and see how they stack up at relatively high speeds.

Next step: collect some baseline data for the current STARSHIPRAIDER characterization board, then use that to inform my v0.2 I/O circuit!

Sunday, February 5, 2017

STARSHIPRAIDER: Input buffer rev 0.1 design and characterization

Working as an embedded systems pentester is a lot of fun, but it comes with some annoying problems. There's so many tools that I can never seem to find the right one. Need to talk to a 3.3V UART? I almost invariably have an FTDI cable configured for 5 or 1.8V on my desk instead. Need to dump a 1.8V flash chip? Most of our flash dumpers won't run below 3.3. Need to sniff a high-speed bus? Most of the Saleae Logic analyzers floating around the lab are too slow to keep up with fast signals, and the nice oscilloscopes don't have a lot of channels. And everyone's favorite jack-of-all-trades tool, the Bus Pirate, is infamous for being slow.

As someone with no shortage of virtual razors, I decided that this yak needed to be shaved! The result was an ongoing project I call STARSHIPRAIDER. There will be more posts on the project in the coming months so stay tuned!

The first step was to decide on a series of requirements for the project:
  • 32 bidirectional I/O ports split into four 8-pin banks.
    This is enough to sniff any commonly encountered embedded bus other than DRAM. Multiple banks are needed to support multiple voltage levels in the same target.
  • Full support for 1.2 to 5V logic levels.This is supposed to be a "Swiss Army knife" embedded systems debug/testing tool. This voltage range encompasses pretty much any signalling voltage commonly encountered in embedded devices.
  • Tolerance to +/- 12V DC levels.Test equipment needs to handle some level of abuse. When you're reverse engineering a board it's easy to hook up ground to the wrong signal, probe a power rail, or even do both at once. The device doesn't have to function in this state (shutting down for protection is OK) but needs to not suffer permanent damage. It's also OK if the protection doesn't handle AC sources - the odds of accidentally connecting a piece of digital test equipment to a big RF power amplifier are low enough that I'm not worried.
  • 500 Mbps input/output rate for each pin.This was a somewhat arbitrary choice, but preliminary math indicated it was feasible. I wanted something significantly faster than existing tools in the class.
  • Ethernet-based interface to host PC.I've become a huge fan of Ethernet and IPv6 as communications interface for my projects. It doesn't require any royalties or license fees, scales from 10 Mbps to >10 Gbps and supports bridging between different link speeds, supports multi-master topologies, and can be bridged over a WAN or VPN. USB and PCIe, the two main alternatives, can do few if any of these.
  • Large data buffer.Most USB logic analyzers have very high peak capture rates, but the back-haul interface to the host PC can't keep up with extended captures at high speed. Commodity DRAM is so cheap that there's no reason to not stick a whole SODIMM of DDR3 in the instrument to provide an extremely deep capture buffer.
  • Multiple virtual instruments connected to a crossbar.Any nontrivial embedded device contains multiple buses of interest to a reverse engineer. STARSHIPRAIDER needs to be able to connect to several at once (on arbitrary pins), bridge them out to separate TCP ports, and allow multiple testers to send test vectors to them independently.
The brain of the system will be fairly straightforward high-speed digital. It will be a 6-8 layer PCB with an Artix-7 FPGA in FGG484 package, a SODIMM socket for 4GB of DDR3 800, a KSZ9031 Gigabit Ethernet PHY, a TLK10232 10gbit Ethernet PHY, and a SFP+ cage, plus some sort of connector (most likely a Samtec Q-strip) for talking to the I/O subsystem on a separate board.

The challenging part of the design, from an architectural perspective, seemed to be the I/O buffer and input protection circuit, so I decided to prototype it first.

STARSHIPRAIDER v0.1 I/O buffer design

A block diagram of the initial buffer design is shown above. The output buffer will be discussed in a separate post once I've had a chance to test it; today we'll be focusing on the input stage (the top half of the diagram).

During normal operation, the protection relay is closed. The series resistor has insignificant resistance compared to the input impedance of the comparator (an ADCMP607), so it can be largely ignored. The comparator checks the input signal against a threshold (chosen appropriately for the I/O standard in use) and sends a differential signal to the host board for processing. But what if something goes wrong?

If the user accidentally connects the probe to a signal outside the acceptable voltage range, a Schottky diode connected to the +5V or ground rail will conduct and shunt the excess voltage safely into the power rails. The series resistor limits fault current to a safe level (below the diode's peak power rating). After a short time (about 150 µs with my current relay driver), the protection relay opens and breaks the circuit.

The relay is controlled by a Silego GreenPAK4 mixed-signal FPGA, running a small design written in Verilog and compiled with my open-source toolchain. The code for the GreenPAK design is on Github.

All well and good in theory... but does it work? I built a characterization board containing a single I/O buffer and loaded with test points and probe connectors. You can grab the KiCAD files for this on Github as well. Here's a quick pic after assembly:

STARSHIPRAIDER I/O characterization board
Initial test results were not encouraging. Positive overvoltage spikes were clamped to +8V and negative spikes were clamped to -1V - well outside the -0.5 to +6V absolute max range of my comparator.
Positive transient response

Negative transient response


After a bit of review of the schematics, I found two errors. The "5V" ESD diode I was using to protect the high side had a poorly controlled Zener voltage and could clamp as high as 8V or 9V. The Schottky on the low side was able to survive my fault current but the forward voltage increased massively beyond the nominal value.

I reworked the board to replace the series resistor with a larger one (39 ohms) to reduce the maximum fault current, replaced the low-side Schottky with one that could handle more current, and replaced the Zener with an identical Schottky clamping to the +5V rail.

Testing this version gave much better results. There was still a small amount of ringing (less than five nanoseconds) a few hundred mV past the limit, but the comparator's ESD diodes should be able to safely dissipate this brief pulse.

Positive transient response, after rework
Negative transient response, after rework
Now it was time to test the actual signal path. My first iteration of the test involved cobbling together a signal path from an FPGA board through the test platform and to the oscilloscope without any termination. The source of the signal was a BNC-to-minigrabber flying lead test clip! Needless to say, results were less than stellar.

PRBS31 eye at 80 Mbps through protection circuit with flying leads and no terminator
After ordering some proper RF test supplies (like an inline 50 ohm BNC terminator), I got much better signal quality. The eye was very sharp and clear at 100 Mbps. It was visibly rounded at 200 Mbps, but rendering a squarewave at that rate requires bandwith much higher than the 100 MHz of my oscilloscope so results were inconclusive.

PRBS31 eye at 100 Mbps through protection circuit with proper cabling
PRBS31 eye at 200 Mbps, limited by oscilloscope bandwidth
I then hooked the protection circuit up to the comparator to test the entire inbound signal chain. While the eye looked pretty good at 100 Mbps (plotting one leg of the differential since my scope was out of channels), at 200 Mbps horrible jitter appeared.

PRBS31 eye at 100 Mbps through full input buffer
PRBS31 eye at 200 Mbps through full input buffer
After quite a bit of scratching my head and fumbling with datasheets, I realized my oscilloscope was the problem by plotting the clock reference I was triggering on. The jitter was visible in this clock as well, suggesting that it was inherent in the oscilloscope's trigger circuit. This isn't too surprising considering I'm really pushing the limits of this scope - I need a better one to do this kind of testing properly.

PRBS31 eye at 200 Mbps plus 200 MHz sync clock
 At this point I've done about all of the input stage testing I can do with this oscilloscope. I'm going to try and rig up a BER tester on the FPGA so I can do PRBS loopback through the protection stage and comparator at higher speeds, then repeat for the output buffer and the protection run in the opposite direction.

I still have more work to do on the protection circuit as well... while it's fine at 100 Mbps, the 2x 10pF Schottky diode parasitic capacitance is seriously degrading my rise times (I calculated an RC filter -3dB point of around 200 MHz, so higher harmonics are being chopped off). I have some ideas on how I can cut this down much less but that will require a board respin and another blog post!