Sunday, May 8, 2016

Open Verilog flow for Silego GreenPak4 programmable logic devices

I've written a couple of posts in the past few months but they were all for the blog at work so I figured I'm long overdue for one on Silicon Exposed.

So what's a GreenPak?


Silego Technology is a fabless semiconductor company located in the SF Bay area, which makes (among other things) a line of programmable logic devices known as GreenPak. Their 5th generation parts were just announced, but I started this project before that happened so I'm still targeting the 4th generation.

GreenPak devices are kind of like itty bitty PSoCs - they have a mixed signal fabric with an ADC, DACs, comparators, voltage references, plus a digital LUT/FF fabric and some typical digital MCU peripherals like counters and oscillators (but no CPU).

It's actually an interesting architecture - FPGAs (including some devices marketed as CPLDs) are a 2D array of LUTs connected via wires to adjacent cells, and true (product term) CPLDs are a star topology of AND-OR arrays connected by a crossbar. GreenPak, on the other hand, is a star topology of LUTs, flipflops, and analog/digital hard IP connected to a crossbar.

Without further ado, here's a block diagram showing all the cool stuff you get in the SLG46620V:

SLG46620V block diagram (from device datasheet)
They're also tiny (the SLG46620V is a 20-pin 0.4mm pitch STQFN measuring 2x3 mm, and the lower gate count SLG46140V is a mere 1.6x2 mm) and probably the cheapest programmable logic device on the market - $0.50 in low volume and less than $0.40 in larger quantities.

The Vdd range of GreenPak4 is huge, more like what you'd expect from an MCU than an FPGA! It can run on anything from 1.8 to 5V, although performance is only specified at 1.8, 3.3, and 5V nominal voltages. There's also a dual-rail version that trades one of the GPIO pins for a second power supply pin, allowing you to interface to logic at two different voltage levels.

To support low-cost/space-constrained applications, they even have the configuration memory on die. It's one-time programmable and needs external Vpp to program (presumably Silego didn't want to waste die area on charge pumps that would only be used once) but has a SRAM programming mode for prototyping.

The best part is that the development software (GreenPak Designer) is free of charge and provided for all major operating systems including Linux! Unfortunately, the only supported design entry method is schematic entry and there's no way to write your design in a HDL.

While schematics may be fine for quick tinkering on really simple designs, they quickly get unwieldy. The nightmare of a circuit shown below is just a bunch of counters hooked up to LEDs that blink at various rates.

Schematic from hell!
As if this wasn't enough of a problem, the largest GreenPak4 device (the SLG46620V) is split into two halves with limited routing between them, and the GUI doesn't help the user manage this complexity at all - you have to draw your schematic in two halves and add "cross connections" between them.

The icing on the cake is that schematics are a pain to diff and collaborate on. Although GreenPak schematics are XML based, which is a touch better than binary, who wants to read a giant XML diff and try to figure out what's going on in the circuit?

This isn't going to be a post on the quirks of Silego's software, though - that would be boring. As it turns out, there's one more exciting feature of these chips that I didn't mention earlier: the configuration bitstream is 100% documented in the device datasheet! This is unheard of in the programmable logic world. As Nick of Arachnid Labs says, the chip is "just dying for someone to write a VHDL or Verilog compiler for it". As you can probably guess by from the title of this post, I've been busy doing exactly that.

Great! How does it work?


Rather than wasting time writing a synthesizer, I decided to write a GreenPak technology library for Clifford Wolf's excellent open source synthesis tool, Yosys, and then make a place-and-route tool to turn that into a final netlist. The post-PAR netlist can then be loaded into GreenPak Designer in order to program the device.

The first step of the process is to run the "synth_greenpak4" Yosys flow on the Verilog source. This runs a generic RTL synthesis pass, then some coarse-grained extraction passes to infer shift register and counter cells from behavioral logic, and finally maps the remaining logic to LUT/FF cells and outputs a JSON-formatted netlist.

Once the design has been synthesized, my tool (named, surprisingly, gp4par) is then launched on the netlist. It begins by parsing the JSON and constructing a directed graph of cell objects in memory. A second graph, containing all of the primitives in the device and the legal connections between them, is then created based on the device specified on the command line. (As of now only the SLG46620V is supported; the SLG46621V can be added fairly easily but the SLG46140V has a slightly different microarchitecture which will require a bit more work to support.)

After the graphs are generated, each node in the netlist graph is assigned a numeric label identifying the type of cell and each node in the device graph is assigned a list of legal labels: for example, an I/O buffer site is legal for an input buffer, output buffer, or bidirectional buffer.

Example labeling for a subset of the netlist and device graphs
The labeled nodes now need to be placed. The initial placement uses a simple greedy algorithm to create a valid (although not necessarily optimal or even routable) placement:
  1. Loop over the cells in the netlist. If any cell has a LOC constraint, which locks the cell to a specific physical site, attempt to assign the node to the specified site. If the specified node is the wrong type, doesn't exist, or is already used by another constrained node, the constraint is invalid so fail with an error.
  2. Loop over all of the unconstrained cells in the netlist and assign them to the first unused site with the right label. If none are available, the design is too big for the device so fail with an error.
Once the design is placed, the placement optimizer then loops over the design and attempts to improve it. A simulated annealing algorithm is used, where changes to the design are accepted unconditionally if they make the placement better, and with a random, gradually decreasing probability if they make it worse. The optimizer terminates when the design receives a perfect score (indicating an optimal placement) or if it stops making progress for several iterations. Each iteration does the following:
  1. Compute a score for the current design based on the number of unroutable nets, the amount of routing congestion (number of nets crossing between halves of the device), and static timing analysis (not yet implemented, always zero).
  2. Make a list of nodes that contributed to this score in some way (having some attached nets unroutable, crossing to the other half of the device, or failing timing).
  3. Remove nodes from the list that are LOC'd to a specific location since we're not allowed to move them.
  4. Remove nodes from the list that have only one legal placement in the device (for example, oscillator hard IP) since there's nowhere else for them to go.
  5. Pick a node from the remainder of the list at random. Call this our pivot.
  6. Find a list of candidate placements for the pivot:
    1. Consider all routable placements in the other half of the device.
    2. If none were found, consider all routable placements anywhere in the device.
    3. If none were found, consider all placements anywhere in the device even if they're not routable.
  7. Pick one of the candidates at random and move the pivot to that location. If another cell in the netlist is already there, put it in the vacant site left by the pivot.
  8. Re-compute the score for the design. If it's better, accept this change and start the next iteration.
  9. If the score is worse, accept it with a random probability which decreases as the iteration number goes up. If the change is not accepted, restore the previous placement.
After optimization, the design is checked for routability. If any edges in the netlist graph don't correspond to edges in the device graph, the user probably asked for something impossible (for example, trying to hook a flipflop's output to a comparator's reference voltage input) so fail with an error.

The design is then routed. This is quite simple due to the crossbar structure of the device. For each edge in the netlist:
  1. If dedicated (non-fabric) routing is used for this path, configure the destination's input mux appropriately and stop.
  2. If the source and destination are in the same half of the device, configure the destination's input mux appropriately and stop.
  3. A cross-connection must be used. Check if we already used one to bring the source signal to the other half of the device. If found, configure the destination to route from that cross-connection and stop.
  4. Check if we have any cross-connections left going in this direction. If they're all used, the design is unroutable due to congestion so fail with an error.
  5. Pick the next unused cross-connection and configure it to route from the source. Configure the destination to route from the cross-connection and stop.
Once routing is finished, run a series of post-PAR design rule checks. These currently include the following:
  • If any node has no loads, generate a warning
  • If an I/O buffer is connected to analog hard IP, fail with an error if it's not configured in analog mode.
  • Some signals (such as comparator inputs and oscillator power-down controls) are generated by a shared mux and fed to many loads. If different loads require conflicting settings for the shared mux, fail with an error.
If DRC passes with no errors, configure all of the individual cells in the netlist based on the HDL parameters. Fail with an error if an invalid configuration was requested.

Finally, generate the bitstream from all of the per-cell configuration and write it to a file.

Great, let's get started!

If you don't already have one, you'll need to buy a GreenPak4 development kit. The kit includes samples of the SLG46620V (among other devices) and a programmer/emulation board. While you're waiting for it to arrive, install GreenPak Designer.

Download and install Yosys. Although Clifford is pretty good at merging my pull requests, only my fork on Github is guaranteed to have the most up-to-date support for GreenPak devices so don't be surprised if you can't use a bleeding-edge feature with mainline Yosys.

Download and install gp4par. You can get it from the Github repository.

Write your HDL, compile with Yosys, P&R with gp4par, and import the bitstream into GreenPak Designer to program the target device. The most current gp4par manual is included in LaTeX source form in the source tree and is automatically built as part of the compile process. If you're just browsing, there's a relatively recent PDF version on my web server.

If you'd like to see the Verilog that produced the nightmare of a schematic I showed above, here it is.

Be advised that this project is still very much a work in progress and there are still a number of SLG46620V features I don't support (see the manual for exact details).

I love it / it segfaulted / there's a problem in the manual!

Hop in our IRC channel (##openfpga on Freenode) and let me know. Feedback is great, pull requests are even better,

You're competing with Silego's IDE. Have they found out and sued you yet?

Nope. They're fully aware of what I'm doing and are rolling out the red carpet for me. They love the idea of a HDL flow as an alternative to schematic entry and are pretty amazed at how fast it's coming together.

After I reported a few bugs in their datasheets they decided to skip the middleman and give me direct access to the engineer who writes their documentation so that I can get faster responses. The last time I found a problem (two different parts of the datasheet contradicted each other) an updated datasheet was in my inbox and on their website by the next day. I only wish Xilinx gave me that kind of treatment!

They've even offered me free hardware to help me add support for their latest product family, although I plan to get GreenPak4 support to a more stable state before taking them up on the offer.

So what's next?


Better testing, for starters. I have to verify functionality by hand with a DMM and oscilloscope, which is time consuming.

My contact at Silego says they're going to be giving me documentation on the SRAM emulation interface soon, so I'm going to make a hardware-in-loop test platform that connects to my desktop and the Silego ZIF socket, and lets me load new bitstreams via a scriptable interface. It'll have FPGA-based digital I/O as well as an ADC and DAC on every device pin, plus an adjustable voltage regulator for power, so I can feed in arbitrary mixed-signal test waveforms and write PC-based unit tests to verify correct behavior.

Other than that, I want to finish support for the SLG46620V in the next month or two. The SLG46621V will be an easy addition since only one pin and the relevant configuration bits have changed from the 46620 (I suspect they're the same die, just bonded out differently).

Once that's done I'll have to do some more extensive work to add the SLG46140V since the architecture is a bit different (a lot of the combinatorial logic is merged into multi-function blocks). Luckily, the 46140 has a lot in common architecturally with the GreenPak5 family, so once that's done GreenPak5 will probably be a lot easier to add support for.

My thanks go out to Clifford Wolf, whitequark, the IRC users in ##openfpga, and everyone at Silego I've worked with to help make this possible. I hope that one day this project will become mature enough that Silego will ship it as an officially supported extension to GreenPak Designer, making history by becoming the first modern programmable logic vendor to ship a fully open source synthesis and P&R suite.

Tuesday, October 27, 2015

New GPG key

Hi everyone,

I've been busy lately and haven't had a chance to post much. There will be a pretty good sized series coming up in a month or two (hopefully) on my next-gen FPGA cluster and JTAG stuff but I'm holding off until I have something better to write about.

In the meantime, I've decided that my circa 2009 GPG key is long overdue for replacement so I've issued a new one and am posting the fingerprints in multiple public locations (this being one).

The new key fingerprint is:
859B A7BA DE9C 0BD5 EC01  FF36 3461 7AB9 B31C 7D7C

Verification message signed with my old key:
http://thanatos.virtual.antikernel.net/unlisted/new-key-notes.txt.asc

Saturday, May 23, 2015

Graduating, TDR prototype, lab move, and a conference talk

So, it's been a busy couple of months and I haven't had time to post anything. Here's a few quick updates:

I successfully defended my Ph.D thesis few weeks ago and will be graduating next weekend. You can download the thesis and browse the code if you're so inclined. I plan to continue developing the project in my spare time and using it as the basis for future embedded gadgets so expect more posts on it over the coming months.

The TDR board came in and I assembled the first prototype. It had a few bugs (which I'll detail in a future post) but after reworking them all of the major subsystems seem functional and I'm working on the firmware. Expect another post over the summer once I've made more progress.
TDR prototype during bring-up.
This was my improvised jig for holding probes on small test points.
Now that I'm done with school I'm moving across the country in a week to start work at my new job. My lab is currently living in 125 cardboard boxes weighing just shy of 1900 pounds. (This doesn't count my desktop computer and monitors, test equipment, microscope bench, or the servers and rack as they're all still in use and being packed up over the next few days.) The lab will be totally down for 1-2 months.

The current state of my lab
Finally, I will be speaking on CPLD reverse engineering at REcon this June. The topic of the talk is the XC2C32A, which my regular readers may remember from a previous post. I'll be describing the reverse engineering process, how far I got, and showcasing bitstreams generated by my toolchain running on actual silicon. If any you are going, by all means introduce yourself :)

Wednesday, February 4, 2015

How to lose my business permanently

This post is a bit different from my usual ones in that it discusses the business side of the semiconductor industry, not the technical side. The issue has been getting more and more problematic lately so I figured I'd write up a few quick thoughts on it.

Let's suppose you're a large semiconductor company who is currently making a large amount of money selling chips to a couple of major customers. You've decided that your business is too big and you have no desire to get new customers, now or in the future. In fact, you don't even want these companies to use your products in new designs. What are some ways you can get rid of these pesky engineers trying to throw money at you?
  1. Make your parts hard to find. Ask major distributors like Digi-Key and AVnet to discontinue stocking them.
  2. If someone does manage to find an authorized sales partner, pester them with questions even if they're just looking for a budgetary price quote for a feasibility study. Ask for a project name, description, business plan, names of the team members, color of the soldermask, logo, and anything else you can think of. If it looks like an initial proof of concept that the customer isn't yet confident will become a high-volume product, or a one-off test fixture/lab tool, badger them by asking about annual sales volume and volume ramp-up dates until they lose interest and buy from a competitor.
  3. Just in case anyone actually succeeds in buying your part, make it useless to them. Keep the datasheet locked up in a steel vault in your corporate headquarters. Promise would-be customers that you'll let them see it if they sign away their firstborn son and sacrifice a golden lamb on an altar made of FPGAs, but hide the actual NDA contract behind so many redirect pages and broken links that nobody can actually sign it, much less see the actual datasheet. Bonus points if your chip is something commodity like a gigabit Ethernet PHY that has nothing even remotely sensitive in the datasheet.
If you follow these rules properly, congratulations! I'll do my part to further your goals by making sure you will never get design wins in any projects I'm involved in, especially high-volume ones for large companies. Your shareholders will be overjoyed.

Tuesday, January 20, 2015

TDR updates

A few months ago, I wrote about a project I had been thinking of for a while but not had time to work on: a time-domain reflectometer (TDR) for testing twisted pair Ethernet cables.

TDR background


The basic theory of operation is simple: Send a pulse down a transmission line and measure the reflected voltage over time to get a plot of impedance discontinuities over time. Unfortunately, doing this with sufficient temporal resolution (sub-nanosecond) requires extremely high analog sampling rates, and GHz A/D converters are (to say the least) not cheap: the least expensive 1 GSA/s ADC on Digi-Key is the HMCAD1520, which sells for $120 each at the time of this writing. Higher sampling rates cost even more, the 1.5 GSa/s ADC081500CIYB is listed at $347.

One possible architecture would consist of a pre-amplifier for each channel, a 4:1 RF mux, and a single high-speed ADC sampled by an FPGA. This would work, but seemed quite expensive and I wanted to explore lower-cost options.

ADC architecture


After thinking about the problem for a while, I realized that the single most expensive component in a classical TDR was probably the ADC - but there was no easy way to make it cheaper. What if I could eliminate the ADC entirely?

I drew inspiration from the successive-approximation-register (SAR) ADC architecture, which essentially converts a DAC into an ADC by binary searching. The basic operating algorithm is as follows:
  • For each point T in time
    • vstart = 0
    • vend = Vref
    • Set DAC to (Vstart + Vend) / 2
    • Compare Vin against Vdac
    • If Vin > Vdac
      • Set current bit of sample to 1
      • Set Vstart to Vdac, update ADC, repeat
    • else
      • Set current bit of sample to 0
      • Set Vend to Vdac, update ADC, repeat
The problem with SAR for high speeds is that N-bit analog resolution at M samples per second requires a DAC that can run at O(M log N) samples per second - hardly an improvement!

In order to work around this problem, I began to think about ways to represent the data generated by a SAR ADC. I ended up modeling a simplified SAR ADC which performed a linear, rather than binary, search. We can represent the intermediate data as a matrix of 2^N rows by M columns, one for each of M data points.

The sampling algorithm for this simplified ADC works as follows:
  • For each point T in time 
    • Set DAC to 0
    • Compare Vin against Vdac
    • If Vin > Vdac
      • Set column[Vdac] to 1
      • Increment Vdac
      • Repeat comparison
    • Otherwise stop and capture the next sample
Once we have this matrix, we can simply sum the number of 1s in each column to calculate the corresponding sample value.

While this approach will clearly work, it is exponentially slower than the conventional SAR ADC since it requires 2^N samples instead, instead of N, for N-bit precision. So why is it useful?

Now consider what happens if we acquire the data from a transposed version of the same matrix:
  • For each Vdac from 0 to Vref
    • For each point T in time
      • Compare Vin against Vdac
      • If Vin > Vdac
        • Set row Vin of column T to 1
    • Increment Vref
    • Go back in time and loop over the signal again
This version clearly captures the same data, since matrix[T][V] is set to true iff sample T is less than V. We simply switch the inner and outer loops.

It also has a very interesting property for cost optimization: Since it only updates the DAC after sampling the entire signal, we can now use a much slower (and cheaper) DAC than with a conventional SAR. In addition, the comparator can now update at the sampling frequency instead of 2^N times the sampling frequency.

There's just one problem: It requires time travel! Why are we wasting our time analyzing a circuit that can't actually be built?

Well, as it turns out we can solve this problem too - with "parallel universes". Since the impedance of the cable is (hopefully) fairly constant over time, if we send multiple pulses they should return identical reflection waveforms. We can thus send out a pulse, test one candidate Vdac value against this waveform, then increment Vdac and repeat.

The end result is that with a cheap SPI DAC, a high-speed comparator, and an FPGA with a high-speed digital input we can digitize a repetitive signal to arbitrary analog precision, with sampling rate limited only by comparator bandwidth and FPGA input buffer performance!

Pulse generation

The first step in any TDR, of course, is to generate the pulse.

I spent a while looking over FPGAs and ended up deciding on the Xilinx Kintex-7 series, specifically the XC7K70T. The -1 speed grade can do 1.25 Gbps LVDS in the high-performance I/O banks (matching the -2 and -3 speed Artix-7 devices) and the higher speed grades can go up to 1.6 Gbps.

The pulse is generated by using the OSERDES of the FPGA to produce a single-cycle LVDS 1 followed by a long series of 0s. The resulting LVDS pulse is fed into a Micrel SY58603U LVDS-to-CML buffer. This slightly increases the amplitude of the output pulse and sharpens the rise time up to 80ps.

The resulting pulse is then sent through the RJ45 connector onto the cable being tested.

Output buffer

Input preamplifier

The reflected signal coming off the differential pair is AC coupled with a pair of capacitors to prevent bus fights between the (unequal) common-mode voltages of the output buffer and the preamplifier. It is then fed into a LMH6881 programmable-gain preamplifier.

This is by far the most pricey analog component I've used in a design: nearly $10 for a single amplifier. But it's a very nice amplifier (made on a SiGe BiCMOS process)- very high linearity, 2.4 GHz bandwidth, and gain from 6 to 26 dB programmable over SPI in 0.25 dB steps.

Input preamplifier
The optional external terminator (R25) is intended to damp out any reflections coming off of the preamplifier if they present a problem; during the initial assembly I plan to leave it unpopulated. Since this is my first high-speed mixed signal design I'm trying to make it easy to tweak if I screwed up somehow :)

The output of the preamplifier is a differential signal with 2.5V common mode offset.

Differential to single-ended conversion

The next step is to convert the differential voltage into a single-ended voltage that we can feed into the comparator. I use an AD8045 unity gain voltage feedback amplifier for this, configured to compute CH1_VDIFF = (CH1_BUF_P - CH1_BUF_N)  + 2.5V.

Digitization

The single-ended voltage is compared against the DAC output (AFE_VREF) using one half of an LMH7322 dual comparator.

The output supply of the comparator is driven by a 2.5V supply to produce LVDS-compatible differential output voltage levels.

PCB layout

The board was laid out in KiCAD using the new push-and-shove router. All of the differential pairs were manually length-matched to 0.1mm or better.

The upper left corner of the board contains four copies of the AFE. The AD8045s are on the underside of the board because the pinout made routing easier this way. Hopefully the impedance discontinuities from the vias won't matter at these signal speeds...

AFE layout, front side
AFE layout, back side
The rest of the board isn't nearly as complex: the lower left has a second RJ45 connector and a RGMII PHY for interfacing to the host PC, the power supply is at the upper right, and the FPGA is bottom center.

The power supply is divided into two regions, digital and analog. The digital supply is on the far right side of the board, safely isolated from the AFE. It uses an LTC3374 to generate an 1.0V 4A rail for the FPGA core, a 1.2V 2A rail for the FPGA transceivers and Ethernet PHY, a 1.8V 1A rail for digital I/O, and a 2.5V rail for the CML buffers and Ethernet analog logic.

The analog supply was fairly close to the AFE so I put a guard ring around it just to be safe. It consists of a LTC3122 boost converter to push the 5V nominal input voltage up to 6V, followed by a 5V LDO to give a nice clean analog rail. I ran the output of the LDO through a pi filter just to be extra safe.

The TDR subsystem didn't use any of the four 6.6 Gbps GTX serial transceivers on the FPGA because they are designed to recover their clock from the incoming signal and don't seem to support use of an external reference clock. It seemed a shame to waste them, though, so I broke them (as well as 20 0.95 Gbps LVDS channels) out to a Samtec Q-strip header for use as high-speed GPIO.

Without further ado, here's the full layout. I could have made the board quite a bit smaller in the vertical axis but I needed to keep a constant 100mm high so it would fit in the card guides on my Eurocard rack.

The board is at fabs for quotes now and I'll make another post once the boards come back.

Layer 1
Layer 2 (ground)
Layer 3 (power)
Layer 4, flipped to make text readable

Thursday, January 15, 2015

GERALD microscope control system

While my optical microscopes are capable of sufficient resolution for imaging larger-process ICs, taking massive die images (this one, for a comparatively small 3.2mm^2 die, is about 0.6 gigapixels) has been beyond my capability because I have better things to do with my time than sit in the lab for a week turning the stage knob a little, clicking a button to take a picture, turning the knob a little...

John has a computer-controlled microscope that he uses for large imaging jobs and it seemed like it'd be a good idea to make one of my own. The first step was to write some control software. John's user interface is very much a "programmer's GUI" and I figured something with a bit more eye candy wouldn't be too hard to do.

The result was a system called GERALD. (I couldn't think of a name for it, asked my girlfriend for suggestions, and this was the first she came up with...)

The prototype is using my old AmScope microscope because I didn't want to take my main Olympus out of service for an extended period of time during development. Yes, I'm aware that the "support" structure under the stage isn't very rigid... this is a software development testbed and I won't be using this microscope in the final deployment so there's no sense wasting time machining nice aluminum brackets. I just have to be careful not to move around too much when using it ;)

Prototype GERALD system on my desk. The breadboarded MCU is generating step/direction pulses from USB-serial commands and sending them to the stepper controller under the textbook.

The camera in use is an AmScope MD1900. It uses a proprietary USB protocol and only has Windows driver support, and I try to run a 100% Linux shop. This was a problem... In keeping with John's unofficial motto "Open source by any means necessary", I corrected the problem ;)

A bit of Wireshark and libusb coding later, I had a basic working driver. The protocol is closely related (but not identical) to the MU800 that John reversed a few months ago, which helped me get my feet in the door. There's a bit of trivial obfuscation (XORing control transactions with 0x55aa or 0x5aa5) which I fail to understand... The average user isn't going to notice anything, and anyone with the skills to reverse engineer USB transactions or a kernel driver will see right through it, so why bother?

I plan to try making a kernel V4L2 driver in the future but for now it works so it's not a huge priority.


The current GERALD system is very much a WIP, most basic features are there but automated image capture and some other things aren't implemented yet.

GERALD UI overview
The basic UI layout is modeled in large part on the FEI Versa FIB's control system, which is currently my favorite control system for either optical or electron microscopes.

The upper left panel is an overview of the current camera feed, scaled down to fit. The upper right view, meanwhile, shows the center of the feed at native (1:1 pixel) resolution. Both views have a footer that includes a scale bar, date/time, the objective in use, and magnification. (The magnification is the actual value based on calibration of the camera and my 24" 1080p display.)

The lower right view is a webcam pointed at the sample stage. I haven't had the time to make a bracket to hold it in the right spot so it's just sitting on my desk for now. This is the equivalent of a SEM chamber cam and I hope to eventually use it to avoid crashing the sample into the objective in full-remote-control operation.

The lower left view is a navigation display showing the current sample. Unlike the Versa's nav-cam (a single static image taken of the stage when the sample is loaded) my navigation view is a composite of actual microscope images. Every time a new video frame comes in when the stage isn't moving (to avoid motion blur) it is plotted in the navigation view at the current physical position of the stage.

As of now the navigation view is non-interactive; all it does is show the current field of view on the sample. My plan is to support clicking to move the stage to a specific point, as well as drawing to define a rectangular area for step-and-repeat imaging.

In typical use I envision the user moving to the upper left and bottom right corner of the sample manually with the joystick, then selecting an objective and drawing a box around the sample to initiate a high-resolution imaging run.

In order for all of this to work properly, the system must be calibrated. The first step is to calibrate the camera so that it knows how large each pixel is. I've done this manually in the past by taking pictures of a calibration slide and counting pixels, but it's high time I automated the process.

GERALD pixel size calibration
The algorithm is quite simple and is designed to work with the pattern on my calibration slide (10um pitch short lines and 50um pitch long lines) only. As of now the are limited sanity checks so if there's no calibration slide in view the results can be somewhat strange :)
  • Convert the image to grayscale using NTSC color weights
  • Compute a gray-level histogram
  • Median-filter the histogram to smooth out spikes and find peaks
  • The distribution should be approximately bimodal (white lines on dark background). Take the mean of the peaks and threshold the image to binary.
  • Do a median filter on the binary image to smooth out noise
  • Scan across the image horizontally, one scan line at a time. Note the start and end locations of each white area. If the width is too small (less than two pixels) or too large (more than 10% of the screen) discard it. Otherwise save the width and centroid as a slice down a potential line.
  • For each slice, check if the next row down has a line slice within a couple of pixels. If so, add it to the growing polyline and remove it from the list of candidates.
  • Fit a line segment to the points in each polyline using least-squares, set its width to the mean of all slices used
  • Extend each line segment until it hits the edge of the image. If the projected line gets within a very short distance of BOTH endpoints of a second segment, the two are collinear so merge them. (This will smooth out gaps in the detected lines from dust specks etc.) The resulting lines are plotted in green in the calibration view.
  • Project the lines onto the X axis and sort them from left to right.
  • For each line from left to right, measure the perpendicular distance to the next one over (displayed in red in the calibration view).
  • Compute the median of these line lengths. This is considered to be the pitch of the lines, in pixels.
  • Find the median width of the lines.
  • If the pitch is less than three times the line width, we're looking at fine-pitch lines (10 μm pitch). Otherwise the fine pitch lines were too small to resolve and we're looking at coarse pitch (50 μm).
  • Compute the physical size of one camera pixel from the known physical pitch and pixel pitch.
Now that that the camera has been calibrated, we know how big a camera pixel is - but we don't have the scaling and rotation factors needed to transform between camera and stage coordinates. This is the second half of the calibration.

Rather than trying to compute a full transformation matrix, the current code simply represents a motor step for each motor as a 2-vector describing the distance (in nanometers, referenced to the camera axes) the stage moves during one motor step. We can then easily compute the distance moved by any number of motor steps as a linear combination of these two 2-vectors.

This algorithm is built on top of the same CV code used for the camera calibration.
  • Find the rightmost line on the calibration slide.
  • Locate the midpoint of it. (This is marked with a blue dot in the debug overlay.)
  • Check if this keypoint is in the left 1/4 of the camera's FOV.
  • If not, move left 50 steps, take a picture, and repeat until it is.
  • Record the position of the keypoint.
  • Move right, 50 steps at a time, until the keypoint is in the right 1/4 of the FOV.
  • Record the 2D distance the keypoint moved and divide by the number of steps taken to get the X axis step vector.
  • Repeat for the Y axis, except using 1/3 as the threshold instead of 2/3 since the Y axis FOV is smaller than X.

The system isn't quite finished but it's coming along nicely. Hopefully I'll have time in the next couple of months to finish the software, make a PCB for the control circuit, and machine brackets to hold all of the parts onto my Olympus scope.

FPGA cluster updates


Although the "raised floor" design for my FPGA cluster looked cool, it really didn't scale. My entire desk was full, there was very limited room for new hardware, and the boards kept getting dusty. To make matters worse, long wires were needed to connect everything and it was difficult to manage them all.

Original FPGA cluster

I ended up moving forward with the plan I came up with a few months ago and my FPGA cluster is now living on the 19" rack in my living room.

The first step was to laser-cut acrylic frames for each board (or several boards, if they were small enough) that would slide into the card guides.

In the photo below you can see nodes lx9mini0 and lx9mini2 (lx9mini1 was being used for something else at the time so I put it on another card later on) on a clear 1/16" acrylic sheet cut to standard Eurocard dimensions. The clear faceplate was later replaced with an opaque black one because I think it looks better that way.

Spartan-6 FPGA boards on a 3U blade card

My existing USB hubs weren't well suited to the rackmount form factor so I built a new one. This is a ten-port USB 2.0 hub with two front-panel ports (for keyboard and mouse) and eight back-side ports for internal connections. It consists of three 4-port Cypress hub chips in a tree, plus the associated PMICs. For extra fun I threw on an XC2C128 CPLD with a SPI headers so that I could potentially toggle power to individual ports remotely over SPI, but as of now this functionality isn't being used.

As with my previous hub designs all ports are overcurrent protected. I also added external ESD clamp diodes (RCLAMP0514M) to each data line after an killing a previous hub by zapping it while plugging my phone in to charge.

The port indicator LEDs are off in the idle state, green when a device has enumerated successfully, blinking green when a device is detected but fails to enumerate, and red for an overcurrent fault.

3U x 4HP USB hub blade


I went with my original plan of racking the Atlys boards in one of my empty 1U cases and the AC701 in another. The boards are screwed into standoffs which are attached to the case by cyanoacrylate adhesive.

Nodes atlys0 and atlys1 being installed in a 1U server case. JTAG and Ethernet cables are not yet installed.

I then installed my PDU board, the BeagleBone Black, the USB hub, and all of the smaller FPGA boards I was using for my research in the rack. Since I was moving the second 24-port switch from my desk to the rack I hooked the two together with a short run of multimode fiber. Fiber was overkill for the application but I had SFP ports sitting around unused and I was running out of copper interfaces...

The standard I've decided on for all new designs in the near future is as follows:
  • Boards are to be 100mm tall (3U Eurocard height)
  • Component keepouts along the top and bottom edges as specified by the Eurocard standard for card guides
  • Faceplate mounting holes on the front panel as specified by the Eurocard standard
  • 5V DC center-high power via standard 5.5mm barrel jack on the back edge
  • Network jacks and indicator LEDs on the front edge
  • JTAG, USB, and other I/O on the back edge

My current rack
The current FPGA cluster
Back side of the FPGA cluster midway through rack install. The USB cable coming out the bottom was temporary for testing until I could find a right-angle one.
All of the cards have nice shiny black faceplates except my 4-port switch prototype from a few years ago - the laser cutter on campus was unavailable the day I wanted to get the faceplate made and I never got around to doing it.

I still have quite a bit more gear to rack - several Spartan-6 boards I'm not actively using in my research, a Raspberry Pi, a Parallella, and a large number of PIC MCU development boards of various types. This will fill the current card rack and then some, so I left 3U of empty space below this one for expansion. The Parallella runs hot so I ordered a 1U fan tray which will be installed later today.

The 1U blank above the card rack is there for airflow reasons (the fan will blow upwards and exhaust air needs some space to blow out) but I can still fit stuff that doesn't block airflow. Early next week I'll be replacing it with a 16-port LC-LC patch panel so that I can terminate fiber runs to various devices on the rack (such as the AC701 board, which currently has a 2m run of multimode going around the side of the rack because there's no good way to get fiber from back to front).