VectorBlox: AI Accelerator for PolarFire FPGA

3 minute read

Published:

VectorBlox is an AI/ML inference accelerator platform for Microchip PolarFire FPGAs and SoCs. It enables AI model deployment without FPGA reprogramming through software-based implementation, optimized for edge AI applications with power efficiency under 5W.

1. VectorBlox Overview

Supported Operators1

The table below lists supported operators and their known limitations in VectorBlox.

OperatorsKnown Limitations
ABS 
ADDFused activation function = [0, NONE, RELU, RELU6]
ARG_MAXAxis = [-1]
ARG_MINAxis = [-1]
AVERAGE_POOL_2DFused activation function = [0, NONE, RELU, RELU6], Padding = [SAME, VALID]
CONCATENATIONAxis = [-4, -3, -2, -1], Fused activation function = [0, NONE, RELU, RELU6]
CONV_2DFused activation function = [0, NONE, RELU, RELU6], Padding = [SAME, VALID]
DEPTHWISE_CONV_2DFused activation function = [0, NONE, RELU, RELU6], Padding = [SAME, VALID]
DEQUANTIZE 
DIVFused activation function = [0, NONE, RELU, RELU6], Others: Input 2 must be a constant
ELU 
EQUAL 
EXP 
EXPAND_DIMSAxis = [-4, -3, -2, -1]
FULLY_CONNECTEDFused activation function = [0, NONE, RELU, RELU6]
GATHERAxis = [-4, -3, -2, -1]
GELU 
GREATERAxis = [-4, -3, -2, -1]
GREATER_EQUAL 
HARD_SWISH 
LEAKY_RELU 
LESS 
LESS_EQUAL 
LOG 
LOGISTIC 
MAXIMUM 
MAX_POOL_2DFused activation function = [0, NONE, RELU, RELU6], Padding = [SAME, VALID]
MEAN 
MINIMUM 
MULFused activation function = [0, NONE, RELU, RELU6]
NEG 
NOT_EQUAL 
PACKAxis = [-4, -3, -2, -1]
PAD 
PADV2 
POW 
PRELU 
QUANTIZE 
REDUCE_MAXAxis = [-4, -3, -2, -1]
REDUCE_MINAxis = [-4, -3, -2, -1]
REDUCE_PRODAxis = [-4, -3, -2, -1]
RELU 
RELU6 
RELU_0_TO_1 
RELU_N1_TO_1 
RESHAPE 
RESIZE_BILINEAR 
RESIZE_NEAREST_NEIGHBOR 
RSQRT 
SILU 
SLICE 
SOFTMAXDim = [-3, -2, -1]
SPLITAxis = [-4, -3, -2, -1]
SPLIT_VAxis = [-4, -3, -2, -1]
SQUEEZEAxis = [-4, -3, -2, -1]
STRIDED_SLICE 
SUBFused activation function = [0, NONE, RELU]
SUMAxis = [-4, -3, -2, -1]
TANH 
TILE 
TRANSPOSE 
TRANSPOSE_CONVFused activation function = [0, NONE, RELU, RELU6], Padding = [SAME, VALID]
UNPACKAxis = [-4, -3, -2, -1]
CASTOthers: Cast inputs from INT8 or UINT8 to INT32

2. CoreVectorBlox IP2

CoreVectorBlox is a neural network accelerator IP core for PolarFire FPGAs.

Architecture

graph TB
    subgraph SYSTEM["System Level"]
        HOST["Mi-V Soft Processor"]
        CORE["CoreVectorBlox"]
        DDR["DDR Memory"]
        NVM["Non-Volatile Memory"]
    end
    
    subgraph CORE_INTERNAL["CoreVectorBlox Internal"]
        CTRL["Control Registers<br/>AXI4-Lite"]
        MCU["Microcontroller<br/>RISC-V"]
        MXP["MXP Vector Processor"]
        CNN["CNN Accelerator"]
    end
    
    HOST -->|Control| CTRL
    CTRL -->|Commands| MCU
    MCU -->|Vector Commands| MXP
    MCU -->|Convolution Commands| CNN
    CORE -->|AXI4 Master| DDR
    NVM -->|Load at Boot| DDR
    DDR -->|Read BLOB| CORE

Components

  1. Control Registers: Control and status management through AXI4-Lite slave interface
  2. Microcontroller: RISC-V based soft processor for network BLOB parsing and vector processor control
  3. MXP Vector Processor: Vector processor for general neural network layers
  4. CNN Accelerator: Dedicated accelerator for convolutional layers

MXP Vector Processor Architecture

CoreVectorBlox MXP Vector Processor Detailed Architecture

Memory Components

CoreVectorBlox stores the following three BLOBs (Binary Large Objects) in memory:

  1. Firmware BLOB: Firmware common to all networks
  2. Network BLOB: BLOB compiled by VectorBlox SDK for each network
  3. Network I/O: Network input/output data

3. VectorBlox SDK3

The compilation process is as follows:

  • .pth (PyTorch model) → .onnx (ONNX model) → .tflite (TensorFlow Lite model) → .vnnx (VectorBlox model)

Use torch.onnx.export() to convert from .pth to .onnx, onnx2tf --output_integer_quantized_tflite to convert from .onnx to .tflite, and vnnx_compile command to convert from .tflite to .vnnx.


Language: 한국어 (Korean)