Sitemap

A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.

Posts

ONNX Conv를 Linalg로 변환하기: conv_2d_nchw_fchw

2 minute read

Published: February 04, 2026

ONNX dialect의 Conv 연산을 Linalg dialect의 conv_2d_nchw_fchw로 변환하는 과정을 단계별로 설명합니다. 입력/속성/출력 매핑 방법과 패턴 구조 설계, 구현 과정을 상세히 다룹니다.

Converting ONNX Conv to Linalg: conv_2d_nchw_fchw

2 minute read

Published: February 04, 2026

We explain step-by-step how to convert ONNX dialect Conv operations to Linalg dialect conv_2d_nchw_fchw. We detail input/attribute/output mapping methods, pattern structure design, and implementation process.

[TIR][Schedule] FuseReductionEpilogue: 표현식 기반 일반화 구현

4 minute read

Published: January 23, 2026

기존의 명시적 패턴 매칭 방식에서 벗어나, 임의의 에필로그 표현식을 처리할 수 있도록 fuse_reduction_epilogue를 일반화했습니다. 패턴별 분기 로직을 제거하고 표현식 기반의 통합 처리 방식을 도입하여 확장성과 유지보수성을 크게 향상시켰습니다.

[TIR][Schedule] FuseReductionEpilogue: Expression-Based Generalization

5 minute read

Published: January 23, 2026

We generalized fuse_reduction_epilogue so that it can handle arbitrary epilogue expressions instead of relying on hard-coded pattern matching. By removing pattern-specific branching logic and introducing an expression-driven unified pipeline, we significantly improved extensibility and maintainability.

Mixed Linalg and ONNX Operations를 위한 Bufferization

3 minute read

Published: January 14, 2026

ONNX-MLIR에서 linalg와 krnl을 동시에 bufferization해야 하는 경우의 문제를 해결합니다. One-Shot Bufferization과 Krnl Lowering을 혼합하여 사용하는 IR Lowering 과정을 상세히 설명합니다.

Bufferization for Mixed Linalg and ONNX Operations

3 minute read

Published: January 14, 2026

We solve the problem of simultaneously bufferizing linalg and krnl in ONNX-MLIR. We detail the IR lowering process that mixes One-Shot Bufferization and Krnl Lowering.

[TIR][Schedule] FuseReductionEpilogue: Clipping 패턴 지원 구현

3 minute read

Published: January 08, 2026

TVM의 TIR 스케줄 프리미티브인 fuse_reduction_epilogue의 지원 범위를 확장하여 Clipping(min(max(x, lower), upper)) 패턴을 자동으로 감지하고 최적화하는 기능을 추가했습니다. ReLU6나 Bounded ReLU와 같이 딥러닝 모델에서 빈번하게 사용되는 Clipping 연산을 리덕션 블록과 통합함으로써 메모리 대역폭 효율을 높였습니다.

[TIR][Schedule] FuseReductionEpilogue: Clipping Pattern Support Implementation

3 minute read

Published: January 08, 2026

We extended the support scope of TVM TIR schedule primitive fuse_reduction_epilogue to automatically detect and optimize Clipping (min(max(x, lower), upper)) patterns. By integrating Clipping operations, which are frequently used in deep learning models like ReLU6 and Bounded ReLU, with reduction blocks, we improved memory bandwidth efficiency

useLinalgPath 활성화 시 단계별 상세 파이프라인 및 End-to-End 검증

3 minute read

Published: January 07, 2026

ONNX-MLIR에서 –use-linalg-path 옵션을 사용할 때 실행되는 3단계 파이프라인과 End-to-End 검증을 위한 driver.cpp 분석을 다룹니다. ONNX에서 Linalg, Affine/SCF, 그리고 최종적으로 LLVM Dialect로의 변환 과정을 상세히 설명합니다.

Detailed Pipeline Stages with useLinalgPath Enabled and End-to-End Validation

3 minute read

Published: January 07, 2026

This post covers the 3-stage pipeline executed when using the –use-linalg-path option in ONNX-MLIR and the driver.cpp analysis for End-to-End validation. We detail the transformation process from ONNX to Linalg, Affine/SCF, and finally to LLVM Dialect.

useLinalgPath 활성화 시 단계별 상세 파이프라인 및 End-to-End 검증

3 minute read

Published: January 07, 2026

[논문 리뷰]WideSA의 Routing-Aware PLIO 할당 알고리즘

2 minute read

Published: December 10, 2025

WideSA는 Versal ACAP에서 높은 AIE 배열 활용도를 달성하기 위한 매핑 방안입니다. 라우팅-인식 PLIO 할당 알고리즘을 통해 PLIO 포트와 AIE 코어 사이의 데이터 입/출력 경로를 구축하고, 컴파일 성공률을 높입니다.

[Paper Review]WideSA: Routing-Aware PLIO Allocation Algorithm

3 minute read

Published: December 10, 2025

WideSA uses a routing-aware PLIO allocation algorithm to solve routing problems that occur during high AIE utilization. Through this algorithm, it constructs data input/output paths between PLIO ports and AIE cores, improving compilation success rate.

ONNXToLinalg 파이프라인 구축: MatMul 연산 변환 구현

2 minute read

Published: December 09, 2025

ONNX-MLIR에서 ONNX Dialect를 Linalg Dialect로 변환하는 파이프라인을 구축하는 과정을 다룹니다. 인프라 구축부터 MatMul 연산의 구체적인 변환 로직 구현, 그리고 IR 변환의 상세 과정까지 단계별로 설명합니다.

ONNXToLinalg Pipeline Construction: MatMul Operation Conversion Implementation

3 minute read

Published: December 09, 2025

This post covers the process of building a pipeline to convert ONNX Dialect to Linalg Dialect in ONNX-MLIR. We explain step-by-step from infrastructure setup to the specific conversion logic implementation of MatMul operations, and the detailed IR transformation process.

ONNX-MLIR의 Linalg Dialect 도입: 컴파일 흐름과 최적화 이점

1 minute read

Published: December 08, 2025

ONNX-MLIR에서 Linalg Dialect를 도입함으로써 얻을 수 있는 컴파일 흐름의 변화와 최적화 이점에 대해 살펴봅니다. 기존 Krnl 기반 흐름의 한계와 Linalg가 제공하는 구조화된 연산 및 고급 변환 기능을 분석합니다.

ONNX-MLIR Linalg Dialect Integration: Compilation Flow and Optimization Benefits

1 minute read

Published: December 08, 2025

This post explores the changes in compilation flow and optimization benefits that can be achieved by introducing Linalg Dialect into ONNX-MLIR. We analyze the limitations of the existing Krnl-based flow and the structured operations and advanced transformation capabilities provided by Linalg.

[논문 리뷰]MaxEVA의 AI Engine 커널 배치 전략 및 통신 방식

2 minute read

Published: December 07, 2025

MaxEVA 프레임워크는 Versal AI Engine 배열의 활용도를 최대화하고 MatMul 커널과 애더 트리 간의 통신에서 DMA 사용을 최소화하여 효율성을 높이는 정교한 배치 전략을 사용합니다.

[Paper Review]MaxEVA: AI Engine Kernel Placement Strategy and Communication Methods

2 minute read

Published: December 07, 2025

The MaxEVA framework employs a sophisticated placement strategy that maximizes Versal AI Engine array utilization and minimizes DMA usage in communication between MatMul kernels and adder trees, enhancing overall efficiency.

VectorBlox vnnx_tflite.py 수정: 문제 1 - TRANSPOSE 연산 문제 해결

3 minute read

Published: December 06, 2025

vnnx_tflite.py에서 TRANSPOSE 연산 처리 시 발생하는 컴파일 실패 문제를 해결했습니다. TFLite 최적화로 인해 잘못 변환된 TRANSPOSE를 감지하여 원래의 RESHAPE 연산으로 복구하는 방법을 설명합니다.

VectorBlox vnnx_tflite.py Fix: Problem 1 - TRANSPOSE Operation Issue Resolution

3 minute read

Published: December 06, 2025

Fixed compilation failure issues when processing TRANSPOSE operations in vnnx_tflite.py. Explains how to detect incorrectly converted TRANSPOSE operations due to TFLite optimization and restore them to original RESHAPE operations.

VectorBlox vnnx_tflite.py 수정: 문제 3 - SLICE/STRIDED_SLICE 5D 텐서 처리

2 minute read

Published: December 06, 2025

vnnx_tflite.py에서 5차원 텐서를 슬라이싱하는 SLICE/STRIDED_SLICE 연산 처리 시 발생하는 컴파일 실패 문제를 해결했습니다. 5차원을 4차원으로 안전하게 변환하는 방법을 설명합니다.

VectorBlox vnnx_tflite.py Fix: Problem 3 - SLICE/STRIDED_SLICE 5D Tensor Processing

3 minute read

Published: December 06, 2025

Fixed compilation failure issues when processing SLICE/STRIDED_SLICE operations on 5-dimensional tensors in vnnx_tflite.py. Explains how to safely convert 5-dimensional tensors to 4-dimensional tensors.

VectorBlox vnnx_tflite.py 수정: 문제 2 - RESHAPE 연산 문제 해결

2 minute read

Published: December 06, 2025

vnnx_tflite.py에서 RESHAPE 연산 처리 시 발생하는 multi-axis squeeze 및 single-axis squeeze 문제를 해결했습니다. VectorBlox SDK가 지원하지 않는 reshape 패턴을 사전에 감지하여 NOP 연산으로 처리하는 방법을 설명합니다.

VectorBlox vnnx_tflite.py Fix: Problem 2 - RESHAPE Operation Issue Resolution

3 minute read

Published: December 06, 2025

Fixed multi-axis squeeze and single-axis squeeze issues when processing RESHAPE operations in vnnx_tflite.py. Explains how to pre-detect reshape patterns not supported by VectorBlox SDK and process them as NOP operations.

VectorBlox vnnx_tflite.py 수정: 문제 5 - 상수 텐서 버퍼 구조체 호환성 문제 해결

1 minute read

Published: December 06, 2025

vnnx_tflite.py에서 상수 텐서의 buffer 필드를 직렬화할 때 발생하는 struct.pack 에러를 해결했습니다. 모든 텐서의 buffer를 [buffer_id, offset] 배열 형식으로 통일하여 C 구조체와의 호환성을 확보하는 방법을 설명합니다.

VectorBlox vnnx_tflite.py Fix: Problem 5 - Constant Tensor Buffer Structure Compatibility Issue Resolution

2 minute read

Published: December 06, 2025

Fixed struct.pack errors when serializing buffer fields of constant tensors in vnnx_tflite.py. Explains how to unify all tensor buffers to [buffer_id, offset] array format to ensure compatibility with C structures.

VectorBlox vnnx_tflite.py 수정: 문제 4 - INT8 상수 ADD/SUB 연산 문제 해결

1 minute read

Published: December 06, 2025

vnnx_tflite.py에서 상수 연산을 포함한 ADD/SUB 연산 처리 시 발생하는 양자화 파라미터 미초기화 문제를 해결했습니다. multi_input=False인 경우에도 모든 양자화 파라미터를 초기화하도록 수정하는 방법을 설명합니다.

VectorBlox vnnx_tflite.py Fix: Problem 4 - INT8 Constant ADD/SUB Operation Issue Resolution

2 minute read

Published: December 06, 2025

Fixed quantization parameter uninitialization issues when processing ADD/SUB operations with constant operands in vnnx_tflite.py. Explains how to modify the code to initialize all quantization parameters even when multi_input=False.

onnx-mlir란 무엇인가?

3 minute read

Published: December 05, 2025

onnx-mlir은 ONNX 모델을 네이티브 코드로 효율적으로 변환하는 오픈 소스 컴파일러입니다. 이 포스트에서는 MLIR, ONNX, 그리고 이 둘의 결합체인 onnx-mlir의 기술적 세부 사항을 심층적으로 다룹니다.

What is an onnx-mlir?

4 minute read

Published: December 05, 2025

onnx-mlir is an open-source compiler that efficiently converts ONNX models to native code. This post covers the technical details of MLIR, ONNX, and their combination in onnx-mlir.

VectorBlox vnnx_tflite.py: TFLite → VNNX 변환 파이프라인 분석

2 minute read

Published: December 04, 2025

vnnx_tflite.py는 TensorFlow Lite(INT8) 모델을 VectorBlox의 VNNX 형식으로 변환하는 핵심 모듈입니다. 이 글에서는 전체 플로우와 내부 구조(generate_vnnx_from_json_subgraphs, update_offsets, vbx.sim.Model 시뮬레이션)를 정리하고, 최신 AI 모델 지원을 위해 직접 패치한 내용을 예고합니다.

VectorBlox vnnx_tflite.py: TFLite → VNNX Conversion Pipeline Analysis

3 minute read

Published: December 04, 2025

vnnx_tflite.py is a core module that converts TensorFlow Lite (INT8) models to VectorBlox VNNX format. This post covers the overall flow and internal structure (generate_vnnx_from_json_subgraphs, update_offsets, vbx.sim.Model simulation), and previews custom patches made to support latest AI models.

VectorBlox VNNX 변환 이슈: Clip 및 ScatterND 연산자 제거

1 minute read

Published: December 04, 2025

ONNX 모델을 VectorBlox VNNX 형식으로 변환할 때 호환되지 않는 연산자들이 발생합니다. 이 포스트에서는 Clip 연산자와 ScatterND 연산자를 제거하고 대체하는 방법을 다룹니다.

VectorBlox VNNX Conversion Issues: Removing Clip and ScatterND Operators

1 minute read

Published: December 04, 2025

When converting ONNX models to VectorBlox VNNX format, some operators are incompatible. This post covers methods to remove and replace Clip and ScatterND operators.

VectorBlox: PolarFire FPGA용 AI 가속기

3 minute read

Published: December 04, 2025

VectorBlox는 Microchip의 PolarFire FPGA를 위한 AI/ML 추론 가속기 플랫폼입니다. TensorFlow Lite INT8 네트워크를 지원하며, 소프트웨어 기반 구현으로 FPGA 재프로그래밍 없이 AI 모델을 배포할 수 있습니다. 5W 미만의 전력 효율과 오버레이 디자인을 통해 여러 네트워크를 동적으로 전환할 수 있습니다.

VectorBlox: AI Accelerator for PolarFire FPGA

3 minute read

Published: December 04, 2025

VectorBlox is an AI/ML inference accelerator platform for Microchip PolarFire FPGAs. It supports TensorFlow Lite INT8 networks and enables AI model deployment without FPGA reprogramming through software-based implementation. With power efficiency under 5W and overlay design, it can dynamically switch between multiple networks.