Does anybody use clang-cl?

DSP, Plugin and Host development discussion.
Post Reply New Topic
RELATED
PRODUCTS

Post

I've been using clang-cl for windows c++ development and only have positive experience so far.
It has nice integration with Visual studio using this plugin.
It does work nicely with some libraries I needed (TBB, Eigen, stb_image_write), does work with VS debugger. Assembly quality is good and almost every time it's faster than MSVC.
Also, cross-compilation helps to reduce bugs and non-standard behavior, which is really nice if you develop mostly on windows.

What is your experience with alternative compilers on windows?

Post

Briefly tried it recently, so far so good, VS integration is smooth with that plug-in you mentioned, performances are pretty much the same as the MS compiler (at least on my plug-ins), linking is faster but probably because of the "Full program optimization" flag I use on the MS linker which takes a lot of time on release builds.

Clang seems a bit smarter on function inlining, I had to switch from "inline" to "__forceinline" to get the same result on both builds.

Post

I've recently started experimenting with LLVM and its MSVC front-end clang-cl, and its pretty good, has some good static code analysis features. LLVM8 even creates a slightly (5%) faster 64-bit code than Intel C++ Compiler 18, at least in my audio plugin projects.
Image

Post

The only issue with LLVM is that when AVX2 is targetted, it lags behind Intel C++ Compiler considerably.
Image

Post

Aleksey Vaneev wrote: Tue Jul 09, 2019 7:05 am The only issue with LLVM is that when AVX2 is targetted, it lags behind Intel C++ Compiler considerably.
Clang does not use FMA instructions by default. Also, it needs more "unsafe" options to get better optimizations. Intel compiler uses them by default.

Code: Select all

-mfma -Xclang -Ofast
I get decent auto-vectorization for my floating point code, and that includes some crazy stuff, such as:

Code: Select all

double normal_prod_local[10 + 4] = {};	
for (size_t i = 0; i < size; i++)
	{
		double a0 = p0[i];
		double a1 = p1[i];
		double a2 = p2[i];
		double a3 = p3[i];
		double ar = p_r[i];

		normal_prod_local[0] += a0 * a0;

		normal_prod_local[1] += a1 * a0;
		normal_prod_local[2] += a1 * a1;

		normal_prod_local[3] += a2 * a0;
		normal_prod_local[4] += a2 * a1;
		normal_prod_local[5] += a2 * a2;

		normal_prod_local[6] += a3 * a0;
		normal_prod_local[7] += a3 * a1;
		normal_prod_local[8] += a3 * a2;
		normal_prod_local[9] += a3 * a3;

		normal_prod_local[10] += ar * a0;
		normal_prod_local[11] += ar * a1;
		normal_prod_local[12] += ar * a2;
		normal_prod_local[13] += ar * a3;
	}

Code: Select all

00007FF655281760  vmovupd     ymm14,ymmword ptr [r11+rbx*8]  
00007FF655281766  vmovupd     ymm15,ymmword ptr [r14+rbx*8]  
00007FF65528176C  vmovupd     ymm2,ymmword ptr [rsi+rbx*8]  
00007FF655281771  vmovupd     ymm3,ymmword ptr [rax+rbx*8]  
00007FF655281776  vmovupd     ymm4,ymmword ptr [r10+rbx*8]  
00007FF65528177C  vfmadd231pd ymm0,ymm14,ymm14  
00007FF655281781  vfmadd231pd ymm13,ymm15,ymm14  
00007FF655281786  vfmadd231pd ymm11,ymm2,ymm14  
00007FF65528178B  vfmadd231pd ymm9,ymm3,ymm14  
00007FF655281790  vfmadd231pd ymm5,ymm4,ymm14  
00007FF655281795  vfmadd231pd ymm12,ymm15,ymm15  
00007FF65528179A  vfmadd231pd ymm10,ymm2,ymm15  
00007FF65528179F  vfmadd231pd ymm8,ymm3,ymm15  
00007FF6552817A4  vmovapd     ymm14,ymmword ptr [rsp+80h]  
00007FF6552817AD  vfmadd231pd ymm14,ymm4,ymm15  
00007FF6552817B2  vmovapd     ymmword ptr [rsp+80h],ymm14  
00007FF6552817BB  vfmadd231pd ymm1,ymm2,ymm2  
00007FF6552817C0  vfmadd231pd ymm7,ymm3,ymm2  
00007FF6552817C5  vmovapd     ymm14,ymmword ptr [rsp+0A0h]  
00007FF6552817CE  vfmadd231pd ymm14,ymm4,ymm2  
00007FF6552817D3  vmovapd     ymmword ptr [rsp+0A0h],ymm14  
00007FF6552817DC  vmovapd     ymm2,ymmword ptr [rsp+0C0h]  
00007FF6552817E5  vfmadd231pd ymm2,ymm4,ymm3  
00007FF6552817EA  vmovapd     ymmword ptr [rsp+0C0h],ymm2  
00007FF6552817F3  vfmadd231pd ymm6,ymm3,ymm3  
00007FF6552817F8  add         rbx,4  
00007FF6552817FC  cmp         rdi,rbx  
00007FF6552817FF  jne         (07FF655281760h) 

Post

I've tried various -m options like -mavx2 -msse3 -mfma, and while code size changes a bit, the performance is same, so probably clang can't see how it can optimize code further whereas ICC can optimize it.
Image

Post Reply

Return to “DSP and Plugin Development”