Unity is telling me Clip() is expensive on Android/iOS. I have a hard time to understand why, because from what I understand, it only prevents the rendering pipeling from writing on the backbuffer, which should save time, not burn it. So what does Clip does that is so expensive? Also, how about Clip vs discard? Is there a difference in cost? And if I should not use Clip, what should I use to do texture clipping?
Alright, this settle that... But how about performance? Why Unity tells me it's costly while what I know of rendering tells me it should saves me time instead?
Its mainly an issue on some architecture with Z buffer writes. With clip()/discard()/alphatest, the HW is now longer dealing with opaque triangles, so it can break rendering designs. But I can tell you that I saw tremendous speedup using clip() on an ipad2... (with transparent geometry, no Z writes) Im' saying this because its supposed to be a big nono to use clip() on ios devices.
Technically, it is still an opaque triangle... Just not drawn. Are you saying that is some device, there's an optimization done using the Z Buffer to ignore some "behind-pixel" from being drawn, but that clip - while writing on the z buffer - break that optimization and forces all triangle to be drawn anyway? It's weird, because I was sure the rendering order was from the further to the nearest, which makes this optimization rather pointless.
I'll stop you right there because you're on the right track but it's not a Z-buffer. I think it suffices to say that discarding fragments of opaque geometry is very expensive on tile-based deferred renderers. If you really want to know more, then you should read about how modern mobile graphics hardware works, probably starting with this article.
I've already read this page, and it doesn't highlight me one bit about why clipping an opaque geometry would mess with tile-base rendering. Also, are you saying this issue is only for deferred rendering?
"Tile-based deferred" GPU architectures are a completely different aspect from Unity's "deferred render path". I agree it's all very confusing at first
It is an issue only for tile-based deferred renderers. This is a type of graphics chipset commonly used in mobile devices. As Metaleap points out, the deferred rendering path in Unity is a completely different thing.
From the Technology section of the article on PowerVR: Alpha testing, clip() and discard() are fragment shader operations. All of the speed advantages of TBDRs come from culling hidden surfaces at the geometry level. Once you introduce the ability to discard individual fragments, geometry culling doesn't work.
I have to say, that's an ingenious way making way with as little overdraw as possible... But also a stupid ass technology for video game. Cutout alpha is massively used in tree, foliage, dress and so on, since it is a lot faster then normal transparency. Now... someone got list of the mobile device working in this mode? And is there a less costly alternative?
Normal transparency is a lot faster on mobile than clip is. It does mean you get alpha sorting issues like you normally would with alpha blend, though.
I don't have a list, but most chipsets work with TBDR as far as I know. A typical exception is the Tegra family from Nvidia. I've even seen listings of chipsets that can render in both ways. There was a time that clip/discard was also not recommended for desktop graphics. It was recommended to use alpha testing instead. Pretty soon clip/discard became faster than alpha testing though and the recommendation reversed. In DirectX 11 alpha testing is not even available anymore and clip/discard is your only choice. I haven't tested usage of clip on mobile devices yet, but TBDR can be surprising when it comes to performance. Running a 200.000 poly fairly opaque scene on a Nexus 5 is no problem at all. Add the simplest possible FXAA and the frame rate goes out of the window. Full screen effects are a big no for TBDR. Alpha blending on complex geometry can also be a killer. I see the Nexus 5 contains an Adreno 330. This is actually one of those chipsets that can do both TBDR and clasical direct rendering.