CUDA, Unity, Unreal, Hair simulation - and learning new lessons alongside lessons learned
I'm starting a new series of posts. It's been awhile, I know, but like most of us, life happens - especially work projects. LOL As for me, a quick re-introduction may be in order. I started as a research scientist (Physics and Mathematics), changed over to multimedia/C++ programming at Microsoft, first supporting Microsoft's partners and customers in the developer community, then in both the Consumer Division and a advanced research products group as a s/w design engineer. I went into independent consulting for awhile, and supplemented that with writing (usually about coding, or as a combination programmer/writer), got truly excited about the possibilities of embedded videos and augmented reality in books, and Unity shaders (Cg/HLSL) during my masters work, and recently ended up working on TressFX where I had my mind blown by the wonders of the power of the GPU and the tangled terror that is something as huge as Unreal engine.
So you might as well know now, that I'm not a hardcore gamer (and lots of people who love graphics level programmer aren't), but I'm always fascinated by the rendering and performance, even the beauty, of graphics. I am a huge fan of anime and manga (and books in general), though that doesn't mean I love it all. I generally prefer ones that have some adventure and humor to them - even if the beauty of the art can vary with those. My favorites include Mob Psycho 100, Noragami, Kakuriyo and lots of others that are more well-known like Black Clover and Demon Slayer. In fact, my love of anime/manga and art/books was what drew me toward NPR (non-photorealistic rendering) and AR/VR, which drew me into shader coding.
This time I'm going to delve into a series of experiments, loosely described as a project. The goal is to understand how parallel programming (compute shaders) can help you improve code speed. In covering this topic, it's easier to look at a current project and investigate how it works, and how it could be improved. Most of the experiments/prototypes will likely fail the 'real-time' test, but like the best of coding, lessons will be learned and valuable insights gained.
So I'm going to start with hair simulation and shading, since the last year and a half of my life has been immersed in that, namely AMD's TressFX (version 4.1) which released in December 2019 as both a standalone engine integration (using AMD's Cauldron engine framework for samples) and as an engine integration into Unreal 4.22.3 (not a plugin, it was necessary to tap into the engine rendering pipeline directly).
And some fun shader work I did for FEMFX, a Finite Element Model collision method with some spectacular deforming and fracturing collision effects . TressFX 4.1 has many improvements as well, including simulation speed-ups, and better documentation and an improved Maya Exporter (hair export for engine consumption). But I'm not going to be promoting FEMFX or TressFX, explicitly, those interested can check them out for yourselves, or (for hair) wait until Unreal's 4.24 experimental hair rendering/simulation is more mature and ready for production. Both are worth checking out though - and some of the engineers and artists involved with both projects are awesome folks! (Some aren't, but isn't that the way? And I have no love for AMD in general and am happy to no longer be there. But that doesn't change my opinion about the amazing work done on TressFX 4.1 and FEMFX.)
In fact, one fun experiment will be looking at how we might take advantage of Unity engine's plugin system - more challenging when you need access to their rendering pipeline, but in my opinion, easier than other engines/frameworks, if you just want to write some custom shaders.
No, I'm going to talk about the research I discovered during my journey, such as using Marschner 3-node rendering and possibly 5-node rendering (PBR stuff) as well as a fascination with hair cohesion. TressFX, unknown to many, but likely not surprising to realize, followed the particle system approach to simulation, working out positions of vertices based on realistic forces, then creating a string of triangles (strip) for each hair. Some of this knowledge got lost along the way (perhaps during ATI's acquisition by AMD), and I spent more hours than I can remember tracking down original papers and presentations, since actual documentation was sparse. So I'm going to talk about things like Per-Pixel Linked Lists (PPLL) which weren't just about a TressFX PPLL hair color shader but have deeper roots, LibWetHair (a non-real-time hair simulation system that does truly realistic hair cohesion and movement), the cleverly designed engine interface that lets you write DirectX/Vulkan independence but is also useful for understanding how rendering frameworks like DirectX 12 and Vulkan work. And I'm going to talk about optimization of your application (using available tools), parallel programming (CUDA, OpenCL, DirectCompute) and how fun that can be, but also the issues with parallelizing algorithms. We might even get into machine learning, too.
What's this got to do with augmented reality? After all, that is why I started this blog. Well, it isn't going to care too much about OpenCV or ARCore/ARKit usage at this point - that is true. But understanding how to optimize code using profilers, how to take advantage of parallel programming, how to wade through some of the issues (like engine integration and overhead) in order to speed up your code - all that helps when you try to go beyond 'make it run' and now 'make it run fast'.
And while I want to do many little experiments, especially as programming in CUDA is a lot of fun, it's good to have an overarching goal. In this case, looking at something as difficult as hair simulation and hair rendering and seeing what might work in those arenas - and what doesn't - gives you a better feel than overly simplified code examples.
And all of this is a long, complicated tough process - again something overly simple code examples and short blogposts never cover. Real programming is difficult and can take a long time, even when using someone else's library or engine. Pushing past current limitations is also a long, arduous process. So that's why I want this grounded, and as I just spent a good chunk of my life on a hair rendering and simulation system...
So I will be forking the current TressFX 4.1 off GitHub, and exploring Unreal and Unity engine integration/plugin strategies, but I'll also be looking at other solutions, algorithms, libraries, and even just doing some CUDA stuff when exploring ideas. And all code experiments will be on my GitHub page.
That will all start next post. I'm going to do this in bite-size chunks, since even forking and cloning a complicated GitHub repository that depends on other GitHub repositories deserves some explanation. Each post will have a different theme, so it can stand by itself but also fit into the long-term goal, which is to explore optimization and parallelization.
So get ready, and if you want to prep, get yourself a GitHub account, a developer account with Epic Unreal (for access to their GitHub repository), a copy of the latest Unity, and while you're at it, the CUDA toolkit. (If your graphics card isn't CUDA capable, I will also try to discuss OpenCL, but I recommend CUDA since it's very well documented, supported and easy to use.) As for ML (machine learning, like Tensorflow) that may come later. But learning to optimize and take advantage of the GPU is a key need, even with AI/ML projects.
Images are clipped from TressFX 4.1 documentation, sample code and/or blogposts, FEMFX sample code assets and sample scenes, and Columbia University's LibWetHair. Please see their websites (AMD GPUOpen, http://www.cs.columbia.edu/cg/liquidhair/ ) for more information and to respect their copyright/trademarks and/or licensing requirements.
My opinions are my own, and no one else's. Anything I talk about is either my own opinion and/or research, or public knowledge. As for copyright, all copyright is reserved unless explicitly noted, this includes my work or any references I make to papers, websites, libraries, engines or similar, which belongs to the original copyright/license holder. Always remember to cite your references, and observe copyright and licensing when using or referencing any code or words or images or anything that isn't your original work. 'Nuff said, I hope. Let's get coding!
So you might as well know now, that I'm not a hardcore gamer (and lots of people who love graphics level programmer aren't), but I'm always fascinated by the rendering and performance, even the beauty, of graphics. I am a huge fan of anime and manga (and books in general), though that doesn't mean I love it all. I generally prefer ones that have some adventure and humor to them - even if the beauty of the art can vary with those. My favorites include Mob Psycho 100, Noragami, Kakuriyo and lots of others that are more well-known like Black Clover and Demon Slayer. In fact, my love of anime/manga and art/books was what drew me toward NPR (non-photorealistic rendering) and AR/VR, which drew me into shader coding.
This time I'm going to delve into a series of experiments, loosely described as a project. The goal is to understand how parallel programming (compute shaders) can help you improve code speed. In covering this topic, it's easier to look at a current project and investigate how it works, and how it could be improved. Most of the experiments/prototypes will likely fail the 'real-time' test, but like the best of coding, lessons will be learned and valuable insights gained.
So I'm going to start with hair simulation and shading, since the last year and a half of my life has been immersed in that, namely AMD's TressFX (version 4.1) which released in December 2019 as both a standalone engine integration (using AMD's Cauldron engine framework for samples) and as an engine integration into Unreal 4.22.3 (not a plugin, it was necessary to tap into the engine rendering pipeline directly).
And some fun shader work I did for FEMFX, a Finite Element Model collision method with some spectacular deforming and fracturing collision effects . TressFX 4.1 has many improvements as well, including simulation speed-ups, and better documentation and an improved Maya Exporter (hair export for engine consumption). But I'm not going to be promoting FEMFX or TressFX, explicitly, those interested can check them out for yourselves, or (for hair) wait until Unreal's 4.24 experimental hair rendering/simulation is more mature and ready for production. Both are worth checking out though - and some of the engineers and artists involved with both projects are awesome folks! (Some aren't, but isn't that the way? And I have no love for AMD in general and am happy to no longer be there. But that doesn't change my opinion about the amazing work done on TressFX 4.1 and FEMFX.)
In fact, one fun experiment will be looking at how we might take advantage of Unity engine's plugin system - more challenging when you need access to their rendering pipeline, but in my opinion, easier than other engines/frameworks, if you just want to write some custom shaders.
No, I'm going to talk about the research I discovered during my journey, such as using Marschner 3-node rendering and possibly 5-node rendering (PBR stuff) as well as a fascination with hair cohesion. TressFX, unknown to many, but likely not surprising to realize, followed the particle system approach to simulation, working out positions of vertices based on realistic forces, then creating a string of triangles (strip) for each hair. Some of this knowledge got lost along the way (perhaps during ATI's acquisition by AMD), and I spent more hours than I can remember tracking down original papers and presentations, since actual documentation was sparse. So I'm going to talk about things like Per-Pixel Linked Lists (PPLL) which weren't just about a TressFX PPLL hair color shader but have deeper roots, LibWetHair (a non-real-time hair simulation system that does truly realistic hair cohesion and movement), the cleverly designed engine interface that lets you write DirectX/Vulkan independence but is also useful for understanding how rendering frameworks like DirectX 12 and Vulkan work. And I'm going to talk about optimization of your application (using available tools), parallel programming (CUDA, OpenCL, DirectCompute) and how fun that can be, but also the issues with parallelizing algorithms. We might even get into machine learning, too.
What's this got to do with augmented reality? After all, that is why I started this blog. Well, it isn't going to care too much about OpenCV or ARCore/ARKit usage at this point - that is true. But understanding how to optimize code using profilers, how to take advantage of parallel programming, how to wade through some of the issues (like engine integration and overhead) in order to speed up your code - all that helps when you try to go beyond 'make it run' and now 'make it run fast'.
And while I want to do many little experiments, especially as programming in CUDA is a lot of fun, it's good to have an overarching goal. In this case, looking at something as difficult as hair simulation and hair rendering and seeing what might work in those arenas - and what doesn't - gives you a better feel than overly simplified code examples.
And all of this is a long, complicated tough process - again something overly simple code examples and short blogposts never cover. Real programming is difficult and can take a long time, even when using someone else's library or engine. Pushing past current limitations is also a long, arduous process. So that's why I want this grounded, and as I just spent a good chunk of my life on a hair rendering and simulation system...
So I will be forking the current TressFX 4.1 off GitHub, and exploring Unreal and Unity engine integration/plugin strategies, but I'll also be looking at other solutions, algorithms, libraries, and even just doing some CUDA stuff when exploring ideas. And all code experiments will be on my GitHub page.
That will all start next post. I'm going to do this in bite-size chunks, since even forking and cloning a complicated GitHub repository that depends on other GitHub repositories deserves some explanation. Each post will have a different theme, so it can stand by itself but also fit into the long-term goal, which is to explore optimization and parallelization.
So get ready, and if you want to prep, get yourself a GitHub account, a developer account with Epic Unreal (for access to their GitHub repository), a copy of the latest Unity, and while you're at it, the CUDA toolkit. (If your graphics card isn't CUDA capable, I will also try to discuss OpenCL, but I recommend CUDA since it's very well documented, supported and easy to use.) As for ML (machine learning, like Tensorflow) that may come later. But learning to optimize and take advantage of the GPU is a key need, even with AI/ML projects.
Images are clipped from TressFX 4.1 documentation, sample code and/or blogposts, FEMFX sample code assets and sample scenes, and Columbia University's LibWetHair. Please see their websites (AMD GPUOpen, http://www.cs.columbia.edu/cg/liquidhair/ ) for more information and to respect their copyright/trademarks and/or licensing requirements.
My opinions are my own, and no one else's. Anything I talk about is either my own opinion and/or research, or public knowledge. As for copyright, all copyright is reserved unless explicitly noted, this includes my work or any references I make to papers, websites, libraries, engines or similar, which belongs to the original copyright/license holder. Always remember to cite your references, and observe copyright and licensing when using or referencing any code or words or images or anything that isn't your original work. 'Nuff said, I hope. Let's get coding!
Comments
Post a Comment