Announcement

Collapse
No announcement yet.

A guide to reducing RAM usage in V-Ray

Collapse
This is a sticky topic.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • A guide to reducing RAM usage in V-Ray

    Introduction and scope:

    We’ve all been there: countless hours of preparation, it’s finally the time to press the render button and call it a day.
    Not this day, though: the scene refuses to render, or starts, and crashes badly after stressing your hard drives to breaking point.
    The clear signs of a scene which is asking too much of your computer’s RAM.
    The grapevine is rife with guides suggesting tricks, and likely you’ve already tried them all, to no avail.

    We (Vlado and I) thought it useful to outline the three key functionalities V-Ray offers to (considerably!) lower memory occupation, explaining in detail the methods by which they achieve the goal.
    If this works, by the end you should have a much better grasp of what is happening in your particular scene, and most importantly a few good ideas on how to solve its memory problems, once and for all.

    The document is split in (many) multiple posts, so to make this thread a bit more readable.


    Without further ado, then, let’s dive in.

    Index (opens in a new page...)

    A) V-Ray Proxy
    Gotcha #1: Indirect Rays
    Gotcha #2: Worst-case scenarios, and the value of Proxies
    Gotcha #3: Progressive Algorithms
    Gotcha #4: Optimize for Instancing, and many objects into a single Proxy
    Gotcha #5: A voxel per face
    B) VRayHDRI and Tiled Textures:
    Gotcha #1: Indirect Rays (again!) and a thought on chance
    Gotcha #2: Progressive Algorithms (again!)

    Gotcha #3: Making tiled textures
    C) Tiled, Bufferless Rendering:
    Gotcha #1: your DCC’s own frame buffer
    D) Parting Thoughts: Limitless!
    Last edited by ^Lele^; 13-10-2020, 12:15 AM.
    Lele
    Trouble Stirrer in RnD @ Chaos
    ----------------------
    emanuele.lecchi@chaos.com

    Disclaimer:
    The views and opinions expressed here are my own and do not represent those of Chaos Group, unless otherwise stated.

  • #2
    A) V-Ray Proxy:

    Proxies in V-Ray are written in a very particular format, one that partitions the saved geometry into a voxel grid, based on triangle count, much like an acceleration structure does before starting the raytracing phase.

    Below, a pine tree with its triangles colored by parent voxel: the regular structure is fairly visible despite the topological complexity. This used 5000 triangles per voxel as target, specified during conversion to proxy.


    This is done for a number of reasons, but the one of interest to us right now is that when saved like so, proxies can be loaded at rendertime by the voxel, and unloaded as the dynamic memory limit has been reached.
    A special case exists for when the “Dyn. Mem Limit” value is set to 0, in which case all the available System RAM is used for proxy loading, but no unloading ever takes place, until render end.

    So, after the render process has started, as rays of any type (Direct, Shadow, GI) will hit a proxy, the corresponding voxel, and its associated geometry, will be loaded into RAM, it will be fit into the acceleration structure, and traced against.
    While this is a huge saver in principle, it has a few points worth considering when using it in practice.
    Last edited by ^Lele^; 27-07-2018, 05:05 AM.
    Lele
    Trouble Stirrer in RnD @ Chaos
    ----------------------
    emanuele.lecchi@chaos.com

    Disclaimer:
    The views and opinions expressed here are my own and do not represent those of Chaos Group, unless otherwise stated.

    Comment


    • #3
      Gotcha #1: Indirect Rays


      Because of the nature of GI (and to an extent, that of a dome light), in fact, this kind of selective load is often impossible for individual meshes: statistically if we bounce light on an object we are likely to hit it a few times over, in different places, and so forcing the loading of nearly all the geo.
      It is surely the case for the tree above, but it’s also a lot less so when the proxy is that of a big terrain, f.e., with plenty of potential self occlusion.
      Mileage will vary, but here is an artificial example illustrating the point, with the pine tree above.

      Two example renders of the tree above, with sun and sky (in a dome light), no GI:


      In fact, there was hardly any difference in RAM occupation, under sun and sky in the open, as each needle is likely depending on a few surrounding it, even with a simple diffuse shader, and the shadow rays have to pretty much traverse the whole tree canopy to hit the ground, or the trunk: both renders wanted 2.3 GB of ram, sure signal the whole tree, or close to that, had been loaded each time.

      So, does this work, at all? To verify it, the quickest way is to enclose part of the tree canopy into a simple box, so to discount it from all rays, and verify the amount of RAM used in the rendering process: any static geometry, existing as actual triangles in a scene, would load in RAM regardless.


      And sure enough, RAM usage has dropped to 400 megabytes, a net saving of 1.9GB.

      While the example is very artificial in nature, it well illustrates both the better and the more problematic aspects of the approach: at times the box will be a wall, a mountain, or some other occluder, and proxies will aggressively cut down the RAM usage, while other times there will be full view of the geometry, with light bouncing all over it, and proxies will only be able to delay the loading of the whole geometric set to after the render has started, with lower savings in memory occupation.
      Either way, if the scene is big enough to warrant a certain variation across the screen, unloading may still happen later on in the rendering phase, even for geometry in full view/exposure.
      Further still, as we'll see in a moment, there are savings to be had just by virtue of converting geometry to Proxy format.
      Last edited by ^Lele^; 27-07-2018, 05:06 AM.
      Lele
      Trouble Stirrer in RnD @ Chaos
      ----------------------
      emanuele.lecchi@chaos.com

      Disclaimer:
      The views and opinions expressed here are my own and do not represent those of Chaos Group, unless otherwise stated.

      Comment


      • #4
        Gotcha #2: Worst-case scenarios, and the value of Proxies

        In worst-case scenarios, where the whole Proxy has to be loaded in RAM, Proxied geometry will still take up less memory than its static counterpart (ie. geometry existing as triangles in the scene), and all the more so when the Embree option “Conserve Memory” is active.
        For very high polycounts, the occupation of Video RAM becomes important as well, and proxies allow for previews which will take up a minute fraction of the RAM needed for the whole geometry.
        Below, i’ll detail some operations with the usual scene. The numbers i gathered from either the Performance Monitor in Windows, or GPU-Z.

        First the baseline, opening Max (2017), for which we’ll set the RAM and VRAM usage to 0.0

        Opening the proxy file (proxy preview mode, 2 lights): 109 MB (RAM), 63 MB (VRAM)
        Rendering - “Conserve Memory” OFF: 2343 MB (geo only), 2452 MB (with Max)
        Rendering - “Conserve Memory” ON: 1381 MB (geo only), 1490 MB (with Max)

        Opening the Mesh file (tree plus 2 lights): 1646 MB (RAM), 1081 MB (VRAM)
        Rendering - “Conserve Memory” OFF: 4233 MB (geo only), 5869 MB (with Max)
        Rendering - “Conserve Memory” ON: 3759 MB (geo only), 5405 MB (with Max)


        As stated, loading the full proxied geometry with “Conserve Memory” for Embree active provides for the biggest overall RAM savings: first in the DCC application, when geometry doesn’t need to be loaded in the System and Video RAM, and then at rendertime, where it can achieve nearly four (!) times the geometry budget than a max Editable Mesh for the same amount of system RAM.
        Last edited by ^Lele^; 26-07-2018, 09:37 AM.
        Lele
        Trouble Stirrer in RnD @ Chaos
        ----------------------
        emanuele.lecchi@chaos.com

        Disclaimer:
        The views and opinions expressed here are my own and do not represent those of Chaos Group, unless otherwise stated.

        Comment


        • #5
          Gotcha #3: Progressive Algorithms and GI Engines

          It can be deduced from the behavior above that any rendering process which considers the screen in its entirety - at once - will work counter to the piecewise load of proxies.
          In fact, the LightCache (which is also performing 100 GI bounces by default!) and progressive rendering both make the loading job a lot less fluid.
          Often, they are responsible for the apparently inexplicable, endless delay at render start: seemingly a few rays have been cast, and the projected rendertimes shoot through the roof: V-Ray is trying to load nigh all proxies in the scene, and with the LC, there’s also a good chance it’ll be loading most of their geometry too.
          It’s also why at times renders seem to be able to start, but invariably hang after some of the LC (or BF progressive) has slowly progressed: the forced loading of most of the geo filled the ram up, as fast as it possibly could reading the data from disk/network.

          As such, the best way to optimize for ram usage is with Bucket rendering.
          Smaller buckets will see less geometry at once, and there is some hope in scene non-coherence/variation so that geometry far away from the buckets will not be loaded, or will be able to be unloaded at a later phase in the render process.
          Smaller buckets also incur in a rendertime penalty (due to the filtering of the bucket edges’ pixels), which however is much smaller than that of disk swapping for out of RAM.
          Buckets also tie in nicely with the optimization tool explained in point C): Tiled, bufferless rendering.

          Further, if it can be afforded, it's best to limit the number of GI bounces to the minimum viable amount, ideally with the Brute Force engine.
          As the LightCache works on the whole screen at once, and traces 100 bounces by default, it may well force the loading we're trying to avoid.
          Should the LC not be discarded in favour of BF (Adaptive lights, and dome, depend on it. And the rendering is overall much faster than with BF), the next best choice is to reduce its bounces.
          The lower number of rays, times the lower number of bounces, may in certain cases not result in the dreaded load-explosion.
          Last edited by ^Lele^; 03-02-2019, 01:30 AM.
          Lele
          Trouble Stirrer in RnD @ Chaos
          ----------------------
          emanuele.lecchi@chaos.com

          Disclaimer:
          The views and opinions expressed here are my own and do not represent those of Chaos Group, unless otherwise stated.

          Comment


          • #6
            Gotcha #4: Optimize for Instancing, and many objects into a single Proxy

            Now, there is an important catch here: proxy exporters offer an option called “Optimize for instancing”, which forces the creation of a single voxel which contains the whole mesh.
            This has performance benefits if the mesh is sure to be wholly loaded in RAM, as there is memory to spare, but it also entirely disables the ability of proxies to partially load (and unload!) their geometries.
            When this is coupled with multiple, topologically complex, geometries exported as single proxy, it could have devastating effects on ram usage, forcing massive amounts of geometry to load as the first ray hits any part of its bounding box.
            So, if you proxy many objects into one, make sure you have a face count budget set for your voxels to adhere to, so that V-Ray may still be able to piece-load the proxy.
            Feel free to use the single-voxel variant if instead you are proxying many single geometries of little to no impact on the ram usage, and would love those quickly loaded, but not residing in RAM while you work on the scene.

            Last edited by ^Lele^; 26-07-2018, 09:37 AM.
            Lele
            Trouble Stirrer in RnD @ Chaos
            ----------------------
            emanuele.lecchi@chaos.com

            Disclaimer:
            The views and opinions expressed here are my own and do not represent those of Chaos Group, unless otherwise stated.

            Comment


            • #7
              Gotcha #5: A voxel per face

              It would be tempting to think that lowering the faces per voxel count a lot from its default would provide for greater benefits in the department of selective loading.
              Unfortunately, the cost of working through the voxel structure would be higher than any potential benefit.
              This is very situational, and depending on topology and triangle counts, the threshold below which the cost exceeds the benefit may vary. In general, the default of 20000 triangles is a good tradeoff between performance and cost, and can be left at its value.
              For particular situations, like the pine above, lowering the value (to 5000, in the specific case) may provide for benefits, particularly if one expects plenty of occlusion to happen (ie. after a scatter operation, where other proxies of different geometry would cover this).
              Last edited by ^Lele^; 26-07-2018, 09:38 AM.
              Lele
              Trouble Stirrer in RnD @ Chaos
              ----------------------
              emanuele.lecchi@chaos.com

              Disclaimer:
              The views and opinions expressed here are my own and do not represent those of Chaos Group, unless otherwise stated.

              Comment


              • #8
                B) VRayHDRI and Tiled Textures:

                The proxy loading mechanism would be quite limited in scope if it weren’t paired with a way to load and unload the textures associated with the proxies (and not only those, ofc.) in a similar piecewise manner.
                Besides the other benefits of using a VRayHDRI loader with standard bitmaps (enhanced filtering options and filtering memory consumption, enhanced color mapping capabilities, enhanced mapping options, and so on), there are great RAM savings to be had when the VRayHDRI loader is coupled with tiled textures.

                These can be either .TX, or .EXR, but either format shares two key characteristics: the image contains atlases of the texture, or reduced, high quality versions of the main size, and it’s also written, and read, in buckets of NNxNN pixels (64 is often used as default).
                At rendertime, the VRayHDRI loader figures out which two sizes of the texture it needs, and which tiles of those, based on visibility (under the same set of constraints which apply to proxies), and performs a high quality interpolation of those.
                Textures have their own memory budget, separated from that of proxies (and dynamic geometry in general), and undergo the same unloading process as that which happens for proxies, as the RAM budget fills up.
                The default of 4000MB may seem small, but as at each resolution halving the memory occupation becomes a quarter, when coupled with tiled textures is often aplenty.
                Here too, a value of 0 will allow V-Ray to use all the available RAM, including that freed up by the proxies, in case.

                The choice of mip-map resolutions can be biased -in 3ds Max- using the Blur parameter in the coordinates section of the map: values lower than 1 will force higher resolutions to be loaded.
                Lowering the parameter to 0.01 will to all intent and purposes load the highest resolution map, and not perform any filtering, which would be incredibly bad.
                If deciding to change Blur value, be considerate, and test: different map resolutions, viewing conditions and most importantly perception will make the sweet spot vary from case to case, so don't be afraid of trying solo renders of the interested geometry to find it.

                In the example of the tree above, if i were to make it very tiny on screen, V-Ray would load the lowest resolutions (1x1, 2x2, 4x4 pixels) and forget entirely about the full-size texture, performing interpolation only between the sizes it chose to load.
                Instead, if i were to cover the tree with the box, but see it up close, it would perhaps load the original sized texture, and yet only those 64x64 pixel tiles of it which in some way contribute to what is being rendered.

                In the example of a terrain, no matter the original texture size, V-Ray will only load the resolution(s) which makes sense at the distance it sees it at, and only the tiles its direct and indirect rays see.
                One could comfortably create a 64k square texture, convert it to tiled EXR (with one of the utilities we provide, for example), and then happily render away at 4k resolution, safe in the knowledge that the full sized texture will never be loaded in its entirety, albeit parts of it will be loaded at the original resolution where needed.
                Last edited by ^Lele^; 13-09-2019, 02:50 PM.
                Lele
                Trouble Stirrer in RnD @ Chaos
                ----------------------
                emanuele.lecchi@chaos.com

                Disclaimer:
                The views and opinions expressed here are my own and do not represent those of Chaos Group, unless otherwise stated.

                Comment


                • #9
                  Gotcha #1: Indirect Rays (again!) and a thought on chance


                  The same set of circumstances that applied to selective loading of proxies applies here.
                  There is one important difference, though: regardless of which part of the texture will be loaded, the resolution at which it will be loaded will be assuredly equal to or lower then that of the original texture size.
                  While put this way it seems a puny gain, it’s not: a 2048 input texture will have 11 smaller resolution atlases, each half the size of the previous, and so there is only a 1 in 12 chance that the full resolution will be picked.
                  With each smaller atlas taking up a quarter of the ram of the bigger one, chance alone can provide for massive RAM saves.

                  Notice that setting Blur to 0.01 will force only the highest map resolution to ever load, and deactivate filtering, negating memory saving, degrading image quality, and increasing render times.
                  Should it not be clear yet: DON'T EVER set blur to 0.
                  Last edited by ^Lele^; 28-07-2018, 06:32 PM.
                  Lele
                  Trouble Stirrer in RnD @ Chaos
                  ----------------------
                  emanuele.lecchi@chaos.com

                  Disclaimer:
                  The views and opinions expressed here are my own and do not represent those of Chaos Group, unless otherwise stated.

                  Comment


                  • #10
                    Gotcha #2: Progressive Algorithms (again!)

                    Like above, progressive methods, and many GI bounces, will force the loading of more textures and tiles, but as above, just for the needed resolution, instead of the original one.
                    So, if in extreme need, make sure you render with buckets, or even as point C) illustrates, bufferless.
                    Regardless, RAM savings are still guaranteed, if made somewhat less flexible by the progressive nature of the algorithms.

                    Last edited by ^Lele^; 26-07-2018, 09:40 AM.
                    Lele
                    Trouble Stirrer in RnD @ Chaos
                    ----------------------
                    emanuele.lecchi@chaos.com

                    Disclaimer:
                    The views and opinions expressed here are my own and do not represent those of Chaos Group, unless otherwise stated.

                    Comment


                    • #11
                      Gotcha #3: Making tiled textures

                      V-Ray Next includes the tool to convert standard textures to tiled .TX format, and the bitmap to VRayHDRI script has been updated to make use of that.

                      For previous V-Ray versions the tool needs to be manually downloaded (for free.) from our site.
                      You’ll want to download the most recent build of “makeTx”.
                      This tool (batch) converts most texture formats, at 8, 16 and 32 bit per channel, into the corresponding tiled version, wholly respecting bitness.
                      To do so, 8bit textures are converted to .tx files, and higher bit depths to .exr.

                      Should you have troubles grabbing that tool, despair not!
                      There is an executable file in the “Bin” folder of your V-Ray installation called img2tiledexr.exe, which will do a very similar job with one important difference.
                      When run, it offers an interface which can convert single or multiple inputs (with 10 formats supported, including .psd and .hdr) at once, but only into tiled EXRs.
                      This will mean that all 8bit textures will end up as 16bit ones, occupying, for the higher resolution, twice as much in RAM.
                      While this is surely undesirable, at first sight, it’s also not that big of an issue most times, as the loading of the lower resolution mipmaps, and of the individual tiles, will more than make up for the double bit depth.
                      In Max, and likely in other DCCs, there is a convenient option to swap all bitmap loaders to VRayHDRI ones, and in the process convert all textures to tiled EXR, using img2tiledexr.exe.

                      EDIT 09-07-19:
                      I made a simple script to convert tiled textures from disks. A simple two clicks solution which should be easy to expand and adapt to one's specific needs. Attached below.
                      EDIT 13-09-19: I updated the script to v 0.02 to allow for a custom makeTX.exe path should it not be found in the bin folder of V-Ray Next install root. It will also remember the settings (ini file saved in the plugcfg user folder)
                      EDIT 24-06-20:Here's the proper post with the script and goodies.
                      Last edited by ^Lele^; 24-06-2020, 01:10 AM.
                      Lele
                      Trouble Stirrer in RnD @ Chaos
                      ----------------------
                      emanuele.lecchi@chaos.com

                      Disclaimer:
                      The views and opinions expressed here are my own and do not represent those of Chaos Group, unless otherwise stated.

                      Comment


                      • #12
                        C) Tiled, Bufferless Rendering:

                        With two technologies able to read big domains bit by bit, there has to be one able to write the output of that process, bit by bit.
                        It’s called Tiled, or Bufferless, rendering.

                        Instead of having to allocate the megabytes (25 per channel, at 4096*2160px, or around 3 megabytes per megapixel, per channel) of frame buffer which will display the image while it renders, along with the associated Render Elements (or AoVs, as one prefers), V-Ray has a mode which allows it to write each bucket as it completes, always only ever allocating the memory needed for the area covered by the buckets themselves (so, smaller buckets, smaller RAM usage.), instead that for the whole image size.
                        If the render is high resolution, and/or the number of render elements grows (a few assorted masks, a few light/texture/material select, a denoiser, a few diagnostic ones…), the RAM allocation can become important, and writing to tiled EXRs can most definitely help sidestepping the issue.

                        The specifics of turning the feature on are well explained in the manual, so i’ll redirect you there.
                        Last edited by ^Lele^; 28-07-2018, 06:44 PM.
                        Lele
                        Trouble Stirrer in RnD @ Chaos
                        ----------------------
                        emanuele.lecchi@chaos.com

                        Disclaimer:
                        The views and opinions expressed here are my own and do not represent those of Chaos Group, unless otherwise stated.

                        Comment


                        • #13
                          Gotcha #1: your DCC’s own frame buffer


                          While V-Ray can turn its own frame buffer off, to write tiled EXR images as it renders, it cannot always do so with your DCC application of choice.
                          In the case of Max, for example, one needs to set the resolution of the max framebuffer (ie. the actual render resolution) as low as it’s comfortable, and then specify the correct render resolution in the V-Ray settings for its framebuffer.
                          While not ideal, this is to avoid Max allocating a chunk of ram to nothing at all.
                          Be wary of specifics for your own DCC application in the respective V-Ray documentation. (f.e. Max, Maya, and so on...)
                          Last edited by ^Lele^; 24-12-2022, 02:38 AM.
                          Lele
                          Trouble Stirrer in RnD @ Chaos
                          ----------------------
                          emanuele.lecchi@chaos.com

                          Disclaimer:
                          The views and opinions expressed here are my own and do not represent those of Chaos Group, unless otherwise stated.

                          Comment


                          • #14
                            D) Parting Thoughts: Limitless!

                            As you may guess, V-Ray is equipped with all that’s needed to render essentially a limitless amount of geometry and textures, and at a limitless resolution.

                            Putting hype to one side (the max resolution is currently 50k px square), the amount of data one can pack in a given amount of RAM, to then get the renders done within an acceptable time, is quite enormous.
                            Besides the best-case scenarios, even in terrible sub-optimal conditions one can get nearly four times more geometry rendered within the same amount of RAM just through judicious proxy conversion of selected scene geometry.
                            Tiled textures also provide for RAM savings by pure virtue of their presence in a scene, and tiled output will save RAM just by being enabled.

                            It will still require a modicum of user wisdom to be wholly effective, along with perhaps a compromise or two in some aspect of scene setup, but the system has been extensively battle-tested in production, and was able to push frames out with remarkable consistency and speed, time and again.

                            Our hope is that you too may begin to enjoy these benefits, and tackle bigger jobs with a bit more ease and peace of mind.
                            Last edited by ^Lele^; 13-10-2020, 12:15 AM.
                            Lele
                            Trouble Stirrer in RnD @ Chaos
                            ----------------------
                            emanuele.lecchi@chaos.com

                            Disclaimer:
                            The views and opinions expressed here are my own and do not represent those of Chaos Group, unless otherwise stated.

                            Comment


                            • #15
                              Thank you will never be enough for these wisdom pills. This is my checklist for the scenes to come.

                              Comment

                              Working...
                              X