Announcement

Collapse
No announcement yet.

NUMA machines. Updated!

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • NUMA machines. Updated!

    Hi guys,

    It looks like there is trend lately for people to buy multi-CPU NUMA machines (f.e. 4x AMD Opteron with 48 or 64 cores). However, please keep in mind that these machines are NOT particularly suitable for rendering. They are powerful machines useful for web servers and databases where they can run efficiently many simultaneous processes, but they are no good when a single application (like 3ds Max) needs to access a large amount of data from all the processors at the same time.

    For such machines, it is more efficient to run several copies of 3ds Max that render separate frames (if you are rendering an animation) with each copy limited to a given NUMA node. Or to run several DR render servers on each NUMA node. But in that case, you might as well just get separate machines.

    If you get such hardware and find out that it renders slower than a simple i7 - there is very little that we can do to help you. It would be best if you can test the configuration before spending money on it.

    Best regards,
    Vlado
    I only act like I know everything, Rogers.

  • #2
    Heya

    Thats great info to have ! Thanks vlado !

    Would then disabling NUMA be any help? The next gen xeons 12/16 core in configuration of 2/4 would they be in this NUMA case slower? Should we just invest in i7 or can we just disable NUMA? I know I could disable NUMA in bios...

    Thanks, by e.
    CGI - Freelancer - Available for work

    www.dariuszmakowski.com - come and look

    Comment


    • #3
      We have a dual Xeon machine that performs well with NUMA enabled (but I haven't tested if disabling it changes anything). But at the same time, I've seen reports where some specific scenes that take up more RAM might render a bit slower. I don't know about the new processors as we haven't tested them.

      I know that 4x AMD are really bad though.

      Best regards,
      Vlado
      Last edited by vlado; 20-06-2013, 02:04 AM.
      I only act like I know everything, Rogers.

      Comment


      • #4
        Hi Vlado,

        Thank you for this info! I was getting pretty excitting about 4x AMD Opteron workstation but now I now that it simply does not work. So I will stick with my small farm approach with 3 of i-7 3930K for DR rendering instead .
        i-9 7980XE at stock, G. Skill RipjawsV 64GB RED, MSI GeForce GTX 1080 Ti GAMING X 11GB, http://fractalmind.eu

        Comment


        • #5
          Reposting additional info from our developers posted in this thread:
          http://forums.chaosgroup.com/showthr...highlight=numa

          Originally posted by vlado View Post
          It might just work fine; Intel processors seem to handle NUMA better than AMD processors. We did some changes to the nightly builds, but you will have to test if they actually help.

          Best regards,
          Vlado

          Originally posted by Vasil Minkov View Post
          Hello,

          One possible approach to take an advantage of a NUMA system is to start several processes, running on different NUMA node(s) each.

          Recent V-Ray versions include some functionality that helps to restrict the number of CPU cores used by the process:

          - it is now possible to restrict the number of computation threads by setting the VRAY_NUM_THREADS=N environment variable.
          By default V-Ray creates one computational thread per CPU core.
          - VRaySpawner now can launch several slave processes, running on different NUMA nodes.
          - several slave processes can run on the same server machine, using range of listening ports. The port range is selectable on the render client.

          I'll give more details on these in the next posts.

          Originally posted by Vasil Minkov View Post
          VRaySpawner.exe is now extended with the following command line options:
          • -numa[=N]
          • -node=node1[,node2][,node3]...
          • -port=port1[,port2][,port3]...


          If "-numa" or "-node" options are not given, vrayspawner acts as usually - creates one slave process that uses all available CPU cores.

          When "-numa" and/or "-node" options are present on the command line, vrayspawner launches several slave processes, such as
          every process runs on its own NUMA node(s) and listens to different slave port.

          Ideally, one slave process per NUMA node could be used for best performance. Running too many slave processes on the same
          machine may eat all of your memory, so it may be useful to run less processes, distributed on the available NUMA nodes.


          Description of the options:
          • -port=port1[,port2][,port3]...
            select listening ports for the slave process(es). If the number of processes is greate than
            the number of ports given, the last port is auto incremented. The default listening port is 20204 for 3ds max.
            The render client should be set to use the corresponding port range.
          • -node=node1[,node2][,node3]...
            select NUMA nodes that will be used for the slave process(es). Default - use all available NUMA nodes.
          • -numa[=N]
            select the number of slave processes to start. If N is not given, N=0 or N>=M, one process per NUMA node will be created.
            Here "M" is the number of selected nodes using "-node", or the number of all available NUMA nodes if no "-node" is used.


          Examples (considering you use 3ds max 2012 and a system with 8 NUMA nodes):
          • vrayspawener2012.exe -numa
            spawns eight 3dsmax.exe processes, every running on single NUMA node and using listening ports 20204-20211
          • vrayspawener2012.exe -numa=4
            spawns four 3dsmax.exe processes, every running on two NUMA nodes and using listening ports 20204-20207
          • vrayspawener2012.exe -node=3,5,6 -ports=30000,40000
            spawns three 3dsmax.exe processes, running on NUMA nodes 3,5 & 6 and using listening ports 30000, 40000 & 40001
          • vrayspawener2012.exe -node=2,3,4,5,6 -numa=3
            spawns three 3dsmax.exe processes, running on nodes (2,3) (4,5) & (6) and using listening ports 20204,20205 & 20206

          Originally posted by Vasil Minkov View Post
          One possible approach to use 3ds max as a client on a NUMA machine is to start several DR slaves on the same machine, using VRaySpawner as described in he previous post.
          The following examples assume 8 NUMA nodes with 6 CPU cores each:

          1. Using single-threaded client
          vrayspwaner2012.exe -numa
          or
          vrayspwaner2012.exe -numa -node=N - to restrict the number of processes and reduce the memory usage
          Then launch 3ds max and make the following settings:
          • turn off multithreading (Customize -> Preferences -> Rendering -> Multi-threading)
          • turn on DR and select DR servers: Rendering -> Settings -> Distributed rendering -> Settings. Add "localhost" or "127.0.0.1" server with a port range of 20204-20211 for 8 DR slave processes.

          In this case the client machine will use single computation thread in order to not interfere too much with the render servers.

          2. Using multi-threaded client
          Another approach could be to launch the client process on a dedicated NUMA node. As with the example above first start vrayspawner:
          vrayspwaner2012.exe -numa -node=1,2,3,4,5,6,7 - reserve node 0 for the client process and spawn 7 slave processes
          or
          vrayspwaner2012.exe -numa=3 -node=1,2,3,4,5,6,7 - reserve node 0 for the client process and spawn three slave processes using nodes (1,2,3), (4,5) & (6,7)
          Then launch 3dsmax.exe - from the command line, or maybe using a batch file:
          set VRAY_NUM_THREADS=6
          start /node 0 /affinity 3f 3dsmax.exe

          This will start 3ds max using six computation threads and running on the first six CPU cores. If Node0 on your system use different number and/or topology of the CPU cores, the "VRAY_NUM_THREADS=6" and "/affinity 3f" should be modified.

          Finally, VRaySpawner launching could be automated using simple windows shortcut. There is also ability to use .ini configuration file instead of command line for VRaySpawner.
          Technical Support
          Chaos Group

          Comment


          • #6
            Will there be a issue with vray 3.0 licenses and NUMA rendering?
            CGI - Freelancer - Available for work

            www.dariuszmakowski.com - come and look

            Comment


            • #7
              Originally posted by DADAL View Post
              Will there be a issue with vray 3.0 licenses and NUMA rendering?
              You need one license only per machine, no matter how many instances of V-Ray are running on it, no matter if in render mode or DR mode. (This is in difference from V-Ray 2.x which will take out a DR license for each instance of 3ds Max used as a slave, regardless of whether these instances are running on the same machine or not).

              Best regards,
              Vlado
              I only act like I know everything, Rogers.

              Comment


              • #8
                Oh thats great. Good to know, thanks Vlado!
                CGI - Freelancer - Available for work

                www.dariuszmakowski.com - come and look

                Comment


                • #9
                  Just to add a clarification. Since each instance of VRaySpawner.exe should have different port number this is feasible only with V-Ray 2.x nightly builds. Official builds of V-Ray 2.x don't have the option to change the port number.
                  Technical Support
                  Chaos Group

                  Comment


                  • #10
                    So, would this be equivalent to splitting up the processors into different virtual machines, and running an instance of 3ds max & VRay in each VM? I've had some advice from others that I would see better performance from my dual 24 core machine if it were split into 3 or 4 VMs. But that approach seems more expensive as far as licenses go.

                    Comment


                    • #11
                      Are those 24 cores physical, or logical? If they are logical, it's not worth the effort. If they are physical, then using VMs might be more efficient (and might be the best choice if you want to render through backburner). If it's only for DR, then running the V-Ray DR spawner in NUMA mode might be just as efficient.

                      Best regards,
                      Vlado
                      I only act like I know everything, Rogers.

                      Comment


                      • #12
                        Windows system information reports that it has 2 physical 12 core processors, each setup as 24 logical cores. I will give VRay spawner a go and see if it I can make it work with backburner.

                        Comment


                        • #13
                          Wondering if this is still actual for latest VRay? Or this is more of a 3dsmax issue?
                          2 x Xeon 2696-V3 machine with total 72 threads - 18 physical cores per CPU
                          Available for remote work.
                          My LinkedIn: https://www.linkedin.com/in/olegbudeanu/

                          Comment


                          • #14
                            Latest version of V-Ray supports Numa processors so this thread is a bit outdated.
                            That processor should work fine with V-Ray, you may test with V-Ray Benchmark if you would like.
                            Svetlozar Draganov | Senior Manager 3D Support | contact us
                            Chaos & Enscape & Cylindo are now one!

                            Comment


                            • #15
                              Originally posted by svetlozar.draganov View Post
                              Latest version of V-Ray supports Numa processors so this thread is a bit outdated.
                              That processor should work fine with V-Ray, you may test with V-Ray Benchmark if you would like.
                              Yep, it's working completely fine.
                              Just wondering if it will be faster if i split it in two instances - virtual machines and use them in distributed rendering. Curious if someone tested this.
                              Available for remote work.
                              My LinkedIn: https://www.linkedin.com/in/olegbudeanu/

                              Comment

                              Working...
                              X