AMD Threadripper proposes to redefine the high-end desktop CPU

QuizzicalQuizzical Member EpicPosts: 17,964
edited May 18 in Hardware
For clarity, "high-end desktop" (HEDT) is Intel's terminology for the consumer versions of their platforms that tend to have more cores than their "normal" desktops but no integrated GPU.  The cheapest CPUs in the HEDT lineup have generally been around $300 or more.  The generations of it have been Nehalem, Gulftown, Sandy Bridge-E, Ivy Bridge-E, Haswell-E, and most recently, Broadwell-E.

Yesterday AMD announced Threadripper, which is basically a two-die Ryzen-based solution with up to 16 Zen cores.  This is really AMD's first credible shot at the HEDT market since Intel split their lines to invent it in 2008.

One traditional problem with HEDT parts is that having so many cores means that the cores can't clock all that high.  Thus, the HEDT parts typically trail behind the normal consumer quad core CPUs in single-threaded performance.  Intel has commonly had the HEDT market use older CPU cores and older process nodes than the normal consumer market, in part because more cores mean larger dies, and that requires more mature process nodes to get acceptable yields.

There have at times been efforts at making a 2-socket desktop, in which you use two separate CPUs.  I don't mean two cores; I mean two entirely separate chips that have their own separate CPU socket, memory pool, and so forth.  The two CPUs can communicate over some bus; Intel has called it QPI in recent years.  Spreading the cores among two sockets means that you can double the number of cores in the system without creating cooling problems or causing clock speed problems from so many cores so near each other.

The problem with the two-socket approach is that for many things, it just doesn't work very well.  If a thread running on one CPU needs memory attached to the other CPU, it has to go across the QPI link to get it.  For occasional accesses, that's fine, but if half of your memory accesses have to go over QPI, that can give you a huge bottleneck in a hurry.  As programs don't have a way to specify which memory pool to use in a two-socket system, if threads are migrating from one CPU to the other a lot without releasing and reallocating memory, you can expect a whole lot of memory accesses to go over QPI.

Intel did push two-socket for high end desktops as recently as Skulltrail, which was basically two Core 2 Quad CPUs.  But due to creating bottlenecks in what was then the front-side bus, many programs performed worse on two CPUs than they did on one.  After that, Intel relegated the multi-socket approach to servers and used single CPUs for their HEDT platforms.

AMD's proposal is to have a two-socket HEDT system with all its benefits, including double the memory bus with and capacity.  But instead of having a giant QPI (or HyperTransport, as AMD has traditionally used) bottleneck, put the two physical CPUs in the same socket, connected by an interposer.  That way, you get enormous bandwidth connecting the two CPUs, rather than a big bottleneck.

So why can't they just add a ton of bandwidth to traditional two-socket systems and fix the problem that way?  One issue is pin count.  You've only got so much room for I/O coming out of your package.  Pins take space, and if you want to add more pins, you have to have bigger, more expensive chips, with all of the attendant drawbacks.  An interposer can allow massively smaller "pins", allowing far more of them and very wide bus widths to connect multiple dies in the same package.  That doesn't let you get massive bandwidth to anything outside of the package, but it does let you connect two CPUs in the same package, or a CPU to a GPU in the same package, or as we've already seen with Fiji, a GPU to HBM.

You might ask, what's the difference between this and the multi-die server chips AMD has had in the past, including Magny-Cours and Valencia?  If one die is going to burn 130 W all by itself, what is a two-die package supposed to burn?  Clock speeds for the server chips were far too low to be appropriate for consumer use.  Ryzen 7 tops out at 95 W, in contrast, with a 65 W consumer version that has all eight cores active.  That leaves plenty of space to add more cores without taking a huge hit to clock speeds.  At minimum, they could probably at least match Ryzen 7 1700 clock speeds in 130 W, or go higher if they're willing to burn more power.  Using an interposer makes it at least possible to spread out the dies a little, which can help with cooling.

But why stop at two dies?  Yesterday AMD also announced Epyc, their new server line.  The top end part has four dies, for 32 cores total.  That will mean considerable drops in clock speeds as compared to Ryzen 7, so it would a stupid part for consumer use.  AMD will offer a two-socket version of Epyc, but they're really trying to make a single-socket alternative to what would previously be two-socket Xeon systems.  Now that AMD finally has competitive CPU cores for the first time since Conroe arrived in 2006, the massive room available to undercut Xeon prices while still making a hefty profit means that Intel's server division should be scared.
Post edited by Quizzical on
Torval
«1

Comments

  • SomethingUnusualSomethingUnusual Member UncommonPosts: 471
    Good read man.

    You raise half of an interesting question that I've always wondered: the absolute need for lower power consumption of the logic circuits. If the same power standard from a few years back (Say at a time of anywhere of 115w to upwards of 230w) applied today the processing power would have to be at the very least theoretically more powerful? Then again, there isn't that much need of extreme power at expense of energy consumption and the more recent dies still outperform the predecessors as expected by Moore's Law. Which also includes power consumption in the theory.

    I'm down for the multi-socket boards. Server boards have always had them, but usually no pci e bus, high performance rendering chipsets, or the like without some serious extra cash, and even then wouldn't translate well into a usable say gaming machine.
    With recent trends in streaming and multitasking commonly seen today multi-cpu options would make a lot of difference to these users. More so than a bloated number of cores, even with thread lanes information in serial has to pass through a lot of encoder/decoders and parallel starts looking sexier on heavy workloads. 

    Death stalks me... Well, figuratively that is. I get killed and people take my stuff.

  • TorvalTorval Member LegendaryPosts: 13,412
    That is huge. There are some software licenses that are priced by socket, databases for example. If they can jam more cores on one die then that will rock server licensing fees. On the other hand if Oracle, Microsoft, and other enterprise vendors still calculate them separately then it won't be that big of a deal with regards to licensing.

    That aside, the scope and approach of what they're doing is exciting.
    Centuries ago, in primitive times, before the dawn of civilization, there were things that would be inconceivable to us today; such things as poverty, disease, violence, senility, and love.
  • QuizzicalQuizzical Member EpicPosts: 17,964
    edited May 19
    Threadripper itself is a single-socket product.  The server versions will come in one-socket and two-socket variants, and are branded as Epyc.

    A point that I want to emphasize is multiple dies in Threadripper or Epyc doesn't just add more cores.  It adds more memory channels and PCI Express channels, too.  It's 32 PCI Express 3.0 lanes and two channels of DDR4 per die.

    Thus, Ryzen with a single die has two channels of DDR4 and 32 lanes of PCI Express, Threadripper will have four channels of DDR4 and 64 lanes of PCI Express, and the big Epyc will have 8 channels of DDR4 and 128 lanes of PCI Express.  The PCI Express is all of the connectivity coming off of the chip, so if you want SATA ports, ethernet, or whatever, that uses up some of the PCI Express connectivity.

    But that does mean that as compared to Ryzen 5 and 7, Threadripper will have double the memory bus width (and hence possible capacity) and about double the PCI Express lanes.  If you were to build a 2-socket Ryzen system with one die per socket for the sake of adding more memory capacity and bandwidth and adding more PCI Express connectivity, Threadripper will get you all of those benefits from a single socket.  So this isn't like the Core 2 Quad, which also had two CPU dies per socket, but only only added more CPU cores and that's it.
    Post edited by Quizzical on
  • Slapshot1188Slapshot1188 Boca Raton, FLMember EpicPosts: 6,959
    How will this affect gaming?

    "I should point out that no other company has shipped out a beta on a disc before this." - Official Mortal Online Lead Community Moderator

    Starvault's reponse to criticism related to having a handful of players as the official "test" team for a supposed MMO: "We've just have another 10ish folk kind enough to voulenteer added tot the test team" (SIC) This explains much about the state of the game :-)

  • RidelynnRidelynn Fresno, CAMember EpicPosts: 5,896
    Torval said:
    That is huge. There are some software licenses that are priced by socket, databases for example. If they can jam more cores on one die then that will rock server licensing fees. On the other hand if Oracle, Microsoft, and other enterprise vendors still calculate them separately then it won't be that big of a deal with regards to licensing.

    That aside, the scope and approach of what they're doing is exciting.
    There is some enterprise software licensed that way (SQL Server comes to mind immediately, maybe some variants of Server), but this has a long way to go before it challenges the current paradigm of the datacenter, and even if it does end up breaking into the datacenter market in a big way, it's nothing for the licensing to just change over to per-core (or some other metric).
    Torval
  • CleffyCleffy San Diego, CAMember RarePosts: 5,586
    The servers look like they have a 4k pin design.
    It is an elegant way to offer scalability. AMD is just making 1 chip and packaging them in varying quantities of that chip. I wonder if AMDs APUs will do something similar. Use the 8 core chip, and separate GPU.
  • SomethingUnusualSomethingUnusual Member UncommonPosts: 471
    How will this affect gaming?

    Better server stability and throughput. Home users not so much. Single-thread/single core usage is still a programming norm. Later coming development using OpenCL api and the like will change that, but still a long ways out for common practice. 

    Death stalks me... Well, figuratively that is. I get killed and people take my stuff.

  • QuizzicalQuizzical Member EpicPosts: 17,964
    How will this affect gaming?

    Better server stability and throughput. Home users not so much. Single-thread/single core usage is still a programming norm. Later coming development using OpenCL api and the like will change that, but still a long ways out for common practice. 
    Threadripper is an HEDT chip, not a server chip.  The server version based on the same die is Epyc.  Motherboards built for Epyc will have server-focused features, while those built for Threadripper will have desktop-focused features.  For example, AMD has said that they didn't disable ECC memory on Ryzen, but it relies on motherboard support.  Motherboards for Ryzen generally won't bother to support ECC, and I'd expect the same to be true of Threadripper and for the same reasons.

    I don't see OpenCL as being terribly relevant to Threadripper, either.  OpenCL is built around the capabilities of GPUs, with a few things (mainly pipes) thrown in for the benefit of FPGAs.  You can run OpenCL code on a CPU, and that especially makes sense for debugging FPGA code.  But you don't need Threadripper for that.

    If the concern is scaling CPU code to use many CPU cores, then I don't see OpenCL as useful there.  It's built for a different threading paradigm entirely from what CPUs use, and some CPU programming languages have plenty mature tools for threading your code--and tools that expose the greater versatility that CPUs have rather than the more restricted model OpenCL follows.

    If the concern is using AVX to really take advantage of what CPUs can do with SIMD, then OpenCL is at least more plausible.  That makes it possible to write your code for what happens in one AVX lane, then have the compiler automatically scale it.  You can blow things up in a hurry if you use any instructions that don't have an AVX version, or if you need to constantly change data sizes (e.g., mixing floats and doubles).  But while OpenCL isn't ideal for this, neither are any of the other options available (OpenMP, intrinsics, etc.).

    But it might be more of a hardware problem than a software one.  It's not just that some instructions don't have an AVX version, and so they're problematic to use at all if you're trying to exploit AVX.  It's also that CPUs simply don't have a good way to pass data back and forth among AVX lanes.  GPUs can put some instructions in some but not all shaders, and then the threads in a warp can get automatically unpacked to route through shaders over the course of multiple clock cycles and repacked without it being all that bad.  GPUs can use local memory to pass data back and forth among threads pretty efficiently, but CPUs simply don't have any analogous cache for that.

    As for the original question of how Threadripper affects gaming, it really doesn't very much for most people.  If you feel the need to have a 16-core desktop for non-gaming reasons, Threadripper will let you have it all in one socket, and with decent single-threaded performance.  So such a desktop could double as a capable gaming rig.  Today, if you want more than 10 cores, you have to either use multiple sockets (which often creates a QPI bottleneck) or else pay a fortune and still accept poor single-threaded performance.  For example, you can get a 16-core Xeon E5-2697A v4 today, but it costs about $3000 and has a stock clock speed of only 2.6 GHz with max single-core turbo of 3.6 GHz.
  • Slapshot1188Slapshot1188 Boca Raton, FLMember EpicPosts: 6,959
    Thanks for the info guys.

    Guess my wait for the revolutionary next gaming chip continues...
    Gdemami

    "I should point out that no other company has shipped out a beta on a disc before this." - Official Mortal Online Lead Community Moderator

    Starvault's reponse to criticism related to having a handful of players as the official "test" team for a supposed MMO: "We've just have another 10ish folk kind enough to voulenteer added tot the test team" (SIC) This explains much about the state of the game :-)

  • QuizzicalQuizzical Member EpicPosts: 17,964
    Thanks for the info guys.

    Guess my wait for the revolutionary next gaming chip continues...
    If you're waiting for a CPU that is going to revolutionize gaming, don't hold your breath.  CPUs are pretty mature now, and that makes further gains hard to come by.  The focus now seems to be, you can get the same performance as before but while using less power.  That's not going to revolutionize gaming unless you'd regard smaller form factors as being revolutionary.  You can also add more CPU cores, but that has long since ceased to be revolutionary for gaming.
  • SomethingUnusualSomethingUnusual Member UncommonPosts: 471
    I see a problem with relying on the compilers too much, just bad coding practice. But a topic for another discussion altogether. 

    As for work applications I'll certainly be checking this chip out, I do a ton of multi-tasking. 

    Can you clarify more about extended vector instructions? This is relatively new tech still and I'm a bit confused on word size relevant to thread paralleling. 

    @Quizzical


    Death stalks me... Well, figuratively that is. I get killed and people take my stuff.

  • HeretiqueHeretique Member UncommonPosts: 1,386
    One day AMD will get it right, for gaming and production there is no reason to go with AMD at the moment. I'd really like AMD to hit Intel hard or surpass them. Need that competition going or else we'll end up with super high priced chips again.

    Originally posted by salsa41
    are you have problem ?

  • QuizzicalQuizzical Member EpicPosts: 17,964
    edited May 22
    I see a problem with relying on the compilers too much, just bad coding practice. But a topic for another discussion altogether. 

    As for work applications I'll certainly be checking this chip out, I do a ton of multi-tasking. 

    Can you clarify more about extended vector instructions? This is relatively new tech still and I'm a bit confused on word size relevant to thread paralleling. 

    @Quizzical


    Vector instructions to do more in a single instruction with SSE or AVX is an entirely different thing from thread-level parallelism.  It's possible to use either one without the other, or to use both at once.

    Let's suppose that you have code that looks something like this:

    float foo[4] = //something
    float bar[4] = //something
    float baz[4];
    for (int i = 0; i < 4; i++) {
      baz[i] = foo[i] + bar[i];
    }

    What SSE and AVX can do is that if you've packed foo and bar into the vector registers, it can do all four of the floating point adds in a single instruction rather than taking four instructions.  OpenMP will try to convert the above code to do exactly that.  OpenCL would take an approach more like:

    __kernel void floatadd(const __global float * restrict foo, const __global float * restrict bar, __global float * restrict baz) {
      int gid = get_global_id(0);
      baz[gid] = foo[gid] + bar[gid];
    }

    One problem with this is putting data into the vector registers.  If all you want is four adds like above, then it takes four operations to put the four components of foo into a vector register, four more operations to do the same for bar, and if you need to unpack the components of baz later, four more operations to unpack them.  That overwhelms the advantages you get by doing the four adds in one operation rather than four.

    If, on the other hand, you have hundreds of consecutive operations where you're doing exactly the same thing to each component of the vector, sometimes you can pack the data into vector registers once, do all of the operations using AVX instructions so that it takes one instruction instead of four, and then unpack the data at the end.  That can provide huge savings because it's one instruction instead of four for hundreds of consecutive things.

    But what you can do with this is really restricted.  If you need to take the cosine of something in the middle of your code, that breaks up the AVX instructions.  The compiler would have to unpack the data from vector registers, take the cosine of one component at a time, and then repack it into vector registers.  If that only happens once in a chain of hundreds of instructions, then maybe you can shrug and ignore it as inconsequential.  But if it happens for a lot of instructions, the extra overhead of packing and unpacking the data can easily make using AVX slower than not using it.

    Mixing data sizes will screw up AVX, too.  If you've got 128-bit vector registers, then you can pack four floats or two doubles into a register.  If you need to add floats to doubles at some point, then it has to unpack the floats, cast them to doubles, repack them as doubles, and then do the addition.  You can get the same issue with different sizes of integer data types, too.

    In CPU terms, this is all happening inside of a single thread.  Scaling to use multiple threads is a different issue entirely.  OpenCL will try to handle that scaling for you, so that if you have components that you might think of as being 1024 bits wide, it can be 8 threads that each process 128 bits on an architecture with AVX, or 4 threads that each process 256 bits on an architecture with AVX2, or whatever.  So you can write the code once and have it automatically packed into the maximum width SSE or AVX instruction.

    GPUs have a totally different cache hierarchy, so they're able to handle either of these situations cleanly.  Let's consider Nvidia's Pascal architecture for concreteness.  Pascal has threads in warps of 32, and its shaders also come in sets of 32, which are divided into two sets of 16.  In order to have all of the threads in a warp execute some instruction, it grabs a set of 16 shaders and has half of the threads start on that instruction in one clock cycle and the other half in the next clock cycle.

    The two different sets of shaders have some instructions in common, but also have some instructions that the other set doesn't have.  If you want to do a bit shift, for example, only half of the shaders can do it, so the instruction will have to go on that half of the shaders.  A floating point multiply is present in all of the shaders, so if that's what is called for, the scheduler can freely put it on either set of shaders as available.  The scheduler will figure out which instructions have to go on which shaders and schedule them accordingly.

    Nvidia also has special function units for things like trig functions, logarithms, exponentials, and some other things.  That allows an instruction to be usable even though it's only laid out in eight shaders.  What I'm pretty sure it does is to use the special function unit, it basically claims the set of shaders that includes it for four clock cycles instead of two, and puts 8 threads through it per clock cycle.  Thus, using cosine as in the above example is as bad as two instructions rather than one, but that sure beats taking a boatload of instructions to pack and unpack the data.

    (continued in next post)
    Post edited by Quizzical on
    Gdemami
  • QuizzicalQuizzical Member EpicPosts: 17,964
    So why don't CPUs just do that?  There are some drawbacks to the way GPUs do it.  For one, for reasons I haven't explained, making register writes work properly means that all instructions have to take the same number of clock cycles, which means slowing everything down to the speed of the slowest instruction.  On a CPU, that would be absolutely lethal to performance.

    To cover up this latency, GPUs need to have an enormous number of threads resident.  It's not just the 32 threads in a warp.  You might want to have 8 or so warps handled by the same set of shaders to cover up latency.  In a lot of situations, the scheduler will say, let's see if we can do the next instruction for this thread... nope, it's not ready.  Well, how about this other thread?  It will try to pick some warp that is ready to go, and bounces around between warps every single clock cycle.  GPUs commonly need tens of thousands of threads resident simultaneously to properly exploit the hardware.

    And it's not just tens of thousands of threads in total, but tens of thousands of threads, each processing the same instructions on their own data, and with very little data of their own loaded at a time.  Modern GPU architectures tend to have a cap of around 256 32-bit registers per thread, and some less recent ones had a cap of around 64 registers per thread.  Unlike CPUs, where you have a few registers, then larger L1 cache, then larger still L2 cache, then even larger L3 cache, on a GPU, registers are your main cache.  Many GPUs have more register space than all other on-die caches added together.  For example, AMD's Fiji chip had 16 MB of registers--and needed them, so that, for example, you could have 65536 threads resident simultaneously, each of which had 256 bytes all to itself.

    GPUs are throughput optimized, not latency optimized.  If all you care about is when some total amount of processing is done, you can play games like scheduling whatever warps are ready and bouncing back and forth between them without worrying that it will take a long time for any particular thread to finish its processing.  That makes a ton of sense for graphics, where it doesn't matter when one particular pixel is finished, but only when the entire frame is done.  And there are some non-graphical but embarrassingly parallel workloads where it also works well.  But it completely fails if you can't scale to more than several hundred threads, or if you need different threads to execute completely independent code at the same time.
  • QuizzicalQuizzical Member EpicPosts: 17,964
    Heretique said:
    One day AMD will get it right, for gaming and production there is no reason to go with AMD at the moment. I'd really like AMD to hit Intel hard or surpass them. Need that competition going or else we'll end up with super high priced chips again.
    If you want a lot of CPU cores and don't want to pay a fortune for them, that's an awfully compelling reason to go with AMD today.  Ryzen 5 and 7 made AMD quite nearly the only viable option if you want 6 or 8 CPU cores in your desktop unless you don't mind paying twice as much for about the same performance from Intel.  Threadripper will likely extend that upper bound to 16 cores.

    AMD can't hang with Intel on single-threaded performance.  But Epyc will probably mean AMD can offer the fastest 1-socket or 2-socket servers for a whole lot of workloads.  For things that scale well to many CPU cores but don't scale well to multiple sockets, it's likely that AMD will often be able to offer the fastest servers, period.  And that's ignoring the price tag, even, where I'd expect AMD to undercut Intel's Xeon prices considerably.
    Gdemami
  • SomethingUnusualSomethingUnusual Member UncommonPosts: 471
    A lot more clarity than I needed, thanks. So could this be solved elsewhere? If compiled instructions are lopped into a single size instruction that puts a major disadvantage in several potential applications. My first thought goes to a compiled database. I don't need a database to be all one word size, but I still need capability to do the math between the stored values. Would this mean in the case of vectoring my database that I now need external methods of pulling that information, converting it, calculating it, then re-registering as the new word format?

    Death stalks me... Well, figuratively that is. I get killed and people take my stuff.

  • KellerKeller UtrechtMember UncommonPosts: 383
    Isn't it true that for gaming one does not need more than 4 cores, you better off investing in a good videocard + I5 or Ryzen5? Streamers and video editing would benefit from 4+ cores, but the average player Threadripper is just overkill?
  • SomethingUnusualSomethingUnusual Member UncommonPosts: 471
    Keller said:
    Isn't it true that for gaming one does not need more than 4 cores, you better off investing in a good videocard + I5 or Ryzen5? Streamers and video editing would benefit from 4+ cores, but the average player Threadripper is just overkill?
    More of a workstation chip. In gaming there would be very little difference as stated. 
    Keller

    Death stalks me... Well, figuratively that is. I get killed and people take my stuff.

  • QuizzicalQuizzical Member EpicPosts: 17,964
    A lot more clarity than I needed, thanks. So could this be solved elsewhere? If compiled instructions are lopped into a single size instruction that puts a major disadvantage in several potential applications. My first thought goes to a compiled database. I don't need a database to be all one word size, but I still need capability to do the math between the stored values. Would this mean in the case of vectoring my database that I now need external methods of pulling that information, converting it, calculating it, then re-registering as the new word format?
    I haven't dealt much with databases, but I'd be surprised if doing computations is commonly a big bottleneck for database work.  If you can't fit the whole database in memory, then it's likely that storage I/O is the bottleneck.  If you're having to spread it across multiple servers, the network is likely to be the bottleneck.  Even for a database that fits in memory on a single server, accessing memory could easily be the bottleneck, whether for reasons of latency or throughput.

    SSE and AVX instructions only help when doing computations is the bottleneck, as they're simply a faster way to do computations in certain situations.  They won't help you load data from memory or hard drives any faster.  More threads can help to cover up latency when that's what is limiting your performance, as I expect would sometimes happen.

    The issues of packing and unpacking data in the SSE or AVX vector registers only matter if you're actually using them.  Most code doesn't, as either it's not of the constrained structure that can benefit or else performance is fast enough that further optimization doesn't matter.  Or both.  If you want to add a float to a double as ordinary scalars, it will have to cast the float to a double first, but that's extremely fast as it's just moving bits around in some fixed pattern.
    Gdemami
  • SomethingUnusualSomethingUnusual Member UncommonPosts: 471
    edited May 22
    Quizzical said:
    A lot more clarity than I needed, thanks. So could this be solved elsewhere? If compiled instructions are lopped into a single size instruction that puts a major disadvantage in several potential applications. My first thought goes to a compiled database. I don't need a database to be all one word size, but I still need capability to do the math between the stored values. Would this mean in the case of vectoring my database that I now need external methods of pulling that information, converting it, calculating it, then re-registering as the new word format?
    I haven't dealt much with databases, but I'd be surprised if doing computations is commonly a big bottleneck for database work.  If you can't fit the whole database in memory, then it's likely that storage I/O is the bottleneck.  If you're having to spread it across multiple servers, the network is likely to be the bottleneck.  Even for a database that fits in memory on a single server, accessing memory could easily be the bottleneck, whether for reasons of latency or throughput.

    SSE and AVX instructions only help when doing computations is the bottleneck, as they're simply a faster way to do computations in certain situations.  They won't help you load data from memory or hard drives any faster.  More threads can help to cover up latency when that's what is limiting your performance, as I expect would sometimes happen.

    The issues of packing and unpacking data in the SSE or AVX vector registers only matter if you're actually using them.  Most code doesn't, as either it's not of the constrained structure that can benefit or else performance is fast enough that further optimization doesn't matter.  Or both.  If you want to add a float to a double as ordinary scalars, it will have to cast the float to a double first, but that's extremely fast as it's just moving bits around in some fixed pattern.
    Not a common practice, but active databases are certainly used. Arcade machines are a minor example, score and leader boards, and when a power outage happens at the arcade you get pissed when your high score goes away and the machine resets. 

    And thanks again! It's been nice to have some good conversations today.
    Post edited by SomethingUnusual on

    Death stalks me... Well, figuratively that is. I get killed and people take my stuff.

  • RidelynnRidelynn Fresno, CAMember EpicPosts: 5,896
    edited May 22
    Keller said:
    Isn't it true that for gaming one does not need more than 4 cores, you better off investing in a good videocard + I5 or Ryzen5? Streamers and video editing would benefit from 4+ cores, but the average player Threadripper is just overkill?
    For the average gamer, a Core i5 is overkill and 95% of what they do, they wouldn't notice a bit if they dropped to a Core i3 (or AMD). But that hasn't stopped people from buying (or recommending) Core i7's.
    Post edited by Ridelynn on
    Torval
  • CleffyCleffy San Diego, CAMember RarePosts: 5,586
    edited May 22
    I think an i3 would be a poor investment now. We have been moving into 4 threaded games for quite some time and i3s are just lagging. I doubt you will ever need more than a Core i5 that is less than 5 years old.
    As a developer you need to develop for the widest audience, not for a specific platform. What that means today is supporting as many threads as you can. A 4c4t is still going to be able to process 16 threads. An 8c16t is just going to process it more efficiently. Given the range of systems you have to develop for, I do see developers taking on a many threaded approach from now on while making it workable on common systems. It really doesn't make sense to poorly utilize the hardware in a PS4 or XBox One. Eight core machines are poised to be incredibly common with ARM and AMD chips.
    Post edited by Cleffy on
    Gdemami
  • RidelynnRidelynn Fresno, CAMember EpicPosts: 5,896
    Cleffy said:
    I think an i3 would be a poor investment now. We have been moving into 4 threaded games for quite some time and i3s are just lagging.
    While I tend to agree:


    I'm not recommending anyone buy an i3, I'm just saying, for most people, that's still plenty of CPU, but a lot of people buy a lot more CPU than they need, sometimes legitimately, but often not citing either "futureproofing" or benchmarks on some corner cases that they don't ever run, or very rarely encounter at the best.


    Its all a matter of budget: of course you'd buy something else if you had more money, but that isn't always an option.


    A desktop i3 is a 4-thread chip (2c4t), and for gaming purposes it seems to hold it's own in non-extreme cases.


    The only reason I haven't recommended an i3 over FX-x3xx in those budget cases is because I think the AMD chips present a better value once your in that budget range, even though for gaming purposes, they tend to lag some behind even the i3. Ryzen 3's and APUs may change that calculation a good deal I expect.

  • GdemamiGdemami Member EpicPosts: 10,629
    Ridelynn said:
    Ryzen 3's and APUs may change that calculation a good deal I expect.


    Looking at the price and spec of Ryzen 1400, I cannot see what AMD can offer to be competitive with $65 G4560.
  • RidelynnRidelynn Fresno, CAMember EpicPosts: 5,896
    edited May 22
    Gdemami said:
    Ridelynn said:
    Ryzen 3's and APUs may change that calculation a good deal I expect.


    Looking at the price and spec of Ryzen 1400, I cannot see what AMD can offer to be competitive with $65 G4560.
    I don't know if AMD will have a straight CPU that's competitive, but you still need a GPU to do anything gaming-wise with that Pentium. Add in a $100 GPU, and your up around $165. AMD hasn't introduced their Zen lineup that is intended to compete in the budget arena yet.

    The Ryzen 5's aren't competitive there, but they aren't really meant to be. I would expect a Ryzen-based APU to be competitive though - something with, say, a couple of Zen cores and a few Vega GPU units, could be extremely competitive with Intel's inexpensive Pentium, if you look at total system cost.
    Post edited by Ridelynn on
    Torval
Sign In or Register to comment.