Quantcast

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

A power outage is estimated to have destroyed 6 exabytes of NAND.

QuizzicalQuizzical Member LegendaryPosts: 22,099
https://www.anandtech.com/show/14596/toshiba-western-digital-nand-production-partially-halted-by-power-outage

I didn't call it 6 EB in the title because people wouldn't recognize what that means.  Anyway, it's estimated that it is about two weeks worth of the global supply of NAND that was destroyed, or half of what Toshiba and Western Digital will produce in a quarter.  This is likely to result in higher SSD prices than there would be otherwise for a while, though nothing like the really high hard drive prices that happened some years ago due to flooding in Thailand.  This affects only one of the four major NAND flash vendors.

Apparently a 13 minute power outage is all that it took to do that.  If it shuts down a bunch of equipment in the process of various stages of producing chips, that could destroy all of the chips that were in process at the time.
Dakerubartoni33Hashbrick
«1

Comments

  • centkincentkin Member RarePosts: 1,526
    They should have invested in some serious UPS gear.  I mean if you are that power critical, you should have backups of backups of power.
    gunklackerBlaze_RockerGdemami
  • QuizzicalQuizzical Member LegendaryPosts: 22,099
    centkin said:
    They should have invested in some serious UPS gear.  I mean if you are that power critical, you should have backups of backups of power.
    It's not really that simple.  For starters, I'm sure that they do have considerable backup capabilities.  Otherwise, every time the lights flicker, they'd have this sort of disaster.  This time it was a 13 minute power outage, which was presumably long enough for their backups to run out of power.

    Second, I'm not sure just how much power they use at that site, but I'd be shocked if it's not well into the megawatt range.  Having battery backups capable of cranking out some megawatts of power for very long times would cost a fortune.  At some point, you're better off just accepting a risk of a considerable production loss.

    Third, fabs are clean room facilities that must be kept pristine.  A speck of dust too small to be seen by the unaided human eye is plenty enough to do serious damage, and possibly make an entire chip unusable.  So some power generation options simply aren't viable.

    I once read a story (in a comment section, so possibly apocryphal) about a fab decades ago that had a bunch of generators on hand in case of a lengthy power outage.  When the big power outage finally hit, they fired up the generators, and it looked like everything was good.  Then not long after that, they noticed that their yields were awful.  After investigating, they found that fumes from the generators had caused a lot more damage than just letting the power go out would have.
    Octagon7711DakeruHashbrickZenJellyalkarionlog
  • centkincentkin Member RarePosts: 1,526
    Actually when you need that much power backup, a lot of times it is easier to pump water uphill or such.  You can set things up to have a goodly amount of power for any specific time you want.  It is the kind of thing power companies do when they have excess supply.

    Battery banks are there for things to last long enough to put something like the above into action.
    GdemamiKezBot
  • RidelynnRidelynn Member EpicPosts: 7,060
    Seems these outages always happen conveniently when nand starts to get too cheap
    GaladournAsm0deusGdemamirojoArcueid
  • VrikaVrika Member EpicPosts: 6,425
    edited June 2019
    centkin said:
    Actually when you need that much power backup, a lot of times it is easier to pump water uphill or such.  You can set things up to have a goodly amount of power for any specific time you want.  It is the kind of thing power companies do when they have excess supply.

    Battery banks are there for things to last long enough to put something like the above into action.
    I think hydro power plants don't have high enough power output. If their backup power couldn't function for 13 minutes then they must either use insanely high amounts of power or their backup systems failed.


    EDIT: On second thought, it's also possible that they didn't have backup for anything longer than evening out a power spike. I remember reading about one industrial location where instead of building backups they had built next to a major high voltage line transferring electricity from several hydro dams. That high voltage line's last outage had been in 1970s.
     
  • RidelynnRidelynn Member EpicPosts: 7,060
    edited June 2019
    The odds that they didn't have backup beyond UPS is almost 0. I'm certain they did in some form or fashion.

    And no, it almost certainly wouldn't be hydro storage. Almost all backup power globally is in the form of diesel generator sets, with various forms of UPS storage (flywheel, lead acid, lithium, etc) in front to to give you enough time to start and transfer load over. Lithium batteries are making an inroads, but they have a very long way to go to be competitive versus a diesel set for standby power. UPS power on this scale is typically measured in seconds, not minutes.

    A plant this size would certainly be megawatts in size. Maybe tens of megawatts. Probably not hundreds of megawatts though. Hydro tends to be in the range of 50-2,000 MW, and it's only really cost effective if you have the right location and can look at it over decades of service. A chip fab isn't going to be looking at decades for a power source. It is possible they pay for two independent utility feeds (and that isn't uncommon for larger facilities as a form of backup), and one of those utility feeds sources from Hydro, but that isn't the same thing as WD or Toshiba owning and operating the dam themselves.

    That being said - even with a backup, no power source is 100% reliable. Typical utility is already 96-98%, add a UPS and your up to 98-99%, add a diesel and your at 99.5%. You can keep paying a whole lot of money to keep adding to the decimal point, but you will never hit 100%.

    With that though, given the market conditions right now, I tend to think it was an "accident", rather than a real accident. The players in this game are just too shady and have been pulling this for too long for me to really think it was anything else.
    TorvalHashbrickGdemami
  • Octagon7711Octagon7711 Member LegendaryPosts: 8,967
    Ridelynn said:
    Seems these outages always happen conveniently when nand starts to get too cheap
    When I first read that the first thing I did was to wonder what really happened.  It sounded like a good cover story.  
    Gdemami

    "We all do the best we can based on life experience, point of view, and our ability to believe in ourselves." - Naropa      "We don't see things as they are, we see them as we are."  SR Covey

  • QuizzicalQuizzical Member LegendaryPosts: 22,099
    If they thought that there was too much NAND on the market and they wanted to respond by reducing their production, they're quite capable of doing so in a far more orderly manner.  Here's a story about Micron announcing that they're going to reduce NAND production by 10% in the latter half of this year:

    https://www.anandtech.com/show/14594/micron-shipments-of-3d-qlc-for-ssds-nearly-double-qoq-as-wafer-starts-cut-again

    And they're going to do it without halfway producing and then ruining a bunch of wafers, or screwing up equipment so that you have to take a ton of effort to recalibrate it, or a number of other expenses that Toshiba and Western Digital are going to have to deal with.

    An intentional "accident" makes no sense at all unless it's covered by insurance--in which case, it would be criminal fraud.  And I'd be skeptical about insurance companies being willing to underwrite this at all, or at least not without a ton of measures to detect and catch fraud.
    Scot
  • ScotScot Member LegendaryPosts: 13,322
    I would have thought an emergency generator would be the solution here, considering how much is wasted with what seems to me to be a fairly short power outage.
    k61977

     25 Agrees

    You received 25 Agrees. You're posting some good content. Great!

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    Now Doesn't That Make You Feel All Warm And Fuzzy Inside? :P

  • k61977k61977 Member RarePosts: 1,325
    Scot said:
    I would have thought an emergency generator would be the solution here, considering how much is wasted with what seems to me to be a fairly short power outage.
    This is what I was thinking.  They should have diesel powered back up generators on site.  Really doubt they rely on something like UPS only.  Maybe someone stole the diesel an they couldn't start the generators. 
    Gdemami
  • ScotScot Member LegendaryPosts: 13,322
    k61977 said:
    Scot said:
    I would have thought an emergency generator would be the solution here, considering how much is wasted with what seems to me to be a fairly short power outage.
    This is what I was thinking.  They should have diesel powered back up generators on site.  Really doubt they rely on something like UPS only.  Maybe someone stole the diesel an they couldn't start the generators. 
    Taking my lead from the conspiracy thread about companies I just posted to, maybe they sold the diesel and intend to claim the insurance? That thread is as barking as that. :)

     25 Agrees

    You received 25 Agrees. You're posting some good content. Great!

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    Now Doesn't That Make You Feel All Warm And Fuzzy Inside? :P

  • GeezerGamerGeezerGamer Member EpicPosts: 8,826
    The problem with emergency generators, is there is still a momentary interruption during cut over. That's what ruins the equipment. 
  • TorvalTorval Member LegendaryPosts: 19,960
    edited July 2019
    The problem with emergency generators, is there is still a momentary interruption during cut over. That's what ruins the equipment. 
    During my navy stint we had a really freaking huge battery array for emergency power. That is what would immediately kick in when the reactor scrammed (emergency shutdown) or turbine generators went offline. The battery would carry the load for a short time until we could get to periscope depth and start the diesel generator. That was 30 years ago. I would think modern systems would have a similar redundancy scheme. Maybe not.

    The power outage was 3 hours. Our batteries couldn't carry the load that long but the diesel generators could easily do so. We just couldn't do much else because the "outboard" electric propulsion system was teeny tiny compared to the ship. I don't quite get how they haven't designed their system with these scenarios in mind. It smacks of poor engineering.
    HashbrickGdemami
    Fedora - A modern, free, and open source Operating System. https://getfedora.org/

    traveller, interloper, anomaly, iteration


  • HashbrickHashbrick Member RarePosts: 1,851
    Could it have been prevented?  Probably but with companies this massive and this size with so many plants and facilities, there is going to be the step child.  The one facility that gets zero attention, no upgrades, just there and neglected, produce til broken.
    [[ DEAD ]] - Funny - I deleted my account on the site using the cancel account button.  Forum user is separate and still exists with no way of deleting it. Delete it admins. Do it, this ends now.
  • VrikaVrika Member EpicPosts: 6,425
    edited July 2019
    Torval said:
    The problem with emergency generators, is there is still a momentary interruption during cut over. That's what ruins the equipment. 
    During my navy stint we had a really freaking huge battery array for emergency power. That is what would immediately kick in when the reactor scrammed (emergency shutdown) or turbine generators went offline. The battery would carry the load for a short time until we could get to periscope depth and start the diesel generator. That was 30 years ago. I would think modern systems would have a similar redundancy scheme. Maybe not.

    The power outage was 3 hours. Our batteries couldn't carry the load that long but the diesel generators could easily do so. We just couldn't do much else because the "outboard" electric propulsion system was teeny tiny compared to the ship. I don't quite get how they haven't designed their system with these scenarios in mind. It smacks of poor engineering.

    A large fab uses really lot of power. Using Google I was able to find out figure of 100 MW.

    That area has 5 fabs. Assuming 100 MW per fab, they'd need 500 MW of backup power.

    Backup power is usually generated with large diesel generators, for example these 7m x 3m x 2,5m generators:
      


    To generate 500 MW they'd need to have 200 generators like this running simultaneously. 
    Gdemami
     
  • TorvalTorval Member LegendaryPosts: 19,960
    Vrika said:
    Torval said:
    The problem with emergency generators, is there is still a momentary interruption during cut over. That's what ruins the equipment. 
    During my navy stint we had a really freaking huge battery array for emergency power. That is what would immediately kick in when the reactor scrammed (emergency shutdown) or turbine generators went offline. The battery would carry the load for a short time until we could get to periscope depth and start the diesel generator. That was 30 years ago. I would think modern systems would have a similar redundancy scheme. Maybe not.

    The power outage was 3 hours. Our batteries couldn't carry the load that long but the diesel generators could easily do so. We just couldn't do much else because the "outboard" electric propulsion system was teeny tiny compared to the ship. I don't quite get how they haven't designed their system with these scenarios in mind. It smacks of poor engineering.

    A large fab uses really lot of power. Using Google I was able to find out figure of 100 MW.

    That area has 5 fabs. Assuming 100 MW per fab, they'd need 500 MW of backup power.

    Backup power is usually generated with large diesel generators, for example these 7m x 3m x 2,5m generators:
      


    To genereate 500 MW they'd need to have 200 generators like this running simultaneously. 
    The scale would require a non-standard solution for sure. Something to consider is that power requirements aren't static ratings. Is that 100MW a peak draw? Does it use 100MW over a year, month, or over an hour? That sort of power is huge and it would surprise me if it was designed to handle multiple peak loads simultaneously even with its normal power source.

    Still if the stakeholders are okay with the risk and rate of failure compared to the cost of supplying backup and redundancy then there isn't much more to say.

    It would be interesting to see an analysis of market conditions and financial factors compared to these power failure events. Could there be market manipulation or are we inferring something false due to frequency illusion (Baader-Meinhoff). Both are plausible.
    QuizzicalGdemami
    Fedora - A modern, free, and open source Operating System. https://getfedora.org/

    traveller, interloper, anomaly, iteration


  • TillerTiller Member EpicPosts: 8,876
    centkin said:
    They should have invested in some serious UPS gear.  I mean if you are that power critical, you should have backups of backups of power.
    lol yeah that might have run a light bulb for 10 min, not production equipment, let alone all the servers and computers. Hell one full sever rack can consume enough power to run 4 houses, think up to six 50amp cables for one full rack. c :D 
    SWG Bloodfin vet
    Elder Jedi/Elder Bounty Hunter

  • QuizzicalQuizzical Member LegendaryPosts: 22,099
    Remember also that this isn't just trying to keep light bulbs on.  As I said above, one fab decades ago did have a bunch of diesel generators to keep everything running in case of a power loss.  The exhaust from those generators overwhelmed the air filters for the clean rooms and caused much worse loss than merely losing power would have.  I'm sure that's a solvable problem, but the question is at what cost.

    In the comments to the article linked from my original post, one of the AnandTech contributors says that Global Foundries has 3 minutes worth of backup power on hand to keep things running in case of a power outage.  An outage longer than that just means that a bunch of parts are fried.  As they're a pure-play foundry that produces custom chips for customers, not a commodity like NAND, they don't even benefit from higher prices resulting from a shortage due to a bunch of production being destroyed.
    Torval
  • QuizzicalQuizzical Member LegendaryPosts: 22,099
    Torval said:

    Still if the stakeholders are okay with the risk and rate of failure compared to the cost of supplying backup and redundancy then there isn't much more to say. 
    That right there is the key.  No matter how much redundancy and backup you have, a sufficiently big disaster can still overwhelm it and cause an outage.  While you obviously need some sort of backup so that a brief flicker doesn't fry everything, the extra cost per extra unit of reliability increases greatly as you get closer to 100% reliability.  At some point, you have to say, this amount of backup is good enough and we'll live with the remaining risk.
    Scot
  • jackwl89jackwl89 Newbie CommonPosts: 6
    A friend of mine lost his newly built gaming rig due to power outage, pretty nasty stuff. Now he uses UPS devices 
  • VrikaVrika Member EpicPosts: 6,425
    jackwl89 said:
    A friend of mine lost his newly built gaming rig due to power outage, pretty nasty stuff. Now he uses UPS devices 
    If he lost his gaming rig due to outage then he's entitled to get it repaired by warranty. Components must be durable enough for normal use, and that includes occasional power outage.

    If he lost the rig because there was a power spike in the network then warranty doesn't cover those.
     
  • QuizzicalQuizzical Member LegendaryPosts: 22,099
    jackwl89 said:
    A friend of mine lost his newly built gaming rig due to power outage, pretty nasty stuff. Now he uses UPS devices 
    To "lose" a gaming rig is ambiguous, so I will assume that what you really meant is that he misplaced it and couldn't find it because the lights were out.  </sarcasm>

    More seriously, even if something is covered by warranty, it's better not to need the warranty.  It can also be hard to figure out what broke when a computer doesn't work.
  • GdemamiGdemami Member EpicPosts: 12,175
    Ridelynn said:
    Seems these outages always happen conveniently when nand starts to get too cheap
    ....yeah, no doubt it was intentional to boost sales of tinfoil hats.
    parrotpholk
  • BloodaxesBloodaxes Member EpicPosts: 4,371
    DMKano said:
    jackwl89 said:
    A friend of mine lost his newly built gaming rig due to power outage, pretty nasty stuff. Now he uses UPS devices 
    Did he just lose the power supply? Motherboard?

    Saying "lost a gaming rig" sounds way worse - like the whole thing was a total loss - which is not going to happen.
    I won't say about his rig, but when I left mine plugged during a thunderstorm it booted up at first. What I noticed was complete freezing with a hissing noise every couple minutes which had to be hard rebooted. Took it to the store and he took pity of me (act of god is not covered in warranties afaik) and switched my psu, motherboard and cpu for free and sent the ram overseas for replacement (Took a month to receive back...). When I got the ram back, same thing was happening and they told me the ram they send me was faulty and had to resend it overseas (another month "yay"...).

    After that, the rig would run normally but once every blue moon it would freeze yet again. So, I tried switching the GPU (it was either that or the harddrive) and it worked. I'm not sure which component was at fault but an almost complete replacement was necessary for mine to work again. I even switched my keyboard and mouse for an upgrade as an excuse on the whole situation :P

  • VrikaVrika Member EpicPosts: 6,425
    edited July 2019
    Bloodaxes said:
    DMKano said:
    jackwl89 said:
    A friend of mine lost his newly built gaming rig due to power outage, pretty nasty stuff. Now he uses UPS devices 
    Did he just lose the power supply? Motherboard?

    Saying "lost a gaming rig" sounds way worse - like the whole thing was a total loss - which is not going to happen.
    I won't say about his rig, but when I left mine plugged during a thunderstorm it booted up at first. What I noticed was complete freezing with a hissing noise every couple minutes which had to be hard rebooted. Took it to the store and he took pity of me (act of god is not covered in warranties afaik) and switched my psu, motherboard and cpu for free and sent the ram overseas for replacement (Took a month to receive back...). When I got the ram back, same thing was happening and they told me the ram they send me was faulty and had to resend it overseas (another month "yay"...).

    After that, the rig would run normally but once every blue moon it would freeze yet again. So, I tried switching the GPU (it was either that or the harddrive) and it worked. I'm not sure which component was at fault but an almost complete replacement was necessary for mine to work again. I even switched my keyboard and mouse for an upgrade as an excuse on the whole situation :P
    Thunderstorms can do that because there's a huge electric spike. The energy from lightning always goes somewhere, and if it goes inside your computer it'll break everything in its path.

    A power outage is much less destructive because normally there's only a sudden power loss, and the device only has to be build so that it never breaks itself even if it suddenly loses all power.
     
Sign In or Register to comment.