Dell R220 / Lifecycle Controller (LCC) / Firmware Upgrade Failure

This week I have identified issues with the LifeCycle Control (LCC) BIOS level interface for updating Dell Power Edge R220 servers. Having attempted various resolutions revolving around documentation and different network connections – we went down the vendor support route. They have confirmed this, found a work around but failed to provide a solution moving forwards. This issue effects updating R220 system firmware from the LCC over FTP.

In terms of replicating the issue it boiled down to boot machine (god how I hate the blank screen while it decides what it is doing – call me old school but POST screens should be a screen not blank?) F10 for the LCC, then Firmware Update. Update from FTP. FTP server is as usual ftp.dell.com. Test network connection – resolution is good. Click on NEXT to connect.

CRITICAL
UNABLE TO DECOMPRESS THE CATALOG FILE. (SUP0506)
RECOMMENDED ACTION:
SELECT A REPOSITORY WITH A VALID CATALOG FILE,
AND RETRY THE OPERATION.

Dell_R220_LCC

The spelling of catalogue there being insult to injury there even to this dyslexic oaf.

So – checking over the ftp.dell.com – that is one HELL of a directory. Seemingly no rhyme or reason, and the listing alone taking an age to download. There is no issue with the service. Curious.

I had a stock of six R220’s on the bench – and tested each – same outcome.

The support team very diligently pushed this along. This was not a priority at the time, so progress was slower than it could have been.

Solutions went from latest firmwares over USB, to entire DVD’s of firmware to update the entire box (unlike the nice monthly HP Proliant ones which booted, scanned the hardware, and presented current versions and what are currently installed via a nice enough if unresponsive GUI – it appears to boot, and then just run through DOS based scripts, running EXE after EXE attempting to patch hardware that may or may not be there…. classy… and slow). No joy. Various permutations continued until we provided them means to use the iDRAC (equivalence to an iLO – an out of band management card / access – so you can see the screen from power on, and essentially control from off) on the box in question.

Watching them work remotely from a terminal was encouraging. It is good to see people failing in the same way that you were. However I would have expected a little less dithering / more finesse from the vendor.

On day two they regressed the firmware back to a version prior to the one we had on all of our stock of R220’s. It was then possible to use the FTP update tool to patch the firmware…. although… of course – taking care not update the BIOS or iDRAC code, as this would essentially break it again.

Glad as I was – this posed more questions than it answered really – and some of which came as a real surprise.

Their Level3 support seemed to know what it was straight away – which begs the question why there is not a fix for this already.

 

How far back did the regress the firmware after updating to most current and beta releases?

iDRAC Firmware Version 1.56.56 (Build 03) Release date: 23 Apr 2014

Lifecycle Controller Firmware 1.4.2.12 Release date: 17 Nov 2014

 

Is there a reason why the current, and pre release ones do not work?

They are investigating this at this time.

 

Has this been recreated within their test environment?

It has been now.

 

Have they seen other occurrences of this?

Yes.

 

What is the easiest way to fix this for an existing stock of R220’s?

They are looking to find the root cause, and then fix the current firmware. Once regressed to a version to patch, then upgrading to the fixed current should work.

 

Does this effect other Power Edge Servers?

Apparently the R220 is the only one that is effected.

 

If you are having issue with similar – it is something to consider. In which case I hope you found  this useful.

I am surprised that this – while having been reported previously – not something they were aware of enough to make that the go-to solution.

It is clear that not many people are using the Open Manage Server Administrator (OMSA) installation to single point manage things like lifecycle and updates. … or if they are it doesn’t work for R220’s.

Regretfully they are our most frequent customer deployment – and we *are* looking to deploy OMSA to compliment our existing monitoring in Paesler’s delicious PRTG.

12 Responses to “Dell R220 / Lifecycle Controller (LCC) / Firmware Upgrade Failure

  • My good friend. I have this same exact issue and I have a support ticket opened up with Dell since September 21st. They have no resolve for it and are “working on it”. No ETA. Did you ever find out what the issue was? Also what is your support ticket number so maybe the two support reps can work together to get this resolved.

  • Brad – hello.

    Yes, we did, of sorts – hence the post. That and the shock that if we have 10+ in stock not racked up – of differing ages – all with the same issue… then this is going to be a wider, if not global issue. I could not find anything on it online – so thought it was worth putting down the experience should someone (you in this case) have similar… as a goal that has been met.

    We went through the support hoops of applying most current firmware. Then the beta version. Then the 64bit version. Then the September boot-from-patch-o-matic-ISO full of firmware. Still no dice. They attempted to recreate the issue in their lab, and discovered an issue with one of the files on the FTP server and resolved that. Still no luck. They approached L3 support who advised them to roll it back. After a huge song and dance they enabled full iDRAC enterprise license on the host so they could remote work. Recreated the problem. Went through all the obvious steps. Repeatedly. Then rolled the firmware back to an OLD version. The old version connected without issue.

    SO – yes the problem has been identified, and yes there are means to resolve. HOWEVER if you do update – it ceases to function so the solution is mute.

    They are able to recreate the issue there – which means this must be an awful lot of Power Edge R220’s that are not going to be able to update, again, ever – unless they have manual intervention and have the firmware retrograded.

    I am not happy to share the ticket reference – however I am happy to share the surname of the long suffering support engineer – her surname is Thampi and would be UK support. While the solution was poor and took an age, their professionalism and diligence (almost) excuses the amount of time it took – and a lot of blaming the network connection.

    How have things panned out for you?

    I will update this once they have a real resolution moving forwards.

  • Dear Anthony,

    We have two R220’s with the same exact issue. I pointed our Dell Representative to this webpage so they could get an idea of what is going on hoping that maybe they can work on this in unison. I was not given half the support/information that you were given. I was asked to install the OME on a management node which is something I have no intentions of using or setting up at this time.

    So it looks like you were able to update everything with the Dell Repository all in one ISO. I might just have them send me that to update all my hardware firmware and worry about the LLC later. I will let you know if i find out any other information (i have this saved as a favorite).

    Also i greatly appreciate you posting this article. Hopefully Dell gets the hint that this is a huge issue that needs resolved. Please post an update if you find or hear of a resolution and I will do the same.

    For an entire week i also had to fight with Dell informing them that this was not a network issue on my side. It was a rather frustrating experience.

  • I am sorry to hear of your experience with the Dell Enterprise Support team. We seem to get through to the same people, so I assume they are UK specific – they certainly seem to work the same hours despite location. Their support was not the fastest – however this was partly down to it not being an issue or priority – more of a concern… I was not updating them very often. However they were exceptionally diligent in its pursuit which was appreciated.

    I was sent a lot of files to patch with. A few of which I managed to install. None of which worked. The ISO was a last ditch attempt. It was a bit of a blunt instrument in approach, but for hardware it located it patched if the version was higher on the disk. Nice. However – worth noting the reported BIOS release version for machines this was run on was the same as it was when it POSTs was unchanged after the patch CD had run. It certainly does seem to bring everything else up to speed though (PERC, NIC, etc.).

    Network issues. This was something I had been presented as a cause. More than once. I had tried this from an internal LAN range (where you could understand this), and an external allocation with the firewall off. A packet capture showed this up to be a standard FTP connection. I supplied times and IP’s for them to check their logs for access. My hope was this would show a request for a file. It being downloaded, and it underlining a local issue.

    On requesting means to enable a temporary enterprise license on the iDRAC – we left them to it. They reported back that the issue could be resolved by rolling the BIOS bask to an old version, and then the FTP issue did not occur.

    The titles above were my questions following this “solution” and the answers that followed.

    While it is great there is a means to resolve the FTP update issue. There is no advantage to this if using it breaks it again.

    You don’t have to be in an enterprise environment for this to be an issue.

  • On follow up yesterday I got the following this morning:

    Apparently, the bug is already visible to the LC development guys and they are working on it. I have sent them an email on what workaround can be suggested till the next release is out and I am awaiting a response from them.

    Got an update from the LC team that they are still root causing this issue however there is a suspicion on the role of the server BIOS in this matter.
    The workaround for this is what we have currently done, get down to 1.4 versions for the LC. Please don’t attempt to downgrade the BIOS .
    I will get back to you once I get the fix from the Product Team.

    …likewise, Brad, or in fact anyone else – if you have any updates – do let me know.

  • I spoke with our Dell reps and they said that since this is an ongoing issue they were going to replace our r220’s with r320s. However they said since it was past the return period they have to get approval for the transaction. This is frustrating since I put in a ticket about 3 days after i received the servers which tells them that i had a problem since day 1. However, it took 3-4 weeks for Dell to finally realize there was an issue on there side which pushed me over the return window. Then our Dell Representative asked us if we could buy the 320s and keep the r220s for spares or repurpose them because they didn’t want to jump through all the hoops. I am at my wits end right now with Dell over this issue.

    Please let me know if Dell ends up resolving this issue. If it gets resolved before Dell ships me the new r320’s then i might just stick with what we have. I appreciate all the help.

  • Brad – hello.

    The Life Cycle Controller (LCC) is something that is key to remote management tie ins (such as OMSA), and ‘what the name suggests’ in terms of asset tracking. Otherwise it is something that slows up a box restart that even in the most ideal of situations always takes far longer than you would like it to.

    What surprises and concerns me the most is that I have highlighted this. What has essentially been an issue on this platform for anyone who has a newer BIOS or has updated from an older one will be unable to update again. While – as a group we are pretty much Dell / Cisco / APC all the way this is apparently “news to Dell” – to the point you found this article as opposed to your engineer saying “oh its this issue”. This is kinda odd if you ask me and highlights both failures in support knowledge and communication – and obviously in release / testing.

    The inability to update the firmware from the LCC is no great loss. If you absolutely have to update I believe it can still be done from the OS if the correct applications are installed? However we were looking to have that “from boot” knowledge that this was one more vector covered, another door bolted, one less hardware failure lurking.

    We wont be moving to another vendor – it is not a deal breaker – however what I do think its piss poor from a leading enterprise vendor.

    I don’t believe you would find this with HP kit – and it shouldn’t have to be a case of paying a premium for something that works as advertised.

    I trust you advised your Dell Representative where they could place their suggestion. Why on earth would you want to BUY R320’s when you spec’d and bought R220’s that you now cannot use? They seem to need an introduction to the clue stick!

  • We have waited over 8 weeks to get a replacement sever from Dell and are still waiting for approval. I am flabbergasted with the service that we are currently getting from Dell. They agreed the issue was on there side and there was nothing they could do to fix it at the moment. Agreed to give us replacement servers. Now we have been delayed almost 2.5 months due to a Dell issue.

  • Brad – I do not know what to say.

    I recently had a follow up call regarding the matter: This was the usual Customer Service fluff – which while it has its place, specifically with non engineering staff – I do not appreciate someone without a full grasp of the situation (not the engineer I had been dealing with) calling up, not knowing what we did, who we are, or how this could feasibly cause us an issue. We are a big Dell hardware user (we only have 24 racks of the stuff on this site, but this is *not* a DC – we have 10 other locations in the UK that are) – the concept of ‘do you have many’ – or still trying to get mileage of out ‘we must have tested with different network hardware’ …. then on being challenged on it not being a network issue countered with ‘I am just guessing’ …. not ideal.

    I raised this on twitter – and they went back to the engineer – which to be fair – she has done a fantastic job…. however its others trying to pick up the ball and run with it and meet customer service as opposed to engineering goals that has caused me pain on this.

    As apparently this is not a priority (a BIOS fault that went unnoticed / undiagnosed for over a year?) – this will be released Q1 2016.

    Otherwise they state that a version 1.4 BIOS will work – just DO NOT ATTEMPT TO REGRESS IT … which is as good as not knowing or not having a solution IMHO.

    They have however issued a new link to the CD for updating to current firmware. A small victory.

    Realistically this for us is an obstacle to introducing a new policy / best practice. It is not preventing us from using hardware. However what it is showing clear chinks in the QA, BIOS code, functionality – that have gone undiagnosed, and unfixed for over a year. It also makes me question who – if anyone is using the FTP BIOS tools.

    How is this preventing you from using these hosts?

    Please do keep me abreast of any changes / updates regarding this.

  • Same issue here on 2 x R220’s

  • Just received our R320 replacement servers in just under 11 weeks. Took forever but we did get the R320’s and they worked flawlessly.

Trackbacks & Pings

  • Bonus root certificate - Zeros & Ones :

    […] / BIOS level management and access – its going to be a lot harder to scrutinise…. hell, *THEY* do not check over it… does this not ring bells […]

    3 years ago

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this:
Skip to toolbar