Monday, September 5, 2016

The weird fuses in laptop batteries

I called them "externally triggerable fuse" in my first post. The datasheets call them "Combined Thermal Fuse/Resistor" or "Fuse-Resistance protector".
Chances are your first time seeing one of these will be in a smart battery, it sure was for me.

As an over-current protection device (ie. regular fuse) they're probably only going to trigger during catastrophic short-circuit scenarios. Their main purpose is to provide a way for the microcontroller to physically break the circuit if it detects a potentially dangerous condition like the overheating of the cells and can't stop it by shutting the FETs off. The problem is some firmwares will count the sudden disappearance/reappearance of cell voltages amongst the list of conditions that warrant doing this so if you want to re-cell a pack you might end up with a blown fuse.
Replacing or jumping them is a pain so it's a good idea to connect the Reset pin of the controller to ground before doing anything of the sort to keep it from overreacting.


The rectangular one




The Cyntec 12AH3 / 12AG3 (datasheet) or similar devices will seem like large capacitors on first sight, though the fact that they have 4 terminals might make you think "current sense resistor". Nope!

The black plastic cap attaches to the ceramic body with glue and you can tease it off with a pair or side-cutters, pliers or tweezers. Here's what you see under it:





Left: intact, Right: blown

What you have is a rectangle of "solder" (probably some special alloy) with a resistor/heating element under it in this configuration:


As you can see on the images above, when the device is triggered the thin layer of solder heats up, flows onto the center pad and breaks the connection. The heater is about a 10 ohm resistance so with a 12.6v battery voltage that's 14 watts dissipated on that tiny surface area (or more if the charger voltage is used)

To check this fuse you'd look for continuity on longer sides and for the heater resistance between either contact and Pin 4.
To jump it carefully dab older onto the broken connections. Use low temperature and thin solder. You can probably shave away at it afterwards and the fuse MAY be able to blow again.. unreliably.. and completely of spec.
The best course of action is of course to replace it with a new one once the repair is confirmed.

If the controller is blowing the fuse repeatedly you have an issue in your data or a hardware fault in the controller board or your cell connections.

The 3-legged one






The SEFUSE D6X / D6T (datasheet) is a weird looking thing that doesn't really resemble any other component.

It's the same deal with a different physical construction.





To jump it you break off the cap (if there is one) to find a resin coated square. Then you carefully scrape away at the coating roughly in the middle until you find a tiny via. This is the center point you see in the diagram above. Now you just need to reconnect it to the two leads where the fusible material broke the connection so tin the via and connect it to both leads (ignore the lead for the heater).

Here's a pretty terrible example:




As with the previous one, it's best to get a new one once the repair is confirmed.

Tuesday, August 30, 2016

Adding the M37512 with Panasonic/IBM firmware

Just "adding" because this battery controller is already public. You have the datasheet(pdf) which tells you the pin combination to enter the Boot ROM and most of the command set (how was the actual read command missed? weird). Then there are open-source flasher tools like this one. You can also use Google to find the passwords because you WILL need passwords (at least with this firmware) and that is after you set the correct pins to the correct states to enter the boot rom. Overkill? Yeah, overkill.

But since it's all out there it's just a matter of coding up a tool for SMBusb.




Quote:
 "Normal microcomputer mode is entered when the microcomputer is reset with pulling CNVSS pin low. In this case, the CPU starts operating using the control program in the User ROM area. When the microcomputer is reset by pulling the P24/SDA2/RXD pin high, the CNVss pin high, the CPU starts operating using the control program in the Boot ROM area"

After setting the pins to desired state and resetting the chip you get:

$ smbusb_scan -w 0x16
------------------------------------
             smbusb_scan
------------------------------------
SMBusb Firmware Version: 1.0.1
Scanning for command writability..
Scan range: 00 - ff
Skipping: None
------------------------------------
[0] ACK, Byte writable, Word writable, Block writable, >Block writable
[1] ACK, Byte writable, Word writable, Block writable, >Block writable
[2] ACK, Byte writable, Word writable, Block writable, >Block writable
[3] ACK, Byte writable, Word writable, Block writable, >Block writable
[4] ACK, Byte writable, Word writable, Block writable, >Block writable
[5] ACK, Byte writable, Word writable, Block writable, >Block writable
[6] ACK, Byte writable, Word writable, Block writable, >Block writable
*repeat for all commands*


Going at this blind would've been pretty terrible. This chip is waiting for the correct passwords and ACKing literally everything until it gets them.
Entering the correct passwords scoured from the internet:


$ smbusb_comm -a 0x16 -c 0xFF -w CDAB -b
$ smbusb_comm -a 0x16 -c 0xCF -w 3412 -b
$ smbusb_scan -w 0x16 -e 10
------------------------------------
             smbusb_scan
------------------------------------
SMBusb Firmware Version: 1.0.1
Scanning for command writability..
Scan range: 00 - 10
Skipping: None
------------------------------------
[0] ACK, Byte writable, Word writable, Block writable, >Block writable
[1] ACK
[2] ACK
[3] ACK
[4] ACK
[5] ACK
*snip*


It still ACKs every command but it's exposing the documented Boot ROM inteface now. Just don't scan it too much because writing the wrong thing to the wrong command will hang the controller and/or the entire bus which the SMBusb won't like too much either. (The Boot ROM in this chip has zero error handling.)

Some coding later:


$ smbusb_m37512flasher -w b0 -p b0
------------------------------------
        smbusb_m37512flasher
------------------------------------
SMBusb Firmware Version: 1.0.1
------------------------------------
Erasing flash block starting at 0xe000 ...
Done!
Writing memory 0xe000-0xffff ...
Done!
Verifying 0xe000-0xffff ...
Verified OK!

The tool is now a part of SMBusb.

I haven't done research into modification or resetting for this controller yet. Maybe in the future!

Monday, August 29, 2016

Hacking the R2J240 with LGC firmware


The second battery controller I looked at was the Renesas R2J240-10F020. It's a complete black box with very little information available except for some outtakes from the datasheet on Chinese developer forums. There is very little resemblance to the M37512, an older Mitsubishi/Renesas microcontroller used in earlier battery packs that's fairly well documented.


The first thing you notice is that this chip has the analog frontend integrated (unlike the M37512 or the bq8030) because there's no separate chip for measuring voltages and such. Cells are connected directly to this chip so it's a one-chip solution for building smart batteries.

SBS Report

$ smbusb_sbsreport
SMBusb Firmware Version: 1.0.1
-------------------------------------------------
Manufacturer Name:          LGC
Device Name:                LNV-42T4911
Device Chemistry:           LION
Serial Number:              41291
Manufacture Date:           2010.01.25

Manufacturer Access:        6001
Battery Mode:               e000
*snip*


Probing around



I started out by measuring voltages on all the pins. Just going by logic I was expecting some sort of differentiation on the various sides of the chip.

To summarize my findings after the first pass:
  • 1-12 is the "main microcontroller side" has the SMBus pins, VCC (and probably RESET and others)
  • 25-36  is connected to current sensing and exposes various built-in voltage regulators
  • 37-48  appears to be mainly unused with a couple of pins at 3.3v, GPIO side?
  • 13-24  has many pins connected directly to "high voltage" from the cells.
I took a 1k resistor connected to ground and started poking the pins with it to find reset. It should be possible to pull reset low through 1k resistor but unlikely on VCC and it shouldn't lead to a complete reset on an unrelated pin. It's also possible to rule out most pins through visual inspection and measurement. So long story short: Pin #12 is Reset.

Next I wanted to see if there's something like a Boot pin that's going to get me a different mode when pulled either low or high during reset so I started up a continuous command scan and started poking at the pins again.

Pulling Pin #4 (also connected to Test Point 1 on the other side of the PCB) low during reset gave me this:

$ smbusb_scan -w 0x16
------------------------------------
             smbusb_scan
------------------------------------
SMBusb Firmware Version: 1.0.1
Scanning for command writability..
Scan range: 00 - ff
Skipping: None
------------------------------------
*snip*
[f0] ACK, Byte writable
[f1] ACK
[f2] ACK
[f3] ACK
[f4] ACK
[f5] ACK
[f6] ACK
[f7] ACK
[f8] ACK
[f9] ACK
[fa] ACK, Byte writable, Word writable, Block writable
[fb] ACK, Byte writable, Word writable, Block writable
[fc] ACK, Byte writable, Word writable, Block writable, >Block writable
[fd] ACK, Byte writable, Word writable, Block writable, >Block writable
[fe] ACK
[ff] ACK

The chip was ACKing on every command. A deliberate attempt at confusing any would-be attacker perhaps? The write scan however reveals that the chip is actually exposing some real functionality on some of the commands and that a couple of them violate SMBus protocol.

Pin #4 appears to be BOOT (active-low).



Mapping

Mapping out the protocol took a while especially because it doesn't correspond to standard SMBus protocol but I was eventually able to figure out how to read and write to RAM and erase blocks of memory-mapped flash.
Just writing to the appropriate address in ram (after the flash blocks have been erased) writes the flash memory which is convenient.

There are several partitions of flash mapped into RAM and I'm sure I haven't found all of them. The ones I did are included as address&length presets in the flasher tool.



$ smbusb_r2j240flasher -d eep2.bin -p df2
------------------------------------
        smbusb_r2j240flasher
------------------------------------
SMBusb Firmware Version: 1.0.1
------------------------------------
Dumping memory 0x3400-0x37ff ...
Done!
$ xxd eep2.bin
0000000: 0000 0000 0000 0000 0000 ffff ffff ffff  ................
0000010: 4c4e 562d 3432 5434 3739 3700 0000 0000  LNV-42T4797.....
*snip*

$ smbusb_r2j240flasher -d eep3.bin -p df3
------------------------------------
        smbusb_r2j240flasher
------------------------------------
SMBusb Firmware Version: 1.0.1
------------------------------------
Dumping memory 0xc000-0xdfff ...
Done!
$ xxd eep3.bin
0000000: 0100 0700 b801 b801 1100 0203 0201 01e3  ................
0000010: e6fe e3ae 7000 e0e4 0cc8 0038 3150 14f0  ....p......81P..
0000020: 1530 2a4c 4743 0031 3100 0000 0000 0000  .0*LGC.11.......
0000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000040: 0000 004c 4e56 2d34 3254 3439 3131 0000  ...LNV-42T4911..
0000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000060: 0000 004c 494f 4e01 2d01 2d30 07fa 1031  ...LION
*snip*


In this particular battery pack the static information was stored in df3 and the dynamic in df2, df1 was empty.
Another battery stored dynamic info in df1 so this is going to differ between firmwares/packs.

Just like the bq8030 the static area is protected by a checksum on this controller/firmware as well. I took a shot at it just for kicks and it was pretty simple so I included it in the flasher tool.


$ smbusb_r2j240flasher -w eep3_f.bin -p df3 --fix-lgc-static-checksum --execute
------------------------------------
        smbusb_r2j240flasher
------------------------------------
SMBusb Firmware Version: 1.0.1
------------------------------------
Erasing flash block starting at 0xc000 ...
Done!
Fixing LGC static checksum..
Done!
Writing memory 0xc000-0xdfff ...
Done!
Verifying 0xc000-0xdfff ...
Verified OK!
Exiting Boot ROM and starting firmware.

$ smbusb_sbsreport
SMBusb Firmware Version: 1.0.1
-------------------------------------------------
Manufacturer Name:          LGC
Device Name:                Karosium000
Device Chemistry:           LION
Serial Number:              41291
Manufacture Date:           2010.01.25
*snip*


Reset

Pretty much the same procedure as with the bq8030. Map and modify the dynamic area. Eventually you'll find the error flag. As with the bq8030 the dynamic area isn't checksummed in this controller/firmware either.

Helpful tips:
  • Again, multiple log entries. The number of 0x00 bytes at the beginning of the section determine the number. Patch the duplicated data in all of them.
  • You can decrease the number of log entries to 1 for the time of mapping which will make the job a lot easier.
  • The real cycle count is stored encoded. No idea how. With this particular firmware it was at 0x78-79. Zeroing out the bytes still decreases the cycle count to 5 but the precise algorithm/obfuscation? No clue.
  • Please don't ask me to fix flash dumps :-)
  • Good luck!
Notes

No disassembly/ler for this chip. I don't really know what architecture it's based on. If I had to guess I'd say an extended version of the MELPS 7700, an old Mitsubishi architecture that Renesas inherited because trying to load it up in IDA with that core seems to produce something that starts to make sense but fails on invalid instructions. I could be completely wrong though.

If anyone wants to tackle this they could probably find a nice, easy way of getting into the Boot ROM using just SMBus commands.

Sunday, August 28, 2016

Hacking the bq8030 with SANYO firmware

As mentioned in the previous article the bq8030 is the blank version of the bq20z90. If you bought some from Aliexpress they'd come up with the TI Boot ROM and you could use the flashing tool included in SMBusb to upload firmware and eeprom(data flash) to it.
Theoretically you could turn it into a bq20z90 by downloading the firmware from one and uploading that. (The procedure for accessing the Boot ROM on those chips is documented in datasheets and application notes.)



So how would you even start with a BQ8030 running proprietary firmware?

Google. Lots of Google.
Apparently they sell this tool for them::




Now with a SPECIAL! price of ONLY 3 THOUSAND US DOLLARS!! WHAT AN AMAZING DEAL!!!

I gathered everything I could find about this device and while it wasn't much it did provide clues that came in handy later on in the process. Especially this screenshot of the software that comes with it:




There was no way I could figure everything out based on just that but I did take notice of the function bar on the bottom.

Those could very well be SMBus commands right there.. would they have done that? Surely not.
Not really expecting much I tried a word write of 0x0214 to command 0x71 aand.. nothing obvious happened. So I moved on to poking at other things but eventually came back for a second look and that's when I realized:

Command scan starting at 0x70 before sending command

$ /smbusb_scan -w 0x16 -b 0x70
------------------------------------
             smbusb_scan
------------------------------------
SMBusb Firmware Version: 1.0.0
Scanning for command writability..
Scan range: 70 - ff
Skipping: None
------------------------------------
[71] ACK, Byte writable, Word writable
[72] ACK



And after


$ smbusb_comm -a 16 -c 71 -w 0x0214 
$ smbusb_scan -w 0x16 -b 0x70
------------------------------------
             smbusb_scan
------------------------------------
SMBusb Firmware Version: 1.0.0
Scanning for command writability..
Scan range: 70 - ff
Skipping: None
------------------------------------
[71] ACK, Byte writable, Word writable
[72] ACK
[73] ACK


So this actually unlocks an extra command which disappears again when an SBS command is issued (or when doing a full command scan starting from 0.)
The command however is not writable. Reading it returns:

$ smbusb_comm -a 16 -c 73 -r 2
023d

Interesting but insufficient.

Brick wall meet impatience

I couldn't really get any further with just that information so I started looking at the hardware instead. Having found slides from a TI presentation revealing the connection between the BQ8030 and bq20z90 I opened up the datasheet for the latter (since there's no public datasheet for the former).


Ok, nothing straightforward. No obvious BOOT pin as one would expect with a device that's not meant to be tampered with. But maybe pulling some pin high or low during reset will get me somewhere.

After the first pass no, not really. So maybe we have to set multiple pins into multiple states for it to work. Or maybe there's no such combination at all.
How about I try to abuse N/C pins instead. I have no logical explanation as to why I came to this decision. Maybe I saw a presentation somewhere about blackbox chips and N/C pins years and years and years ago but I could just be imagining things. Either way, about 5 minutes of poking at PIN #28 with a resistor connected to 3.3v in hand and triggering RESET at random intervals while running a continuous command scan:

$ smbusb_scan -w 0x16
------------------------------------
             smbusb_scan
------------------------------------
SMBusb Firmware Version: 1.0.1
Scanning for command writability..
Scan range: 00 - ff
Skipping: None
------------------------------------
[0] ACK, Byte writable, Word writable, Block writable
[1] ACK
[2] ACK
[3] ACK
[4] ACK, Byte writable, Word writable, Block writable
[5] ACK, Byte writable, Word writable, Block writable
[6] ACK, Byte writable, Word writable
[7] ACK, Byte writable, Word writable
[8] ACK
[9] ACK, Byte writable, Word writable
[a] ACK, Byte writable, Word writable


Wow, that worked?
Umm.. ok.. let's just reset for now..

$ smbusb_sbsreport
SMBusb Firmware Version: 1.0.1
-------------------------------------------------
Manufacturer Name:          ERROR
Device Name:                ERROR
Device Chemistry:           ERROR
Serial Number:              4294967287
Manufacture Date:           1980.00.00


Uh-oh.. Well that's not good!
It seems we're stuck in the Boot ROM. Is the chip fried? It's at this point that I coded up the flash tool to try and read the flash contents. (I wasn't really bothered by the chip dying as this was one of 2 sacrificial controller boards I kept just for messing around with.)
And the results? Apparently we can corrupt (ideally just) the first couple of blocks of flash if we bully PIN #28 while the chip is trying to start up. The good news though? (If we're lucky) We get 99% of the firmware, and thanks to Charlie Miller we have a disassembler(zip) for it.

Did messing with Pin #28 even have an effect? Could it just have been the erratic resetting of the chip that triggered the malfunction? Did I short VCELL+ to Pin28 while messing about? Was there high voltage on VCELL+? Was it just ESD?
No idea. But I did manage to reproduce the result on another chip using the same procedure. So when in doubt and you have nothing to lose, act like a caveman, I guess?
The only good thing about this method is that even if you have 0 knowledge about whether there even IS a method for entering the Boot ROM in the firmware let alone what it is there's still a high chance that you'll get in. How much of the firmware survives is another question.

Disassembly

A couple of hours of staring at unfamiliar assembly code later, here are the relevant parts for entering the Boot ROM with annotations:

cmd_handle_71
    ..      
    calls       smb_ACK
    ..
    calls       smbSlaveRecvWord
    move        a, (i3,0x1A)
    or          a, (i3,0x1B)
    jeq         check_71_pass
    move        r2, (i3,0x1B)
    add         r2, (i3,0x19) ; smb_word_LSB
    move        r3, (i3,0x1A)
    addc        r3, (i3,0x18) ; smb_word_MSB
    or          a, r3, r2
    jeq         accesslevel_oreq_40
    move        a, #0
    move        (i3,0x1A), a
    move        (i3,0x1B), a
   
check_71_pass:
    ..
    move        i1l, (i3,0x19) ; smb_word_LSB
    move        i1h, (i3,0x18) ; smb_word_MSB
    cmp         i1h, #2
    jne         wrong_pass
    cmp         i1l, #0x14  ; is 71 0214?
    jne         wrong_pass
    ..
    jeq         accesslevel_oreq_80


This is the first password check, seem familiar? It's the one that we saw in the screenshot above 0x0214 to 0x71. It sets an access flag that gets checked later on. Basically if (smbSlaveRecvWord(0x71) == 0x0214) { access_level |= 0x80 }; But wait.. It can set two access flags based on whatever (i3,0x1A) and (i3,0x1B) are. Hrmm.. Well I don't know what those are and can't find where they're set so let's assume the first jeq will not jump once we've given the correct first password because it would make sense. We can also see that it checks the word we send against those mystery bytes somehow and if it likes what it sees it sets access flag 0x40 and the mystery bytes to 0.

A little bit further up we find the entry point for the Boot ROM:

cmd_handle_70:
    *snip*
    move        r3, access_level
    and         r3, #0x40
    cmp         r3, #0        ; don't even bother if access
    jeq         cmd_handle_71 ; flag 0x40 is missing          
    *snip*   
    calls       smbSlaveRecvWord
    move        r2, (i3,0x19) ; smb_word_LSB
    move        r3, (i3,0x18) ; smb_word_MSB
    cmp         r3, #5
    jne         wrong_pass
    cmp         r2, #0x17      ; is 70 0517?
    jne         wrong_pass
    *snip* (prepare leaving the firmware safely)
    calls       bootrom_execute

So now we know pretty much what we need to do.

1. Send 0x0214 to 0x71
2. ???
3. Send 0x0517 to 0x70
4. Profit

And we've made the educated guess that Step 2 is really "Send 0x???? to 0x71" so we're pretty much done with the disassembly as 16 bits is way within the realm of bruteforceability and since I had another sacrificial board as well as a battery pack running SANYO firmware I had everything I needed to attempt it.
As it turns out there's another mandatory step between 1 and 2 and it was sheer luck that I left it in my brute force loop. 0x73, the command unlocked by sending the first password needs to be read before entering the second password. Which is...*drumroll*

0xFDC3

After realizing that the first unlocked command is important (why else would they have made it mandatory otherwise) it's not that surprising that when adding the number returned by it (0x023d) to the bruteforced value we get a nice round result: 0x10000 which is probably what the adding in the assembly and the mystery numbers are all about.

So to sum it all up:

1. Send 0x0214 to 0x71
2. Read Word X from 0x73
3. Send (0x10000 - X) to 0x71
4. Send 0x0517 to 0x70

Actually, sending the correct word in Step 3 will unlock several extra commands not just 0x70 for the BootROM entry but they all disappear as soon as you send an unrelated command much the same way as 0x73 does with the first password.

We don't really care about those though because we already have what we wanted:

$ smbusb_comm -a 16 -c 71 -w 0214
$ smbusb_comm -a 16 -c 73 -r 2

023d
$ smbusb_comm -a 16 -c 71 -w fdc3
$ smbusb_comm -a 16 -c 70 -w 0517

$ smbusb_bq8030flasher -p prg.bin -e eep.bin
------------------------------------
        smbusb_bq8030flasher
------------------------------------
SMBusb Firmware Version: 1.0.1
PEC is ENABLED
TI Boot ROM version 3.1
------------------------------------
Reading program flash
.............................................................
.............................................................
*snip*
.................................................
Done! 
Reading eeprom(data) flash
...................................................
Done!

$ xxd eep.bin
0000000: ffff 0031 076c 00c8 ffff 11f8 19e0 0355  ...1.l.........U
0000010: 0853 414e 594f 0030 3820 20ff ffff 0407  .SANYO.08  .....
0000020: 0b49 424d 2d34 3254 3532 3531 2020 2020  .IBM-42T5251   
0000030: 044c 494f 4e20 ffff ffff ffff ffff ffff  .LION 

*snip*


Huzzah!


Reset


To actually remove the permanent failure flag we need to look at the eeprom area.
The file is 2048 bytes and it has two sections.

The first 1024 bytes contains the static data (the beginning of which you can see in the hex dump above). It contains all the data set by the manufacturer that never changes during the lifetime of the battery. Design capacity/voltage, serial and model numbers, default settings, etc.
This part is protected by a checksum somewhere which you'll need to find and fix if you want to modify anything in there.

The second part contains the dynamic data. Basically the "log" of the battery with current remaining capacity and similar things that get updated as the battery is cycled.  Also, the failure flag.

You pretty much just need to start mapping out the values and then zeroing or FF-ing out the ones that you can't map to anything to see if that fixes it or breaks something else. There's no checksum on the dynamic area so you are free to modify this section all you want. Repeat until desired outcome is reached. That's what I did.
Some helpful tips:
  • On my specific battery the log starts at 0x500 and has several entries that all need to be modified (mostly duplicate data)
  • Battery capacity is stored as the remaining capacity reported through SBS divided by 2.
  • Cycle count is stored as CycleCount-1 (eg.: SBS value: 223, Eeprom byte: 222)
  • Remaining Capacity Alarm is stored as-is. A good place to start mapping.
  • It's a good idea to reset the cycle counter. I don't want to start conspiracy theories but... at least with this specific model there's been a lot that died inexplicably around the 200 cycle mark. Coincidence? Probably, but it can't hurt.
  • Please don't ask me to fix eeprom dumps :-) 
  • Good luck!
And the result:


It took the estimate a charge cycle to normalize. This particular battery lasts 2 hours on constant high CPU load after external recharging and clearing of the fail flag. Not bad for a 10 year old battery in a 12 year old machine and since the other choice was THROW IT AWAY AND BUY A NEW ONE  I consider this a win :)

PSA: DO NOT WRITE PROGRAM FLASH. You don't need to rewrite the firmware to recover a battery and there's an issue affecting some platforms/boards that WILL result in a brick if you do, especially if you're using the outdated Windows builds. https://github.com/karosium/smbusb/issues/10
You just need to read/write the data/eeprom flash.

Friday, August 26, 2016

SMBusb - Hacking smart batteries

Having gone deep down the rabbit hole of researching smart laptop battery controllers I've ended up reverse engineering a couple of them used in ThinkPad batteries. Looking around there's very little software available out there for working with battery controllers in general and most of them cost hundreds or even thousands of dollars. Usually the chips' datasheets aren't even publicly available. (Aside from a few outtakes from chinese developer forums)


So why would you want to mess with a smart battery controller anyway?



Consider the case of one ThinkPad X100e I purchased a few months ago. Battery dead. Querying the controller reveals it's had 43 charge/discharge cycles so the cells are practically new! And yet the controller was in permanent lockout mode due to a single overdischarge condition getting logged. Trying to charge the battery in the laptop resulted in a rapidly flashing charge LED indicating charge failure.

Some people will say "Well that's by design, a single overdischarge turns Li-Ion cells into potential fire-bombs!"

If that was true there would've been a lot more cases of batteries setting houses on fire back in the early 2000s as "0V deep discharge recovery" used to be a feature in some laptop battery pack controllers back then. It's not nearly as bad as some urban legends would have you believe.
Clarification: True 0V discharge kills Li-ion cells and you shouldn't attempt to recover them. The old controllers tried to fix over-discharged cells and they succeeded in the majority of cases because you rarely have a true 0V scenario unless you have a shorted cell or something. The new controllers on the other hand happily commit suicide at over-discharge voltages where cells are still easily and safely recoverable with little to no capacity loss.
 
Take the battery pack I mentioned above for example. After recharging the cells externally while carefully monitoring their temperature and reprogramming the controller the pack was fine. It ended up with 100% the design capacity lasting 3.5 hours (it's a power hungry AMD system with an ATI GPU)

For the manufacturer, taking no chances from a legal standpoint is understandable but sometimes they can go a little overboard. A 3.5hr battery ending up in a landfill is not the best outcome.

Another reason would be for re-celling the pack. You have some options when the battery of your laptop dies: Replace the whole shebang because it's too slow anyway, replace the battery with a factory one for $100+, buy a chinese knockoff for $20-50 or re-cell with good brand cells for $20-50ish.

As demonstrated in my previous article about laptop batteries, keeping the original controller board has certain advantages such as temperature monitoring and an extra safety feature. It's also a good bet that brand cells will survive more cycles without capacity loss than ChangJiang ones. (I've actually recently tested a 6-cell LG pack at 540 cycles and it held a 1.5Amp load for 80 minutes.. a ChangJiang pack at 230 cycles petered out at 50 minutes. A sample size of two and the CJ pack was lower capacity to begin with (4.4AH vs 5.2) but still...)


Ok, so how do you talk to these controllers?


SMBus, which is I2C's impatient cousin with nasty hard timeouts and bulk transfer modes not standard in I2C along with a RESTART condition in addition to the START and STOP you normally deal with (... or just 1-Wire in more exotic hardware but let's not think about that).

How nasty are the timing constraints? You can implement I2C through a USB serial dongle's flow control pins. You can't implement SMBus because your code, the windows API, driver, USB stack and controller firmware response overhead added up is over the limit (especially if your serial thing uses only USB1.1). You can't keep the clock idle for over 35 milliseconds or the controller drops the conversation.


But don't go TOO fast because SMBus can't go over 100Khz (I2C can do 400). All that rushing and it can't keep up the pace in the long run, figures..

Unfortunately this makes it cumbersome to implement a dumb SMBus interface on a non real-time system like a PC alone (not counting the actual interface between the EC/SMC and the SMBus devices in the system which is usually not accessible directly and/or isn't usable for more involved operations)

Did I mention that different controllers seem to have slightly different timing requirements?

The most well supported interface for speaking with smart battery packs is the Texas Instruments EV-2300. It's a fairly expensive piece of kit but it has the advantage of being able to work with TI's own software which has a myriad of options for the publicly available TI battery controllers running the TI firmware but more on that later.


 

What's next?


Once reliable SMBus communication is established you will have access to a battery controller that talks the SBS Standard! It's a standard so we're basically done now. We just look at the standard documentation on how to clear any failure condition and reprogram the controller to our desired parameters, easy!

Ha! Yeah, no. The SBS standard only defines the very basic functionality for a smart battery controller. The controller will have something sometimes referred to as a "sealed" mode which is where it's only going to service the limited SBS standard requests. This will be the standard mode for the battery as that's all that laptops need. No touchy on any of the settings that really matter and certainly no clearing of any error conditions.

They'll sometimes have a command to "unseal" and enter full access mode where some or all settings are modifiable with a mixture of standard and unstandard commands. Most of the time this unseal command will be password protected or worse. Some TI controllers are like this while other controllers will only implement a sealed mode for SBS compliance and a completely proprietary bootloader mode to modify flash memory content directly (which may or may not also be entered in a similar way). It's really a mixed bag.

What's similar in all SBS microcontrollers is that getting into the Boot ROM where you can access and modify flash memory is hindered by the firmware that's already running on the device.


TI


Texas Instruments is a BIG player in the smart battery world. They sell their battery management microcontrollers and analog frontends to pretty much everyone making battery packs. Some companies buy the whole package that is, the microcontrollers with TI-developed firmware. The security model mentioned above (sealed/unsealed/bootrom) is actually used mostly by the TI firmware. These pre-programmed micros under the "bq"-line usually have a letter in their model number such az the bq20z90. What's somewhat less known is that the bq20z90 is also sold WITHOUT the TI firmware as the bq8030(DBT). At the very least SONY and Sanyo have opted to buy the bq8030 with the accompanying (super-secret) SDK and develop their own firmware for their battery packs with it.

Did they not trust TI's expertise? Was the firmware too expensive to license? Did the TI firmware not do everything they wanted? Who knows! But the fact is production bq8030s are pretty much complete black boxes as they run a proprietary firmware and don't work at all with TI's own software. Other bq series are sold similarly as both pre-programmed and unprogrammed versions. The bq8050 would be another example though I don't know what the pre-programmed counterpart's model number is.

The EV-2300 will still talk to the chips running 3rd party firmware but the TI software will not work for anything other than simple SBS reporting.
The ONE thing the pre-programmed and unprogrammed chips will always have in common though is the TI Boot ROM. The challenge? Actually getting into it if there's already firmware on the chip. With the TI firmware the method for doing so is fully documented: You need to send two passwords to two SMBus commands. You can even find default passwords that will work in case the pack producer forgot (or didn't care) to change it, such as one particular series of Apple MacBook battery packs. (Charlie Miller did a LOT of work on this including making a cpu module (zip) for IDA to disassemble the proprietary Xemics CoolRISC core and modifying the TI firmware. Go check it out (pdf))

With a Sanyo or SONY or other 3rd-party firmware however, you're on your own! The procedure to access the Boot ROM is not public and neither are the passwords (if there even are passwords). The latter would also be true for most battery packs using the TI pre-programmed chips as pack manufacturers tend not to use the default passwords

 

SMBusb


So the obstacles to hacking smart batteries are numerous but let's take it one step at a time. First up.. the interface. Introducing SMBusb, a USB SMBus interface based on the Cypress FX2LP CY7C68013A(datasheet) USB Microcontroller or more specifically the dev-board that's available all across eBay for around $5 shipped.

This one here:

Search for: FX2LP board
 But all FX2LP based boards should work as long their firmware isn't pre-programmed into the EEPROM.



It's open source as far as firmware and software goes, comes with a library so you can easily use it in your own projects and includes a few tools to aid SMBus and smart battery hacking such as:

$ smbusb_scan -a
------------------------------------
             smbusb_scan  
------------------------------------
SMBusb Firmware Version: 1.0.0
Scanning for addresses..
Skipping: a0 a1
------------------------------------
[0] ACK
[16] ACK
[17] ACK

Which allows scanning for available devices on the bus and analyzing the command set they expose:

$ smbusb_scan -w 0x16
  * snip *
[2f] ACK, Byte writable, Word writable, Block writable 
[30] ACK 
[31] ACK 
[32] ACK 
[33] ACK 
[35] ACK, Byte writable, Word writable 
[37] ACK 
[38] ACK 
[39] ACK, Byte writable, Word writable 
  * snip *

and

$ smbusb_sbsreport
SMBusb Firmware Version: 1.0.0
-------------------------------------------------
Manufacturer Name:          LGC
Device Name:                LNV-42T4911
Device Chemistry:           LION
Serial Number:              41291
Manufacture Date:           2010.01.25

Manufacturer Access:        6001
Remaining Capacity Alarm:   561 mAh(/10mWh)
Remaining Time Alarm:       10 min
Battery Mode:               e000
At Rate:                    0 mAh(/10mWh)
At Rate Time To Full:       65535 min
At Rate Time To Empty:      65535 min
At Rate OK:                 1
Temperature:                23.05 degC
Voltage:                    21 mV (*)
Current:                    0 mA
Average Current:            0 mA
Max Error:                  0 %
Relative State Of Charge    0 %
Absolute State Of Charge    0 %
Remaining Capacity:         0 mAh(/10mWh)
Full Charge Capacity:       5616 mAh(/10mWh)
Run Time To Empty:          65535 min
Average Time To Empty:      65535 min
Average Time To Full:       65535 min
Charging Current:           0 mA
Charging Voltage:           0 mV
Cycle Count:                529
Manufacturer Data:          fffffff7


(*):  This controller is just a bare board with no cells connected.

And also a couple of flashing tools for BQ8030 and R2J240 chips. (Note that there are NO passwords or methods included for entering the Boot ROM on already programmed chips)


Bonus: When you don't need the SMBusb (and you don't feel like learning fx2lib to do something interesting with the dev board) you can still use it as a 16 channel logic analyzer with sigrok. (You may need to compile sigrok from scratch with the VID/PID added in but it'll work)
And chances are you already have an FX2LP device laying around in your drawer in the form of an Altera ByteBlaster clone or one of the many USB logic analyzers that use it. You only need to access the hardware I2C pins for SMBusb to work and an easy point to get at those is the onboard EEPROM which is usually a nice big SO-8.

So where is this project?

Right here: http://www.karosium.com/p/smbusb.html

Sunday, May 1, 2016

Notes on the Gobi2000 in a Thinkpad X100e

The Qualcomm Gobi2000 is an internal Mini PCI Express 3G modem. It was used in a variety of machines including the Lenovo Thinkpad X100e.
It's a bit unintuitive to set up (to say the least) so I decided to write this post. Maybe it'll help someone else out there.

So right off the bat unless you're a Verizon customer do NOT just install the driver from Lenovo's site because it will default to loading Verizon's firmware. See: https://support.lenovo.com/en/en/documents/migr-75433

If you already did you can uninstall and follow the instructions above or you can open the file: C:\ProgramData\QUALCOMM\QDLService2k\Options2k.txt and edit it.

The text file contains the path to 3 binary blobs which get loaded into the card's ram when the driver loads. Apps.mbn, AMMS.mbn and UQCN.mbn 
You just need to change which directory the images are loaded from. \1 is the Verizon firmware. For a full list see the support link above.
Note that to load the generic UMTS firmware you need to load Apps.mbn and AMMS.mbn from \UMTS and UQCN.mbn from \6 because consistency was not the developers' strong suit.

Also unless you're on a CDMA network that doesn't use a SIM card and requires an activation procedure instead do NOT install "Mobile Broadband Activation" either.

DO Install "Hotkey Features Integration" http://support.lenovo.com/en/en/downloads/migr-74261. As far as I could tell it's the only way to enable the WWAN antenna. Fn+F5 brings up the radio power controls. It's also a separate executable so you can create a shortcut for it if you like. The naming is pretty straightforward, look through the installation folder in Program Files.

A note on all the abbreviations you can come across while setting up a 3G connection:
UMTS is the name of the whole 3G network technology, WCDMA is the name of the radio access layer used by UMTS networks. From a setup perspective they're pretty much interchangeable. HSDPA or HSUPA are high speed data services offered on a UMTS network. HSDPA offers faster download and HSUPA offers faster upload speeds. Together they are often referred to as HSPA. HSPA+ and Advanced HSPA+ are further iterations of this technology that offer even faster speeds.
And of course GPRS and EDGE are considerably slower data services running on a 2G GSM network.

Unlike Huawei or ZTE modems the Gobi2000 doesn't have a straightforward way to set which network and data service it should prefer. The automatic selection will be fine in most cases but sometimes you just want to force the setting. Like let's say the 3G network's signal is really weak and flaky but it's still faster than the 2G network that has good signal.
Apparently there is a method with the slow and clumsy Lenovo Access Connections tool but from what I can tell the setting doesn't persist.

I'm fairly certain the Lenovo tool just issues a manual network selection AT command. You can do that yourself (you can even add it as a pre-connection command to the Windows mobile broadband connection. It's there on one of the properties pages)
You can tell the modem to connect to a network by name but it's a better idea to use the network ID because carriers will sometimes name their 2g and 3g networks differently.

AT+COPS=1,2,"#####",0 (where ##### is the network id) will connect to the 2G network of the carrier

AT+COPS=1,2,"#####",2 will connect to the 3G network

To get the network ID you can issue AT+COPS=? to do a network scan (it will take a while) the 5 digit numbers in the list will be the network IDs

You can query the current operating mode with AT$QCSYSMODE?

On the 2G network I got simply GSM as the reply
On the 3G network WCDMA + HSDPA + HSUPA (which Windows listed as HSPA in the network list)
On another 3G network simply WCDMA (which Windows listed as UMTS)

You can also query the current network with AT+COPS?
You'll get something like +COPS: 1,0,"Network-Name",2

Note that the modem will return ERROR for most commands if it doesn't see a SIM card or if the radio is powered off.


UPDATE: It turns out that you have to use the terrible, slow, memory-hog Lenovo Access Connections tool anyway. The built-in Windows connection manager just gives a nondescript connection failed error every time :-(

Monday, April 18, 2016

[Random Teardowns] A couple of laptop batteries (aftermarket vs original)

Device: Original Lenovo 42T5225 (ThinkPad R61) and Chinese clone of 42T4788 (ThinkPad X100e)
Origin: Various
Reason for teardown: Broken
Impressions: I've recently opened up a few laptop batteries and decided to take some pictures. Keep reading if you've ever wondered just how different the Chinese aftermarket batteries are compared to the originals.


The original battery controller PCB is conformally coated while the clone is bare. Fair enough.



The original is based on the Mitsubishi/Renesas M37512 microcontroller (datasheet)


The clone on the other hand is based on the SINO WEALTH SH79F32 (datasheet)

Nothing terrible so far, right?

Well, if you take another look at the image with the original microcontroller there are two things worth mentioning on there. That component with the two long leads to the right of the micro is an externally triggerable fuse. It can blow during an overcurrent condition as any regular fuse OR the micro can trigger it in case of another hazard condition like say the cells overheating. Which brings us to the small flex cable temperature sensor stuck to the cell on the left side

Now let's see what the clone has in terms of overcurrent or overtemp protection.


Oh...

Zero for two.


So the most the micro could do is (attempt) to shut the FETs off in case of an issue. (If it even noticed it before the cells set themselves ablaze.)

Lastly, let's take a look at the cells themselves.

The original has Japanese made Panasonic cells
The clone has Chinese made ChangJiang (CJ) cells
No surprises here. And it should be pointed out that these aren't necessarily bad or more hazardous than any of the high-end cells. If anything they're just unlikely to handle as many cycles before capacity loss renders them unusable. That said, a fuse and temperature monitoring still wouldn't hurt.