Thursday, May 9, 2013

Reverse Engineering GM ECMs

This is the tale of how I wrote a Perl disassembler for the Motorola 6801 CPU to learn how the fuel injection computer that I installed onto my Jeep works.

Almost 200 pages of source code and data create code to run late-80's GM fuel injected engines. DIYers retrieved machine code, disassembled it, and reverse engineered code and hundreds of parameters 15 years ago.

Complete ASDZ Source for 1227747 (pdf)

Inside a 1227747 GM ECM
Fuel injection computers interface with fuel pump relays, fuel injectors, ignition modules, coolant, throttle, oxygen, and pressure sensors. They employ complex algorithms to properly meter fuel under myriad engine and environmental conditions.

Only recently has my 1986 Jeep Grand Wagoneer benefited from this technology when I retrofit an 80's GM 1227747 ECM, throttle body, and all the fixin's. While I use nice GUI software to tune the parameters, I wanted to peer into the code that is executed on a 1MHz Motorola 6800-family CPUs with a paucity of memory.

On a whim I decided to write a disassembler in Perl with all the features that were lacking from other offerings. Plus, writing one is a great way to learn about a CPU.

Source code repository

CPUs in the 6800 family are simple. They have two accumulator registers (A, B) and one index register (X) as well as a stack pointer (SP), and program counter (PC). A status, or condition code register (CCR) stores the status results of computation: carry, zero, overflow, and so on. The MOS 6502 (Apple //e) and related MOS 6510 (Commodore 64) are even simpler, designed as a low cost alternative by some of the same engineers.

MC6800 Assembly

Here's some assembly language from the GM ECM.
LDX #$D4D9 ; O2 SENS VOLTAGE BIAS FOR COLD OP'S TBL
LDAA $00E3 ; START UP COOL
COMA
JSR $FB36 ; 2d LK UP, WITH UPPER LIMIT
STAA $0055 ; SAVE o2 SENSOR VOLTAGE BIAS RESULT

In the example above,
  • The mnemonic LDX loads the operand, the value 0xD4D9, indicated by the # prefix, into the index register X.
  • Then, mnemonic LDAA loads the A accumulator with a value stored in memory location 0x00E3. 
  • COMA performs a binary complement on register A.
  • JSR jumps to subroutine at location 0xFB36 in memory. After it returns, the result returned in the A register is stored to memory location 0x0055.

Mnemonics 

Assembly mnemonics represent simple instructions for the CPU to carry out. Operands like #$D4D9, $00E3, $F36, and $0055 are addresses or values used by the instruction. They specify the addressing mode of the instruction.

Addressing Modes

The 6800 family instructions can have Immediate Mode (value) operands like #$D4D9, Direct Mode operands referring to the first 256 bytes in memory like $0055, Extended Mode operands that can reference any memory such as $FB36, and Index Mode operands which use the X register as a memory pointer, indicated by ",X" in assembly language. There's also relative addressing used for branch/jump operations. The operand specifies a relative address, positive or negative, to jump to. Inherent mode has no operand; the registers involved are implied, such as ABX, add B to X.

Writing a Disassembler

Assembling

When assembly language is assembled into machine code, the mnemonic and addressing mode determine the opcode to use. A given mnemonic has an associated opcode for each addressing mode. For example, ADDA is a mnemonic which can have immediate, direct, index, and extended addressing modes:
  • ADDA #$04 becomes opcode 8B, operand 04
  • ADDA $0004 becomes opcode 9B, operand 04
  • ADDA $04,X becomes opcode AB, operand 04
  • ADDA $4004 becomes opcode BB, operands 40 04

Disassembly 

Disassembly is the process of converting machine code into assembly language. With the 6800 series, any given opcode may have up to 2 additional bytes for operands. A 6801/3 instruction set reference (pdf) provides information on how many bytes per opcode and what addressing mode is involved.

So when disassembling 8B 04, one translates 8B to mnemonic ADDA, then expects a single byte afterwards, which is formatted as #$nn in this case #$04. To simplify this process and keep the code simple, I use a hashtable generated from a configuration file that lists all the opcodes and their mnemonics, number of bytes, and addressing mode. Here's an excerpt:

###################################
# Accumulator and Memory operations
###################################

# Add to A
8B,2,adda,immed
9B,2,adda,direct
AB,2,adda,index
BB,3,adda,extend


Formatting operand output is based entirely on the number of bytes and the addressing mode. For example, 8-bit immediate mode format is #$NN and 16-bit is #$NNNN.

Relative Addressing

A little extra challenge is provided by the relative addressing mode used by jump/branch instructions. The operand is a signed 8-bit integer representing the number of addresses before or after the address of the operand. The variable $a1 is read in as a signed value, then the absolute address is calculated:

my $calcAddr = $addr + 2 + $a1;

Labels

I wanted to automatically generate labels, which are assembly langauge conveniences that identify addresses and can be used by jump and branch instructions making code much easier to read and maintain. Here's an example from rosettacode.org:

outeee   =   $e1d1      ROM: console putchar routine
        .or  $0f00
;-----------------------------------------------------;
main    ldx  #string    Point to the string
        bra  puts         and print it
outs    jsr  outeee     Emit a as ascii
        inx             Advance the string pointer
puts    ldaa ,x         Load a string character
        bne  outs       Print it if non-null
        bra  main       else restart
;=====================================================;
string  .as  "SPAM",#13,#10,#0
        .en

Without labels, bra puts would be replaced by a relative value. Same with bne outs and bra main.  Likewise, jsr outeee would be replaced with jsr $e1d1 which is harder to understand.

To generate labels, I disassembled the code in two passes. The first pass identifies address references. The second pass converts opcodes and operands into assembly, while replacing raw address references ($e1d1) with labels (LE1D1), and prefixing referenced addresses with labels. Like this:

        bmi   _Ld62c
        ldab  D05
        andb  #$80
        aba  
_Ld62c: ldab  D10


As you probably noticed, I also converted direct mode references to labels of the form DNN.

Variables

The first several hundred bytes of GM ECM binary file is filled with parameters. Rather than using generic labels, my disassembler reads a list of addresses and label (variable) names. Here's an excerpt of the config file:

d290,TPS_FiltCoef1,FILT COEF TPS
d291,TPS_FiltCoef2,FILT COEF TPS
d292,DiffTPSforPE,DIFF TPS REQ FOR PWR ENRICH WHILE IN PE

d293,IAC_BPW,usec ADDED TO BPW WHILE IAC IS OPENING 

So, for example, any reference to $0D293 is converted to IAC_BPW, the Idle Air Control Base Pulse Width. I did this for all the addresses referenced in the bin itself. It makes the code more readable.

        ldaa  IAC_BPW ; ADDED TO BPW WHILE IAC IS OPENING
_Ld79f: adda  D52     ; BPW,LSB
        bcs   _Ld7a7  ; BR IF NO OVERFLOW
        adda  D53     ; BPW,MSB
        bcc   _Ld7a9  ; BR IF NO OVERFLOW


There are commented disassembly files out there. Adding those comments (as above) to my disassembly using descriptive variable names should reduce the time it takes me to understand the code.

3 comments:

  1. Nice work! The venerable 6800 and its brethren found their place in automotive applications, where the simple but flexible instruction set allowed it do do almost anything fairly well, as long as it didn't involve super high speeds. I found one the other day in a Buick LeSabre power seat controller, during a water damage diagnosis. I am aware that GM made frequent use of these processors in the 80s and 90s, although the personal computer market leaned much more heavily to the similar 6502, and the fancier Z-80 and 8086.

    I am honored that you chose an excerpt from my rosettacode entry to illustrate your desire to implement label generation in your dis-assembler.

    ReplyDelete
    Replies
    1. Thanks for the kind words and the interesting info. I grew up on a C<64 so have some familiarity with the derivative MOS 6510. I was quite glad to find your excellent rosettacode.org site and had to share it. :)

      Delete
    2. Rosettacode.org is most definitely NOT mine ... I'm just a junior member who likes to tinker around in a few of the easier tasks. There are some heavy hitters over there sharing some very complex and powerful pieces of code ... _way_ out of my league!

      I _am_ proud of one of my ascii art designs. It is also 100% valid c++ code, which can be copied and pasted to http://codepad.org/ to put it through its paces:

      http://rosettacode.org/wiki/99_Bottles_of_Beer#Bottled_Version

      A thoroughly enjoyable waste of about nine hours of my life!

      Cheers!

      Delete

Note: Only a member of this blog may post a comment.