Almost 200 pages of source code and data create code to run late-80's GM fuel injected engines. DIYers retrieved machine code, disassembled it, and reverse engineered code and hundreds of parameters 15 years ago.
Complete ASDZ Source for 1227747 (pdf)
|Inside a 1227747 GM ECM|
Only recently has my 1986 Jeep Grand Wagoneer benefited from this technology when I retrofit an 80's GM 1227747 ECM, throttle body, and all the fixin's. While I use nice GUI software to tune the parameters, I wanted to peer into the code that is executed on a 1MHz Motorola 6800-family CPUs with a paucity of memory.
On a whim I decided to write a disassembler in Perl with all the features that were lacking from other offerings. Plus, writing one is a great way to learn about a CPU.
Source code repository
CPUs in the 6800 family are simple. They have two accumulator registers (A, B) and one index register (X) as well as a stack pointer (SP), and program counter (PC). A status, or condition code register (CCR) stores the status results of computation: carry, zero, overflow, and so on. The MOS 6502 (Apple //e) and related MOS 6510 (Commodore 64) are even simpler, designed as a low cost alternative by some of the same engineers.
MC6800 AssemblyHere's some assembly language from the GM ECM.
LDX #$D4D9 ; O2 SENS VOLTAGE BIAS FOR COLD OP'S TBL LDAA $00E3 ; START UP COOL COMA JSR $FB36 ; 2d LK UP, WITH UPPER LIMIT STAA $0055 ; SAVE o2 SENSOR VOLTAGE BIAS RESULT
In the example above,
- The mnemonic LDX loads the operand, the value 0xD4D9, indicated by the # prefix, into the index register X.
- Then, mnemonic LDAA loads the A accumulator with a value stored in memory location 0x00E3.
- COMA performs a binary complement on register A.
- JSR jumps to subroutine at location 0xFB36 in memory. After it returns, the result returned in the A register is stored to memory location 0x0055.
MnemonicsAssembly mnemonics represent simple instructions for the CPU to carry out. Operands like #$D4D9, $00E3, $F36, and $0055 are addresses or values used by the instruction. They specify the addressing mode of the instruction.
Addressing ModesThe 6800 family instructions can have Immediate Mode (value) operands like #$D4D9, Direct Mode operands referring to the first 256 bytes in memory like $0055, Extended Mode operands that can reference any memory such as $FB36, and Index Mode operands which use the X register as a memory pointer, indicated by ",X" in assembly language. There's also relative addressing used for branch/jump operations. The operand specifies a relative address, positive or negative, to jump to. Inherent mode has no operand; the registers involved are implied, such as ABX, add B to X.
Writing a Disassembler
AssemblingWhen assembly language is assembled into machine code, the mnemonic and addressing mode determine the opcode to use. A given mnemonic has an associated opcode for each addressing mode. For example, ADDA is a mnemonic which can have immediate, direct, index, and extended addressing modes:
- ADDA #$04 becomes opcode 8B, operand 04
- ADDA $0004 becomes opcode 9B, operand 04
- ADDA $04,X becomes opcode AB, operand 04
- ADDA $4004 becomes opcode BB, operands 40 04
DisassemblyDisassembly is the process of converting machine code into assembly language. With the 6800 series, any given opcode may have up to 2 additional bytes for operands. A 6801/3 instruction set reference (pdf) provides information on how many bytes per opcode and what addressing mode is involved.
So when disassembling 8B 04, one translates 8B to mnemonic ADDA, then expects a single byte afterwards, which is formatted as #$nn in this case #$04. To simplify this process and keep the code simple, I use a hashtable generated from a configuration file that lists all the opcodes and their mnemonics, number of bytes, and addressing mode. Here's an excerpt:
# Accumulator and Memory operations
# Add to A
Formatting operand output is based entirely on the number of bytes and the addressing mode. For example, 8-bit immediate mode format is #$NN and 16-bit is #$NNNN.
Relative AddressingA little extra challenge is provided by the relative addressing mode used by jump/branch instructions. The operand is a signed 8-bit integer representing the number of addresses before or after the address of the operand. The variable $a1 is read in as a signed value, then the absolute address is calculated:
my $calcAddr = $addr + 2 + $a1;
LabelsI wanted to automatically generate labels, which are assembly langauge conveniences that identify addresses and can be used by jump and branch instructions making code much easier to read and maintain. Here's an example from rosettacode.org:
outeee = $e1d1 ROM: console putchar routine .or $0f00 ;-----------------------------------------------------; main ldx #string Point to the string bra puts and print it outs jsr outeee Emit a as ascii inx Advance the string pointer puts ldaa ,x Load a string character bne outs Print it if non-null bra main else restart ;=====================================================; string .as "SPAM",#13,#10,#0 .en
Without labels, bra puts would be replaced by a relative value. Same with bne outs and bra main. Likewise, jsr outeee would be replaced with jsr $e1d1 which is harder to understand.
To generate labels, I disassembled the code in two passes. The first pass identifies address references. The second pass converts opcodes and operands into assembly, while replacing raw address references ($e1d1) with labels (LE1D1), and prefixing referenced addresses with labels. Like this:
_Ld62c: ldab D10
As you probably noticed, I also converted direct mode references to labels of the form DNN.
VariablesThe first several hundred bytes of GM ECM binary file is filled with parameters. Rather than using generic labels, my disassembler reads a list of addresses and label (variable) names. Here's an excerpt of the config file:
d290,TPS_FiltCoef1,FILT COEF TPS
d291,TPS_FiltCoef2,FILT COEF TPS
d292,DiffTPSforPE,DIFF TPS REQ FOR PWR ENRICH WHILE IN PE
d293,IAC_BPW,usec ADDED TO BPW WHILE IAC IS OPENING
So, for example, any reference to $0D293 is converted to IAC_BPW, the Idle Air Control Base Pulse Width. I did this for all the addresses referenced in the bin itself. It makes the code more readable.
ldaa IAC_BPW ; ADDED TO BPW WHILE IAC IS OPENING
_Ld79f: adda D52 ; BPW,LSB
bcs _Ld7a7 ; BR IF NO OVERFLOW
adda D53 ; BPW,MSB
bcc _Ld7a9 ; BR IF NO OVERFLOW
There are commented disassembly files out there. Adding those comments (as above) to my disassembly using descriptive variable names should reduce the time it takes me to understand the code.
Nice work! The venerable 6800 and its brethren found their place in automotive applications, where the simple but flexible instruction set allowed it do do almost anything fairly well, as long as it didn't involve super high speeds. I found one the other day in a Buick LeSabre power seat controller, during a water damage diagnosis. I am aware that GM made frequent use of these processors in the 80s and 90s, although the personal computer market leaned much more heavily to the similar 6502, and the fancier Z-80 and 8086.ReplyDelete
I am honored that you chose an excerpt from my rosettacode entry to illustrate your desire to implement label generation in your dis-assembler.
Thanks for the kind words and the interesting info. I grew up on a C<64 so have some familiarity with the derivative MOS 6510. I was quite glad to find your excellent rosettacode.org site and had to share it. :)Delete
Rosettacode.org is most definitely NOT mine ... I'm just a junior member who likes to tinker around in a few of the easier tasks. There are some heavy hitters over there sharing some very complex and powerful pieces of code ... _way_ out of my league!Delete
I _am_ proud of one of my ascii art designs. It is also 100% valid c++ code, which can be copied and pasted to http://codepad.org/ to put it through its paces:
A thoroughly enjoyable waste of about nine hours of my life!