Code Data Logger (Emulation)

Code Data Logger (Emulation)

image alt text

Purpose of the Code Data Logger

The Code/Data Logger makes it much easier to reverse-engineer NES ROMs. The basic idea behind it is that a normal NES disassembler cannot distinguish between code (which is executed) and data (which is read). The Code/Data Logger keeps track of what is executed and what is read while the game is played, and then you can save this information into a .cdl file, which is essentially a mask that tells which bytes in the ROM are code and which are data. The file can be used in conjunction with a suitable disassembler to disassemble only the actual game code, resulting in a much cleaner source code where code and data are properly separated.

Uses for the Code Data Logger

  • Creating a re-assemblable source code file using the cdl file to know exactly what each byte is used for (data or code)

  • Easier to find unused resources in rom file - look for unmapped bytes after completing a whole game (e.g cutting-room-floor)

  • Exodus emulator will properly identify the data as pointers and even form them into a table/array!

  • Identify bytes that are being played as a sound

  • Identify bytes that are being used as tilesets/sprites

  • Know which bytes are unused to look for hidden gems, unused game assets

  • Mark which byte is the first byte of an opcode to make more accurate disassembly listings

  • Share files to get multiple people to help map out a rom

CDL versus Trace Logging

  • CDL is different than simply trace logging, in that you just care about how the rom is mapped out and a disassembly of it. Where as a trace logger is to give you an insight into what the registers and such, contain at the time of logging the disassembly. [1]

How Code/Data Logger works

  • The CDL is mapped in real time, so you need to play the rom in an emulator from start to finish, to map it out. You need to do as much as possible: dieing, alternate paths, 2 players, secret areas etc

CDL Tools

Emulators that support CDL creation

Disassemblers that support CDL input

  • IDA pro has a script which imports the cdl file and uses it to mark code and data bytes.

Some complete cdl files

Format of the Code data Logger

CDL files are just a mask of the ROM; that is, they are of the same size as the ROM, and each byte represents the corresponding byte of the ROM. The format of each byte is like so (in binary):

  • The CDL format needs to be specific for the target system.

    • Things like how the data was accessed in length, would be essential for Genesis (so instead of just 1 bit for data, use 2 bits - so you can know if the data was accessed as a byte, word, or long).

FCEUX (NES) Format

For PRG ROM (8 bits in a byte):

Unused Bit Unused bit
Audio Bit Used as Audio data
Indirect Data Bit (e.g. as the destination of a JMP ($nnnn) instruction)
Indirect Code Bit Whether indirectly accessed as data. (e.g. as the destination of an LDA ($nn),Y instruction)
Rom Bank Bit1 Into which ROM bank it was mapped when last accessed: 00 = $8000-$9FFF 01 = $A000-$BFFF 10 = $C000-$DFFF 11 = $E000-$FFFF
RomBank Bit2
Data Bit The byte was executed as Data
Code Bit The byte was executed as Code

For CHR ROM:

Unused Bit Unused Bit
Unused Bit Unused Bit
Unused Bit Unused Bit
Unused Bit Unused Bit
Unused Bit Unused Bit
Unused Bit Unused Bit
Read Bit Whether it was read programmatically using port $2007 (e.g. Argus_(J).nes checks if the bankswitching works by reading the same byte of CHR data before and after switching)
Drawn Bit Whether it was drawn on screen (rendered by PPU at runtime)

Bizhawk Format (Multi-system)

The Bizhawk CodeDataLogger supports multiple emulation cores but always follows a fairly common structure:

image alt text

Number in Screenshot Name of Section Example Value
1 File identifier "BIZHAWK-CDL-2"
2 Platform name "Gen", “GB”
3 Number of blocks (e.g ROM, WRAM..) 3
4 Name of Block MD Cart (Megadrive cartridge)
5 Number of bytes in block 4MB
6 Block data 00, 01, 04, 40

Implementation of a Code Data Logger

  • The emulator creates a large Byte array, that’s equal to the size of the rom.

    • Each byte in the array corresponds to a byte in the rom address. So it’s a map of the rom.

      • Each bit in the byte represents how that data was accessed. For instance, Bit0= code, Bit1=data, Bit8= first byte of opcode.

      • Other bits can represent how the data was accessed; directly, indirectly, indirectly for a jump table, etc. [1]

    • So after loading the rom, allocate a byte array the same size as the rom loaded and set all bytes to 0

      • Every time a opcode is executed flip the bit

      • Every time a memory address is executed flip the bit

  • Difficulties implementing CDL are:

    • Many games using different access to same data. For example read long to read two words (for example coordinates x, y). [2]

Bizhawk CDL Tool

Since Bizhawk is fully open source we can inspect how the CDL was implemented for this multi-emulator system.

CDL.cs (Showing cdl statistics window)

CDL.cs contains the functionality for interacting with the CodeDataLogger window seen in the screenshot below:

image alt text

The list is made up of a number of columns showing the statistics collected in the CDL:

Name Displayed Information
CDL File @ Address of the block in the cdl file
Domain Name of the block
% Percentage of the block that has been mapped (accessed in some way either as code or data)
Mapped Number of mapped bytes
Size Total number of bytes in the block
0x01.. 0x80 Percentage or Number of bytes accessed as a certain type depending on bit (see format table below)

Format of byte access values

Console 0x01 0x02 0x04 0x08 0x10 0x20 0x40 0x80
GameBoy ExecFirst ExecOperand Data
Megadrive Exec68k Data68k ExecZ80First ExecZ80Operand DataZ80 DMASource
SNES ExecFirst ExecOperand CPUData DMAData BRR
SMS/GG ExecFirst ExecOperand Data

ExecFirst: The opcode (first byte of an instruction to be executed)

ExecOperand: The operand (2nd byte of the instruction)

Calculating Statistics

In order to calculate the statistics shown in the list bizhawk has the following logic:

  • First loop over every byte in the block and for each byte:

    • Count the number of times this byte value exists

    • Since each byte can only have a value between 0 -> 255, you just need a map with size 256 that contains the count for each possible byte value.

  • Next, now that we have the count of each byte value we need to find out the count for each bit being set to 1

    • Create a new array with length 8 of the totals for each bit being set to 1, where each element is the count of that bit being set.

    • So for each possible byte value (0 -> 255) use bitwise arithmetic to find out if the first bit is set

    • If the first bit is set then add the number of times this byte appeared to the totals array for this bit

      • To check if the first bit is set: (byteValue & 0x01) != 0

      • To check if the last bit is set: (byteValue & 0x80) != 0

CodeDataLog.cs (.cdl file read/write)

The CodeDataLog class contains logic for saving and importing CDL files.

  • The SaveInternal method on this class is what actually writes the cdl data to a file on disk

    • image alt text

    • It uses the class as a Key Value Pair (kvp) where the key is the block name (e.g Cartridge ROM, WRAM etc)

    • The value is the byte array of this block and before its written to the file the length of the block is written so it can be easily parsed when read.

  • The Load method on this class is what actually reads in the cdl data from a cdl file.

This includes opening and saving a cdl file and displaying the statistics in the list view.

View Source for CDL.cs

Gambatte.ICodeDataLog.cs (Gameboy Gambatte in Bizhawk)

  • View Gambatte.ICodeDataLog.cs Source Code (public partial class Gameboy : ICodeDataLogger) Only useful as an entry point

  • NewCDL - creates the memory blocks for this CDL (for gameboy it creates ROM, WRAM, CartRAM, HRAM)

  • CDCallbackProc - this is called everytime an instruction accesses memory

ICodeDataLog (Key value pair of block name to block array)

LibGambatte.cs

  • CDLog_Flags - the different types of byte access as enum (ExecFirst, ExecOperand, Data)

  • Gambatte_setcdcallback (calls into the native code of gambatte)

Gambatte.cpp (Entry point in the native dll)

The entry point sets the CodeData Logger callback on the cpu object.

void GB::setCDCallback(CDCallback cdc) { p_->cpu.setCDCallback(cdc); }

Cpu.h (Z80 CPU implementation)

Just seems to set the CodeDataLogger call back on the memory object

void setCDCallback(CDCallback cdc) { memory.setCDCallback(cdc); }

Memory.h (GB Memory object)

void setCDCallback(CDCallback cdc) { this->cdCallback = cdc; }

In memory read and write function it calls the callback to the c# code to log it as memory data:

if(cdCallback) { CDMapResult map = CDMap(P); if(map.type != eCDLog_AddrType_None) cdCallback(map.addr,map.type,eCDLog_Flags_Data); }

References

  1. **Tomaitheous - details about format of CD**L and implementation details

  2. **r57shel**l - difficulties of implementing CDL

  3. http://www.fceux.com/web/help/fceux.html?CodeDataLogger.html

Table of Contents