Figure 1: Crash while accessing an address pointed by register ESI
Back tracing the ESI
register value to 0xfb3e took stepping back hundreds of instructions and ended up in the following sequence of instructions, as shown in Figure 2.
Figure 2: Register ESI getting populated by pop si and xchg si,bp
There are two instructions populating the ESI
register, both working with the 16-bit sub register of SI
while completely ignoring the other 16-bit part of the ESI
register. If we look closely at the results after pop si
instruction in Figure 2, the upper 16-bit of the ESI
register seems to be nulled out. This looked like a bug in emulating pop r16
instructions, and we quickly wrote a proof-of-concept code for verification (Figure 3).
Figure 3: Proof-of-concept for pop r16
Running the resulting binary natively and with TTD instrumentation as shown in Figure 4 confirmed our suspicion that the pop r16
instructions are emulated differently in TTD than on a real CPU.
Figure 4: Running the code natively and with TTD instrumentation
We reported this issue and the fuzzing results to the TTD team at Microsoft.
Given there is one instruction emulation bug (instruction sequence that produces different results in real vs TTD execution), we decided to fuzz TTD to find similar bugs. A rudimentary harness was created to execute a random sequence of instructions and record the resulting values. This harness was executed on a real CPU and under TTD instrumentation, providing us with two sets of results. Any changes in results or partial lack of results points us to a likely instruction emulation bug.
PUSH segment
Instruction Emulation DiscrepancyFigure 5: Proof-of-concept for push segment
This new bug was fairly similar to the original pop r16
bug, but with a push segment
instruction. This bug also comes with a little bit of twist. While our fuzzer was running on an Intel CPU-based machine and one of us verified the bug locally, the other person was not able to verify the bug. Interestingly, the failure happened on an AMD-based CPU, tipping us to the possibility that the push segment
instruction implementation varies between INTEL and AMD CPUs.
Looking at both INTEL and AMD CPU specifications, INTEL specification goes into details about how recent processors implement push segment register
instruction:
If the source operand is a segment register (16 bits) and the operand size is 64-bits, a zero-extended value is pushed on the stack; if the operand size is 32-bits, either a zero-extended value is pushed on the stack or the segment selector is written on the stack using a 16-bit move. For the last case, all recent Intel Core and Intel Atom processors perform a 16-bit move, leaving the upper portion of the stack location unmodified
. (INTEL spec Vol.2B 4-517)
We reported the discrepancy to AMD PSIRT, who concluded that this is not a security vulnerability. It seems sometime circa 2007 INTEL and AMD CPU started implementing the push segment instruction differently, and TTD emulation followed the old way.
lodsb/lodsw
Instruction Emulation DiscrepancyThe lodsb
and lodsw
are not correctly implemented for both 32-bit and 64-bit instructions. Both clear the upper bits of the register (rax
/eax
) whereas the original instructions only modify their respective granularities (i.e., lodsb
will only overwrite 1-byte, lodsw
only 2-bytes).
Figure 6: Proof-of-concept for lodsb/lodsw
There are additional instruction emulation bugs pending fixes from Microsoft.
As we were pursuing our efforts in the CPU emulator, we accidentally stumbled on another bug, this time not in the emulator but inside the Windbg extension exposed by TTD: TTDAnalyze.dll
.
This extension leverages the debugger's data model to allow a user to interact with the trace file in an interactive manner. This is done via exposing a TTD data model namespace under certain parts of the data model, such as the current process (
@$curproces
), the current thread (@$curthread
), and current debugging session (
@$cursession
).
Figure 7: TTD query types
As an example, the @$cursession.TTD.Calls
method allows a user to query all call locations captured within the trace. It takes as input either an address or case-insensitive symbol name with support for regex. The symbol name can either be in the format of a string (with quotes) or parsed symbol name (without quotes). The former is only applicable when the symbols are resolved fully (e.g., private symbols), as the data model has support for converting private symbols into an ObjectTargetObject
object thus making it consumable to the dx
evaluation expression parser.
The bug in question directly affects the exposed Calls
method under @$cursession.TTD.Calls
because it uses a fixed, static buffer to capture the results of the symbol query. In Figure 8 we illustrate that by passing in two similar regex strings that produce inconsistent results.
Figure 8: TTD Calls query
When we query C*
and Create*,
the C*
query results do not return the other Create
APIs that were clearly captured in the trace. Under the hood, TTDAnalyze
executes the examine debugger command "x KERNELBASE!C*
" with a custom output capture to process the results. This output capture truncates any captured data if it is greater than 64 KB in size.
If we take the disassembly of the global buffer and output capture routine in TTDAnalyze
(SHA256 CC5655E29AFA87598E0733A1A65D1318C4D7D87C94B7EBDE89A372779FF60BAD) prior to the fix, we can see the following (Figure 9 and Figure 10):
Figure 9: TTD implementation disassembly
Figure 10: TTD implementation disassembly
The capture for the examine command is capped at 64 KB. When the returned data exceeds this limit, truncation is performed at address 0x180029960
. Naturally querying symbols starting with C*
typically yields a large volume of results, not just those beginning with Create*
, leading to the observed truncation of the data.
The analysis presented in this blog post highlights the critical nature of accuracy in instruction emulation—not just for debugging purposes, but also for ensuring robust security analysis. The observed discrepancies, while subtle, underscore a broader security concern: even minor deviations in emulation behavior can misrepresent the true execution of code, potentially masking vulnerabilities or misleading forensic investigations.
From a security perspective, the work emphasizes several key takeaways:
Reliability of Debugging Tools: TTD and similar frameworks are invaluable for reverse engineering and incident response. However, any inaccuracies in emulation, such as those revealed by the misinterpretation of pop r16
, push
segment, or lods*
instructions, can compromise the fidelity of the analysis. This raises important questions about trust in our debugging tools when they are used to analyze potentially malicious or critical code.
Impact on Threat Analysis: The ability to replay a process's execution with high fidelity is crucial for uncovering hidden behaviors in malware or understanding complex exploits. Instruction emulation bugs may inadvertently alter the execution path or state, leading to incomplete or skewed insights that could affect the outcome of a security investigation.
Collaboration and Continuous Improvement: The discovery of these bugs, followed by their detailed documentation and reporting to the relevant teams at Microsoft and AMD, highlights the importance of a collaborative approach to security research. Continuous testing, fuzzing, and cross-platform comparisons are essential in maintaining the integrity and security of our analysis tools.
In conclusion, this exploration not only sheds light on the nuanced challenges of CPU emulation within TTD, but also serves as a call to action for enhanced scrutiny and rigorous validation of debugging frameworks. By ensuring that these tools accurately mirror native execution, we bolster our security posture and improve our capacity to detect, analyze, and respond to sophisticated threats in an ever-evolving digital landscape.
We extend our gratitude to the Microsoft Time Travel Debugging team for their readiness and support in addressing the issues we reported. Their prompt and clear communication not only resolved the bugs but also underscored their commitment to keeping TTD robust and reliable. We further appreciate that they have made TTD publicly available—a resource invaluable for both troubleshooting and advancing Windows security research.