Sunday, June 7, 2026

Reading the Wire — Protobuf Without a Map

Moin! ๐Ÿ‘‹

Protobuf turns up constantly in DFIR work. Android apps, iOS apps, Chrome internals, sync engines, health databases — wherever Google's tooling reaches, Protobuf follows. And yet it is one of those formats where a lot of examiners open the hex, see a wall of binary, and move on. That is understandable. Without the schema, Protobuf does not announce what it is holding.

This post is a deep dive into what Protobuf actually is at the wire level — byte by byte — and what you can and cannot recover from it without access to the original .proto definition. Where tool output is shown, it comes from crush — an open-source DFIR workbench I develop in my personal time. The forensic concepts apply regardless of which tool you use.

What Is Protobuf?

Protocol Buffers (Protobuf) is a binary serialisation format developed by Google. The design goal is compact, fast, schema-driven serialisation — the opposite of a human-readable format like JSON or XML. A .proto file defines the message structure: field names, types, and field numbers. The compiled schema is used by both the writer (to encode) and the reader (to decode). Without the schema, you get the wire format — and the wire format is deliberately sparse.

That sparseness is the core forensic problem. The wire format encodes field numbers and wire types, but not field names, not semantic types beyond a handful of primitives, and not the structure of nested messages. A value of 1 in a varint field could be a boolean true, an enum value, an integer count, or a Unix timestamp in seconds — the wire format cannot tell you which.

What makes Protobuf interesting for forensics is that it is self-delimiting and robust. A decoder that does not know the schema can still walk the byte stream, identify field boundaries, and extract raw values. That is exactly what schema-less decode tools — including protoc --decode_raw and the crush BLOB Inspector — do.

The Wire Format — From the Ground Up

A serialised Protobuf message is a sequence of fields. There is no message header, no length prefix for the overall message, no magic bytes. The stream starts immediately with the first field. Each field has two components: a tag and a payload.

The Tag Byte

The tag encodes two things in a single varint: the field number and the wire type. It is constructed as:

tag = (field_number << 3) | wire_type

The three low-order bits carry the wire type (values 0–5). The remaining bits carry the field number. There are six wire types in use:

Wire Type Value Used For
Varint 0 int32, int64, uint32, uint64, sint32, sint64, bool, enum
64-bit 1 fixed64, sfixed64, double
Length-delimited 2 string, bytes, embedded messages, packed repeated fields
Start group 3 proto2 only — deprecated; group and contents silently skipped by crush
End group 4 proto2 only — deprecated; consumed as part of group skip
32-bit 5 fixed32, sfixed32, float

So a tag byte of 0x08 decodes as: 0x08 = 0b00001000 → low 3 bits = 000 = wire type 0 (Varint), remaining bits = 00001 = field number 1. A tag of 0x12 = 0b00010010 → wire type 2 (Length-delimited), field number 2.

Tags themselves are encoded as varints, which matters when field numbers exceed 15 (the tag no longer fits in a single byte).

Varints — The Core Encoding

Varint is the most important encoding to understand, because it is used for both tags and for all integer-typed field values. The encoding is variable-length little-endian with a continuation bit:

  • Each byte contributes 7 bits of value data.
  • The most significant bit (MSB) is the continuation flag: 1 means another byte follows; 0 means this is the last byte.
  • Bytes are in little-endian order — the first byte holds the least significant 7 bits.

Let us decode 0x96 0x01 step by step:

Byte 1: 0x96 = 1001 0110
        MSB = 1 → continuation, more bytes follow
        Value bits: 001 0110 = 0x16 = 22 (least significant 7 bits)

Byte 2: 0x01 = 0000 0001
        MSB = 0 → final byte
        Value bits: 000 0001 = 0x01 (next 7 bits)

Assembled (little-endian 7-bit groups):
  [ 000 0001 ] [ 001 0110 ]
   bits 13–7     bits 6–0

Result: 0b 0000001 0010110 = 0x96 & 0x7F | (0x01 << 7)
      = 22 | 128 = 150

So 0x96 0x01 is the varint encoding of 150. A single-byte varint (MSB = 0) encodes values 0–127 directly. Values from 128–16383 require two bytes. The practical maximum for a 64-bit varint is 10 bytes.

A Complete Field Walkthrough

Take this 7-byte sequence:

08 96 01 12 03 74 65 73

Breaking it down field by field:

Field 1:
  Tag:   0x08 = wire type 0 (Varint), field number 1
  Value: 0x96 0x01 → varint → 150

Field 2:
  Tag:   0x12 = wire type 2 (Length-delimited), field number 2
  Length: 0x03 → 3 bytes follow
  Data:  0x74 0x65 0x73 → UTF-8 → "tes"

Without a schema, we know: field 1 holds the integer 150, field 2 holds 3 bytes that happen to be valid UTF-8. We do not know if field 1 is a count, a status code, an enum, or a timestamp. We do not know if field 2 is a string, a serialised sub-message, or arbitrary bytes. That ambiguity is inherent — and unavoidable.

ZigZag Encoding — Signed Integers

Varint is efficient for small positive integers. Negative integers are a problem: in two's complement, -1 is 0xFFFFFFFF for int32, which as a varint requires 10 bytes. Protobuf solves this with two approaches:

int32 / int64: negative values are sign-extended to 64 bits and then varint-encoded. -1 encodes as 10 bytes. Inefficient, but rarely used for fields that are expected to hold negative values.

sint32 / sint64: ZigZag encoding maps signed integers to unsigned integers such that small-magnitude values — both positive and negative — produce small varints. The mapping is:

ZigZag(n) = (n << 1) ^ (n >> 31)   // sint32
ZigZag(n) = (n << 1) ^ (n >> 63)   // sint64

 0 → 0
-1 → 1
 1 → 2
-2 → 3
 2 → 4
-3 → 5

To decode: if the raw varint value is n, the ZigZag-decoded signed value is (n >> 1) ^ -(n & 1).

The forensic implication: if a schema-less decoder shows a varint value that seems implausibly large (e.g., a field holding 4294967295 where you expected something small), the field may be a sint32 that holds -1. Without the schema, you cannot tell — you can only note the raw value and be aware of the possibility.

64-bit and 32-bit Fixed-Width Fields

Wire type 1 (64-bit) and wire type 5 (32-bit) are fixed-width little-endian. They are used for double, float, fixed64, sfixed64, fixed32, and sfixed32. Because the width is fixed, no varint encoding is involved — the bytes are read directly.

A forensically common case: Unix timestamps stored as double (wire type 1, 8 bytes, IEEE 754 double precision) or as fixed64 (wire type 1, 8 bytes, unsigned 64-bit integer). Both look identical at the wire level. A schema-less decoder will show you the raw 8 bytes and typically interpret them as a 64-bit integer — it cannot know whether you should read it as a double instead.

Length-Delimited Fields

Wire type 2 covers strings, byte arrays, embedded sub-messages, and packed repeated fields. The format is:

[tag: varint] [length: varint] [payload: length bytes]

A packed repeated field is a wire type 2 field where the payload itself is a concatenated sequence of varints or fixed-width values — no tags between them. This is how arrays of integers are efficiently encoded. In a schema-less decoder, a packed repeated field is indistinguishable from a byte string or an embedded message without attempting to parse the payload.

An embedded message is also wire type 2. Its payload is itself a valid Protobuf message. A schema-less decoder can recurse into it and decode it as a nested message — this is what protoc --decode_raw does, and what crush does in its Protobuf decode mode. However, whether a length-delimited payload actually is a valid sub-message, or just happens to parse as one, cannot be determined from the wire format alone.

What You Get Without a Schema

With a solid understanding of the wire format, the limits of schema-less decoding become precise. The table below reflects what is recoverable from the wire format alone — and what remains genuinely ambiguous even after decoding:

What you can recover What remains ambiguous
Field numbers Field names
Wire types (varint / 64-bit / length-delimited / 32-bit) Enum value names and meaning
All numeric interpretations of a varint simultaneously — uint64, int64, sint64 (ZigZag), bool (if 0/1), Unix timestamp (if in range), Chrome/WebKit ยตs timestamp (if in range) Which of those interpretations the application actually wrote
All numeric interpretations of fixed-width fields — uint64/int64/double/timestamps for wire type 1; uint32/int32/float/timestamp for wire type 5 Which numeric type was intended
Nested message structure (recursive decode) Whether a length-delimited field is a string, bytes, sub-message, or packed repeated — crush applies a nested-first heuristic to choose the most plausible interpretation (described below)
Message field boundaries Whether optional fields are absent vs. set to their default value

The practical upshot: schema-less decoding gives you structure and all plausible numeric interpretations. What it cannot give you is the correct one. Context — from the application, the schema, or known app behaviour — is always the final arbiter.

Finding the Schema

Before treating a Protobuf blob as permanently opaque, it is worth checking whether the schema is publicly available — but realistically, for most apps encountered in casework, it is not. Protobuf schemas are internal implementation details. Unless the application is open source, or the developer has explicitly published the .proto files, you are working without one.

The exceptions worth checking:

  • Chrome / Chromium: Chrome sync, history, and IndexedDB schemas are in the open-source Chromium repository. The sync protocol definitions are under components/sync/protocol/ — this is one of the most complete publicly available Protobuf schema sets relevant to forensics.
  • Android (AOSP): system-level Android components that are part of AOSP sometimes have schemas in the source. This does not extend to Google apps distributed separately through the Play Store — including Device Health Services, which is a Google app, not an AOSP component.
  • Open-source apps: if the app is open source, the .proto files are usually in the repository. Signal Desktop is a notable example.
  • Embedded descriptors in APKs: some apps ship compiled Protobuf descriptors (.pb files) inside the APK or app bundle. Extracting and decompiling the APK and searching for descriptor.pb or .proto.bin files is worth attempting — but success depends entirely on whether the developer included them, and many do not.

For everything else — which is most things — schema-less decode is what you have. The goal then shifts: extract as much structure as possible from the wire format, surface all plausible interpretations, and document the basis for any interpretation that ends up in a report.

Protobuf in crush

Unlike SQLite or binary plist, Protobuf cannot be detected automatically from the data itself — there are no magic bytes, no file extension convention, no self-describing header. Whether a blob is Protobuf or not is something only the examiner can know, based on context: the app, the table, the column name, prior research. crush therefore does not attempt to auto-classify Protobuf in the file panel or database viewer. Instead, it provides two explicit entry points.

The first is the BLOB Inspector, accessible by right-clicking any binary value in the SQLite viewer or LevelDB viewer and selecting Inspect Value… (or Inspect Key…). In Auto mode the inspector attempts several format detections via magic bytes — binary plist, PNG, JPEG, and others. If none match, it attempts a Protobuf heuristic decode. You can also select Protobuf mode manually if you already know what you are looking at. Importantly, the BLOB Inspector goes beyond what protoc --decode_raw produces: for every numeric field it surfaces all plausible interpretations inline — ZigZag-decoded values, Cocoa and Unix timestamps, bool — directly in the output. No separate decode pass required.

The second is the dedicated Protobuf Viewer, invoked explicitly from the context menu on any file or blob. This opens the full Tree Viewer interface — an expandable tree structure designed for working through larger or more complex messages field by field. The Tree Viewer shows all interpretations for every field including uint64 and uint32; the BLOB Inspector omits these from the inline annotations since they already appear as the primary field value, keeping the output less cluttered for quick triage.

The BLOB Inspector

Protobuf blobs surface in crush anywhere binary data appears: SQLite BLOB columns, LevelDB values, or any file opened in the hex viewer. The BLOB Inspector handles all of these through a consistent workflow.

To invoke it: right-click any cell or record that contains binary data and select Inspect Value… (or Inspect Key… in the LevelDB viewer). The inspector opens as a dialog with a mode selector at the top.

In Auto mode, the inspector checks for magic bytes first — binary plist (62706C69 73743030), PNG (89504E47), JPEG (FFD8FF), and others. If none match, the blob is displayed in hex. Protobuf mode must be selected manually — there are no magic bytes to detect it on.

The schema-less output surfaces field numbers, wire types, nested message structure, and — beyond what protoc --decode_raw produces — all plausible numeric interpretations inline. Here is a real example: a Biome SEGB entry payload from an iOS acquisition, opened in the BLOB Inspector with Protobuf mode selected manually:

1 {
  1: "/app/inFocus"
  2 {
    1 [varint]: 2
    # sint64 (zigzag): 1
    2 [varint]: 6584185901589580638
    # sint64 (zigzag): 3292092950794790319
  }
}
2 [fixed64]: 4739523391776620544
# double: 743887831.0
# Cocoa timestamp: 2024-07-28 19:30:31 UTC
3 [fixed64]: 4739523391826952192
# double: 743887837.0
# Cocoa timestamp: 2024-07-28 19:30:37 UTC
4 {
  1 {
    1 [varint]: 2
    # sint64 (zigzag): 1
    2 [varint]: 6584185901589580638
    # sint64 (zigzag): 3292092950794790319
  }
  3: "com.apple.MobileSMS"
}
5: "2E565DD6-9B69-448D-B374-AAB614442F03"
7 {
  1: <>
  2 {
    1 {
      1 [varint]: 2
      # sint64 (zigzag): 1
      2 [varint]: 0
      # sint64 (zigzag): 0
      # bool: false
    }
    3: "com.apple.SpringBoard.transitionReason.homescreen"
  }
  3 [varint]: 6002
  # sint64 (zigzag): 3001
}
7 {
  1: <>
  2 {
    1 {
      1 [varint]: 2
      # sint64 (zigzag): 1
      2 [varint]: 0
      # sint64 (zigzag): 0
      # bool: false
    }
    3: "14.0"
  }
  3 [varint]: 6007
  # sint64 (zigzag): -3004
}
7 {
  1: <>
  2 {
    1 {
      1 [varint]: 2
      # sint64 (zigzag): 1
      2 [varint]: 0
      # sint64 (zigzag): 0
      # bool: false
    }
    3: "1262.400.41.2.3"
  }
  3 [varint]: 6008
  # sint64 (zigzag): 3004
}
8 [fixed64]: 4739523391831073926
# double: 743887837.491349
# Cocoa timestamp: 2024-07-28 19:30:37 UTC
10 [varint]: 18446744073709537216
# int64: -14400
# sint64 (zigzag): 9223372036854768608

Several things are immediately readable without a schema. Fields 2, 3, and 8 are fixed64 — the BLOB Inspector detects the double value falls in the Cocoa epoch range and surfaces the decoded timestamp alongside the raw bytes: 2024-07-28 19:30:31 through 19:30:37 UTC. Field 4.3 contains "com.apple.MobileSMS" — the app in focus. Field 5 is a UUID identifying this entry. Field 7 is a repeated embedded message — it appears three times, each carrying a string value: the homescreen transition reason, an iOS version ("14.0"), and an app build version ("1262.400.41.2.3"). All three surface as plain strings without any schema knowledge.

What remains uncertain: the large varint in field 1.2.2 (6584185901589580638) does not resolve to anything meaningful — likely an internal identifier or hash. Field 10 decodes as int64: -14400; appearing consistently across entries, this looks like a UTC offset in seconds (−4 hours / EDT), but that requires contextual confirmation. The small varints typed as sint64 (zigzag): 1 throughout are plausibly type or state flags — the schema would be needed to confirm. Field 7.1 is empty bytes in all three repeated blocks.

This is a realistic picture of what schema-less decode gives you: timestamps, app identifiers, version strings, and transition reasons surface immediately; internal identifiers and small integers require inference or schema cross-reference.

The Protobuf Viewer — Navigating the Tree

The Protobuf Viewer is invoked via the context menu on any file or blob — not on an individual cell value. It presents the same schema-less decode and the same multi-interpretation output as the BLOB Inspector, but in an expandable tree structure rather than a flat inline view. The difference is navigation, not capability: the BLOB Inspector shows everything at once, annotations inline; the Tree Viewer lets you work through the message field by field and expand only what you need.

Each field in the tree is expandable. Expanding a numeric field reveals every valid interpretation for its wire type — the same set described in the Interpretation Rules tables below. Take field 10 from the SEGB payload above. In the flat BLOB Inspector output you already see the annotations inline, but in the Tree Viewer it renders as:

10 ▼ varint
    uint64:               18446744073709537216
    int64:                -14400
    sint64 (ZigZag):      9223372036854768608
    bool:                 —  (not 0 or 1, not shown)
    Unix timestamp (s):   —  (out of range, not shown)

The three interpretations are visible simultaneously. int64: -14400 is the one that makes contextual sense — a UTC offset in seconds. The ZigZag value is a large meaningless number, the timestamp check fails. The examiner decides; the tree just ensures nothing is hidden. For a large message with many repeated fields, being able to collapse and expand individual subtrees makes this significantly more practical than scrolling through a flat output.

Tree Viewer showing a Protobuf message with a varint field expanded, displaying all interpretations

Interpretation Rules

The following tables document exactly which interpretations crush surfaces per wire type, and under what conditions. These apply to the Tree Viewer; the BLOB Inspector shows the same set minus uint64 and uint32, which are already shown as the primary field value.

Wire Type 0 — varint

Interpretation Condition
uint64 always
int64 only when value ≥ 2⁶³ (negative as int64)
sint64 (ZigZag) always
bool only when value = 0 or 1
Unix timestamp (s) 946 684 800 ≤ value ≤ 4 102 444 800 (2000–2100)
Chrome/WebKit timestamp (ยตs) 12 591 158 400 000 000 ≤ value ≤ 15 778 800 000 000 000

Wire Type 1 — fixed64 (8 bytes)

Interpretation Condition
uint64 always
int64 only when negative
double when not NaN and not ±inf
Cocoa timestamp double not NaN/inf AND 0 < double ≤ 3 155 673 600 (2001–2101)
Unix timestamp (double, s) double not NaN/inf AND 946 684 800 ≤ double ≤ 4 102 444 800
Unix timestamp (uint64, s) 946 684 800 ≤ uint64 ≤ 4 102 444 800
Chrome/WebKit timestamp (ยตs) 12 591 158 400 000 000 ≤ uint64 ≤ 15 778 800 000 000 000

Wire Type 5 — fixed32 (4 bytes)

Interpretation Condition
uint32 always
int32 only when negative
float when not NaN and not ±inf
Unix timestamp (uint32, s) 946 684 800 ≤ uint32 ≤ 4 102 444 800

Loading a Schema

If you have located the .proto definition for the data you are examining, the Protobuf Viewer can load it directly — either as a raw .proto source file or as a compiled FileDescriptorSet (.pb), produced with protoc --descriptor_set_out. With a schema loaded, field numbers resolve to names, enum values carry their labels, and type ambiguity is eliminated. Schema loading is only available in the Protobuf Viewer, not in the BLOB Inspector - but I am actively working on the BLOB inspector of crush - so look out for updates ;-).

Protobuf in Non-Obvious Places

Protobuf does not only appear as standalone values in databases or LevelDB stores. A few places worth checking on acquisitions where it is easy to miss:

  • SQLite BLOB columns: any column typed as BLOB in an app database is a candidate. Android health apps, sync clients, and messaging apps (including Signal, which has historically stored some internal state as serialised Protobuf) are worth checking.
  • LevelDB values: as covered in Deep Dive #3, Chrome Sync Data is almost entirely Protobuf. Chrome IndexedDB values often contain Protobuf-serialised JavaScript objects depending on the web application.
  • Raw files in app containers: some apps write Protobuf directly to files — no database wrapper. Common in Android under /data/data/<package>/files/ or /data/data/<package>/cache/. These have no extension convention.
  • Apple Biome / SEGB files (iOS): this one surprises people. Apple's Biome framework stores behavioural telemetry — app usage, screen time, notifications, location visits — in SEGB files under /private/var/mobile/Library/Biome/. SEGB is a proprietary Apple format with no public specification, but its structure has been reverse-engineered: each entry has a fixed binary header (magic bytes, timestamps, flags) followed by a Protobuf-encoded payload. The header is Apple-proprietary; the payload follows the standard Protobuf wire format exactly.

    In crush, opening a SEGB file in the SEGB viewer triggers Decode from Table automatically — the viewer knows the SEGB structure and parses header fields and entry boundaries without any manual intervention. What it does not do automatically is decode the Protobuf payload inside each entry, because there is no way to detect Protobuf without context. To examine the payload, right-click the entry and open it in the BLOB Inspector, then select Protobuf mode manually. The real SEGB payload shown earlier in this post came from exactly that workflow.

    The takeaway: Protobuf is not just an Android or Chrome concern. It is embedded inside Apple's own proprietary formats, one layer down. If you are already examining SEGB files, you are already working with Protobuf — just with a header in front of it. Cross-referencing the output against Chris Vance's research on Biome (https://blog.d204n6.com/2022/09/ios-16-now-you-c-it-now-you-dont.html) gives partial field mappings for several known stream types, which is the closest thing to a public schema available.

  • Network captures: not a crush use case, but Protobuf is common in gRPC traffic and some REST APIs that use application/x-protobuf. Worth keeping in mind for network forensics.

A Note on Heuristic Detection

Unlike binary plists (magic bytes 62706C697374) or PNG (magic bytes 89504E47), Protobuf has no magic bytes. Any sequence of bytes that is internally consistent with the wire format grammar will pass a Protobuf heuristic check. This means false positives are possible — particularly with short blobs, or with data that happens to contain valid-looking varint sequences.

The heuristic crush applies when parsing a blob as Protobuf is conservative: the entire byte sequence to parse without encountering any wire type value outside 0–5 — values in that range have no defined meaning and indicate either corrupt data or a format that is not Protobuf. Any length-delimited field claiming a length that would exceed the remaining bytes also aborts the parse immediately.

Wire types 3 and 4 — the deprecated proto2 group start and end markers — are handled differently. Rather than treating them as parse errors, crush skips group fields entirely: the group and its contents are silently consumed and parsing continues with the next field. Group fields do not appear in the Tree Viewer or BLOB Inspector output. This is the correct forensic trade-off: aborting on wire type 3 would cause an entire blob to be reported as non-Protobuf simply because it contains a legacy field type that still appears in older proto2-serialised data and some Google-internal formats. Silent skip means you see the fields that can be decoded; you do not lose the whole message. The limitation is worth knowing: if a blob contains exclusively group fields, the output will be empty — which looks identical to a failed parse.

There is a subtler heuristic problem specific to length-delimited fields. A wire-type-2 payload can be decoded as a UTF-8 string, as raw bytes, or as a nested Protobuf message — and the wire format alone cannot tell you which interpretation is correct. An earlier version of the crush decoder checked for valid UTF-8 first: if the payload was valid UTF-8, it was displayed as a string and the nested decode was never attempted. This produced a silent data loss: a payload that was simultaneously valid UTF-8 and a valid nested message was always shown as a flat string, with its internal structure invisible.

The current decoder reverses the order: for every length-delimited field, a nested Protobuf parse is attempted first. Only if that parse yields no entries — or raises an error — does the decoder fall back to UTF-8 string or raw bytes display. Forensically, this is the correct priority: a nested message that happens to be valid UTF-8 has structure worth surfacing; a flat string that happens to fail the nested parse is just a string.

If the auto-detected Protobuf output looks structurally inconsistent — deeply nested messages where none are expected, or field numbers that jump around implausibly — it is worth switching to Hex mode and checking whether this is actually Protobuf or something else that happens to parse.

Further Reading

Happy examining. ๐Ÿข

Saturday, May 16, 2026

Beyond the C — SEGB and Biome Forensics with crush

Moin! ๐Ÿ‘‹

This post walks through the SEGB viewer in crush, which received a complete forensic overhaul in the current release. The short version: Protobuf payloads are decoded automatically, and a built-in SQL editor lets you query every record — including nested and repeated Protobuf fields — without writing a single line of Python first.

What Is SEGB — and Why Should I Care?

SEGB is Apple's binary log format used by the Biome service. The name comes from the four-byte magic header at the start of every file: 53 45 47 42 — ASCII for SEGB. Apple has never publicly documented the format; the community named it informally, and the name stuck.

Biome emerged in iOS 15 alongside knowledgeC.db, and by iOS 16 it had largely replaced it. Today there are well over 130 defined Biome streams on a modern iOS device, each tracking a specific type of user activity in its own folder — application foreground and background events, device unlock events, Safari browsing history, Siri interactions, messaging activity, airplane mode toggles, and much more. Every stream stores its records in SEGB files. Every SEGB file payload is almost always Protobuf.

There are two versions of the format in the wild. SEGB v1 was used in iOS 15 and iOS 16. SEGB v2 arrived with iOS 17 and remains in use through current iOS versions — the internal structure changed significantly, with the addition of a trailer section that must be parsed to read the file correctly. crush handles both versions automatically; you do not need to know which one you are looking at before opening the file.

Two resources from the community worth reading alongside this post:

The open-source Python library that the crush SEGB parser builds on is ccl-segb by Alex Caithness and the CCL Solutions Group — full credit to them for that work.

Where to Find SEGB Files

A full filesystem acquisition is required — logical or backup-level imaging does not expose Biome data.

iOS:

  • Primary user streams: /private/var/mobile/Library/Biome/streams/public/
  • System streams: /private/var/db/biome/streams/

macOS:

  • /private/var/db/biome/streams/

These are the well-known paths — but SEGB files are not limited to them. On a full filesystem image you will find additional SEGB data in locations such as Library/DuetExpertCenter/, Library/PersonalizationPortrait/, and inside app group containers under Containers/Shared/AppGroup/<AppID>/. crush detects SEGB files by their magic bytes automatically — so when you open a full filesystem image, files in these locations show up in the file system panel when filtering for type:SEGB without any manual searching.

In the well-known paths each stream lives in its own subdirectory named after the data it tracks — _DKEvent.App.inFocus, _DKEvent.device.Unlocked, _DKEvent.Safari.History, Device.Display.Backlight, and so on. Within each stream folder you will find a local subfolder (records from this device) and a remote subfolder (records synced from the user's other Apple devices on the same Apple ID). There is also a tombstone folder containing expired record versions. crush reads all of them.

The SEGB files themselves have no file extension and are named with a large integer — a Cocoa epoch timestamp in microseconds, which gives you a rough creation time from the filename alone before you open anything.

For this post I am using SEGB files from an iPhone acquisition, specifically from the App.inFocus stream:

/filesystem1/private/var/mobile/Library/Biome/streams/public/_DKEvent.App.inFocus/local/

Opening an SEGB File in crush

SEGB files appear as plain files in the crush file panel — no extension, no special icon. To open one, double-click it. crush reads the magic header, determines whether the file is v1 or v2, parses the records, decodes the Protobuf payloads, and opens the viewer. The detected version and total record count are shown in the Properties panel.

No extraction needed — crush reads directly from inside ZIP or TAR archives or from a folder on disk. All connections are read-only.

Filesystem Panel - Showing SEGB in well-known folders



The SEGB Table

The main view is the SEGB table — the same name used in the SQL editor. Each row is one record from the file. The columns present depend on whether the file is v1 or v2; some are common to both.

Columns common to v1 and v2:

Column Description
Index Sequential record number, 0-based. Order of appearance in the file.
Offset Byte offset where the payload data begins, after the record header. Useful for correlating with raw hex examination.
State Entry lifecycle state: Written = active record, Deleted = marked for deletion, Unknown = empty or unrecognised. Deleted records are colour-coded red.
CRC Stored CRC32 value written into the entry header at creation time.
CRC Calc CRC32 calculated over the current payload bytes at parse time.
CRC Passed True if the CRC stored by the system matches the CRC crush calculates over the parsed payload bytes — confirming that what crush read matches what was written. False warrants closer examination: it may indicate corruption, an unknown format variant, or a parse issue.
Payload Size Size of the raw Protobuf payload in bytes.
Payload Schema-less wire-format decode of the raw payload bytes, rendered as human-readable text. Cocoa timestamps are decoded to ISO datetimes automatically, nested messages are expanded inline, repeated fields collected into arrays. Raw bytes remain accessible via the Blob Inspector.

v1 only:

Column Description
Timestamp 1 First Cocoa absolute time double from the 32-byte record header (bytes 8–15). Exact semantics not formally documented by Apple; in practice often identical to Timestamp 2, possibly reflecting record creation and commit time.
Timestamp 2 Second Cocoa absolute time double from the 32-byte record header (bytes 16–23). May differ from Timestamp 1 in some entry states.

v2 only:

Column Description
Creation Cocoa absolute time from the trailer entry at the end of the file. Records when the entry was created.
Trailer Offset Byte offset of this entry's trailer record within the trailer table at the end of the file.
Entry End Offset End offset of the entry's data area, relative to the start of the entry region (after the 32-byte file header). Used internally to calculate entry length.

SEGB table showing columns for a v2 file


Deleted Records

A record marked Deleted has an empty payload — the content is gone. What remains is the record itself: its index, offset, timestamp, and state. That is still forensically meaningful. A deleted entry proves that something was recorded here and was subsequently removed. The timestamp tells you when. The offset tells you exactly where in the file the entry sits. That is often enough to establish that activity occurred, even without the payload content.

Cell Preview and BLOB Inspector

Selecting any row in the SEGB table populates an inline preview panel below the table with the decoded content of the selected cell — the same human-readable Protobuf rendering shown in the Payload column, but with more space to read it. This works across all viewers in crush: SQLite, LevelDB, Realm, and SEGB.

For deeper inspection, double-clicking a cell or using the context menu opens the shared Blob Inspector dialog. The Blob Inspector always receives the raw bytes of the cell. For a Payload cell where crush has already decoded the Protobuf, the inspector additionally shows the decoded value alongside the raw bytes — so you can cross-reference the rendered text with the underlying hex without switching tools. From there you can switch to any other decode mode: Hex, UTF-8, Base64, and so on.


SCREENSHOT: Blob Inspector open on a Protobuf payload from an SEGB record


The SQL Editor — Querying the Biome

This is the feature that makes the crush SEGB viewer genuinely different from a basic file parser.

When you open an SEGB file, crush creates a backing SQLite database in memory and loads all decoded records into a table named SEGB. The SQL editor — with autocomplete for table and column names — lets you query that table directly. Two payload columns are available:

Column Content
Payload Human-readable rendered text — same as the table column
Payload JSON Protobuf fields as JSON — queryable with json_extract

The Payload JSON column is where the real power is. Every Protobuf field is stored by its field number as a JSON key. SQLite's json_extract function lets you reach into any field — including nested messages and repeated fields.

crush decodes payloads schema-less at the wire level, so field numbers match the raw Protobuf field numbers — not named keys. The meaning of each field varies by stream and is not publicly documented by Apple, but is consistent within a stream across devices and OS versions. Community tools like iLEAPP carry explicit field mappings for the most forensically relevant streams and are a good reference when you need to know what a specific field contains.

The SQL editor is primarily an analysis and exploration tool. The goal is not to replace a dedicated parser — it is to understand the data well enough to build one. Working through the fields with json_extract, identifying which field number carries the bundle ID, which carries the timestamps, and how nested or repeated fields are structured, is exactly the groundwork needed before writing a parser for iLEAPP or any other tool. crush lets you do that interactively, without writing a single line of Python first.

For the App.inFocus stream some key fields are:

json_extract path Meaning
$.1.1Activity / stream path (e.g. /app/inFocus)
$.2Time Start — when the app came into focus
$.3Time End — when the app left focus
$.4.3Bundle ID (e.g. com.apple.Preferences)
$.5Action GUID
$.7[0].2Transition — how the app was reached (e.g. com.apple.SpringBoard.transitionReason.homescreen)
$.8Time Write — when this record was written to the SEGB file

For a detailed breakdown of the App.inFocus stream structure — including the record header layout and protobuf field analysis — D20 Forensics covered this in depth: iOS 16 — Now You 'C' It, Now You Don't. For the App.inFocus stream specifically, the payload field structure is consistent across SEGB v1 and v2, making it a useful cross-reference regardless of which version you are examining.

Some example queries for App.inFocus:

-- App focus timeline: bundle ID, start, end, and how the app was reached
SELECT "Index",
       json_extract("Payload JSON", '$.4.3') AS bundle_id,
       json_extract("Payload JSON", '$.2')   AS time_start,
       json_extract("Payload JSON", '$.3')   AS time_end,
       json_extract("Payload JSON", '$.7[0].2') AS transition
FROM SEGB
WHERE State = 'Written'
ORDER BY time_start;

-- All focus events for a specific app
SELECT * FROM SEGB
WHERE json_extract("Payload JSON", '$.4.3') = 'com.apple.Preferences';

-- Deleted records — payload is empty, but index, offset and timestamp remain
SELECT "Index", "Offset", "State"
FROM SEGB
WHERE State = 'Deleted';

SQL editor with a json_extract query running against the SEGB table, results shown below


A Practical Workflow

A useful approach for Biome examination in crush:

  1. Navigate to the relevant Biome stream folder.
  2. Open an SEGB file. Check the detected version and record count in the Properties panel before going further.
  3. Scan the Payload column — the decoded Protobuf text is often enough to identify records of interest without writing any SQL.
  4. Deleted records have an empty payload — but the record itself remains, including index, offset, timestamp, and state, which can be enough to establish that activity occurred.
  5. Switch to the SQL editor and use json_extract on Payload JSON to isolate specific apps, time ranges, or field values. Autocomplete covers table and column names.
  6. For any record where the decoded text is ambiguous, double-click the Payload cell to open the raw bytes in the Blob Inspector.
  7. Do not skip the remote subfolder — data from the user's other Apple devices may cover time ranges not present in the local files, or contain records that were already deleted locally.

Further Reading

Happy examining. ๐Ÿข

Sunday, May 10, 2026

Reading the CURRENT — LevelDB Forensics with crush

Moin! ๐Ÿ‘‹

LevelDB is everywhere — you just don't see it. If your workflow starts and ends with SQLite, you're walking past a surprising amount of data on every acquisition. Chrome, Electron-based desktop apps, iOS, Android — LevelDB shows up constantly in real cases, often unnoticed, and often holding data that other tools quietly skip over.

With the current release, the crush LevelDB viewer has grown significantly. This post walks through what it can do — and more importantly, what it means for your investigation.

What Is LevelDB — and Why Should I Care?

LevelDB is an on-disk key-value store developed by Google. Unlike SQLite, it does not present itself as a single file with a recognisable header — instead, a LevelDB database occupies an entire folder, containing a mix of files with .log, .ldb, and metadata files named MANIFEST-######, CURRENT, and LOG. If you have ever opened an acquisition and seen a folder full of files like that and moved on, there is a good chance you left data on the table.

The forensic value of LevelDB depends on what the application chose to store there — and that varies enormously. In Chrome and Chromium-based browsers you will find browsing history fragments, sync metadata, IndexedDB entries from web applications, session state, and local storage data. Electron apps like Signal Desktop, Discord, or WhatsApp Desktop use LevelDB for app configuration, cached user data, and UI state.

On top of that, there is a structural property worth understanding: deleted records are not immediately gone. Every write to LevelDB — including deletions — is appended as a new entry with a sequence number. A deletion creates a "tombstone" entry: the key remains, but the value is empty. In .ldb files, superseded or deleted records are eventually cleaned out during compaction — but until that happens, they can still be read. In .log files, the active write log, everything is there. A tombstone without a value is not nothing: the key alone can tell you that something existed and was deliberately removed.

Alex Caithness from CCL wrote the definitive primer on this in 2020 — if you have not read it yet, it is required reading before going further:

The open-source Python library that came out of that research — ccl_chrome_indexeddb — is also the foundation that the crush LevelDB viewer builds on. Full credit to Alex and the CCL team for that work.

Where to Find LevelDB

LevelDB shows up more often than most people expect — and the pattern is consistent enough that you can learn to spot it without memorising paths.

On iOS and Android, look for any Chromium-based browser (Chrome, Edge, Brave) and any app built on a Chromium WebView or Electron runtime. The LevelDB stores typically live under the app's data container, in subdirectories named IndexedDB, Local Storage/leveldb, or Sync Data/LevelDB. The folder name is usually your first hint.

On desktop systems — Windows, Linux, macOS — the same logic applies, but the surface area is larger than most people assume. Every Chromium-based browser and every Electron app on the system has its own LevelDB stores. On Linux alone, a standard workstation running Chrome, Signal Desktop, Discord, and Slack will have dozens of LevelDB databases under ~/.config/. It is easy to treat LevelDB as a mobile-only concern. It is not.

The reliable indicator is always the folder structure: a directory containing a CURRENT file, one or more MANIFEST-###### files, and a mix of .log and .ldb or .sst files is a LevelDB database — regardless of what the folder itself is named.

Example LevelDB folder content


For this post, I am using the Spotify Client App from an iPhone acquisition (Public iOS 17 Image from Josh Hickman) as the example throughout. The path to the LevelDB looks like this:

/private/var/mobile/Containers/Data/Application/E81161FB-2689-4DF6-B98B-C8F95B056440/Library/Application Support/PersistentCache/Users/7jlf2hrgerxu6n0pp8l35r5h0-user/primary.ldb/

Opening a LevelDB Database in crush

LevelDB databases don't announce themselves — there is no single file to click on. A LevelDB database is a plain directory, typically named LevelDB (or sometimes leveldb), containing a mix of .log, .ldb, .sst, and metadata files. There is no special label or icon that distinguishes it from any other folder in the file panel. The .sst (Sorted String Table) extension is functionally identical to .ldb — it is an older naming convention still used by some non-Chrome LevelDB implementations. crush handles both.

To open it in crush, right-click the folder in the file panel and select Open. A single left-click only selects the folder — the right-click menu is what triggers crush to read the directory structure and open the LevelDB viewer. That is the step that is easy to miss the first time.

No extraction needed — crush reads directly from inside ZIP or TAR archives, or from a folder on disk. 

File panel showing right-click context menu on a LevelDB folder, with


The Overview Tab

The first tab is the Overview. Before looking at any records, this is where I orient myself.

It shows the MANIFEST metadata for the database: the comparator in use, the last sequence number, the log number, and the files that make up the database grouped by compaction level. crush parses all MANIFEST-* files present in the directory — not just the current one. The active MANIFEST is labelled (current); older ones expose compaction history from before the last recovery, potentially referencing file numbers that are no longer on disk. The CURRENT pointer itself is shown as a separate entry, and where present, the prev_log_number field identifies the WAL log file that preceded a recovery — useful context when you are trying to reconstruct the write history of a database that has been through a crash or a clean shutdown cycle.

The last sequence number tells you how many write operations have been performed on this database across its lifetime — including updates and deletions, not just inserts. A low number (like 0 or close to it) means the database is fresh or barely touched. A high number is a signal that there has been a lot of activity — and potentially a lot of deleted data worth looking at.

The compaction level of the files tells you how mature the database is. Fresh data lives in .log files and level-0 .ldb files; as the database ages and compaction runs, data moves to higher levels and old or deleted records get cleaned out. A database where everything is still at level 0 has had very little compaction — which is good for us, because deleted records are more likely to still be present.


Overview tab showing MANIFEST metadata, all MANIFEST files, and file list by level


The Files Tab

The Files tab gives you a per-file breakdown: filename, type (Log or Ldb), compaction level, size in bytes, and counts of live, deleted, and unknown records per file. For .ldb and .sst table files, the tab also shows the Smallest Key and Largest Key boundaries sourced directly from the MANIFEST VersionEdit entries — decoded as UTF-8 where possible, hex otherwise. At a glance, you can see which files cover which key ranges without opening a single record.

Files containing deleted records are colour-coded red. That is your first triage signal — if a file is red, it is worth a closer look. A file with a high deleted record count relative to its live records tells its own story.

The distinction between Live, Deleted, and Unknown is worth understanding:

  • Live — the record is the current version of a key and has a value.
  • Deleted — a tombstone entry. The key is present, the value is empty. Something was here.
  • Unknown — records in .ldb files where the state cannot be determined without reading the full compaction context. These are worth inspecting manually.
Files tab showing per-file record counts, size, key ranges, with red-coded files containing deleted records

The Records Tab — Live and Deleted in Context

The Records tab shows all records from all files in a single table. This is where the examination happens.

The filter toolbar lets you narrow down to All / Live / Deleted / Unknown records. Deleted records appear inline in red, directly alongside the live data — not in a separate view. That matters: seeing what is gone in the context of what is still there is often where the interesting questions start.

Each row shows:

  • The sequence number — the order in which this record was written. Lower sequence numbers are older writes.
  • The offset — the byte position within the source file where this record begins, displayed in hex. This is your anchor for cross-referencing in a hex editor and for court testimony: if you need to point to exactly where in the evidence file a record lives, this is how you do it. The offset is included in CSV exports.
  • A UTF-8 text preview and a hex preview of the key.
  • A UTF-8 text preview and a hex preview of the value.
  • The state — Live, Deleted, or Unknown.

Selecting any row feeds the full key and value bytes into a split hex pane below the table — separate Key and Value tabs, so you can inspect each independently without them running together. For records from .ldb or .sst table files, a third Internal Key tab is also available, showing the complete LevelDB internal key: the user key plus its 8-byte sequence number and type suffix. This is the raw on-disk representation — useful when you need to verify exactly what LevelDB stored, or when the type byte tells you something about how the record was written.

Records tab showing mixed live and deleted records, with the split hex pane below


Search and Filter

The Search box filters rows case-insensitively across all columns — key text, value text, state. It combines with the state filter buttons, so you can ask questions like "show me only deleted records whose key contains this prefix" in two clicks. For databases with thousands of records, this is how you find signal in the noise quickly.

The LOG Files

LevelDB writes its own operational log to files named LOG and LOG.old — plain text, not to be confused with the binary .log write-ahead log files. crush surfaces these in dedicated tabs, one per file that exists, with the full content displayed in a monospace view and a Find toolbar for searching within the log. This is where LevelDB records its own compaction events, file opens, and error conditions — useful background when you are trying to understand what happened to a database and when.

BLOB Inspector — When the Value Is Protobuf

In the LevelDB used as the example here, roughly 95% of the values are Protobuf-encoded binary data. That is typical for LevelDBs — the apps use Protobuf extensively for serialising entity data before it goes into the LevelDB store.

Right-clicking any row in the Records table offers Inspect Key…, Inspect Value…, and for table-file records, Inspect Internal Key… — all opening the shared BLOB Inspector dialog.

In Auto mode, the inspector tries to detect the format automatically. For valid Protobuf data it will decode it and display the wire-decoded fields in protoc --decode_raw style — field numbers, wire types, and values, rendered as formatted text. PNG, JPEG, GIF, binary plists, XML, Android Binary XML, and JSON are also detected automatically in Auto mode via magic bytes or content sniffing.


BLOB Inspector open on a Protobuf value, showing decoded fields


If auto-detection does not find a match, you can manually select from the available decode modes: Hex, UTF-8, Latin-1, Base64, Plist, XML, JSON, Protobuf (schema-less), Android Binary XML, or Image. This covers the formats you are most likely to encounter in LevelDB values across Chrome, Electron apps, and Android databases.

The inspector opens as a non-modal window — the rest of the UI stays fully interactive while it is open, and you can have multiple inspectors open at the same time. That matters when you are comparing values across records without constantly opening and closing the dialog.

The BLOB Inspector is shared across all viewers in crush — SQLite, LevelDB, Realm. Any improvement to the inspector (a new decode mode, a UI fix) benefits all three at once. If you have been following the SQLite and Realm deep dives, the workflow here is identical: right-click, inspect, decode.

Further Reading

Happy examining. ๐Ÿข

Tuesday, May 5, 2026

Object by Object — RealmDB Forensics with crush

Moin! ๐Ÿ‘‹

After the SQLite deep dive with crush, the next stop is Realm. This post walks through what the crush Realm viewer can do — and as before, what the features actually mean for an investigation. I am using the Android app de.formel1 — the German Formula 1 news app — as the example throughout. I enjoy watching F1, and it turns out the app stores some of its data in its Realm database.

With v0.7.0, the crush Realm viewer has been significantly extended and improved. This post walks through what it can do — and as before, what the features actually mean for an investigation.

What is Realm — and Why Should I Care?

Realm is an object-oriented database developed for mobile platforms. Unlike SQLite which stores data in tables with rows and columns, Realm stores data as objects — think classes with properties rather than spreadsheet rows. It was originally developed by Realm Inc., acquired by MongoDB in 2019, and has seen adoption in a range of Android and iOS apps.

Forensically, Realm matters because it shows up in real cases — and most tools either support it poorly or not at all. If you have never encountered a .realm file in an acquisition, you may simply not have noticed: the file has no extension-based prominence and the format is far less documented than SQLite.

For a thorough introduction to Realm and its physical structure, Damien Attoe's The Realm Files series is the best starting point available:

Finding the Database

Realm databases use the .realm extension and are typically found under the app's data directory. For the Formula 1 news app it is:

/data/data/de.formel1/files/

The extension alone is not always reliable — some apps store Realm files without it or under an unexpected name. A more robust identifier is the magic bytes: at offset 16 in the file, Realm stores the 4-byte value 54 2D 44 42 — the ASCII string "T-DB". If you see that signature, you are looking at a Realm database regardless of the filename.

In crush, .realm files are now labelled Realm in the VFS tree panel — the same way SQLite files show their type at a glance. Previously they showed no label at all, which made them easy to overlook when browsing a large acquisition.

The Header

The Header tab is my first stop when opening a Realm file — before looking at any data. It shows the fundamental structural information of the database: the file format version, the positions of both top-level references in the file, and which one is currently active.

This matters because everything else in the viewer builds on this foundation. The active Top Ref determines which snapshot is the current state of the database; the inactive one points to the previous commit. Knowing where both live in the file is the starting point for understanding what the database currently contains — and what it contained before.


The Schema Tab

The Schema tab shows every table (class in Realm terminology) as an expandable node listing all column names with their Realm type: int, bool, string, date, link, and so on.

This is important context before looking at data. Which tables exist? What properties do they have? Are there link columns that reference other tables — and can I join across them? Tables without decoded data still appear as leaf entries so nothing is hidden.


In earlier versions of the crush Realm parser, every column was labelled col_0, col_1, … regardless of the actual schema names. That is now fixed — column names are read directly from the Realm spec node.

Top Refs — Deleted Data and Schema Diff

Before diving into tables, the Top Refs tab is worth checking first. This is specific to Realm's internal architecture and has direct forensic relevance.

Realm maintains two top-level references — Top Ref 0 and Top Ref 1 — as part of its copy-on-write architecture. At any point, one of them is active (the current state of the database) and the other represents the previous commit. The Top Refs tab shows both references, which one is currently active, and key settings and metadata for each.

Beyond the low-level header comparison, the tab now also shows a Diff — schema section. This is where it gets forensically useful: crush compares the schema of both Top Refs and lists tables that exist only in the active ref (added since the last snapshot), tables that exist only in the inactive ref (deleted since the last snapshot), and tables present in both but with a changed row count. A drop in row count between the inactive and active ref is a direct indicator that records were deleted between the two commits.


Freed Data — Inspecting Unallocated Space

In addition to active and previous snapshots, crush also exposes freed data directly. Unlike the Tables tab, this is not reconstructed into rows — it is a view into unallocated regions of the Realm file.

When objects are deleted in Realm, their space is marked as free but not immediately overwritten. These regions may still contain residual data from previously stored objects. The Freed Data tab surfaces these regions as individual entries with:

  • Offset — where the free region is located in the file
  • Size — the length of the region
  • Source — whether the region originates from the active or inactive Top Ref
  • Content — decoded where possible (currently it looks for null terminated strings)

Selecting an entry reveals the raw data, including a full hex view at the bottom of the tab. This keeps the representation close to the underlying file structure and avoids introducing assumptions about object boundaries or schema correctness.

In practice, these regions can still contain recognizable artefacts such as strings, timestamps, or fragments of previously stored objects. However, interpretation requires care: data may be partially overwritten, misaligned, or no longer internally consistent.

What this view provides is access to data that is not present in either the active or previous snapshot — and would otherwise remain invisible.


The Tables Tab — Active and Previous Snapshot

The Tables tab shows decoded table data from both Top Refs — not just the active one. This mirrors how the SQLite WAL viewer exposes superseded page versions: the idea is the same, Realm just implements it differently at the format level.

Tables from the inactive (previous) ref can be shown by activating the Show diff to prev ref checkbox. In the temporary SQLite database (more on this in the next section) the previous refs are stored as _prev_<tablename>, which makes cross-snapshot comparisons straightforward. The example data in the image is synthetically generated — I do not have real test data for this scenario in the Formula 1 app.


Deleted = Only existent in the prev ref
Prev Version = In both refs - but different content
Added = Only in the active ref
Everything else = identical in both refs

A few things that are now decoded correctly that were previously broken:

  • Timestamps — Realm timestamp columns (type 8) were previously fed through the wrong decoder, producing incorrect Unix-epoch-1970 dates. They are now decoded correctly and displayed as YYYY-MM-DD HH:MM:SS UTC.
  • Strings — format 24 stores strings in fixed-width inline entries. The previous logic produced garbage values; strings now decode correctly.
  • Booleans — nullable booleans (Realm's 2-bit encoding: 0=False, 1=True, ≥2=NULL) were previously returned as raw integers 0/1/2. They now show as True, False, or None.
  • NULL-only columns — columns with no payload were silently dropped. They now appear in the table with all-None values so the column is visible.

The SQL Bar and Cross-Snapshot Queries

When the Tables tab opens, crush loads the decoded Realm data — from both snapshots — into a temporary in-memory SQLite database. The full SQL bar is available: SELECT, WHERE, ORDER BY, aggregates, and more. The temp database is deleted automatically when the tab is closed.

Because both the active and previous snapshot tables are present in the same database, cross-snapshot comparisons are possible directly in SQL. For example, to find rows that exist in the previous snapshot but not in the active one — i.e. deleted records:

SELECT p.*
FROM _prev_class_Evidence p
LEFT JOIN class_Evidence e ON p._objkey = e._objkey
WHERE e._objkey IS NULL

Cross-Table JOINs via _objkey

Beyond cross-snapshot queries, Realm link columns store ObjKey values — the internal identifiers Realm uses to reference objects across tables. Each table in the temporary SQLite database receives a leading _objkey column containing the Realm ObjKey for every row, making cross-table joins straightforward:

SELECT ae.dt_published [Published Date], ae.title [Title], a.share_link, e.last_name [Author]
FROM class_Article a
JOIN class_AuthorEDP e ON a.author = e._objkey
JOIN class_ArticleEDP ae ON a.edp = ae._objkey

ObjKeys are not shown in the table grid — they are only present in the SQL database — so the visual view stays clean while the join capability is fully available.


Inspect Cell

As with the SQLite viewer, right-clicking any cell and selecting "Inspect cell…" opens the content in a dedicated window with auto-detection. If the content is a known format — a binary plist, JSON — it opens directly in the appropriate viewer. Otherwise the hex view is available as fallback.

Realm BLOB columns occasionally contain nested serialized data — Inspect Cell is the fastest way to find out what is in there without leaving crush.

Further Reading

Happy examining. ๐Ÿข