Tuesday, May 5, 2026

Object by Object — RealmDB Forensics with crush

Moin! 👋

After the SQLite deep dive with crush, the next stop is Realm. This post walks through what the crush Realm viewer can do — and as before, what the features actually mean for an investigation. I am using the Android app de.formel1 — the German Formula 1 news app — as the example throughout. I enjoy watching F1, and it turns out the app stores some of its data in its Realm database.

With v0.7.0, the crush Realm viewer has been significantly extended and improved. This post walks through what it can do — and as before, what the features actually mean for an investigation.

What is Realm — and Why Should I Care?

Realm is an object-oriented database developed for mobile platforms. Unlike SQLite which stores data in tables with rows and columns, Realm stores data as objects — think classes with properties rather than spreadsheet rows. It was originally developed by Realm Inc., acquired by MongoDB in 2019, and has seen adoption in a range of Android and iOS apps.

Forensically, Realm matters because it shows up in real cases — and most tools either support it poorly or not at all. If you have never encountered a .realm file in an acquisition, you may simply not have noticed: the file has no extension-based prominence and the format is far less documented than SQLite.

For a thorough introduction to Realm and its physical structure, Damien Attoe's The Realm Files series is the best starting point available:

Finding the Database

Realm databases use the .realm extension and are typically found under the app's data directory. For the Formula 1 news app it is:

/data/data/de.formel1/files/

The extension alone is not always reliable — some apps store Realm files without it or under an unexpected name. A more robust identifier is the magic bytes: at offset 16 in the file, Realm stores the 4-byte value 54 2D 44 42 — the ASCII string "T-DB". If you see that signature, you are looking at a Realm database regardless of the filename.

In crush, .realm files are now labelled Realm in the VFS tree panel — the same way SQLite files show their type at a glance. Previously they showed no label at all, which made them easy to overlook when browsing a large acquisition.

The Header

The Header tab is my first stop when opening a Realm file — before looking at any data. It shows the fundamental structural information of the database: the file format version, the positions of both top-level references in the file, and which one is currently active.

This matters because everything else in the viewer builds on this foundation. The active Top Ref determines which snapshot is the current state of the database; the inactive one points to the previous commit. Knowing where both live in the file is the starting point for understanding what the database currently contains — and what it contained before.


The Schema Tab

The Schema tab shows every table (class in Realm terminology) as an expandable node listing all column names with their Realm type: int, bool, string, date, link, and so on.

This is important context before looking at data. Which tables exist? What properties do they have? Are there link columns that reference other tables — and can I join across them? Tables without decoded data still appear as leaf entries so nothing is hidden.


In earlier versions of the crush Realm parser, every column was labelled col_0, col_1, … regardless of the actual schema names. That is now fixed — column names are read directly from the Realm spec node.

Top Refs — Deleted Data and Schema Diff

Before diving into tables, the Top Refs tab is worth checking first. This is specific to Realm's internal architecture and has direct forensic relevance.

Realm maintains two top-level references — Top Ref 0 and Top Ref 1 — as part of its copy-on-write architecture. At any point, one of them is active (the current state of the database) and the other represents the previous commit. The Top Refs tab shows both references, which one is currently active, and key settings and metadata for each.

Beyond the low-level header comparison, the tab now also shows a Diff — schema section. This is where it gets forensically useful: crush compares the schema of both Top Refs and lists tables that exist only in the active ref (added since the last snapshot), tables that exist only in the inactive ref (deleted since the last snapshot), and tables present in both but with a changed row count. A drop in row count between the inactive and active ref is a direct indicator that records were deleted between the two commits.


Freed Data — Inspecting Unallocated Space

In addition to active and previous snapshots, crush also exposes freed data directly. Unlike the Tables tab, this is not reconstructed into rows — it is a view into unallocated regions of the Realm file.

When objects are deleted in Realm, their space is marked as free but not immediately overwritten. These regions may still contain residual data from previously stored objects. The Freed Data tab surfaces these regions as individual entries with:

  • Offset — where the free region is located in the file
  • Size — the length of the region
  • Source — whether the region originates from the active or inactive Top Ref
  • Content — decoded where possible (currently it looks for null terminated strings)

Selecting an entry reveals the raw data, including a full hex view at the bottom of the tab. This keeps the representation close to the underlying file structure and avoids introducing assumptions about object boundaries or schema correctness.

In practice, these regions can still contain recognizable artefacts such as strings, timestamps, or fragments of previously stored objects. However, interpretation requires care: data may be partially overwritten, misaligned, or no longer internally consistent.

What this view provides is access to data that is not present in either the active or previous snapshot — and would otherwise remain invisible.


The Tables Tab — Active and Previous Snapshot

The Tables tab shows decoded table data from both Top Refs — not just the active one. This mirrors how the SQLite WAL viewer exposes superseded page versions: the idea is the same, Realm just implements it differently at the format level.

Tables from the inactive (previous) ref can be shown by activating the Show diff to prev ref checkbox. In the temporary SQLite database (more on this in the next section) the previous refs are stored as _prev_<tablename>, which makes cross-snapshot comparisons straightforward. The example data in the image is synthetically generated — I do not have real test data for this scenario in the Formula 1 app.


Deleted = Only existent in the prev ref
Prev Version = In both refs - but different content
Added = Only in the active ref
Everything else = identical in both refs

A few things that are now decoded correctly that were previously broken:

  • Timestamps — Realm timestamp columns (type 8) were previously fed through the wrong decoder, producing incorrect Unix-epoch-1970 dates. They are now decoded correctly and displayed as YYYY-MM-DD HH:MM:SS UTC.
  • Strings — format 24 stores strings in fixed-width inline entries. The previous logic produced garbage values; strings now decode correctly.
  • Booleans — nullable booleans (Realm's 2-bit encoding: 0=False, 1=True, ≥2=NULL) were previously returned as raw integers 0/1/2. They now show as True, False, or None.
  • NULL-only columns — columns with no payload were silently dropped. They now appear in the table with all-None values so the column is visible.

The SQL Bar and Cross-Snapshot Queries

When the Tables tab opens, crush loads the decoded Realm data — from both snapshots — into a temporary in-memory SQLite database. The full SQL bar is available: SELECT, WHERE, ORDER BY, aggregates, and more. The temp database is deleted automatically when the tab is closed.

Because both the active and previous snapshot tables are present in the same database, cross-snapshot comparisons are possible directly in SQL. For example, to find rows that exist in the previous snapshot but not in the active one — i.e. deleted records:

SELECT p.*
FROM _prev_class_Evidence p
LEFT JOIN class_Evidence e ON p._objkey = e._objkey
WHERE e._objkey IS NULL

Cross-Table JOINs via _objkey

Beyond cross-snapshot queries, Realm link columns store ObjKey values — the internal identifiers Realm uses to reference objects across tables. Each table in the temporary SQLite database receives a leading _objkey column containing the Realm ObjKey for every row, making cross-table joins straightforward:

SELECT ae.dt_published [Published Date], ae.title [Title], a.share_link, e.last_name [Author]
FROM class_Article a
JOIN class_AuthorEDP e ON a.author = e._objkey
JOIN class_ArticleEDP ae ON a.edp = ae._objkey

ObjKeys are not shown in the table grid — they are only present in the SQL database — so the visual view stays clean while the join capability is fully available.


Inspect Cell

As with the SQLite viewer, right-clicking any cell and selecting "Inspect cell…" opens the content in a dedicated window with auto-detection. If the content is a known format — a binary plist, JSON — it opens directly in the appropriate viewer. Otherwise the hex view is available as fallback.

Realm BLOB columns occasionally contain nested serialized data — Inspect Cell is the fastest way to find out what is in there without leaving crush.

Further Reading

Happy examining. 🐢

Friday, May 1, 2026

Crush-forensics new version - 0.6.0 - is out

Moin! 👋

crush-forensics v0.6.0 is out.

github.com/kalink0/crush-forensics

No dramatic new format support this time, no big UI overhaul. This release is about going deeper, not wider.

Forensic Integrity — Now Tested

One thing that has been on my mind since the start of this project: how do you actually prove that a forensic tool does not touch the evidence?

The answer for crush is now a dedicated forensic test suite — currently 14 tests that verify the tool is safe to run on real evidence. Not standard unit tests. Each test is framed as a forensic property guarantee, grouped into five categories: Source Immutability, No Side Effects, Read-only Media compatibility, Known-output Verification, and Reproducibility.

Every test run also produces a forensic_audit.html — a self-contained, printable report with SHA-256 hashes of the reference corpus.

SQLite WAL Forensics

The SQLite viewer has been significantly extended in v0.6.0, with a focus on WAL archaeology — recovering data from the Write-Ahead Log that would otherwise be invisible to standard tools.

When a -wal companion is present, crush now shows a full frame inventory classifying every frame as Active, Superseded, Uncommitted, or WAL slack — the last category being salt-mismatch frames from a previous WAL cycle, per Sanderson's terminology. Superseded and uncommitted frames are colour-coded amber and blue so the examiner immediately sees whether overwritten or in-flight data exists.

Enabling the Show WAL history toggle appends rows decoded from non-active frames directly below the live table data, colour-coded, with a row count showing how many additional records were recovered. Double-clicking any WAL frame opens the raw page bytes in the hex viewer.

On a practical note: crush now also discovers -wal and -shm companions when a .db file is opened directly from the filesystem — previously these were silently skipped when not opening from inside an archive or folder.

More SQLite Improvements

Beyond WAL forensics, the SQLite viewer gained a number of improvements that make day-to-day examination faster.

Timestamp decoding is now built in: right-click any column header and decode it as Unix seconds, Unix milliseconds, Unix microseconds, Mac Absolute Time, Windows FILETIME, or Chrome/WebKit time — displayed as YYYY-MM-DD HH:MM:SS UTC, with sorting still working correctly on the underlying numeric value.

Two new combo entries give quick access to schema and configuration information: DB Structure lists all tables, views, indexes, and triggers with their definitions; DB Info shows 28 PRAGMA settings with decoded values and descriptions. Database views are now browsable like tables. The SQL bar now accepts PRAGMA statements, shows syntax highlighting, and supports running only a highlighted selection — useful for stepping through a complex query piece by piece.

Paste & Decode

A small but useful addition: Tools → Paste & Decode… lets you paste raw hex, base64, or plain text and open it immediately in any supported viewer — without saving anything to disk. If you have been copying BLOB values out of a hex editor or a network capture and opening temporary files just to inspect them, this should save a few steps.

Export as .logarchive

iOS diagnostics nodes now have an "Export as .logarchive…" right-click action. crush assembles the correct logarchive layout — diagnostics tree plus the uuidtext/ sibling — and copies the result to a location of your choice. The output is a standard .logarchive folder that can be opened directly in Console.app or processed with the log CLI. As I do not have a Mac test system this functionality isn't tested yet.

Parallel Unified Log Conversion

Multi-Log Studio now splits large logarchives across multiple unifiedlog_iterator processes — one per physical core by default — and streams entries into the viewer as each chunk finishes. On a typical 200 MB acquisition this gives roughly a 25% reduction in wall time. Unified Log support is still alpha, and there is more work ahead, but it is moving in the right direction.

File Format Database

All entries in the built-in file format database have been reviewed and refined — descriptions improved, forensic context expanded, reference URLs added. Five new formats were added: Apple Keychain, Android Keystore, iOS Backup (iTunes/Finder), Windows Prefetch, and Gzip.

What's Next

The SQLite viewer is the first to receive this level of dedicated depth and tooling — and it won't be the last. The plan is to work through all parsers and viewers systematically, one at a time, done properly. A dedicated deep-dive post on the SQLite viewer is also in the works, with the same planned for other parsers and viewers down the line.

As always: feedback, issues, and pull requests are welcome on GitHub.

Just keep swimming. 🐢

What Hides in the WAL — SQLite Forensics with crush

Moin! 👋

SQLite is everywhere in digital forensics. iOS, Android, macOS, Windows — virtually every app that needs to store structured data uses it. Messages, call logs, browser history, location data, app state — if it is on a mobile device, there is a good chance it lives in a SQLite database. If you want to go deeper on the file format itself, the official SQLite file format documentation is the right starting point.

With v0.6.0, the crush SQLite viewer has grown significantly. This post walks through what it can do — and more importantly, what the features actually mean for your investigation.

Opening a Database

crush opens SQLite databases directly from inside ZIP or TAR archives, from folders, or as standalone files. No extraction needed. When you select a SQLite file, the SQLite viewer opens automatically. The combo box at the top lets you switch between tables, views, and the generated entries described below.

One important detail: crush opens all SQLite connections in read-only mode. This matters because closing a WAL-mode database that was opened with a read-write connection triggers an automatic checkpoint — which commits and clears the Write-Ahead Log, potentially destroying forensic data. More on WAL below.

The Summary View

The default view when opening a database is the Summary. It shows all tables and views with their row counts, and a status label giving the full schema object count — tables, views, indexes, and triggers at a glance.

This is useful for orientation before diving in: how many tables are there, which ones have data, and are there any views worth looking at?

DB summary overview in crush

DB Structure

The DB Structure entry in the combo box lists every schema object in the database — tables, views, indexes, and triggers — with structural details: column definitions for tables, CREATE SQL for views, index definitions, and trigger signatures.

When I open an unfamiliar database, this is where I start, direclty after taking a look on the summary. What tables are there? What do the columns tell me about what the app stores? I also pay attention to indexes — they often hint at what the app queries frequently — and triggers, which can automatically update timestamps or delete records without the user doing anything. Understanding the schema is the foundation for everything that follows

DB structure view in crush

DB Info — PRAGMA Settings

The DB Info entry shows 28 SQLite PRAGMA settings with their current values and plain-English descriptions. PRAGMA statements are essentially configuration queries — they reveal things like the page size, the journal mode, the encoding, and the application ID.

The journal mode is particularly relevant: a value of WAL means the database uses a Write-Ahead Log, which has significant forensic implications (see below). The application_id PRAGMA can sometimes identify the application that created the database. The user_version field is used by many apps to track database schema versions — useful for understanding which version of an app created or last modified the database.


DB Info PRAGMA view in crush


Timestamp Decoding

Timestamps in SQLite databases are stored as raw numbers — but the epoch and unit vary wildly between apps and platforms. Right-clicking any column header and selecting "Decode column as timestamp" lets you tell crush how to interpret the values.

Supported formats:

  • Unix seconds — seconds since 1970-01-01 UTC (most common on Android)
  • Unix milliseconds — same epoch, millisecond precision
  • Unix microseconds — same epoch, microsecond precision
  • Mac Absolute Time — seconds since 2001-01-01 UTC (Apple's CoreData default)
  • Windows FILETIME — 100-nanosecond intervals since 1601-01-01 UTC
  • Chrome / WebKit time — microseconds since 1601-01-01 UTC (Chrome history databases)

The decoded values are displayed as YYYY-MM-DD HH:MM:SS UTC and the column header shows the active format as a suffix — for example created_at [mac abs]. Sorting still works correctly because the raw numeric value is preserved internally. Select "Clear timestamp format" to revert to the raw value.

If you are unsure which format a column uses, a quick sanity check is to look at the magnitude of the values: Unix seconds for recent dates are around 1.7 billion; Mac Absolute Time values for recent dates are around 750 million; Chrome time values are around 13 trillion.

The following pictures show one column converted, the other in original state.

Timestamp decoding in crush SQLite viewer

Just to mention it: It is also possible to convert timestamps directly in the SELECT query - this is basic SQLite-Syntax one can use (Reference: https://sqlite.org/lang_datefunc.html ). 

Inspect Cell

BLOB cells are common in SQLite databases — and they often contain the most interesting data. Right-clicking any cell and selecting "Inspect cell…" opens a dedicated window that tries to auto-detect the content format. If it recognises something — a binary plist, xml, base64 (more format to come) — it opens the data directly in the appropriate viewer. If auto-detection does not find a match, I can manually select from the available formats or fall back to the hex view.

In practice I use this constantly. A BLOB that looks like noise in the table view often turns out to be an bplist or base64  — and Inspect Cell gets me there in two clicks without touching the filesystem.

Inspect cell in crush SQLite viewer

There are currently not so many formats supported it the Inspect Cell command - but I will get there soon with more formats.

The SQL Bar

The SQL bar lets you run custom queries directly against the open database — useful when you want to filter, join, or aggregate data beyond what the table view shows. A few things worth knowing:

  • PRAGMA statements are accepted alongside SELECT and WITH
  • Selected text only — if you highlight a fragment of a query and press F5, only the selection is executed. Useful for debugging a complex query step by step
  • Syntax highlighting — keywords, strings, numbers, and comments are highlighted
  • Status feedback appears below the input field — red on error, normal on success
  • The pane between the SQL bar and results table is resizable


SQL bar in crush SQLite viewer


WAL Forensics — Going Deeper

This is where things get forensically interesting. SQLite has two journal modes: rollback (the default) and WAL (Write-Ahead Log). In WAL mode, changes are not written directly to the database file — they are first written to a separate -wal file and later checkpointed (merged) back into the main database. Most modern iOS and Android apps use WAL mode.

The forensic value of the WAL comes from what survives in it. For a detailed explanation of the WAL format and its forensic implications, I highly recommend Sanderson Forensics' write-up on WAL forensics — it is the reference text on this topic. The book from him and some other great persons in our field is also a standard reference on SQLite forensic including wal-files - definitely worth a read in my opinion and still valid info after all the years.

Sanderson, Paul. SQLite Forensics. Independently Published, 2018. ISBN: 978-1980293071.

In crush, when a -wal companion is present alongside a database, a new WAL Frames entry appears in the combo box. It shows a full frame inventory with every frame classified:

  • Active — the current version of a page, reflected in the live database view
  • Superseded — an older version of a page that has since been overwritten by a newer frame. The older version of the data may still be readable here
  • Uncommitted — frames that were written but never committed — the transaction was rolled back or the app crashed
  • WAL slack — frames from a previous WAL cycle (salt mismatch). After a checkpoint, new WAL writes start from the beginning of the file, overwriting old frames — but until that happens, old frames from the previous cycle remain

Superseded and uncommitted frames are colour-coded amber and blue so you immediately see whether overwritten or in-flight data exists. The Table column shows which schema object owns each page.


SQLite WAL frame inventory in crush


Enabling the Show WAL history checkbox in the table toolbar appends rows decoded from Superseded, Uncommitted, and WAL-slack frames directly below the live data — colour-coded, with a row count showing how many additional records were recovered. Double-clicking any WAL frame row opens the raw page bytes in the hex viewer.

The DB Info view also shows a WAL summary when a companion is present: total frames, active / superseded / uncommitted / WAL-slack counts — a quick health check before diving into the frame inventory.

Paste & Decode

One more addition worth mentioning in the context of SQLite work: Tools → Paste & Decode… lets you paste raw hex, base64, or plain text and open it in any supported viewer — including the SQLite viewer — without saving anything to disk. If you have been copying BLOB values out of a cell and creating temporary files just to inspect them, this saves a few steps.

A Note on WAL and Evidence Handling

To close with something practical: if you open a WAL-mode database with a standard SQLite tool in read-write mode, the automatic checkpoint will commit and clear the WAL before you see anything. This is silent and irreversible on your working copy.

crush prevents this by opening all connections with the mode=ro URI flag. If you are working with SQLite databases outside of crush, always use PRAGMA query_only = ON or open with a read-only URI to protect the WAL. The forensic test suite in v0.6.0 includes a dedicated test that verifies this behaviour on every build.

Happy examining. 🐢

Saturday, April 25, 2026

Introducing crush: A DFIR Workbench for Surfing Through Data Formats

Moin! 👋

Today I want to share something a little different from the usual artifact analysis posts — I am releasing crush-forensics v0.5.0, a digital forensic analysis workbench I have been building for a while.

You can find it on GitHub: github.com/kalink0/crush-forensics

The Problem

When I work with acquisitions — especially mobile ones — I often find myself wanting to quickly look at a specific file without firing up a full forensic platform. Maybe I want to check a PLIST, peek into a SQLite database, or just confirm what a binary blob actually contains. Opening the whole acquisition in a heavy tool just to answer a quick question felt like overkill.

At the same time, reaching for a hex editor or writing a one-off script every time also gets old fast. I wanted something in the middle: a lightweight, dedicated workbench that knows about the file formats we actually work with in DFIR — and that lets me navigate directly inside ZIP and TAR archives without extracting anything to disk first.

That is what crush is.

Why "crush"?

I am a big Finding Nemo fan. Crush is the laid-back sea turtle who surfs the East Australian Current — and that is exactly the vibe I wanted for this tool: just riding through data formats, going with the flow. 🐢

The idea of surfing through file formats felt like the right metaphor — you open an archive, navigate the structure, and glide from a PLIST to a SQLite DB to a hex view without fighting the current. Dude.

What Can It Do?

crush is a Python-based GUI application (built with PySide6). It supports:

  • Opening and navigating inside ZIP and TAR archives directly — no extraction to disk needed
  • Opening single files and folders
  • Export files/folders and open them directly in external software
  • Hex Viewer
  • SQLite Viewer
  • Text Viewer with syntax highlighting and encoding detection
  • JSON Viewer (collapsible tree)
  • XML Viewer (collapsible tree)
  • PLIST and BPLIST Viewer
  • SEGB v1 and v2 Viewer
  • ABX Viewer (Android Binary XML)
  • LevelDB Viewer (Chrome LevelDB / Android app databases)
  • Image Viewer
  • Media Viewer (audio and video)
  • Multi-Log Studio — multi-source log analysis with format auto-detection, including Apple Unified Log / .tracev3 / .logarchive, syslog, and more (note: Unified Log support is currently alpha — decoding can be slow, this is actively being worked on)
  • Protobuf Viewer — schema-less, with optional schema decoding
  • PDF text extraction
  • Realm Database Viewer — header, schema/class extraction, top-ref comparison, table/column data decoding

Android ABX viewer in crush (Linux)



iOS SQLite viewer in crush (Windows)


Built-in Data Format Database

One feature I am particularly happy with is the built-in data format database. Crush identifies forensically relevant formats by magic bytes and extension, and surfaces the information directly in the UI for every selected file — including formats that do not have a dedicated viewer yet.

For each format it shows:

  • Full name and abbreviation
  • Category (database, configuration, log, ...)
  • Forensic relevance — what an investigator is likely to find here
  • Relevant platforms (iOS, Android, macOS, Windows, ...)
  • Magic bytes with offset and description
  • Links to format specs and relevant forensic research

Format Reference — the built-in data format database in crush


So for example, when you open an ABX file, crush identifies it via its magic bytes, opens it in the ABX viewer, and right there in the UI you can see that this is an Android Binary XML file used for system and app settings — plus links to the AOSP source and relevant research from CCL Solutions. No more alt-tabbing to a browser to remember what a format is.


Integrity Mode

Crush also has an optional integrity mode for auditability. When enabled:

  • Records SHA-256 hashes when files are opened or exported
  • Hashes ZIP/TAR/file sources on open (folders are not hashed)
  • Writes hashes to the log
  • Creates a crush-export-hashes.txt file next to exported data

You can toggle it via the status badge in the bottom right of the UI. It can also be turned off for faster opening of large ZIP/TAR sources. A small but useful addition for anyone who needs to demonstrate that what they examined is what they got.


Integrity Mode dialog in crush

Installation

v0.5.0 ships with pre-built binaries for Windows, Linux, and macOS — available directly from the GitHub releases page. No Python environment needed, no dependency wrangling — just download and run.

One honest caveat: the macOS binary is currently untested as I do not have a macOS system available. If you are on macOS and give it a try, please let me know how it goes via GitHub issues or direct message — that feedback would be really valuable.

If you prefer to run from source (e.g. for development), the README has full instructions, including the platform-specific system dependencies and how to download the Unified Log parser binaries.

Credits

Crush builds on some excellent work from the DFIR community. Bundled third-party modules include:

  • ccl_bplist by CCL Solutions Group — binary plist parsing
  • ccl_segb by CCL Solutions Group — SEGB parsing
  • ccl_leveldb by CCL Solutions Group — LevelDB / Chrome LevelDB parsing
  • macos-UnifiedLogs by Mandiant — Apple Unified Log parsing

Thank you to everyone in the community whose research and open source work made this possible.

Nightly Builds

If you want to live on the edge — every night a fresh build is automatically created from the latest main branch, including the most recent features and fixes. Changes are tracked in the changelog.md in the repo. Not recommended for production casework, but great if you want to try out what is coming next or help with testing.

What's Next?

There is still a lot I want to add — more decoders, better search, maybe some lightweight timeline capabilities. I will keep building as I run into things I wish it could do during actual casework.

If you try it out, I would love to hear what features would be most useful to you. Open an issue on GitHub, or reach out to me on Mastodon, Bluesky, or LinkedIn.

Until next time — happy forensicating! 🔍

GitHub: github.com/kalink0/crush-forensics

Saturday, March 14, 2026

xLEAPP - Helper scripts for pulling/cloning and creation of Windows exe

Hey there,

I was asked if there is an easy way to have the latest xLEAPP (ALEAPP and iLEAPP) version on Windows with all the most recent parsers from the main repositories.

I created batch scripts for this - they can be found in my github repo "useful_scripts" at https://github.com/kalink0/useful_scripts/tree/master/windows/leapp-build

The start_update*.bat files pull from the most current main branch from the original repositories (https://github.com/abrignoni/ALEAPP.git and https://github.com/abrignoni/iLEAPP.git), create a venv and than install all necessary requirements. 


After this, the user can execute ALEAPP/iLEAPP with executing the batch file. At every execution the batch file asks if it should get the newest files from the repos if not it just executes xLEAPP the way the user wants it to.

Additionally I created create_exe*.bat files. These files create executables via pyinstaller based on the existing clone of the repo on the machine for Windows. These executables can than be used on other machines without the necessity of installed python or git or an active Internet connection.

There are several versions of each batch file, one for ALEAPP, one for iLEAPP.

Python and git must be installed on the machines that execute the script.

Perhaps these scripts are also useful for other person - so feel free to use.


Bubbly now with GUI - New Release 1.2.4

 Hey there, I did a bit of work on my small application Bubbly.

What's new?

First and most important:

I implemented a GUI for Bubbly itself AND a GUI for the csv to bubbly json converter - now you are able to open a csv and map the columns easily and also convert the timestamp into the necessary format - So it is much easier to get the necessary input file to use in Bubbly.










Additionally I added a parser for the Android Romeo Dating App - SQlite (Already in Version 1.1.0) - You can directly input the sqlite Table of this Android App and so create a bubble view of all the chats.

New version (executable for Windows and Linux) can be found at https://github.com/kalink0/bubbly/releases/tag/v1.2.4

Next step is to allow other file formats for the Bubbly Converter to make it more versatile.

Saturday, February 21, 2026

Samsung Trash Provider App - Traces of deleted files never hurt

This post shows my result of my research on the Samsung Trash Provider App on Samsung Android devices.

Android Version Tested:
Android 16

Test device:
Samsung Galaxy S23

Database location:
[...]/com.samsung.android.providers.trash/databases/trash.db


What is this app about?

The Samsung Trash Provider App is not an app a user directly interact with. The app is used by other apps (e.g. MyFiles and Gallery App) to offer trash bin functionality. In earlier OneUI/Android versions this worked differently. There every single app had its own trash functionality implementation.
This means it is still possible, that trash data is also stored in the individual app.

What's in the databases?


In the database "trash.db" is one interesting table, it is called "trashes". With the following interesting columns.

  • _data - Current file path
  • original_path - The original File path
  • _id - Some file id
  • media_id - an additional id, empty for my data
  • sec_media_id - id in the corresponding Samsung app, was filled for Gallery app and matched the idea there in my data
  • is_cloud - for my data always 1 - but I did not have any cloud sync active - so meaning not known atm
  • volume_name - The name of the volume the data is stored on
  • title - In my data always the file name with file extensions
  • _display_name - What is shown in the apps - identical to file name in my data
  • _size - Size of file in bytes
  • mime_type - Recognized MIME type of the file
  • media_type - 1 = Image, 2 = Audio, 3 = Video, Pretty sure there are more types but I only have these in my data atm
  • datetaken - if media was taken on the phone this value is set - didn't found another case were it was filled
  • date_expires - the date/time this file will be deleted completely from trash
  • date_deleted - the date/time the file was moved to trash
  • user_id - which user account deleted it (0 = main user, 150 = secure folder context)
  • extra - additional info from the source app, JSON based - can hold exif data (location, timestamp taken etc.)


Example SELECT query:

SELECT
    _id [File ID],
    _data [Trash File Path],
    original_path [Original File Path],
    title [File Titel],
    _display_name [File Name],
    _size [File Size],
    mime_type [MIME Type],
    delete_package_name [App Context],
    is_cloud [Cloud?],
    user_id [User ID],
    strftime('%Y-%m-%d %H:%M:%S.', "date_deleted"/1000, 'unixepoch') || ("date_deleted"%1000) [Deletion Timestamp],
    strftime('%Y-%m-%d %H:%M:%S.', "date_expires"/1000, 'unixepoch') || ("date_expires"%1000) [Expiration Timestamp],
    extra [Extra Info JSON]
FROM trashes
        

Location of the deleted files

The files are stored in the media part of the Android system under Android/.Trash
Full path can be e.g. /data/media/0/Android/.Trash/[...]

The pattern in this is as follows:

[appcontext]/[file_uid]/[old_file_path]/.!%#@$/[old_file_name]

Example:

/storage/emulated/0/Android/.Trash/com.sec.android.app.myfiles/f00f4be8-7dad-4996-aedb-6edc4b9f42d3/1763651528662/storage/emulated/0/Music/Telegram/.!%#@$/deleted_file.ogg


With this info, even if we do not have any database access, we can tell a bit about the file. From the example above:

  • App Context - The files was deleted via the App Samsung My Files
  • Old File Path - The original File path was /storage/emulated/0/Music/Telegram/
  • Old File Name - The original File Name was deleted_file.ogg
This fits what I've done on my test phone.


Additionally Good to know

Based on the values for date_expires and date_deleted one can say that per default the files will stay 31 days in the trash and than will be deleted automatically.

But, in my test data - the entry in the database was kept in it - so file was gone, trace of it was still there.

I recovered one file (undeleted via GUI), the entry is immediately deleted from the database and the file is moved back to its original location.


Conclusion

1. The Samsung Trash Provider App stores info on deleted files. Also from other Apps like Samsung Gallery and MyFiles.
2. The files are kept for 31 days in the Trash
3. Even if we only have the files in the default trash folder location (e.g. /media/data/0/Android/.Trash/) we can say a few things about the files. E.g. in which app context it was deleted, what was the original path and the original file name


I've created an ALEAPP parser for this. It is part of an already open PR (#665) waiting to get pulled into the ALEAPP repo.







Sunday, February 15, 2026

BUBBLES!!! BUBBLES! [gibbering] MY BUBBLES! - Chats (export/backups) need bubbles

In the last few days I took some time to develop a few things. And it has to do with bubbles ;-)

I developed "Bubbly" - a small tool that parses different Messenger Export and Backup formats and visualize the data in HTML Bubble View. The data is filterable, searchable, taggable and media files are supported.

The tool can be found at https://github.com/kalink0/bubbly. It is Python based CLI-application and I have built versions for Linux and Windows on Github.

How do the bubbles look?

The Overview HTML

When running the tool an overview page is created that shows the created thread views. One can also decide to put all chat threads in one view. Then only one would be shown on the overview page.



The Chat Thread HTMLs

The created thread view that shows the data with the bubbles is divided into three parts.

1. The header with general info:

 This part can be collapsed. And one can load or export created tags and go back to the overview page

2. The control panel:


Which contains the filter and search functionality.

3. MY BUBBLES!


Tagging is possible via the button in the bubbles (only three predefined tags at the moment).


Current implemented parsers

Currently I have the following parsers implemented:

1. WhatsApp Chat Export

This parser parses the Chat Export from the iOS and Android WhatsApp app.

Because the Output is different based on the system language -> currently fully supported are German and English. But I work on a more robust implementation.

2. Wire Messenger Backup

Parses Wire backup files - only unencrypted backups are supported atm.

3. Telegram Desktop Chat Export

Parses Chat Exports frorm the Telegram Desktop App. 

4. Threema Backup Parser

Parses Backup of the Messenger Threema. But: the data needs to be decrypted before parsing - no decryption feature.

5. Generic JSON Parser

I've built a generic JSON parser with a simple JSON design. Also a mapping script is in my repo. The main idea: just create the simple json for a messenger, Bubbly is doing the rest to get the bubbles.

Additional functionality

1. Branding - it is possible to give a logo that is shown in the top left corner
2. General case information - Case number, name of creator are necessary to create an HTML
3. Logging - the read files and written files are documented including the setting used. (Not the media files)
4. Config File usage
5. CLI interactive mode


Why did I develop it?

Main reasons:

- Easy to use and to hand over to other persons - especially the Bubble View and the media file support in a directly usable HTML.

- Fast to implement new parsers for not so well known formats/messengers (Exports and Backups)

- Basic filtering, searching and tagging functionality

Well, I needed a fast way to visualize the basic information from Exports and Backups of different messengers. That is what I focused on. And the bubble view is the one that everyone - also non technical persons, know and understand, so I wanted this view.


The limitations - or - what is Bubbly NOT

Bubbly is not a "Forensic Chat Analysis Tool". It focus on the data from Exports and Backups and gives additional possibility to create bubble view of other messengers with mostly less functionality. I don't plan to support any parsing of the well known messengers like Signal, WhatsApp, Telegram on Android and iOS with full functionality support for all they have (e.g. statuses, answering messages, forwarding messages, reacting etc.) with Bubbly. There are other tools out there to do this. 

I want to keep it as simple as possible.

Also there won't be any export functionality - I thought about it first - but the simple used platform and the use of web-browsers limit the functionality. Tagging and export/import of them is possible - so one can share them at least.

I tested Bubbly with data sets with around 20000 messages per thread (incl. media files) - without any problem - but there will be a limit for sure ;-)


Saturday, January 10, 2026

Inside Android Samsung DHMS: Extracting Device Health, Thermal, and App-Control Data from Android - Part 1

As mentioned in my last post, I am currently researching Samsung system apps on Android.

In this post, I share some of my findings on the Samsung Device Health Management Service (SDHMS).

As the name suggests, SDHMS is responsible for monitoring the overall health of the device — including power consumption, temperature, and system load. It detects abnormal behavior (“anomalies”) in running apps, packages, and processes.

To do this, SDHMS records system health data. Some of this data may be valuable for forensic analysis.

App path: /data/data/com.sec.android.sdhms


Anomalies

First database I examined was anomaly.db

This database stores information about detected anomalies. On my test device, no anomalies were triggered, but one table stood out as particularly interesting to me:

Table "config_history": 

This table has four columns:
  • time - Timestamp of the entry, stored as Unix epoch time in milliseconds
  • config_key - The key of the configuration that was loaded
  • config_version - The version of the loaded configuration
  • reason - The reason why the entry was generated
In my test data, most entries in the reason column are BOOT_COMPLETED. I checked the last five entries, and they indeed correspond to my device reboots.

My results span approximately 1.5 years back, covering the full time span the device was in use. Additionally, I found entries with SYSPACKAGE_REMOVED, which likely indicate that a system package was removed from monitoring. This is less relevant for my current analysis.

From this, I was able to reconstruct a history of device (re)boots for my test device.

Table "anomaly_history":

On my test device, this table does not contain any data.

Based on the available columns, it appears that this table would record information about apps or packages that triggered an anomaly, including the type of anomaly and the timestamp of the event.

Generating meaningful test data for this table is challenging. Nevertheless, it seems designed to provide historical insight into anomalous app behavior when anomalies are detected.

Table "current_config":

I am confident that this table contains the current configuration or a reference to the config file.

  • time - Timestamp of the entry in Unix epoch milliseconds
  • config_key - Key of the configuration loaded
  • config_version - Version of the loaded configuration
  • config_data - The configuration data itself

The config_key and config_version correspond exactly to those referenced in the config_history table.

The config_data column holds a large amount of characters — for my device, 7,413 characters — which appears to be Base64-encoded.

Using CyberChef, it was possible to decode it:

1. From Base64

2. Protobuf Decode

3. JSON -> results in readable, structured data

The decoded content contains a lot of information, such as descriptions of anomaly types configured on the device. While it may not be immediately relevant to my current analysis, knowing how to read and decode this data will perhaps be useful in other cases.

Thermal Log

The next database I examined was thermal_log, which — as the name suggests — is responsible for thermal and system performance logging.

Table "NETSTAT": 

As the name implies, this table stores network usage statistics.

For my test device, data spans approximately 5 days.

  • start_time - The start timestamp of measurement time window, Unix epoch time ms
  • end_time - The end timestamp of measurement time window, Unix epoch time ms
  • package_name - Package Name - related to the package that was measured
  • uid - The App ID on the device
  • net_usage - Bytes transferred in the measurement time window
This table allows you to track network activity per app over time.

Table "CPUSTAT": 

The CPUSTAT table seems to store data on CPU Usage statistics.

Again, for my test device, the data covers roughly 5 days.

  • start_time - The start timestamp of measurement time window, Unix epoch time ms
  • end_time - The end timestamp of measurement time window, Unix epoch time ms
  • uptime - Uptime in seconds
  • cputime - Total CPU time used by process since last boot
  • process_name - Name of the process measured
  • uid - Package ID related to the process
  • pid - Process ID
  • process_usage - CPU time used in measurement time windows

Note: CPU time values are scaled and depend on the number of cores. In general, the higher the process_usage value, the higher the CPU utilization and load generated by the process.

Table "TEMPERATURE":

This table contains temperature sensor data for the device.

For my test device, data covers roughly 5 days.

  • timestamp - The start timestamp of measurement time window, Unix epoch time ms
  • skin_temp - Chassis Temperature
  • ap_temp - Processor Temperature
  • bat_temp - Battery Temperature
  • usb_temp - USB Temperature
  • chg_temp - Charging IC Temperature
  • pa_temp - Pulse Amplifier Temperature (Cell Radio)
  • wifi_temp - WiFi Temperature

Notes on values:

  • Temperatures are stored in degrees Celsius × 10, so divide by 10 to get °C.
  • On devices with different regional settings, values may instead be stored in degrees Fahrenheit.
  • Not all devices provide all sensor data, and some columns may aggregate readings from multiple sensors depending on the Hardware.
In general, this table allows you to get an overview of the device’s thermal state.

Conclusion

In the anomaly.db database, we can extract information such as when the device was rebooted. On my test device this data goes back for 1.5 years - the complete usage time frame of the device.

The thermal_log database provides detailed insights into system activity, including:

  • CPU load generated by apps and processes
  • Network usage per app
  • Device temperature across multiple sensors

On my test device, this data spans roughly 5 days.

What's next?

There is still additional data stored by the app that I haven’t covered in this post. For example:

  • Databases containing battery usage per app for the last 7 days — these require some mapping of numeric package IDs to package names.
  • Log files that record e.g. which app was in use and which SIM card (slot) was active.

I will cover these findings and share further insights in a follow-up post on SDHMS.

I created parsers for ALEAPP for the results shared here - PR is open.





Sunday, January 4, 2026

Android Samsung My Files App: Exploring the File Operations History

I’ve been exploring Samsung apps on Android, and I’d like to share some of my findings. Let’s start with the File Operations History maintained by the Samsung My Files app.

The My Files app keeps a database that logs actions performed on files and folders throughout the system. This database provides a detailed view of user and/or system activity, capturing events such as file creation, movement, copying, and deletion.

Android Version Tested:
Android 16

Test device:
Samsung Galaxy S23

App Version tested:
15.4.01.16

Database Location:
[...]/com.sec.android.app.myfiles/databases/OperationHistory.db


The database can exist both system-wide and per user, for example:
  • /data/data/[...] — system-wide

  • /data/user/0/[...] — default user

  • /data/user/150/[...] — secure folder

By analyzing this database, you can reconstruct file operations done on the system.

The database structure

The most interesting tables in the database are:

1. operation_history:
This table stores summary information about each file operation, including the date and time, type of operation, status, and the number of files or folders involved.

2. operation_history_data
This table contains the detailed information for each operation. It includes one entry per file or folder, specifying the file or folder name, as well as the source and destination paths for the operation.


Table operation_history

  • _id - The primary key, iterating number to identify the operation
  • mDate - Time and date in UTC when the operation was done. Not totally clear if it is the start or the end timestamp
  • mOperationType - Type of Operation - human readable, e.g. MOVE, MOVE_TO_TRASH, RENAME, COPY, DELETE
  • mItemCount - Number of Items the operation is done on. This is the aggregated number of folder and files
  • mFolderCount - Number of folders the operation is done on
  • mPageType - Location in the App from where the operation was triggered.
  • mOperationResult - Result of the operation, not always filled with any data.

Table operation_history_data

  • _id - The primary key, iterating number to identify the file/folder used in operation
  • operation_id - The foreign key, with this we know which operation from the operation_history is related to this line
  • src_path - Source path of the file or folder. Has different meanings for operation. E.g. CREATE_FOLDER -> it is the folder where the new folder is created in
  • src_file_id - Always identical with src_path in my test data
  • dst_path - Destination path of the file or folder. E.g. the file that is created. Can be empty for operations - e.g. for DELETE or EMPTY_TRASH
  • dst_file_id - Always identical with the dst_path in my test data
  • file_type - Decimal number - meaning currently not totally clear, seems to be based on some MIME type or extensions type things. Current test data shows the following mapping:
                10 -  Image (JPEG)
                11 - Image (JPG)
                14 - Image (PNG)
                19 - Image (WEBP)
                53 - Image (HEIC)
                100 - Audio (MP3)  
                102 - Audio (M4A)
                107 - Audio (OPUS)             
                200 - Video (MP4)
                311 - Document (CSV)
                312 - Document (PDF)
                315 - Document (DOCX)
                330 - Document (TXT)
                400 - Archive (APK)
                410 - Archive (ZIP)
                411 - Archive (RAR)
                506 - (VCF)
                526 - (TORRENT)
                12289 - Folder
        
            This leads me to that it is something like:
                10 -99 = Images
                1xx = Audio
                2xx = Video
                3xx = Documents
                4xx = Archives
                5xx = Others
                12289 = Folder

Getting and interpreting the data

1. Get an Operations Overview

SELECT
mDate [Operation Date],
_id [Operation ID],
mOperationType [Operation Type],
mItemCount [# of Files],
mFolderCount [# of Folder],
mPageType [Page Type],
mOperationResult [Result]
FROM operation_history;

2. Get list of all operations and the corresponding files / folders.

SELECT
oh.mdate [Operation Date],
ohd.operation_id [Operation ID],
ohd.src_path [Source Path],
ohd.src_file_id [Source File ID],
ohd.dst_path [Destination Path],
ohd.dst_file_id [Destination File ID],
oh.mOperationType [Operation Type],
oh.mItemCount [# of Files],
oh.mFolderCount [# of Folders],
oh.mPageType [Page Type],
oh.mOperationResult [Result],
oh.mMemoryFullCapacity [Storage Capacity Info]
FROM operation_history_data ohd
JOIN operation_history oh ON oh._id = ohd.operation_id

3. Are there any parsers out there?

I also looked into existing tools for parsing this data. Currently, ALEAPP includes a parser for the File Operations History, but it only supports Android versions below 13 and analyzes only the operation_history table.

After reviewing the code, I concluded that it would be possible to adapt the parser to support newer Android versions as well. I plan to do this in the future, but for now, I want to focus on my analysis of Samsung apps without interrupting the workflow by refactoring an existing parser.

Meaning of the data / Conclusion

The entries in the File Operations History reflect the file operations performed by the Samsung My Files app. They provide a useful overview of what actions were taken on files and folders within the system.

The data is retained for a long period—in my test dataset, I was able to trace operations going back about one and a half years. I did not observe any triggers that would automatically delete entries from the database.


What the data don't show:

It is important to note that the database does not log individual files inside a folder when an operation is performed on the folder itself.

Example:

If I decide to move a folder to a new location, I select the folder in the UI and click “Move.” The File Operations History will store an entry for this folder and the move operation—but it will not create entries for each file inside the folder.

However, it may be possible to reconstruct some file-level operations by looking back at earlier entries. For example, if a file was created in that folder 30 minutes before the move and was not subsequently deleted based on entries in the database, you could infer that it was included in the move operation. But I would like to always double check this with additional data if possible in any way.

Additionally, the database only logs operations performed by Samsung apps, specifically the My Files app or calls to this app. Any operations done through third-party file managers will not appear in the File Operations History.

Future work:

I have stored my SQL queries for extracting data from the database in my GitHub repository for later use, such as building a parser:
https://github.com/kalink0/useful_scripts/blob/master/sql/Android/Samsung%20Apps/sMyFiles.sql

I also plan to create additional posts on Samsung apps, sharing my findings step by step.

Sunday, December 7, 2025

Samsung Core Services - Module "ai search" and what data it holds

Today I will share my result about an analysis I did on a Samsung device and especially the Samsung Core Services.

What does ChatGPT says about SCS?:

Samsung Core Services (SCS) is a system service on Samsung devices that provides essential background functionality for Samsung apps and One UI features. It supports services like search indexing, metadata and image processing, entity/phrase extraction, and suggestions, enabling apps and system features to work seamlessly.

Sounds interesting - so I took a look into the data.

Testing was done on a Samsung device with Android 16 installed.

The data of Samsung Core Services can be found at the (user) data folder under ../com.samsung.android.scs/

You can find the normally known app structure in the folder with e.g. subfolders "databases" and "files".

I took a look into the databases, but nothing interesting came up - doesn't look like any user generated or user dependent stuff is in it.

So next step I looked into the files-folder.

There are two subfolders.

1. zip -> empty - so nothing to look at

2. aisearch -> okay - ai + search - what could this be? and files/folder are in there.


aisearch

The following folders can be found in the folder aisearch:


bnlp -> could mean "Natural Language Processing" - B could stand for "Bayesian". My master studies are a bit far away (yeah - i am getting older :-D) - but in this context  it is possible.

There are two files in it. One is noun_list.txt which contains a list of words that can be found on an android device (see next pic).


I see whatsapp, telegram, instagram, chatgpt - yepp, these are some of the installed apps on the device. But, to be honest, not very useful i think at least at the moment.

I also took a look into the client, indexes and log folders. I see structures and some log files - what modules are running and when - also how many files were processed. This looks like data for text processing.

I found traces that Apache Lucene is used under the hood - now I am certain, that what I see has something to do with text indexing of files e.g. to give a user the possibility to search on the devices and also in the files on the device.

Okay, one folder left - raw_txt - what is in it? Some sample files perhaps?

Surprise - there are txt-files - the file names seem to lead to existing or formerly existing pdf-files on the device.


Now - that is interesting.

One example file name: 0_732749_CV Marco.pdf.txt

Pattern of the file names:

[uid]_[internal_file_id]_[file_name].pdf.txt

Where:

uid = the id of the user, 0 for main user, or e.g. 150 for secure folder - the origignal file is related to.

internal_file_id = I don't saw any trace where this id leads to - it seems like some internal id for the service - I double checked in the databases

file_name = the name of the file the moment it was indexed


But does the content of the files come from real files on the device? Let's see:



-> Yes - in this example  it contains the text of the pdf file. In the picture above it shows the start of the AGB (General Terms and Conditions - The fine print nobody reads but everyone agrees to—like the snooze button of contracts) of a German Provider - identical to the content of the original pdf.

I can also see traces of pdf files that are not on the device anymore. 

So it is possible to find data of deleted at least pdf files in this folder?! - nice.

The timestamp of the files seem to relate to there index time - it is close to the creation time (I know this at least for my cv) - but not exactly identical.

I don't have any information on when exactly the Samsung Core Service and its module ai search indexes a file but I know that the file must have been existed on the device the moment it was indexed. And the text content must have been in it - as based on the data I see on my test device.

Conclusion

The most interesting part for me are the files in the subfolder raw_txt. These files contain the textual content of pdf files on the test device - even for deleted ones.

The deletion of some files was about 2 weeks ago - but I didn't test for how long the data is stored in the raw_txt folder.

There is only the file name of the corresponding original file - at least I didn't find any info on the original file path or e.g. a hash values of the original file.

There is also only the creation date of the index file in the raw_txt folder - from that I could say that the original file must have existed on the device at the same time the index file was created  or at least at the moment the indexing was executed what should be close to the creation time of the index file - but not more.

There are some metadata available in the app context of the Samsung Core Services - I didn't go very deep into this stuff.

This was a really quick dive into the data - I did not any reversing on the exact functionality of the Samsung Core Services and ai search or perhaps any other modules in it. Also I did not take a deep look into the config file in the shared_prefs folder - it could be possible that service/module settings are stored there so we can see what data we could expect. As always - I will put this on my way too long to-do list. ;-) 

Thx for your time and have a nice day.