Sunday, December 7, 2025

Samsung Core Services - Module "ai search" and its value for digital forensics analysis

Today I will share my result about an analysis I did on a Samsung device and especially the Samsung Core Services.

What does ChatGPT says about SCS?:

Samsung Core Services (SCS) is a system service on Samsung devices that provides essential background functionality for Samsung apps and One UI features. It supports services like search indexing, metadata and image processing, entity/phrase extraction, and suggestions, enabling apps and system features to work seamlessly.

Sounds interesting - so I took a look into the data.

Testing was done on a Samsung device with Android 16 installed.

The data of Samsung Core Services can be found at the (user) data folder under ../com.samsung.android.scs/

You can find the normally known app structure in the folder with e.g. subfolders "databases" and "files".

I took a look into the databases, but nothing interesting came up - doesn't look like any user generated or user dependent stuff is in it.

So next step I looked into the files-folder.

There are two subfolders.

1. zip -> empty - so nothing to look at

2. aisearch -> okay - ai + search - what could this be? and files/folder are in there.


aisearch

The following folders can be found in the folder aisearch:


bnlp -> could mean "Natural Language Processing" - B could stand for "Bayesian". My master studies are a bit far away (yeah - i am getting older :-D) - but in this context  it is possible.

There are two files in it. One is noun_list.txt which contains a list of words that can be found on an android device (see next pic).


I see whatsapp, telegram, instagram, chatgpt - yepp, these are some of the installed apps on the device. But, to be honest, not very useful i think at least at the moment.

I also took a look into the client, indexes and log folders. I see structures and some log files - what modules are running and when - also how many files were processed. This looks like data for text processing.

I found traces that Apache Lucene is used under the hood - now I am certain, that what I see has something to do with text indexing of files e.g. to give a user the possibility to search on the devices and also in the files on the device.

Okay, one folder left - raw_txt - what is in it? Some sample files perhaps?

Surprise - there are txt-files - the file names seem to lead to existing or formerly existing pdf-files on the device.


Now - that is interesting.

One example file name: 0_732749_CV Marco.pdf.txt

Pattern of the file names:

[uid]_[internal_file_id]_[file_name].pdf.txt

Where:

uid = the id of the user, 0 for main user, or e.g. 150 for secure folder - the origignal file is related to.

internal_file_id = I don't saw any trace where this id leads to - it seems like some internal id for the service - I double checked in the databases

file_name = the name of the file the moment it was indexed


But does the content of the files come from real files on the device? Let's see:



-> Yes - in this example  it contains the text of the pdf file. In the picture above it shows the start of the AGB (General Terms and Conditions - The fine print nobody reads but everyone agrees to—like the snooze button of contracts) of a German Provider - identical to the content of the original pdf.

I can also see traces of pdf files that are not on the device anymore. 

So it is possible to find data of deleted at least pdf files in this folder?! - nice.

The timestamp of the files seem to relate to there index time - it is close to the creation time (I know this at least for my cv) - but not exactly identical.

I don't have any information on when exactly the Samsung Core Service and its module ai search indexes a file but I know that the file must have been existed on the device the moment it was indexed. And the text content must have been in it - as based on the data I see on my test device.

Conclusion

The most interesting part for me are the files in the subfolder raw_txt. These files contain the textual content of pdf files on the test device - even for deleted ones.

The deletion of some files was about 2 weeks ago - but I didn't test for how long the data is stored in the raw_txt folder.

There is only the file name of the corresponding original file - at least I didn't find any info on the original file path or e.g. a hash values of the original file.

There is also only the creation date of the index file in the raw_txt folder - from that I could say that the original file must have existed on the device at the same time the index file was created  or at least at the moment the indexing was executed what should be close to the creation time of the index file - but not more.

There are some metadata available in the app context of the Samsung Core Services - I didn't go very deep into this stuff.

This was a really quick dive into the data - I did not any reversing on the exact functionality of the Samsung Core Services and ai search or perhaps any other modules in it. Also I did not take a deep look into the config file in the shared_prefs folder - it could be possible that service/module settings are stored there so we can see what data we could expect. As always - I will put this on my way too long to-do list. ;-) 

Thx for your time and have a nice day.


No comments:

Post a Comment