Data Representation & Files Examination
Data Representation
Computers, on the other hand, use a different system called Binary system which has only two values (0-1). The reason for using this system has nothing to do with the number of fingers
The smallest unit in the binary system is a bit, which could take either the value 0 or 1.
The next unit is Byte (B), which equals 8 bits.
A Kilobyte (KB) is 1024 byte, where a Megabyte (MB) is 1024 Kilobyte
A Gigabyte (GB) consists of 1024 megabyte, and a Terabyte (TB) is 1024 Gigabyte. Nowadays, most personal computers storage disks are between 500 (GB) up to 2 (TB) for gaming and business laptops
It is crucial to be able to differentiate between the terms Kilobyte and Kilobit.
One kilobyte means 1024 bytes which equals (remember each byte equals 8 bits) 1024 * 8 = 8192 bits.
On the other hand, one kilobit equals 1024 bits.
File identification
The first step to identify a specific file type is its extension, like .PDF and .DOC. Identifying the file type by its extension is a common technique in Windows operating system. For example, .PDF files are opened by Adobe reader in windows
Note that if a JPEG file extension changed to .PDF, the file will not be opened on windows The reason is that windows will try to open the file using Adobe reader which cannot read the JPEG file contents. Another thing to note is that for each file type there should be a reader that knows how to read its contents
Not all operating systems rely on file extension to identify the file type. i.e.: Linux rely on file signature to determine its type.
File structure
Regardless of the operating system, every file has a specific structure to arrange its components; those components are file name, size, signature, contents, etc.
The files structure is universal and the same for any operating system
In order to open a specific file, the operating system will use a specific reader; this reader knows where to find the file name or size within the different file components.
For example, PDF files could be viewed by Adobe Reader or Foxit Reader
Metadata
Metadata in general is defined as “Data describing other data.”
The things we wrote on the envelope is the best example for Metadata.
The sender’s and receiver’s address are NOT part of the original letter we’re sending. They are Metadata. Data that is used to describe the original data we’re sending, i.e., the letter.
So, file metadata is the data that describes the file itself and is used by the OS’ applications to make opening, recognizing, and processing that file easier.
Metadata is found in different locations, but as a starting point, the three locations you need to start looking for metadata when analyzing a file are:
• MFT records • File header • Magic number
MFT Attributes
MFT stands for Master File Table and is used by the NTFS file system to store metadata which is necessary to retrieve files from the NTFS partitions. Each file has one or more MFT record.
Note that the data itself is an attribute. MFT records can be used when searching for files within the file system. It is also worth mentioning that those records could be used as an evidence to prove the existence of lost or deleted files.
Directory Snoop is a great tool to perform the required task.
It allows the examination for both NTFS and FAT32 disks on a low level, allowing the investigator to examine the MFT records and other system related files
Another great tool for NTFS attribute examination is DiskExplorer for NTFS from Runtime Software,
File headers
As the name suggests, file header is a unique identification section found at the beginning/head of every file.
The header usually contains data used by the application that opens the file
The header could contain things like name, author, date of creation, size, or data that helps performing error detection and correction before opening the file.
Different files have different headers.
A final note worth mentioning is that most file formats have a header and a trailer.
Headers and trailers can be checked using any hex editor.
Here is an example of analyzing a (.txt) file using hex workshop. There is nothing but the ASCII content.
Magic number
Magic number is another method used by applications (mostly Unix/Linux) to try and ID the file without the need for reading the whole header.
A magic number is a unique string, usually at the beginning of the file, which can be used to identify the type of the file
A list of magic numbers can also be found on most linux systems in: /usr/share/file/magic
Metadata Types
However, metadata files which are relevant to forensic investigations and can be categorized into three main type:
• System metadata, • Substantive metadata, • Embedded metadata and External Metadata.
System metadata files are usually generated by the file system.
Substantive metadata contains information on the modifications over a document.
Embedded metadata is usually embedded by applications that edit or create files within the file itself.
External metadata is normally created separately by file management software to keep track of the managed files
System Metadata
System metadata are created, edited and used by the Operating system of many purposes.
The Operating system file system is one of the main components that heavily relies on metadata to keep track of the files it manages.
Other drivers such as Disk Drives (CD, DVD) and removable devices (Flash disks and external Hard disks) also rely on the use of system metadata.
Storage devices in general (fixed and removable) use system metadata to track the addresses of the contained files and how they are stored
From an investigation perspective, the system metadata can be used, as mentioned before, to track a file that doesn’t exist anymore (removed, deleted, moved).
When conducting an investigation, we are mainly interested in 4 attributes within the metadata. The create, accessed and modified entries are usually referred to as MAC. Additionally, NTFS disks add another entry called entry modified
EM stores the last time of when the MFT entry was modified.
MAC and EM analysis are essential in the process of crime timeline construction and analysis.
It is crucial for the investigator to remember NOT to alter the MAC and EM entries when analyzing.
Create Metadata usually describes when the file was first created.
It is important to note here that the date you find doesn’t always indicate the date which the data was originally created at.
The Create refers to the date of which the file was created NOT when the data was created.
As an example, if there is a file1.txt created for the first time on 2015 and we copied the file into another disk, the new file (file2.txt) will have a create date of 2017.
The Access attribute refers to when the file was last opened, moved or copied.
The modify attribute reports when the content of the file was last changed.
The Entry modified attribute is another attribute found only in NTFS.This tells when the other attributes were last modified.
More advanced Frameworks can save the investigator’s time by gathering the metadata for all the files the falls within the search category, list and analyze them. An example of such tools is Sleuth Kit
It is important to note that a crime’s time line construction, as we saw in a previous chapter, requires metadata, logs, network traffic and application level data analysis. Examining the metadata alone won’t be enough.
DMS
Document management systems tends to create many metadata records and files to help keep track and manage the stored files.
It is always recommended to go through the DMS documentation before gathering data.
However, it is not unlikely to encounter a product specific feature, formant or functionality.
Embedded Metadata
Last updated