ITM Web Conf.
Volume 40, 2021International Conference on Automation, Computing and Communication 2021 (ICACC-2021)
|Number of page(s)||6|
|Published online||09 August 2021|
File Fragment Classification using Content Based Analysis
Ramrao Adik Institute Of Technology, 400706 Nerul, Navi Mumbai India
One of the major components in Digital Forensics is the extraction of files from a criminal’s hard drives. To achieve this, several techniques are used. One of these techniques is using file carvers. File carvers are used when the system metadata or the file table is damaged but the contents of the hard drive are still intact. File carvers work on the raw fragments in the hard disk and reconstruct files by classifying the fragments and then reassembling them to form the complete file. Hence the classification of file fragments has been an important problem in the field of digital forensics. The work on this problem has mainly relied on finding the specific byte sequences in the file header and footer. However, classification based on header and footer is not reliable as they may be modified or missing. In this project, the goal is to present a machine learningbased approach for content-based analysis to recognize the file types of file fragments. It does so by training a Feed-Forward Neural Network with a 2-byte sequence histogram feature vector which is calculated for each file. These files are obtained from a publicly available file corpus named Govdocs1. The results show that content-based analysis is more reliable than relying on the header and footer data of files.
© The Authors, published by EDP Sciences, 2021
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.