Home | Issues | Profile | History | Submission | Review
Vol: 57(71) No: 4 / December 2012 

Towards Efficient Structural Mapping of Files in Data Forensics
Ciprian Pungila
Department of Informatics, West University of Timisoara, Faculty of Mathematics and Informatics, Vasile Pirvan no. 4, Timisoara 300223, Romania, e-mail: cpungila@info.uvt.ro


Keywords: data forensics, structural mapping, aho-corasick, pattern matching, xml, regular expressions

Abstract
Structural mapping in digital forensics helps the data recovery process by constructing potential layouts of files on disks where file-systems are lost or damaged, based on the layout of the file and on the particular features of each file format. The approach is particularly useful on fragmented file-systems, where data may be split over several areas. We are presenting a new and efficient approach for performing structural mapping of files in the data forensics process, by analyzing a database of file types used in the freeware software TrID and proposing an efficient model, based on regular expression matching, for achieving our goal. The proposed method is based on an extended model of the Aho-Corasick automaton that supports different types of constraints and also employs an XML-to-RegEx parsing engine that converts signatures from their native format in the TrID software to the regular-expression format used in our custom-built automaton. Experimental results performed in real-world scenarios with damaged file-systems have shown that our approach can significantly improve accuracy by building a potential map of fragments for the lost files, on top of which different heuristics may be employed for the final reconstruction of the documents, with very little impact on performance compared to the original automaton implementation, making this method a powerful and useful instrument for achieving better data recovery as part of forensic investigations.

References
[1] Data Recovery Pricing, available at: http://iomega.com/data-recovery/ datarecoveryfaq- prices.html
[2] S. Holewinski and G. Andrzejewski, Data Recovery from Solid-State Drives, available at: http://www.gillware.com/docs/SSD whitepaper.pdf, 2009.
[3] S.J.J. Kloet, Measuring and Improving the Quality of File Carving Methods, Master\'s Thesis, 2007
[4] G. Richard III, V. Roussev, Scalpel: A Frugal, High Performance File, Carver Digital Forensics Research Workshop, 2005.
[5] Scalpel, available at: http://www.digitalforensicssolutions.com/Scalpel/
[6] Foremost, available at: http://foremost.sourceforge.net/
[7] A. Pal, H.T. Sencar and N. Memon, Detecting File Fragmentation Using Sequential Hypothesis Testing, Digital Forensic Research Workshop (DFRWS), Fall 2008
[8] N. Memon and A. Pal, Automated reassembly of file fragmented images using greedy algorithms, IEEE Transactions on Image Processing, February 2006, p.385-393
[9] A. Aho and M. Corasick, Efficient string matching: An aid to bibliographic search, CACM, 18, 6, 1975, 333-340
[10] L. Wang and X. Xu, Parallel Software Development with Intel Threading Analysis Tools, Intel Technology Journal, vol. 11, issue 04, November 15, 2007
[11] C. Pungila, A Highly-Efficient Memory Compression Approach for GPU-Based Virus Signature Matching, Proceedings of the Information Security Conference (ISC), Lecture Nodes in Computer Science, 2012
[12] C. Pungila, Hybrid Compression of the Aho-Corasick Automaton for Static Analysis In Intrusion Detection Systems, CISIS 2012
[13] M. Nordin et al., A Filtering Algorithm for Efficient Retrieving of DNA Sequence, International Journal of Computer Theory and Engineering, Vol. 1, No. 2, June 2009
[14] B. Soewito and N. Weng, Methodology for Evaluating DNA Pattern Searching Algorithms on Multiprocessor, Bioinformatics and Bioengineering, BIBE 2007
[15] S. Sahni, Highly compressed Aho-Corasick automata for efficient intrusion detection, IEEE Symposium on Computers and Communications, 2008, p.298-303
[16] C. Pungila, A Bray-Curtis Weighted Automaton for Detecting Malicious Code Through System-Call Analysis, Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), Timisoara, Romania, 2009
[17] C. Pungila, A Model for Energy-Efficient Household Maintenance
Through Behavioral Analysis of Electrical Appliances, The First International Workshop on Engineering Low-Carbon Business (ELCB’10), Shanghai, ICEBE 2010
[18] M. Oono, M. Fuketa, Y. Inada, Y. Murakami and J.-i. Aoe, A compression method using link-trie structure for natural language dictionaries, Computing & Informatics, ICOCI 2006, p.1-4
[19] D.P. Scarpazza, O. Villa and F. Petrini, High-Speed String Searching against Large Dictionaries on the Cell/B.E. Processor, Parallel and Distributed Processing (IPDPS), 2008
[20] G. Richard III, V. Roussev, L. Marziale, In-Place File Carving, Science Direct, 2007
[21] X. Zha, S. Sahni, Fast In-Place File Carving For Digital Forensics, e-Forensics, Vol. 56, Springer, 2010, p. 141-158
[22] R. Boyer and J. Moore, A fast string searching algorithm, CACM,
20, 10, 1977, 262-272
[23] S. Wu and U. Manber, Agrep - a fast algorithm for multi-pattern searching, Technical Report, Department of Computer Science, University of Arizona, 1994
[24] TrIDScan, available at: http://mark0.net/soft-tridscan-e.html
[25] Y. Miretsky, A. Das, C.P. Wright and E. Zadok, Avfs: An On-Access Anti-Virus File System, Proceedings of the 13th USENIX Security Symposium, 2004.