herdProtect [Fuzzy]

The herdProtect fuzzy engine is an experimental anti-malware scanner that utilizes static and dynamic fuzzy matching to find new malware variants or those that are not detected by some anti-virus emgines. By generating a set of fuzzy signatures based on the file structure and active behaviors and comparing those signatures with known malware variants, the engine can perform fuzzy matching quickly and efficiently to discover new files in the wild based on existing ones.
One such fuzzy hash the engine utilizes is a context triggered piecewise hash (CTPH), also known as Ssdeep, which is designed to find nearly identical file conetent based soley on the sturcture of a file on disk. Homologous files share identical sets of bits in the same order. Because such files are not completely identical, traditional techniques such as cryptographic hashing cannot be used to identify them. CTP hashing is a technique for constructing hash signatures by combining a number of traditional hashes whose boundaries are determined by the context of the input. These signatures can be used to identify modified versions of known files even if data has been inserted, modified, or deleted in the new files.

How does this work?

An example of calculating such a fuzzy match is as such. Two known adware type files (both web browser extensions) both have completely different cryptographic hashes, however, comparing their CTPH hashes yeilds a 97% match ratio bases soley on the structure of the files. In addition, the fuzzy match takes into account the dynamic beahviors of files such as both being BHOs (Browser Helper Objects) that utilize the same CLSIDs, both make simular network connections, etc.
Example:
File #1  deal-boat.dll
SHA-1:  09b26053d282af1236e3738ea811f6b46501a036
CTPH:   12288:/uU1oTd6pitTRSR50tPCD/Hac1Azj/Pq56Tob17bHM06Yh+C1:mUSd6pitTRSD0tPCDC1zjnjTobdblD+I
SHA-1:  67f2a145bf4770236ac2e45caa81c33c73e4b115
CTPH:   12288:/uU1oTd6pitTRSR50tPCD/Hac1Azj/dqN6Tob17THM06Yh+CD:mUSd6pitTRSD0tPCDC1zj1rTobdTlD+2
Hash analysis:
SHA-1 match:  0%
CTPH match:   97%