Get the latest tech news

The performance of hashing for similar function detection


Imagine that you reverse engineered a piece of malware in pain-staking detail, only to find that the malware author created a slightly modified version of the malware the next day. You wouldn't want to redo all your hard work.

One way to avoid this is to use code comparison techniques to try to identify pairs of functions in the old and new version that are "the same" (which I put in quotes because it's a bit of a nebulous concept, as we'll see). That LEV did not blow PIC out of the water is very telling, and suggests that there is a fundamental limit to how well syntactic similarity based on instruction bytes can perform. For example, we might be able to train a model to understand that omitting the frame pointer does not change the meaning of a function, and so shouldn't be counted as a difference.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Performance

Performance

Related news:

News photo

Web Assembly audio decoders highly optimized for size and performance

News photo

Chain-of-thought can hurt performance on tasks where thinking makes humans worse

News photo

Chrome introduces new ‘Performance’ tools to wrangle the tabs gobbling up your memory