Technical in depth description of scan process

Hello,

can anyone provide me a link or do an in depth technical description how a virus scanner actually works when scanning a file for viruses?

Just the phrase: “It looks for known patterns in the file” is not enough :slight_smile:

I mean the known number of viruses today is huge and it just takes the scanner some miliseconds to scan a file for it.
How does this exactly work?

It looks for known patterns in the file very very fast.

:wink: ;D ;D

:-X no comment :-[

As my question is still unanswered, I ping it to the top.

I think you can ping away, but I would have thought a detailed answer would be classed as ‘commercial in confidence,’ so you wouldn’t get one, especially not in a public forum.

I also don’t think you’ll get a detailed answer - because it’s basically the “know how” behind the antivirus scanners (in addition to a team working hard on the detections).

ok, ok

Let’s try it another way :wink:
How does CLAM AV do the fast scanning for viruses in the files?
This program is open source and can be analysed by everyone. I am just not too technical savvy to understand the code.
Any new tries?

Well, good question…
But won’t we find a good answer if we ask Clam team/developers… I think Alwil team won’t have ‘time’ to look at that code… if they have and discover ‘anything’, they won’t tell, eh? :wink:

I really think there is a basic concept how virus detection works on all virus scanners.
Cannot imagine that every vendor reinvents the wheel completely…

My question is also not directly targeted to alwil, but to everyone who thinks is an antivirus expert and really knows how it works behind the scenes.

Ok, I agree, but this basic concept won’t be just compare the source code with the signature patterns?

Sorry… I’m not among ‘them’…

Well, the simplest thing you can do is use hashes (hash the whole file and match the result against a database of known infected hashes - binary division is quite fast).
More complex, you can use some multi-string search algorithm (e.g.: http://en.wikipedia.org/wiki/Aho-Corasick_algorithm) to look for known patterns.
Then, you add some algorithmic detections…

Thanks igor,

now we are heading in the right direction.
The Aho-Corasick hint was the missing link I was searching for.
Googling for the term with some other keywords really revealed what I was looking for.

It helped to understand more what is really going on.