machine learning antivirus

The biggest challenge in cyber security today is identifying zero-day, never-before-seen threats to a system.


Traditional anti-virus vendors attempt to do this by using a signature-based approach, which leverages heuristics and pre-discovered hashes to guess if an incoming file could contain malware. This method requires frequent database updates as new malware examples are identified, and it commonly misclassifies new threats due to the latency of that process. Secondarily, zero-day malware samples can also circumvent this detection method by resolving themselves into polymorphic or metamorphic viruses, capable of modifying their contents to avoid detection.

To solve this challenge, SparkCognition has taken a revolutionary approach to malware detection.


Using advanced data science and machine learning algorithms, we have developed the first anti-virus solution of its kind, dubbed Machine-Learning Anti-Virus (MLAV).

As files are downloaded from a server, our tools take each one and expand them into a feature space of thousands of individual data points. These features are then viewed individually and combined in unique patterns to create something akin to the true DNA of each and every downloaded file. These pieces of DNA are then passed to our proprietary classification and text-processing techniques that analyze them individually. The results are ensembled together to identify if the downloaded file’s DNA is more similar to that of a malicious entity (i.e. Trojan, Adware, Ransomware, etc.) or a benign entity (i.e. Spotify.exe). Because this method analyzes the unique DNA of every downloaded file and does not depend on known hashes or virus profiles, it is able to identify malware that might slip through the cracks of more regimented signature-based systems.

MLAV is only part of the picture. It is but one small piece of what SparkCognition has built into our revolutionary SparkSecure cyber-security software. SparkSecure is designed to analyze Terabytes of inbound or outbound traffic data and identify potential threats and malicious activity at a server level. MLAV is just one of dozens of cognitive pipelines, all of which leverage machine learning processes, to detect malicious intent from internal or external sources. In my next post, I’ll discuss how it is possible to analyze outbound port traffic to identify pre-established malware instances on your server.

For questions or thoughts, please contact our Product Manager via the following channels:

Twitter: @Moorethinking