Mining Dark Web Malware: Connecting the Dots
Tiversa is a venture-funded cyber intelligence company based in downtown Pittsburgh that provides computer security services to an array of government agencies and commercial customers. To develop new client opportunities, Tiversa wants to mine its datastore to learn what it should be looking for. For example, it is possible that data is leaked from health insurers, resulting in highly sensitive medical information appearing on the Dark Web. If this is the case, Tiversa could approach insurance companies and offer to provide specific cybersecurity services. The problem is that so much data is being collected that Tiversa needs tools to mine it, both historically and on a daily basis. The MSIT eBusiness Technology practicum team, over a period of 9 weeks, focused on mining Tiversa’s malware datastore, and prototyped a software solution to search and visualize their malware data. Beyond the requirements, this solution also provided the capability to identify offenders spreading malware (to assist federal law enforcement agencies to apprehend cyber criminals), and to detect and observe behaviors of zero-day infections (to assist Tiversa with provision of unidentified infection spread to antivirus software clients such as Norton and McAfee). This solution provided three significant features. First, it optimized query performance on a 150GB malware dataset with 700 million rows (from 15 minutes down to a few seconds) through indexing, normalization, partitioning and pre-computing. Second, it enabled identification of offending IPs from data trends through analysis of field patterns, detection of unusual user behavior, and correlation and classification across behaviors, using similarity matrices for geographic distances and infections. Third, it enabled visualization of geographic and temporal data for Tiversa and their clients through a responsive and dynamic interface and web services. Deployable in three months and serving as a development baseline for future products and services, this solution provided Tiversa with capabilities to capture 30% market share, reach 200K users, and reduce operational costs by 43%. Technologies used included Java, Bootstrap, HTML5, D3.js, Highcharts.js and JQuery. MSIT eBusiness program faculty member Sujata Telang, consulting faculty Alex Hauptmann (Principal Systems Scientist in the Language Technologies Institute), and Tiversa’s project coordinator Anju Chopra, managed the seven-member student team. Congratulations to the MSIT eBusiness student team for earning second place along with a prize of $12,000 at the 2016 Practicum Competition.