Detecting data exfiltration on computer networks: a machine learning approach

Samanwickrama, WD; Wickramasinghe, Indika; Perera, LDRD

Detecting data exfiltration on computer networks: a machine learning approach

Samanwickrama, WD; Wickramasinghe, Indika; Perera, LDRD

URI: http://repository.wyb.ac.lk/handle/1/3609

Date: 2018

Abstract:

Data exfiltration is the unauthorized transfer of sensitive information from a target computer to a location where a threat actor controls. Due to the regular exchange of data back-and-forth from networked enterprises, differentiating data exfiltration frorn normal network traffic has become a daunting task. Existing traditional security controls are increasingly ineffective in detecting such attempts. Machine learning (ML) can be effectively utilized to bridge the gap. However, choosing a suitable ML algorithm for a given problem depends on various factors such as characteristics of data available and context of the problem. This work emoloys ML, in particular Local Outlier Factor algorithm, to monitor data exfiltration on computer networks. Experimenta[ setup of research approach and very early stage resutts were provided. ln this experiment, a network testbed was setup with a private VLAN which consists of 10 PCs installed with general purpose applications such as MS Office, and they has access to the lnternet via a proxy server. Typical perimeter defenses are set as usual. No inbound connection is allowed for the internal network, and only HTTP (Bort 80) and I{TTPS (port 443) lraffic are allowed as outbound traffic, During the experiment, depending on their choice, benign users use internat PCs to mimic normal user activities such as word processing lnternet and Emails. Attacker resides externally (lnternet) and targets a victim on the private network. Using a remote administration toolkit (RAT)I, attacker stealthy captured a screenshot of victim's desktop and stole a file from her hard disk. All data transferred via encrypted web traffic (HTTPS) to bypass the firewall and intrusion detection systems. During a 2.5 hours monitoring period, using tshark (terminal oriented version of Wireshark), about one miilion of packets captured from victim's PC to perform the analysis describing in this work. Experimental results are encouraging. Proposed method successfully isolates exfiltrated data flows (RAT traffic) from the rest of legitimate web traffic even traffic is encrypted and uses the same channel (HTTPS). However, in order to generalize these findings, an extensive validation is needed and left as the future work.

Show full item record