Academic Papers - 2016

pdf
Accurate spear phishing campaign attribution and early detection

In Proceedings of the 23rd ACM Conference on Computer and Communications Security (ACM Sigsac 2016)
In this paper, we introduce four categories of email profiling features that capture various characteristics of spear phishing emails. Building on these features, we implement and evaluate an affinity graph-based semi-supervised learning model for campaign attribution and detection.

pdf
Insights into Rooted and Non-Rooted Android Mobile Devices with Behavior Analytics

In Proceedings of the 31st ACM/SIGAPP Symposium on Applied Computing (ACM SAC 2016)
We proposed the first quantitative analysis of mobile devices from the perspective of comparing rooted devices to non-rooted devices. We have attempted to map high level thoughts about the characteristics of users who root their devices to the low-level data at our disposal.

pdf
Measuring PUP Prevalence and PUP Distribution through Pay-Per-Install Services

In Proceedings of the 25th USENIX Security Symposium (USENIX Security 2019)
We perform the first systematic study of PUP prevalence and its distribution through pay-per-install (PPI) services, which link advertisers that want to promote their programs with affiliate publishers willing to bundle their programs with offers for other software.

pdf
NG-DBSCAN: Scalable Density-Based Clustering for Arbitrary Data

In Proceedings of the VLDB Endowment, Vol. 10, No. 3, 2016
A scalable and distributed implementation of the DBSCAN clustering algorithm. The particularity of NG-DBSCAN is that it works scalably based on arbitrary data and distance functions.

pdf
Partially Supervised Graph Embedding for Positive Unlabelled Feature Selection

In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI 2016)
We propose to encode the weakly supervised information in PU learning tasks into pairwise constraints between training in-stances. Violation of pairwise constraints are measured and incorporated into a partially supervised graph embedding model.

pdf
Generating Graph Snapshots from Streaming Edge Data

In Proceedings of the 25th International World Wide Web Conference (WWW), 2016
We study the problem of determining the proper aggregation granularity for a stream of time-stamped edges. To this end, we propose ADAGE and demonstrate its value in automatically finding the appropriate aggregation intervals on edge streams for belief propagation to detect malicious files and machines.

pdf
Improving population estimation from mobile calls: a clustering approach

In Proceedings of the 21st IEEE Symposium on Computers and Communication (ISCC 2016)
We use distributed and scalable clustering techniques to perform estimation of population estimation, including mobility, based on mobile phone calls data.

pdf
Content-Agnostic Malware Detection in Heterogeneous Malicious Distribution Graph

In Proceedings of the 25th ACM International Conference on Information and Knowledge Management (CIKM 2016)
We propose a novel Bayesian label propagation model to unify the multi-source information, including content-agnostic features of different node types and topological information of the heterogeneous network. Our approach does not need to examine the source codes nor inspect the dynamic behaviours of a binary. Instead, it estimates the maliciousness of a given file through a semi-supervised label propagation procedure, which has a linear time complexity w.r.t. the number of nodes and edges. The evaluation on 567 million real-world download events validates that our proposed approach efficiently detects malware with a high accuracy.

pdf
PSBS: Practical Size-Based Scheduling

IEEE Transactions on Computers, 2016
Size-based scheduling algorithms can perform disastrously with skewed workloads and incorrect size information. PSBS is a scheduling discipline that performs very well even when job sizes are incorrect.

pdf
Efficient Routing for Cost Effective Scale-Out Data Architectures

In Proceedings of the IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS'16) In the context of large-scale data architectures, we propose an efficient technique to speedup the routing of a large number of real-time queries while minimizing the number of machines that each query touches (query span).

Related News

Man entering credit card details on tablet

Privacy, Identity, and Trust

Consumers and corporations are driven to engage in a digital world that they cannot adequately trust. We are developing paradigms to enable online commerce and facilitate machine learning in ways that provide privacy and protect user identities, by leveraging such concepts as local differential privacy, federated machine learning, identity brokering, and blockchain technology.

LEARN MORE
Child using tablet device

Social Good

Where possible, we want to investigate how existing technology and/or telemetry could be used to address key issues pertaining to vulnerable populations. In addition, we want to develop new techniques to try and solve specific problems in the areas of abuse, scams, and child online safety.

LEARN MORE
machine learning image

Robust and Fair Machine Learning, Data Mining, and Artificial Intelligence

The tremendous growth in the learning capacity of Machine Learning methods has yet to be met with a corresponding growth in our ability to understand these models. Equally troubling, our ability to build robust machine learning models has not kept pace with research in adversarial attacks against machine learning. As we increasingly hand over decision making to automated machine learning and AI systems, we must find ways that the life-altering decisions made by these systems can be audited for fairness, safety, robustness to adversaries, and the preservation of privacy of any personally identifiable information over which they operate.

LEARN MORE
click to top

Back to Top