Academic Papers - 2015

pdf
Foreebank: Syntactic Analysis of Customer Support Forums

In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015)
We present a new treebank of English and French technical forum content which has been annotated for grammatical errors and phrase structure. This double annotation allows us to empirically measure the effect of errors on parsing performance. While it is slightly easier to parse the corrected versions of the forum sentences, the errors are not the main factor in making this kind of text hard to parse.
This paper introduces the Foreebank data set, a data set created for training user-generated content parsers. By clicking on the link below to access the Foreebank data set, or by accessing and/or using the Foreebank data set, you agree to be bound by these Terms of Use. If you do not agree to the Terms of Use, do not access or use the ForeeBank Data Set.

pdf
TrackAdvisor: Taking back browsing privacy from Third-Party Trackers

In Proceedings of the Passive and Active Measurement Conference (PAM), New York, 2015. A study aiming to measure accurately how widespread third-party tracking is online, and hopefully raise the public awareness to its potential privacy risks.

pdf
Soothsayer: Predicting Capacity Usage in Backup Storage Systems

IEEE 23rd International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems

pdf
The Dropper Effect: Insights into Malware Distribution with Downloader Graph Analytics

In Proceedings of the 22nd ACM Conference on Computer and Communications Security (ACM SIGSAC 2015)
We introduce the downloader-graph abstraction, which captures the download activity on end hosts, and we explore the growth patterns of benign and malicious graphs.

pdf
Access Prediction for Knowledge Workers in Enterprise Data Repositories

In Proceedings of the 17th International Conference on Enterprise Information Systems (ICEIS 2015)
The data which knowledge workers need to conduct their work is stored across an increasing number of repositories and grows annually at a significant rate. It is therefore unreasonable to expect that knowledge workers can efficiently search and identify what they need across a myriad of locations where upwards of hundreds of thousands of items can be created daily. This paper describes a system which can observe user activity and train models to predict which items a user will access in order to help knowledge workers discover content.

pdf
Are You at Risk? Profiling Organizations and Individuals Subject to Targeted Attacks

In Proceedings of the 19th International Conference on Financial Cryptography and Data Security (FC 2015)
Considering the taxonomy of Standard Industry Classification (SIC) codes, the organization sizes and the public profiles of individuals as potential risk factors, we design case-control studies to calculate odds ratios reflecting the degree of association between the identified risk factors and the receipt of targeted attack.

pdf
Harbormaster: Policy Enforcement for Containers

In Proceedings of the 7th IEEE International Conference on Cloud Computing Technology and Science (CloudCom'15) We present Harbormaster, a system that improves the security of running Docker containers on shared infrastructure. Harbormaster enforces policies on container management operations, allowing administrators to implement the principle of least privilege.

pdf
Mind Your Blocks: On the Stealthiness of Malicious BGP Hijacks

2015 Network and Distributed Systems Security (NDSS) Symposium
In this paper, we analyse 18 months of data collected by SpamTracer, an infrastructure specifically built to answer that question: are intentional stealthy BGP hijacks routinely taking place on the Internet? The identification of what we believe to be more than 2,000 malicious hijacks leads to a positive answer.

pdf
Needles in a haystack: mining information from public dynamic analysis sandboxes for malware intelligence

In Proceedings of the 24th USENIX Security Symposium (USENIX Security 2015) We propose a novel methodology to automatically identify malware development cases.

pdf
Efficient and Self-Balanced ROLLUP Aggregates for Large-Scale Data Summarization

In Proceedings of the IEEE 4th International Congress on Big Data (BigData Congress 2015)
The ROLLUP primitive allows summarizing complex and large datasets. We develop an efficient implementation for Apache Pig.

pdf
Cutting the Gordian Knot: A Look Under the Hood of Ransomware Attacks

In Proceedings of the 12th Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA 2015)
We present the results of a long-term study of ransomware attacks that have been observed in the wild between 2006 and 2014.

pdf
Monte Carlo Strength Evaluation: Fast and Reliable Password Checking

In Proceedings of the 22nd ACM Conference on Computer and Communications Security (ACM CCS 2015)
A method for scalable password strength checking reflecting the effort that state-of-the-art attackers would need to guess them.

pdf
The Attack of the Clones: A Study of the Impact of Shared Code on Vulnerability Patching

In Proceedings of the 36th IEEE Symposium on Security and Privacy (SP ‘15)

pdf
All your Root Checks are Belong to Us: The Sad State of Root Detection

In Proceedings of the 13th ACM International Symposium on Mobility Management and Wireless Access (MobiWac 2015)
We analyzed security focused applications as well as BYOD solutions that check for evidence that a device is “rooted”.

pdf
HFSP: Bringing Size-Based Scheduling To Hadoop

IEEE Transactions on Cloud Computing, 2015
HFSP is a scheduler for Hadoop inpired by the FSP algorithm. Like FSP, HFSP improves the scheduling both in terms of service time and fairness.

pdf
Demystifying the IP Blackspace

18th International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2015)
In this paper, we explore the misuse and abuse of the IP blackspace, a portion of the Internet IP address space that should not be used. We show that the IP blackspace is sometimes mistakenly used to host web services, such as, websites. We also show that cybercriminals exploit the blackspace to host malicious servers and launch attacks.

pdf
Scalable k-nn based text clustering

In Proceedings of the 2015 IEEE International Conference on Big Data (IEEE BigData 2015)
We use distributed and scalable clustering techniques to cluster text data based on the edit distance metric.

Related News

Man entering credit card details on tablet

Privacy, Identity, and Trust

Consumers and corporations are driven to engage in a digital world that they cannot adequately trust. We are developing paradigms to enable online commerce and facilitate machine learning in ways that provide privacy and protect user identities, by leveraging such concepts as local differential privacy, federated machine learning, identity brokering, and blockchain technology.

LEARN MORE
Woman watching large screen with stocks on it

Risk Measurement and Mitigation

Cyber incidents are unavoidable. As digitalization marches on, online security weak spots proliferate while digital footprints become more prominent. The endless stream of digital assets is even more lucrative to an evolving set of well-equipped and skillful attackers. A combination of risk analytics and risk prediction can help improve security posture by taking appropriate counter measures. Risk analytics can identify the key actors that correlate with and cause the risk. Risk prediction can forecast the elements in the ecosystem that will be attacked or infected.

LEARN MORE
Secure Systems Map

Systems Security: Internet of Things, Mobile, Cloud, Virtualization

There is a continual need for security systems of many kinds, including traditional endpoints, mobile devices, cloud, IoT and virtual hosts. The continual evolution of these computing platforms results in new threats, but also in opportunities to better secure these systems. Furthermore, widespread deployment of trusted hardware brings new opportunities, but also a set of hardware-level threats that are not easily mitigated. The escalating cost of data breaches continues to make defending sensitive data a priority, and enterprises are becoming increasingly open to adopting new classes of defenses and encryption-based solutions to prevent serious breaches.

LEARN MORE
click to top

Back to Top