Avast Team, Sep 3, 2020

Lorenzo leads the Systems Security Research Lab where he specializes in the intersection of program analysis and machine learning for systems security. 

At CyberSec&AI Connected this October 8th, Lorenzo will present his talk ‘Intriguing Properties of Adversarial ML Attacks in the Problem Space’. We spoke with him to get  some insight into his presentation as well as to discuss trends, developments, and news in the adversarial machine learning field. 

Could you give us a little insight into your upcoming CyberSec&AI Connected talk? 

Machine learning has recently gained popularity thanks to the increase in computational resources as well as its democratization through software development frameworks and online courses. To date, AI-powered techniques have shown solid results in well-known and emerging domains, ranging from image classification and speech recognition, recommendation systems, to games and specific computer security tasks. AI is influencing, and will be driving, several key aspects of our society, such as healthcare, human resources, and transportation.

Although promising, AI research has often neglected the adversarial component. For instance, adversarial manipulation of testing datasets (evasion attacks) cause mispredictions. Recent research efforts on adversarial ML have investigated problem-space attacks, focusing on the generation of real evasive objects in domains where, unlike images, there is no clear inverse mapping to the feature space (e.g., software). However, the design, comparison, and real-world implications of problem-space attacks remain under-explored.

We propose a novel formalization for adversarial ML evasion attacks in the problem-space, which includes the definition of a comprehensive set of constraints on available transformations, preserved semantics, robustness to preprocessing, and plausibility. We shed light on the relationship between feature space and problem space (realizable attacks), and we introduce the concept of side-effect features as the by-product of the inverse feature-mapping problem. This enables us to define and prove necessary and sufficient conditions for the existence of problem-space attacks. We further demonstrate the expressive power of our formalization by using it to describe several attacks from related literature across different domains.

Building on our formalization, we propose a novel problem-space attack on Android malware that overcomes past limitations.  Experiments on a dataset with 170K Android apps from 2017 and 2018 show the practical feasibility of evading a state-of-the-art malware classifier along with its hardened version. Our results demonstrate that “adversarial-malware as a service” is a realistic threat, as we automatically generate thousands of realistic and inconspicuous adversarial applications at scale, where on average it takes only a few minutes to generate an adversarial app. Yet, out of the 1600+ papers on adversarial ML published in the past six years, roughly 40 focus on malware – and many remain only in the feature space.

I hope to show how our formalization of problem-space attacks can help in paving the way to more principled research in this domain. We responsibly release the code and dataset of our novel attack to other researchers, to encourage future work on defenses in the problem space.

This year’s event will examine critical issues around AI for privacy and security. What aspects of this theme are you looking forward to discussing with other speakers and panelists at the conference?

I would very much like to identify and prioritize research directions we – as a community – should be focusing on with respect to the challenges of using AI in security contexts (and how to secure AI systems too). It would be really exciting to reason as a community – with no academic/industry split – quandaries such as: What problems matter from a theoretical as well as practical importance? There’s a great focus on adversarial ML, but is, for instance, concept drift a solved problem? Is it related to adversarial ML? What role would explainable AI play in this context?

How fast moving an area is adversarial ML? Is it difficult for those in academics and industry to keep pace with developments and potential of the field?

The field is moving fast and it often requires an understanding of the mindset and core topics of ML as well as security communities. This requires time and a non-negligible amount of dedication and passion. I guess we’re living in exciting times where collaborations and cross-pollination of ideas might be key to keep up with the fast-paced field. Academics are involved in several non-research activities, which may challenge their blue-sky thinking and I am sure those in industry and the government face similar issues. In academia – at least in the UK where I work – it’d be great to perhaps rely more on the philanthropic generosity of endowed chairs, unrestricted gifts, and co-funded research professorships to buy ourselves quality research time for reasoning about these challenging, yet exciting problems.

What are some of the recent trends and developments around AI, ML, and privacy that have caught your eye?

Outside of a proper security mindset, the progress of AI used in creating films, music, and games has greatly captured my interest. These tasks are generally associated with the concept of creativeness and, although AI results here are still in their infancy, it seems quite an exciting journey. When it comes to cybersecurity and privacy, I see the potential of AI but we don’t have to confuse an apparently high-performance result with the ability to solve the underlying tasks. There are still far too many open problems we need to understand before we’ll be able to truly rely on AI in cybersecurity and privacy contexts.

Do you think the general public, governments, and businesses are far more educated around privacy and AI than before? How can more be done to increase awareness and understanding around these areas?

I used to think so, but I guess I have to turn around on that and perhaps provide a subject-specific answer. The UK seems to be putting a lot of effort on privacy, security, and AI. Which is brilliant. For instance, this effort is supported by the establishment of thematic institutes, e.g., the Alan Turing Institute, the Office for Artificial Intelligence, the Centre for Data Ethics and Innovation, and the AI Council. Similar efforts are pursued by the UK National Cyber Security Centre and research grants by UK funding agencies. I am sure a similar effort can be seen in several other governments with some probably lagging behind.

Industry is a different beast. I am sure the usual key players are well-educated in this context, but there is an interesting recent work by Shankar Siva Kumar et al. (Adversarial Machine Learning – Industry Perspectives), which appeared in the workshop on Deep Learning for Security 2020 that sheds some interesting and quite unexpected light on the fact that “[based] on interviews with 28 organizations, […] industry practitioners are not equipped with tactical and strategic tools to protect, detect and respond to attacks on their Machine Learning (ML) systems”

This was quite eye-opening for me but the work shares a positive note of engaging “[…] researchers to revise and amend the Security Development Lifecycle for industrial-grade software in the adversarial ML era.”, which seems very encouraging.

I believe the general public risks being manipulated by the news as shown by the media due to a lack of fact-checking (fake news) and the use of deep fakes which further exacerbate such a problem.  As such, I guess most of the worries go on pointing fingers against AI (e.g., as a job-stealing technology) or any other technological progress in general. 

Now, this is a very delicate conversation one should spend more than a paragraph on – with a disclaimer I don’t intend to be political at all  but my view is that governments (with the help of experts in academia and industry) have the duty of forecasting how advances in technology would shape our society in the next 30-50 years and provide the tools from now onwards to train individuals to embrace new opportunities such changes will bring. 

We have lived across different revolutions, from the industrial (or first revolution) to the digital one (the third revolution), with many acknowledging we have already entered the fourth revolution: “[our] response to it must be integrated and comprehensive, involving all stakeholders of the global polity, from the public and private sectors to academia and civil society.”

Failure to do so would irremediably and righteously let people point fingers, where opportunities will divide rather than unite our society for the greater good.

CyberSec&AI is fully virtual, connecting attendees wherever they are in the world. What excites you about this format and the opportunities it brings?

This new format has the main advantage of reaching out to a far wider audience. Sure, physical events naturally support the creation of networking opportunities, but if properly done – with the support of technology – interesting conversation with a wider audience will happen in virtual events too.

To watch Lorenzo Cavallaro’s talk live, or view afterwards via our Virtual Library, secure your place at CyberSec&AI Connected. Visit our booking page to check out our 3 for 2 access offer and our special academic discount offer. 

This article features

Lorenzo Cavallaro

Professor of Computer Science, Chair in Cybersecurity (Systems Security)

CyberSecAI Connected 2020
King’s College London

Latest news

3 reasons you need to be at CyberSec&AI ...

Partnerships and collaborations drive progress and technological advances. With travel restrictions ...

Podcast: Avast’s Michal Pechoucek on what e...

Michal is one of the chief architects behind CyberSec&AI Connected, which takes place online on ...

Bobby Filar, Lead Data Scientist at Elastic, ...

Bobby Filar is the Lead Data Scientist at Elastic where he employs machine learning and natural lang...

McAfee’s Celeste Fralick talks algorithm bi...

Celeste Fralick is the Senior Principal Engineer and Chief Data Scientist for McAfee. She is respons...