Avast Team, Sep 10, 2020

Bobby Filar is the Lead Data Scientist at Elastic where he employs machine learning and natural language processing to drive cutting-edge detection and contextual understanding capabilities in the Elastic Security platform.

Recently, Bobby has focused on applying machine learning against process event data to provide confidence and explainability metrics for malware alerts. Previously, he has worked on a variety of machine learning problems focused on natural language understanding, geospatial analysis, and adversarial tasks in the information security domain. 

At CyberSec&AI Connected this October 8th, Bobby will present ‘Getting Passive Aggressive About False Positives’ as part of the workshop sessions. We spoke with him to get some insight into his presentation as well as to discuss trends, developments, and news in the adversarial machine learning field. 

Could you give us some insight into your role at Elastic and some of the work you are doing that would be of interest to those attending CyberSec&AI Connected? 

I lead the Security Data Science team at Elastic, which splits my time between technical research and people management. Our team’s primary responsibility is to deliver a free and open anti-malware machine learning model to our customers. This has allowed me to research adversarial machine learning (to keep our model as secure as possible) and determine how we can improve our deployed models at the customer level.

What are some of the latest trends and developments in your field? 

I think the trend I am most excited about is research sharing. The intersection of security and machine learning is still a relatively nascent field, with a limited number of practitioners and academic researchers. The fact that events like CyberSec&AI Connected exist show that there is a growing interest in reproducible research. Open-source malware classification models like Ember and MalConv have led to Anti-Malware Evasion competitions that encourage researchers to apply the latest reinforcement learning and adversarial ML methods.

Your workshop session is entitled ‘Getting Passive Aggressive About False Positives’. Could you give us a preview into what attendees can expect from your session? 

Every day when I log onto my machine, I review a dashboard depicting our malware model’s performance in production environments over the past 24 hours. Over the past year, I started to notice a larger trend of false-positive bursts occurring at the local (individual organization) level that was not pervasive across all environments. Further analysis showed that these false positives often came from internal custom software that was not representative of our training data distribution. Still, if we call this software malicious, it could significantly impact their business, so I started researching how global models could be tailored to local environments.

Passive Aggressive models represent one of several options to put some power in the users’ hands by encouraging them to re-train on problematic samples to adjust the classifier’s decision boundary locally. The general approach seeks to eliminate the industry trend of user’s having to overshare their data for the vendor to improve thereby preserving the privacy of internal software.

This year’s event will examine critical issues around AI for privacy and security. What aspects of this theme are you looking forward to discussing with other speakers and panelists at the conference? 

I think the panel on ‘Tackling Bias in AI’ will be fantastic as this is a topic we are hearing more about every day. Even in malware detection, we are constantly struggling to identify and eliminate bias in our models. It is a challenging task, so I am interested to hear what techniques the panelists discuss.

Are there any speakers or sessions in particular at CyberSec&AI Connected you are looking forward to watching? 

There are several talks that I’m excited about attending. In particular, the talks on attacking machine learning (Lorenzo Cavallaro, Hyrum Anderson, Luca Demetrio) and understanding data drift and class imbalance (Jan Brabec and Feargus Pendlebury). These topics are definitely ‘top of mind’ for me when looking to improve or augment our anti-malware classification models.

Do you have any predictions for what’s coming next in the field of machine learning, AI, privacy, and security? 

I think the concept of secure and private machine learning is only going to increase in importance. As machine learning becomes more ubiquitous in society as practitioners, our core responsibility will be to ensure the security of those models, while maintaining the privacy of the data being consumed by those models, and finally developing methods to identify and eliminate underlying bias in those models.

This year’s CyberSec&AI Connected is fully virtual, connecting attendees wherever they are in the world. What excites you about this format and the opportunities it brings? 

While it is always nice to see your peers face-to-face, the virtual conference experience is a great way to make content more accessible at a lower cost. Hopefully, a virtual CyberSec&AI will lead to greater attendance and a more in-depth discussion of the impact of privacy and security in machine learning.

To watch Bobby Filar’s presentation live, or view afterwards via our Virtual Library, secure your place at CyberSec&AI Connected. Visit our booking page to check out our 3 for 2 access offer and our special academic discount offer.

This article features

Bobby Filar

Lead Data Scientist

Elastic Security

Latest news

3 reasons you need to be at CyberSec&AI ...

Partnerships and collaborations drive progress and technological advances. With travel restrictions ...

Podcast: Avast’s Michal Pechoucek on what e...

Michal is one of the chief architects behind CyberSec&AI Connected, which takes place online on ...

Professor Lorenzo Cavallaro on adversarial ma...

Lorenzo leads the Systems Security Research Lab where he specializes in the intersection of program ...

McAfee’s Celeste Fralick talks algorithm bi...

Celeste Fralick is the Senior Principal Engineer and Chief Data Scientist for McAfee. She is respons...