Reflections on my AR journey: Part I: A teaser into Visual Textual analysis (with original goals including cybersecurity, binary/malware analysis, image processing

A bit of History:

I first began to look into AR when I was doing my graduate work, which was focused on engagement techniques, influence effects, and in particular, using them to help keep girls in the STEM pipeline. I'd already learned enough to know that interaction was key to keeping interest high and internal motivation high (in academic speak it's called intrinsic motivation). That was how I defined the often overly used and overly vague term, engagement. (engagement = high interest + high intrinsic motivation).

I'd been involved with a Virtual Worlds standards group, so naturally, some of my first thoughts went straight to interaction within a virtual world. A lot of researchers were thinking the same thing, with the idea of online classrooms, classrooms and field trips inside VR created worlds, and so on. Remember Google Glass? Yes, that was being used by some educators to share field trips with their students - a professor walking through a museum or factory while broadcasting his/her experiences or someone wearing Google Glasses and letting popups appear to inform them of what they were looking at. It was one of the early attempts a AR - the idea of recognizing objects and popping up bubbles of information based on that recognition.

AR? Yes, it seemed a good fit. So I started experimenting with Vuforia and Metaio, two AR SDKs, the latter being the more powerful but at the time, the more restrictive. I had the idea, like many others, of using a laptop or tablet or even smartphone to look at images (image targets) and have 3D scenes pop up in response - and Vuforia was also trying to find ways to let the user interact directly with the scene (and I don't mean touching the screen as you typically see, I mean touching the image target and having the 3D scene react. It was a form of invisible buttons - a form of limited 'hand tracking' though it was really just tracking whether an area of the target was obscured (if it was obscured, then it *might* mean that the user was *touching* that part of the screen...and not that the user had accidentally left their coffee cup in that spot!)

Vuforia was too slow and buggy and didn't work (at that time) on Windows devices (even now it only works for Windows 10 and up), and I wanted to use Windows 8+ devices as well as Android (and should I ever be able to afford an iPhone just for testing, well...iOS, too.) Well, that slowness, lack of ability to get to the video stream (and modify it), the instability of the product, made me think about writing my own AR engine. I didn't want to, but then Metaio went and sold itself to Apple and Apple promptly shutdown any further public use of Metaio, so I had only Vuforia and a handful of other new contenders that weren't even close to Vuforia and had no free commercial option anyway.

So I began to learn about OpenCV, the open source computer vision library - and likely one of the most powerful libraries for computer vision, open or proprietary, out there. And I discovered ArUco, which was an open (and licensed free for commercial and personal use) basic AR library developed by Rafael Muñoz-Salinas and Sergio Garrido-Jurado at the University of Cordoba (see this link for more information, ArUco Introduction. )

I was also using Unity as my coding base, rather than trying to create my own game engine or write my code directly using OpenGL or DirectX and/or Windows and Android. Like many, I found it easier to let Unity do that work for me, and make it easier for me to create cross-platform applications. Vuforia had a plugin for Unity, so it seemed only natural to look for a way to adapt ArUco to Unity, as a plugin. The only problem was that ArUco was really a combination of console executables and static library functions that, in turn, required heavy OpenCV use.

It would be a lot of work - making a plugin (a dynamic library basically) for Unity is no small task, but also creating an entire framework (scripts, prefabs, and more scripts) within Unity that could interact with Unity and the plugin...well, that should be enough to give anyone pause.

But I really wanted an AR system that could work with Windows 8+ as well as Android - and I didn't have any budget to buy some other kit that also would likely disallow me to get directly into the video stream so that I could image process it.

But why was I so focused on that darn video stream?

Well, I started as a scientist and I'm still a scientist at my core. And part of the work I was doing with the Virtual World standards group involved cybersecurity, just as my early career had been working on defense department research. I was fascinated with hackers and the intricacies of low level coding, hacking, and binary malware - especially polymorphic malware. And, being a data junkie, I'm naturally drawn to pattern recognition, data mining, and machine learning. In fact, my early days were focused around such subjects, though in a different context.

So a thought struck me: Could I use image processing to look at binary data? And if I did, would I see any patterns?

It turned out I wasn't the first to have this thought, and some very clever people, one named Conti, had already figured out that if you look at binary data as digrams (i.e. points of x,y), you quickly realize that binary data has patterns in it - distinctive patterns. (Another was named Cortesi and looked at Hilbert curves for binary visualization.)

In fact, those patterns are signatures. They indicate what type of file you are looking at - from a 32bit based EXE to a text file (TXT) to everything in-between.

My AR engine was underway and it's first goal was now multiplied.

1) be able to bring up 3D Unity scenes based on fiducial markers (simple black and white markers that encode numbers)
and
2) analyze a binary file as an image by first streaming the binary data into a digram based image and then using typical image processing techniques and machine learning techniques to categorize and analyze it (using OpenCV and its image processing capabilities)

Boy, I had no idea what I was getting myself into!

In Part 2, I will go more into digrams and what I learned, and how I am now approaching the problem of visual textual analysis in a different way. I'm focused currently on sentiment analysis rather than binary/malware analysis, but if you want some teasers on how digrams and hilbert curves jumpstarted my thoughts on binary visualization and image processing of textual content, check out these links.

Back a few years, most people hadn't heard about digrams for binary visualization use, but now, as you'll see from some of the links, the idea has exploded.

https://www.dfrws.org/sites/default/files/session-files/paper-automated_mapping_of_large_binary_objects_using_primitive_fragment_type_classification.pdf

https://corte.si/posts/visualisation/binvis/index.html

https://corte.si/posts/visualisation/hilbert-snake/index.html

https://codisec.com/binary-visualization-explained/

http://cs.brown.edu/~er/papers/icse01.pdf

Cantor Dust: https://sites.google.com/site/xxcantorxdustxx/home

https://defensesystems.com/articles/2014/02/06/plan-x-raytheon-darpa.aspx

Search This Blog

Augment Your Reality Safely - Reflections on AR and Other Things

Reflections on my AR journey: Part I: A teaser into Visual Textual analysis (with original goals including cybersecurity, binary/malware analysis, image processing - yes, really!)

Comments

Post a Comment

Popular posts from this blog

Getting started with Unity's new Shader Graph Node-based Shader Creator/Editor (tutorial 6 - Getting Glow/Bloom Effect wihout Post-Processing by Inverting Fresnel...Sort Of...)

Getting started with Unity's new Shader Graph Node-based Shader Creator/Editor (tutorial 5 - Exploring Fresnel/Color Rim and Update on Vertex Displacement Attempts)

Getting started with Unity's new Shader Graph Node-based Shader Creator/Editor (tutorial 2 - tiling, offsets, blending, subgraphs and custom channel blending)