Skip to Main Content

← Back to Resources

NK Labs: Introducing New Knowledge”s Research Division

Providing companies with an agile and reliable solution for brand integrity attacks requires us to stay up-to-date with the latest developments in disinformation tactics. This is an arms race. The better we understand the technologies and behaviors that can create problems, the more effectively we can detect, monitor, and defend against brand reputation manipulation.

NK Labs is the division of New Knowledge AI dedicated to driving and discovering emerging artificial intelligence and machine learning research. We use that research to provide New Knowledge with the tools it needs to help companies fight disinformation. Going forward, NK Labs will share monthly updates on the New Knowledge blog aiming to showcase our technology, practices and research to the community.

In the first of the series, we want to start by providing background on NK Labs, what we’re working on, and what you can expect from our monthly posts moving forward.

What NK Labs has been working on recently

Recently, there has been a push among artificial intelligence researchers to develop tools that automate various steps of solving a data science problem. As the theory and practice around popular machine learning techniques matures, various data science processes have reached a stage where they may potentially be commoditized, and various related pipelines automated. The US government, in particular, has been investing into the development of such tools through its Data Driven Discovery of Models (D3M) DARPA program, in which New Knowledge AI is a key participant.

Advancing automatic machine learning with government partners

Over the past couple years, New Knowledge AI has been contributing to D3M by driving advances in Natural Language Processing (NLP), big data summarization, time-series forecasting, clustering and classification, neural networks, deep learning, transfer learning, graph analytics, to mention just a few technical areas. These advances are packaged up as building-blocks, or primitives, which can be combined by a “meta learning” algorithm with primitives submitted by other D3M performers to yield optimal Machine Learning pipelines. The end goal is to have these pipelines solve useful challenging problems for users with no AI expertise, based on a natural language description of the user’s problem alone.

Publishing findings as peer-reviewed research

In July 2018, a workshop was held in Stockholm, Sweden at the International Conference on Machine Learning (ICML) to focus on this idea of automatic machine learning, the second time ever that this workshop was held. It was aptly dubbed “AutoML 2018”, and brought together leading researchers across industry and academia to present and discuss peer-reviewed research broadly related to this problem [ref:1]. NK Labs was on the ground in Stockholm to present a pair of research papers showcasing recent advances.

The first paper – Abstractive Tabular Dataset Summarization via Knowledge Base Semantic Embeddings – describes an abstractive summarization method for tabular data underpinning our open source software tool DUKE [ref:2]. It employs a knowledge base semantic embedding to quickly generate summaries for datasets, allowing queries across large previously-untagged collections of datasets for specific content of interest. The method is able to exploit hierarchies in prespecified ontologies. Experimental results on open data taken from several sources – OpenML, CKAN and – demonstrated the effectiveness of the approach on real data representative of what is often encountered in the wild.

DUKE is just one of the advanced analytic tools comprising DISTIL, a joint-effort with Canadian partner Uncharted Software and VA-based partners Qntfy/Jataware to produce an integrated automated Data Science system. We presented current state of DISTIL in the second research paper – DISTIL: A Mixed-Initiative Model Discovery System for Subject Matter Experts [ref:3]. This mixed-initiative system enables non-experts in AI with subject matter expertise in some domain area to generate data-driven models using an interactive analytic question-first workflow. Our approach incorporates data discovery, enrichment, analytic model recommendation, and automated visualization to understand data and models.

We were encouraged and inspired by the quality and breadth of the peer-reviewed research showcased at the conference – from breathtaking advances in Meta Learning via Deep Reinforcement Learning to a myriad of promising techniques for neural network architecture search. The talent and enthusiasm showcased would suggest that we are not only on the brink of automatic machine learning feasibility, but also critically its practicality. We are honored to be playing a role in this revolution.

The role of NK Labs research in fighting disinformation

The research described above is already finding internal use at New Knowledge AI in fighting disinformation. Components of DUKE are being deployed on social media data at scale to tag potential bad actors who may be attempting to influence conversations around relevant hot-button issues of the day. While the idea of practical automatic machine learning – as embodied by DISTIL – is some distance away from full autonomous feasibility, it has already yielded positive feedback from government user testing. Users with little to no AI expertise have found it useful in exploring novel data sets and extracting insights that were previously inaccessible to them. The end goal is to place a fully autonomous system in the hands of an army of disinformation analysts to expose bad actors hiding in the deepest darkest corners of the internet.

To learn more about all of the exciting research mentioned in this article, please check out the links below. All papers presented at the AutoML 2018 conference can be downloaded for free from [ref:1]. To give our open-source tool DUKE a try, please go to [ref:2]. We would love to hear of any experiences with this software and/or any suggestions you might have! To get a sense for the capabilities of DISTIL, check out [ref:3].

See also:

More like this: