metamorworks - stock.adobe.com

Data integration remains essential for AI and machine learning

As our artificial intelligence algorithms continue to struggle with metacognition and causal inference, it is essential that we feed them the right data

The nature of human consciousness has so far evaded all, from philosophers to neuroscientists. It should therefore be of no surprise that the artificial intelligence (AI) that we can currently build is limited to more basic pattern recognition within sets of data.

Although AI and machine learning (ML) algorithms are getting ever better at doing more with less, we still often need to bring together data from multiple sources for them to produce results that make sense. Humans require a lot of information to make sense of the world, so our current more primitive computer algorithms surely need far more.

I am of the opinion that there are two essential technologies that will play a huge role in the future of ML, but are currently relatively under-explored – metacognition and causal inference.

As humans, we are not only able to complete a cognitive task, but we can also consider our thinking in completing that task. Although behavioral scientists will claim that our egos may sometimes get in the way of us being able to do this rationally, this process remains a critical process to how we, as humans, learn. This process is known as metacognition.

Humans plan, monitor and assess our understanding as we learn. So much so that when observing a robotic vacuum cleaner going awry, we will often consider it with amusement while taking our own ability for metacognition for granted.

A number of years ago, I accidentally worked on a solution to this very problem. I found myself needing to make a natural language processing (NLP) algorithm sensitive to the risk of making a mistake, and I later co-authored a paper describing the approach taken to address this problem. Essentially, the NLP algorithm would process the text and come up with a suitable reply for a human, and a different neural network would make a binary decision as to whether the reply was likely to lead to a positive interaction.

Other computer scientists have also explored this space and other research is going on to devise novel ways of doing more advanced metacognition in AI, attempting to improve the level of both problem-solving and comprehension that machines are capable of.

Read more about machine learning

Another challenge for ML is the ability of algorithms to understand causality. So far, much of what is done by AI algorithms is finding correlations between data points, as opposed to understanding causal relationships.

One scientific way of understanding causation is via randomised control trials (RCTs), in which test subjects are randomly assigned to different treatment groups. Such trials are used in everything from vaccine development to A/B testing of different user interface designs on websites. RCTs are one of the highest forms of scientific evidence, and provide significantly greater certainty than observational data, but can require a lot of data for statistical significance and need careful experimental design from the start.

Improving causal reasoning in AI offers the opportunity for us to do more with less when it comes to data. Microsoft Research is one team that has a group currently working on improving causality in ML, but there is still more work to be done. 

Until we overcome these challenges in AI, data integration will remain an important part of ensuring we can give our constrained ML algorithms the data they need to provide meaningful outputs. It is not just about the volume of data, but also the dimensionality. ML algorithms need a full understanding of all data attributes to have a better chance of finding the right conclusions. For this reason, before embarking on your AI revolution, you must ensure your ducks are in order when it comes to your data.

Junade Ali is an experienced technologist with an interest in software engineering management, computer security research and distributed systems.

Read more on Big data analytics

CIO
Security
Networking
Data Center
Data Management
Close