Internet Application Integration: Catalysts for the coming Artificial Intelligence Revolution


In current times, hardly a day passes without some news about Artificial Intelligence (AI) and how it is going to change the world. Math based algorithms that were developed in the late 1950’s (first paper on Machine Learning appeared in 1957) are changing the world so much that some experts are already sounding the doomsday scenario when artificial intelligence algorithms will destroy the human race(Elon Musk).


While the availability of cheaper and more powerful computation power has been the major reason for the exponential growth in AI applications, another major impact has been from digitization of data which started in the 1960’s with the availability of commercial magnetic storage. The foundations of all modern AI algorithms originated in statistics and just like their base, the predictions of the AI algorithms are as good as the data. Due to Moore’s law(and its storage cousin, Kryder’s law , cost of data storage has been exponentially decreasing, allowing large organizations to permanently archive pretty much all data they generate.

With interconnected device and wider access to the Internet, data is being generated at a phenomenal rate, and none of it is being discarded. Facebook alone generates 500+ Terabytes of data a day with Google and Amazon likely in the same ballpark. Quite interestingly 90% of the data in the world today has been generated over the past two years and IBM estimates that currently all digital systems in the world are generating a combined 2.5 quintillion bytes (approx. 2.5 million Terabytes) of data each day.

This data offers tremendous opportunities for building sophisticated and robust artificial intelligence systems, which can offer tremendous benefits to humanity, doomsday scenarios on AI misuse notwithstanding. Since AI algorithms determine correlations (finding causation is best left to science) they can offer critical insights which otherwise would not have been obvious. As a simple example, if your health records are linked to Google Maps, you may be able to determine a correlation between visiting certain places and falling sick. If the data is further integrated with weather systems and traffic, you may be able to predict what weather patterns or traffic conditions increase your likelihood of falling sick. One can keep going on with these correlations since the possibilities are endless.

Unfortunately, most of the current data exist within their own organizations and a pretty much silo-ed. While large organizations in social media, internet search, medical insurance, oil & gas etc. have a large repository of data, they build their AI systems based only on the data they have. There have been some momentum in data collaboration for mutual benefit but these have been on an ad-hoc basis rather than part of a strategy.

In the early 1970s, organizations and government faced a similar problem with their own internal data. Various departments in the organization like accounting, sales, human resources etc. maintain data in their their own silos. Link between these disparate data sources was provided by humans doing manual entry of from one database to another. A similar manual link where humans serve as data bridges is done now for linking two data sets (in our example above, to link medical records with Google maps human intervention will be needed). Needless to say this is a very inefficient and slow process!

With the advent of robust networks in 1970s, originated the idea of Enterprise Application Integration (EAI) which allowed various enterprise data silos to talk to each other. Various topologies and communication architectures were developed to allow data exchange and supporting data transfer. However, before EAI could become the de-facto standard for integration of siloed enterprise data, they were upended by Enterprise Resource Planning (ERP) systems. ERP’s offered to store data for all departments in a single data store and hence did not need a separate layer for data linking and synchronization. Being cheaper than ERP, EAI’s still exist and are generally considered the poor-man’s (or poor-organization’s) ERP.

Though EAI and ERP solve an integration problem at a significantly smaller scale, it does not mean that the enterprise wide idea cannot be scaled up to the whole wide world where connectivity is provided by the Internet. However, enterprise ERPs are quite complex in themselves (for example, SAP R/2 released in 1982 has over 24,000 relational tables), a world-wide ‘ERP’ which uses a single data store will require a level of complexity and storage/computation power that we likely do not have. Hence, an EAI like system that links all disparate data sources of each organization in the world through the Internet, a so called Internet Application Integration (IAI), looks more feasible. Such a system will work on an open-standards network protocol like TCP/IP and provide APIs through state-of-the-art mechanisms like REST. In big data parlance, such a system will provide a virtual data lake that will power the AI of tomorrow.

Of course building such a system will need to cross several legal and cultural hurdles. Governments will have to take the role of organizational chiefs of the 1970s and drive the standards and process of adoption among all organizations within its jurisdiction. Private organization will also have to be willing to share their data in the common interests of the society. Since data is the new oil governments will want it treated just like another strategic natural resource.

Though IAI offers a technical solution, this will be challenge for companies who profit from data they own, and finding a common stake along with their for-profit goals will be challenge that will set the political and business discourse in the years to come.

Love or hate this article ? Have other ideas ? Please leave a comment below !

Comments

Popular posts from this blog

Part III: Backpropagation mechanics for a Convolutional Neural Network

Introducing Convolution Neural Networks with a simple architecture

Deriving Pythagoras' theorem using Machine Learning