For several years, as Microsoft has pivoted to a cloud-first strategy, users have known that its traditional back-end technology -- SQL Server -- isn't really SQL Server in the cloud. It's something new. For those accustomed to working with SQL Server for analytics, though, the next wave of Microsoft Analytics is surprising.
This fall, Microsoft announced its latest iteration of the Azure cloud, whose architecture is designed for applications to be built on top of it. In particular, the Azure cloud, according to Microsoft CEO Satya Nadella, is poised to become the first artificial intelligence (AI) supercomputer. This means Azure will be building cloud processing power not just based on traditional CPU architectures, but also on GPUs. Banks of GPUs handle tasks in parallel, rather than using individual machines with ever-faster CPUs, which is becoming increasingly hard to do.
Microsoft AI is bringing intelligence to a whole new set of tasks. This kind of efficiency and contextual knowledge can complete tasks not only by mimicking human processes, but also by removing some of the manual, time-consuming work that humans engage in to get these tasks done.
At the same time, AI is nascent and immature. There are clear signs of promise, but also concerning examples of intelligent systems gone awry.
Consider Tay, Microsoft's Millennial-inspired chatbot that was supposed to mimic and learn from the conversations of teens. The chatbot was shut down within 24 hours, after it began spouting racist and inappropriate comments, indeed learned from teens engaged in chat. In December, Microsoft rolled out its next chatbot, Zo.
The intelligent personal assistant is a super-database administrator
Microsoft's Cortana began life as a kind of super-Siri, an "intelligent personal assistant" intended to use the AI functionality in the Azure cloud to enhance personal productivity. Cortana (named after a character in the Halo video game) is embedded in products like the Edge search engine, offers augmented responses to requests like "Buy me movie tickets" and "Make my dinner reservations," discovers what stores sell certain products and locates them geographically, and so on. These tasks tap into Cortana's metadata handling capabilities.
Cortana also features natural language processing and semantic search capabilities, which -- apart from face value -- combine with its metadata facility to embed meaningful context into input. These innate, AI power-based applications include its new data management products.
However, a word of caution here: The problem with this kind of voice-based productivity enhancement is that it can make the user lazy in terms of precision and forethought. Voice-based query should do the opposite, prompting the user to be both more specific and more careful in making ad hoc queries.
Azure Data Lake
Now that cloud platforms contain embedded analytics and the capacity to handle big data at low costs, the race is on to develop convenient resources for exploiting those analytics. Traditionally, this has been a challenge for even the most resourceful enterprises. Staging big data is tough enough, infrastructure-wise -- but organizing multiple sources and schemas while dealing with inconsistent data structures and completeness can be so complex, it can make the effort cost-prohibitive.
The Azure Data Lake solves this problem by providing a repository for an organization's schemas from every data source and data type. This centralization provides an essential platform for handling big data in general, and analytics in particular, lowering the cost of entry significantly and enabling the smart, automated configuration of sources for complex processing via Hadoop or other advanced analytics resources.
It can even serve as a first stage process for creating data warehouses, trimming the cost of that undertaking, as well.
Data Lake analytics
Another plus for the Data Lake is that the analytics are offered as an on-demand service in Azure, billable by job, rather than by subscription. This trims the cost of crunching big data, and the expense is further controlled by the service's dynamic scaling feature: It will dial resources up or down as needed during execution, so you pay only for the processing power consumed.
U-SQL is available for use with the service, offering declarative language utility with the nuance of C#, enabling distributed runtime processing. The service is Visual Studio-integrated for debugging efficiency and, in addition to Data Lake, will work with most Azure data sources, including SQL and Blob storage.
The convenience of this service can scarcely be overstated, but neither can the warning that it must be used judiciously: Staging data for analytics in this economical way is a huge plus, but getting the most from it requires in-house expertise and an understanding of applied analytics at the code level. It should not be thought of as Microsoft's version of IBM's Watson or Salesforce's Einstein.
Azure Data Catalog
Originating in Power BI, the Azure Data Catalog is a service that enables users to discover the data sources they need and to understand the data sources they find. This solves the problem of having to register the same data source over and over for different applications. With Azure AI, the Data Catalog is enterprise-wide, and the cloud handles interfacing with sources dynamically. Functionally, it is similar to the SQL Server Management Studio, but it's far more dynamic, convenient and consolidated; an enterprise-level data discovery tool that democratizes data sources.
A publish/discover/annotate model registers and updates sources in the catalog. The discover function allows one-stop shopping to learn whether certain data exists in the enterprise and to locate it in the catalog.
Metadata is added to data sources with the annotate function, which allows for the configuration of alerts for particular data use, as well as for group tagging, so that different groups can tweak columns within sources for their own use without tripping anyone else up. In essence, it is a data de-siloing mechanism for the enterprise.
Open APIs are available for discover and annotate, and Azure Active Directory integration is required to secure the catalog's data sources. One data catalog is available per Azure subscription.
Azure Data Factory
Data movement is similarly upgraded in Azure AI, with an enhanced successor to the SQL Server Integration Services (SSIS) product. As with SSIS, the Data Factory automates the transport and transformation of data between sources, creating data pipelines between published sources and points of consumption.
Data Factory improves on traditional methods of transport and transformation of data by changing data movement into a top-down, business-oriented framework, breaking business tasks down into logical groupings of data activity, and building pipelines for those logical groupings. Within the pipelines, data transformation occurs, including anything from data validation and stored procedure execution to complex activities like Hadoop streaming, machine learning processes and Data Lake Analytics.
While this method inherently serves to refine the alignment of IT processes with business goals and objectives, it also serves to force an upfront rethinking of exactly what the logical groupings of data activity need to be from a business point of view. This is far easier said than done because it can require extensive de-siloing and cumbersome accommodation of legacy systems. But it's unavoidable if Data Factory is to be of true benefit.
Finally, there are Linked services, which integrate Data Factory processes beyond cloud operations, representing data stores to external processes, and which specify available external computing resources where down-line activities will be executed.
What to consider with Azure SQL Database
The battle of SQL Server cloud databases
The challenges of Azure adoption