Tuesday, 30 October 2018

Importing data to Azure Cosmos DB


Importing data to Azure Cosmos DB using Data migration tool


In this section, we will import our data from SQL server to Cosmos DB on Azure. We are using SQL API in our demonstration.

I am going to use AdventureWorks2016 sample database, will import this database to cosmos DB using Data migration tool. You can download this database from the GitHub repository from the following URL:


On download, we get the .bak file, which has to be moved in our MSSQL Backup folder as below:


Step -1 Restore Database

Now I have restored this database to my on-prem SQL server database. I did this by using restore and move SQL query. You can do this by restore database wizard also.

So, I have AdventureWorks database available in my database list and I am querying some data to be imported on cosmos. You can choose the data as per your business need.

Here we see several tables and views of AdventureWorks database, I am going to use View “vStoreWithAddresses




You might have noticed that the names of the views and tables are in dot notation where first part denotes the primary entity name.

For example, "vSalesPerson" is prefixed by Sales Entity. And therefore, the full name is [Sales.vSalesPerson]

This view joins many tables and returns 712 rows on selecting.



Similarly, if we fetch those addresses where adresstype = ‘Main office’, we will get 701 rows. But we need to modify this query in dot notation so that we can use partition key in cosmos DB to display the denormalized data in JSON format.



USE [AdventureWorks2016]

GO

SELECT CAST([BusinessEntityID] as varchar) as [id]
      ,[Name] as [name]
      ,[AddressType] as [address.addressType]
      ,[AddressLine1] as [address.addressLine1]
      ,[AddressLine2] as [address.addressLine2]
      ,[City] as [address.location.city]
      ,[StateProvinceName] as [address.location.stateProvinceName]
      ,[PostalCode] as [address.postalCode]
      ,[CountryRegionName] as [address.countryRegionName]
  FROM [Sales].[vStoreWithAddresses]
  WHERE [AddressType] = 'Main Office'
GO

In above query Address Type, line1, Line2, postalCode, countryRegionName are part of the Address and therefore prefixed by address, and City and StateProvineName are part of the Location.
We will import this to cosmos Db and will see how cosmos DB leverages this notation and gives us the nested format of data.

So as the result of above query we get 701 filtered rows as below:


Step-2 Download the Document DB data Migration tool


Data migration tool can be downloaded from Microsoft download center. Below is the URL:



You will get a compressed folder in your download file location. Extract it and run the dtui.exe for GUI based installation.

Step -3 Run the Document DB Data Migration Tool

Once you extract the files, you will see two executables files dt.exe and dtui, select dtui.exe for windows using wizard as below-


Double click on this setup and start the configuration settings.
It will pop up a dialog box, click on next, and we select SQL from “import from” dropdown because our data source is SQL . We enter a dot(.) as the nested separator. because we want to use it to determine the nested data. Here .(dot) will work as a delimiter for us.

We click next, and on the next screen, we have to enter the connection string to connect with our SQL server.
I am trying to connect with AdventureWorks2016 database from my local SQL server instance. You can try with any database you want to query with cosmos DB.

My connection string is data source=.;initial catalog=AdventureWorks2016;integrated security=true
Now copy paste the select view query as below:


Click on Next and in the next screen, we will enter the details of our cosmos DB account
Go to your cosmos DB account on Azure portal. Search Key and copy the primary Key and paste in the data migration tool.



Copy paste this connection string in Data migration tool and remember to append this connection string with the database name. Because this string points to your cosmos DB account, not the database. So, you need to specify the database name explicitly in which you want to import the data.
Here I am adding my connection string with Database=mydb;
This DB does not exist, so Cosmos DB creates it for us. Similarly, I am entering mystore for the collection. This collection does not exist so it will be generated by cosmos db .

We specify a partition key as :  /address/postalCode


If you expand Advanced Options, you will see a Text area to enter index policy.
Right click on it and you will see two options – default and Range. We will use Default for now.


Click Next few screens and Click on Import, the tool will start importing the result of the query to the Cosmos DB in JSON document format.


Now go to your Cosmos DB account and refresh it. In the overview blade, we can see a new collection.


clicking on this collection we can see our data imported in the form of documents.


By clicking on every document, you can see the entire JSON formatted document on right side area, you can easily observe the structure with nested Address and Location properties and this is because we used dot limiter for our column alias in our sql server view.
For example, location is further nested into city and stateprovincename. This looks similar to a normalized separate table in sql, but here denormalized data managed in the form of nested elements, as we can see in below example.



So, all set up now, we get successfully imported the SQL data to cosmos DB.
Now we are ready to query with SQL API.
Happy Learning with CosmosDB !! 😊

Tuesday, 23 October 2018

An Introduction to Azure Cosmos DB and SQL API



Azure Cosmos DB

What is Cosmos DB

An Azure cosmos DB is a document-oriented and a non-relational database. It does not model data in tabular format instead the data resides in a document form which is a JSON format. So unlike SQL server, the data in cosmos DB does not reside in table-column structure.
It extends Azure DocumentDB which was released in 2014. it is schema-less and generally classified as a NoSQL database. Azure Cosmos DB is Microsoft’s proprietary globally-distributed, multi-model database service "for managing data at planet-scale" launched in May 2017. It uses query language similar to SQL and supports ACID property of transactions.

Benefits of Azure cosmos DB

There are several benefits of Azure cosmos DB, most of them are written below -
  • Turnkey global distribution-replication across any number of Azure regions
  • High Availability and responsiveness and mission-critical application
  • Multiple data model and multi API support including graph and table API
  • On-demand storage( In any region worldwide) as per the size requirement
  • Elastically and independently scale throughput (easily scale DB throughput at a per second granularity)
  • Index management
  • No worry about the database schema
  • Latency guarantees (90% of read in less than 10 milliseconds)
  • Tunable consistency with the ACID-compliant transaction
  • Backup every four hour
  • Azure active directory support
  • High scalability0 Horizontal partitioning
  • .Net SDK supported libraries
  • Local cosmos DB emulator 

STEP 1: Create a Cosmos DB account on Azure

Go to your Azure portal, and Under All services, search Cosmos DB:




As you type cosmos, you will see “Azure Cosmos DB” , click on it.
In the next screen click on +Add button:



Add button opens a new create azure account blade as below.

Enter the subscription, resource group, a unique account name, API, Location etc. details.
Cosmos DB has compatibility with several different APIs like SQL, Mongo DB, Cassandra etc. Here SQL API is the native cosmos DB language.

I am selecting SQL as an API for my cosmos DB account. I am not enabling Geo-Redundancy because this is something we consider for a real production environment.

And click on Create-



Note: Step 1 will not ask for a pricing plan, because setting an account is just like setting a server. We don’t pay for the server, but we pay for the database inside this server.

So, here in below screen, you can see, I have created a new Cosmos DB Account “anucosmosdb” in West Europe location, which is already highlighted in blue in selected region map view in the Overview Blade.



STEP 2: Create Collections inside this cosmos Account


Now we have to create a collection, and for this setup, we have to pay to Microsoft.
A cosmos DB may have several databases, and each database may have one or more collections, and collection holds by a number of documents.





Add Collection

Click on Add Collection button under the Overview blade of your cosmosDB.

This will open a new Blade in the right aide area:

I named my database id as cosmosDB1 and to the collection as cosmosCollection1. You can give it any name.


Storage Capacity

  •      Fixed capacity – Choose this option if you do not expect to grow your data in the future or you are planning to do partitioning your own. It has one partition and it can’t grow dynamically.
  •      Unlimited Capacity – Cosmos DB automatically handles the partitioning server side(on Azure),  and it is fully simulated on the local emulator.

Throughput

This is a request Unit (RU of throughput. This is a performance level which you want this collection to be delivered. The minimum throughput value is 400. Lower value delivers lower cost and lower performance and higher value deliver higher cost and high performance.

Partition Key

This is used to group multiple documents together within the physical partition.

Here I have created a collection cosmosCollection1.
Now click on that collection and then expand this collection. Let’s create a new document by clicking on “New Document” button as below screen :



New Document will show an area, where I can type in my first document data in JSON. Here each document needs a unique Id, let’s start with ID and other data. If we leave this part, cosmos DB generates a GUID key for this document.
I am using a sample data of library management system and clicked on save.



As we can see a  unique GUID which is generated by cosmosDB.
Cosmos DB will automatically assign an ID property to a GUID if you don't supply one yourself but in addition, Cosmos DB includes a few more auto-generated properties like _rid, _self, _etag, _ts etc.

  •         _rid : This is a unique ID that you cannot set and it never changes. So, every resource really has two IDs, the regular ID which you can set and even modify, and the resource ID which is always set automatically by Cosmos DB and is immutable. It won't ever change.
  •       _ts : It holds the date and time that the resource was last updated.
  •        _etag : This is used internally by Cosmos DB for optimistic concurrency checks. The etag is a binary value that always changes whenever the resource changes and so Cosmos DB can tell when one user is going to overwrite changes by another user by detecting a change in the etag. When this happens, the second user's update gets rejected which protects against overwriting the first user's changes.
  •         _self : The self-property represents the full path of the resource, expresses a hierarchy of nested resource IDs.
  •         _attachments : This property which when appended to the self-link, points to the attachments for the resource.
Now I have created one more document with the same data, but I added one property Author in the last. And We can see, cosmosDB created a new document with newly added guid.




So now we can see there are two documents added with two different unique guid IDs.(See on the left side area)

Now we can query using SQL, here c is an alias which belongs to our collection. And in Result, we get one matching document. 
below is our Select statement to see how it works with where clause:



In above example, we fetched the document using where clause by ISBN number. And in the result, we got a matching document.
This was the basic idea of how collection and SQL query works. But in general, we do not write our documents like this. This editor sometimes can be useful for updating a little piece of code.

API Used: Here in the above example we used SQL API, which treats entities as JSON documents and these are hierarchical text document formatted in JSON.
There are other APIs too, we will learn them in detail.

Performance measuring features in CosmosDB


Latency

Latency means wait time or the time of getting a response to our request.
Response for a given request is faster in cosmos DB because data is distributed globally. Cosmos DB supports a range of consistency level which provides control over latency. Consistency means the data you are reading any point of time is the latest or not.
These consistency levels are :


How to set consistency level in Azure cosmos DB

By default, “Session” consistency has been set on Azure cosmos DB, but as per your database need you can overwrite it by clicking on “Default Consistency” link. You can do this change from .net SDK also by choosing the overloaded document class constructor which accepts consistency level enum. From the Azure portal end, you will see the following screen where you can overwrite it -

Throughput

Throughput means how many requests can be served within a specific period of time. It ensures how cosmos DB handles concurrent database requests. Its unit is RU. RU depends upon how many numbers of request units per second that you asked for and pay for when you create a data container in cosmos DB.

Request Units

Request Unit (RU) can be said as a throughput currency because it’s collectively measures of computational cost for example CPU, memory, disk, I/O. network I/O etc.
Note: Request-Units are not request but rather all requests get charged in the form of request units. All requests are not equal, because all requests have different computational cost. 
For example : Write is always cost more than Read, running a query on indexed data is always cost less than query over non-indexed data.

But they are deterministic if you run the same request again and again because they require the same number of RUs every time.
Request charges are very transparent. For example in our previous query we can see RUs by clicking on Query stats tab :


So we can see here, the RU cost is 2.96RUs.

Multi-model, multi-API support

Azure cosmos DB natively supports multiple data models including Document(JSON/BSON), graph, table, column-family, but the core content-model of database engine based on is ARS which is atom record sequence.

Difference between a Relational database and Document DB


Relational database
Document database
Tabular format. Rows
Documents in JSON format
Column
properties
Strongly typed schema
Schema-free. No pre-defined schema
Highly normalized. No column duplication. Data retrieval from multiple tables by using joins.
Typically denormalized. No need for ORM nor multiple joins. JSON document uses an array of embedded objects.


Cosmos DB resource Model

Documents in the cosmos DB are managed by a well-defined hierarchical structure of resources. These resources are highly available while allowing us to address each resource by a unique URI. We can understand this resource hierarchy by the following diagram :



At the top level, there is Account. Account’s name is used to compose the URI to access all child resources. After that, there is a Database, which is nothing but a logical container for collections. One or more container can reside inside a single database. After that, there is a collection, which holds a number of related documents. It’s a level where you select a pricing tier. After this, there is Document in the hierarchy which is a single entity represented by JSON.

Resource Granularity

Databases can contain Users who access to resources in the database. Each user can also contain Permissions that can restrict access to specific resources or operations. For example, you can restrict a user for read-only access to a collection.
A collection stores various programmatic logic in form of stored procedure, functions, triggers more like a SQL which you can reuse throughout various queries. But they are scoped only within the partition key.

Different APIs

Azure Cosmos DB can be accessed using four different APIs.
  • Azure Cosmos DB (SQL) API
  • MongoDB API
  • Graph (Gremlin) API
  • Tables (Key/Value) API



SQL API

What is SQL API

The SQL API in Azure Cosmos DB is a JavaScript and JSON native API based on the DocumentDB database engine. The SQL API also provides a query capability which is rooted in the structured query language. 

But SQL is designed for relational databases, and Cosmos DB is a NoSQL database. So, what is the difference?
Well, they adapted the sql syntax for cosmos DB but rooting this SQL into JavaScript and JSON. The SQL API supports the execution of JavaScript logic within the database in the form of stored procedures, triggers, and user-defined functions. JavaScript logic can be executed in a transactional manner directly within the database engine.

The language still looks like a familiar SQL, but the semantics are all based on hierarchical JSON documents rather than relational tables, and you work with the JavaScript data types rather than SQL datatypes. Expressions are evaluated as JavaScript expression rather than T-SQL expression. If you're using the .NET SDK, then there's also a LINQ provider that can translate language-integrated queries that you write in C# into Cosmos DB SQL.
At the end, you will see several keywords from SQL, but JavaScript keywords are very limited.

How SQL works in denormalized database world of cosmos DB?

In cosmos DB , we don’t deal with row-column based structure of data, but we deal with semi-structured hierarchical documents with nested arrays and child objects. Here we use dotted notation in our queries to navigate a path to any property no matter how deeply nested it may be within the document.

REST API

Azure Cosmos DB exposes resources through the REST APIs that can be called by any language capable of making HTTP/HTTPS requests. The basic HTTP methods used in REST can also be used to access resources in Azure Cosmos DB.
The SQL API provides a rich SQL query interface for retrieving JSON documents and server-side support for transactional execution of JavaScript application logic. The query and script execution interfaces are available through all platform libraries as well as the REST APIs.
Microsoft offers programming libraries for several popular languages for the SQL API. Currently available libraries are :
  • .Net SDK
  • Java SDK
  • JavaScript SDK
  • Python SDK


Accessing Resources from REST-based URIs Concept

As described previously (See figure-2 ), each resource in the hierarchy can be accessed using a URI, its concept is based on REST API design. We can understand this URI concept by the following diagram :


Above examples shows how these URIs work:


Note: Here we took sample IDs, but the ID of a resource is not based on the name you give in the portal or using a script. It is uniquely generated by the Azure Cosmos DB system.

Here in the above example, you may have noticed that the IDs used by Azure Cosmos DB “build” upon each other. For example, the Collection id begins with the Database id as a prefix. 


Hey Viewers, I am ending this introductory section here, In my next blog I will come up with a more technical approach and with querying and programming using SQL API for Azure Cosmos DB.

Have a great learning with Azure Cosmos DB basics..!!


My Article: Introduction to Azure Cosmos DB






Import data to Azure Cosmos DB using Azure DocumentDB Data Migration Tool

Today we will learn how to import data from our SQL Server database to Cosmos DB on Azure. Click on  My Blog   link to read my article, wher...