Azure Cosmos DB
An Azure cosmos DB is a document-oriented and a non-relational database. It does not model data in tabular format instead the data resides in a document form which is a JSON format. So unlike SQL server, the data in cosmos DB does not reside in table-column structure.
It extends Azure DocumentDB which was released in 2014. it is schema-less and generally classified as a NoSQL database. Azure Cosmos DB is Microsoft’s proprietary globally-distributed, multi-model database service "for managing data at planet-scale" launched in May 2017. It uses query language similar to SQL and supports ACID property of transactions.
Benefits of Azure cosmos DB
There are several benefits of Azure cosmos DB, most of them are written below -
- Turnkey global distribution-replication across any number of Azure regions
- High Availability and responsiveness and mission-critical application
- Multiple data model and multi API support including graph and table API
- On-demand storage( In any region worldwide) as per the size requirement
- Elastically and independently scale throughput (easily scale DB throughput at a per second granularity)
- Index management
- No worry about the database schema
- Latency guarantees (90% of read in less than 10 milliseconds)
- Tunable consistency with the ACID-compliant transaction
- Backup every four hour
- Azure active directory support
- High scalability0 Horizontal partitioning
- .Net SDK supported libraries
- Local cosmos DB emulator
In the next screen click on +Add button:
Add button opens a new create azure account blade as below.
Enter the subscription, resource group, a unique account name, API, Location etc. details.
Cosmos DB has compatibility with several different APIs like SQL, Mongo DB, Cassandra etc. Here SQL API is the native cosmos DB language.
I am selecting SQL as an API for my cosmos DB account. I am not enabling Geo-Redundancy because this is something we consider for a real production environment.
And click on Create-
Note: Step 1 will not ask for a pricing plan, because setting an account is just like setting a server. We don’t pay for the server, but we pay for the database inside this server.
So, here in below screen, you can see, I have created a new Cosmos DB Account “anucosmosdb” in West Europe location, which is already highlighted in blue in selected region map view in the Overview Blade.
STEP 2: Create Collections inside this cosmos Account
Now we have to create a collection, and for this setup, we have to pay to Microsoft.
A cosmos DB may have several databases, and each database may have one or more collections, and collection holds by a number of documents.
Add Collection
Click on Add Collection button under the Overview blade of your cosmosDB.
This will open a new Blade in the right aide area:
I named my database id as cosmosDB1 and to the collection as cosmosCollection1. You can give it any name.
Storage Capacity
- Unlimited Capacity – Cosmos DB automatically handles the partitioning server side(on Azure), and it is fully simulated on the local emulator.
Throughput
This is a request Unit (RU of throughput. This is a performance level which you want this collection to be delivered. The minimum throughput value is 400. Lower value delivers lower cost and lower performance and higher value deliver higher cost and high performance.
Partition Key
This is used to group multiple documents together within the physical partition.
Here I have created a collection cosmosCollection1.
Now click on that collection and then expand this collection. Let’s create a new document by clicking on “New Document” button as below screen :
New Document will show an area, where I can type in my first document data in JSON. Here each document needs a unique Id, let’s start with ID and other data. If we leave this part, cosmos DB generates a GUID key for this document.
I am using a sample data of library management system and clicked on save.
Cosmos DB will automatically assign an ID property to a GUID if you don't supply one yourself but in addition, Cosmos DB includes a few more auto-generated properties like _rid, _self, _etag, _ts etc.
- _rid : This is a unique ID that you cannot set and it never changes. So, every resource really has two IDs, the regular ID which you can set and even modify, and the resource ID which is always set automatically by Cosmos DB and is immutable. It won't ever change.
- _ts : It holds the date and time that the resource was last updated.
- _etag : This is used internally by Cosmos DB for optimistic concurrency checks. The etag is a binary value that always changes whenever the resource changes and so Cosmos DB can tell when one user is going to overwrite changes by another user by detecting a change in the etag. When this happens, the second user's update gets rejected which protects against overwriting the first user's changes.
- _self : The self-property represents the full path of the resource, expresses a hierarchy of nested resource IDs.
- _attachments : This property which when appended to the self-link, points to the attachments for the resource.
Now I have created one more document with the same data, but I added one property Author in the last. And We can see, cosmosDB created a new document with newly added guid.
So now we can see there are two documents added with two different unique guid IDs.(See on the left side area)
Now we can query using SQL, here c is an alias which belongs to our collection. And in Result, we get one matching document.
below is our Select statement to see how it works with where clause:
In above example, we fetched the document using where clause by ISBN number. And in the result, we got a matching document.
This was the basic idea of how collection and SQL query works. But in general, we do not write our documents like this. This editor sometimes can be useful for updating a little piece of code.
API Used: Here in the above example we used SQL API, which treats entities as JSON documents and these are hierarchical text document formatted in JSON.
There are other APIs too, we will learn them in detail.
Performance measuring features in CosmosDB
Latency
Latency means wait time or the time of getting a response to our request.
Response for a given request is faster in cosmos DB because data is distributed globally. Cosmos DB supports a range of consistency level which provides control over latency. Consistency means the data you are reading any point of time is the latest or not.
How to set consistency level in Azure cosmos DB
By default, “Session” consistency has been set on Azure cosmos DB, but as per your database need you can overwrite it by clicking on “Default Consistency” link. You can do this change from .net SDK also by choosing the overloaded document class constructor which accepts consistency level enum. From the Azure portal end, you will see the following screen where you can overwrite it -
Throughput
Throughput means how many requests can be served within a specific period of time. It ensures how cosmos DB handles concurrent database requests. Its unit is RU. RU depends upon how many numbers of request units per second that you asked for and pay for when you create a data container in cosmos DB.
Request Units
Request Unit (RU) can be said as a throughput currency because it’s collectively measures of computational cost for example CPU, memory, disk, I/O. network I/O etc.
Note: Request-Units are not request but rather all requests get charged in the form of request units. All requests are not equal, because all requests have different computational cost.
For example : Write is always cost more than Read, running a query on indexed data is always cost less than query over non-indexed data.
But they are deterministic if you run the same request again and again because they require the same number of RUs every time.
Request charges are very transparent. For example in our previous query we can see RUs by clicking on Query stats tab :
So we can see here, the RU cost is 2.96RUs.
Multi-model, multi-API support
Azure cosmos DB natively supports multiple data models including Document(JSON/BSON), graph, table, column-family, but the core content-model of database engine based on is ARS which is atom record sequence.
Difference between a Relational database and Document DB
Relational database
|
Document database
|
Tabular format. Rows
|
Documents in JSON format
|
Column
|
properties
|
Strongly typed schema
|
Schema-free. No pre-defined schema
|
Highly normalized. No column duplication. Data retrieval from multiple tables by using joins.
|
Typically denormalized. No need for ORM nor multiple joins. JSON document uses an array of embedded objects.
|
Cosmos DB resource Model
Documents in the cosmos DB are managed by a well-defined hierarchical structure of resources. These resources are highly available while allowing us to address each resource by a unique URI. We can understand this resource hierarchy by the following diagram :
At the top level, there is Account. Account’s name is used to compose the URI to access all child resources. After that, there is a Database, which is nothing but a logical container for collections. One or more container can reside inside a single database. After that, there is a collection, which holds a number of related documents. It’s a level where you select a pricing tier. After this, there is Document in the hierarchy which is a single entity represented by JSON.
Resource Granularity
Databases can contain Users who access to resources in the database. Each user can also contain Permissions that can restrict access to specific resources or operations. For example, you can restrict a user for read-only access to a collection.
A collection stores various programmatic logic in form of stored procedure, functions, triggers more like a SQL which you can reuse throughout various queries. But they are scoped only within the partition key.
Different APIs
Azure Cosmos DB can be accessed using four different APIs.
- Azure Cosmos DB (SQL) API
- MongoDB API
- Graph (Gremlin) API
- Tables (Key/Value) API
SQL API
What is SQL API
The SQL API in Azure Cosmos DB is a JavaScript and JSON native API based on the DocumentDB database engine. The SQL API also provides a query capability which is rooted in the structured query language.
But SQL is designed for relational databases, and Cosmos DB is a NoSQL database. So, what is the difference?
Well, they adapted the sql syntax for cosmos DB but rooting this SQL into JavaScript and JSON. The SQL API supports the execution of JavaScript logic within the database in the form of stored procedures, triggers, and user-defined functions. JavaScript logic can be executed in a transactional manner directly within the database engine.
The language still looks like a familiar SQL, but the semantics are all based on hierarchical JSON documents rather than relational tables, and you work with the JavaScript data types rather than SQL datatypes. Expressions are evaluated as JavaScript expression rather than T-SQL expression. If you're using the .NET SDK, then there's also a LINQ provider that can translate language-integrated queries that you write in C# into Cosmos DB SQL.
At the end, you will see several keywords from SQL, but JavaScript keywords are very limited.
How SQL works in denormalized database world of cosmos DB?
In cosmos DB , we don’t deal with row-column based structure of data, but we deal with semi-structured hierarchical documents with nested arrays and child objects. Here we use dotted notation in our queries to navigate a path to any property no matter how deeply nested it may be within the document.
REST API
Azure Cosmos DB exposes resources through the REST APIs that can be called by any language capable of making HTTP/HTTPS requests. The basic HTTP methods used in REST can also be used to access resources in Azure Cosmos DB.
The SQL API provides a rich SQL query interface for retrieving JSON documents and server-side support for transactional execution of JavaScript application logic. The query and script execution interfaces are available through all platform libraries as well as the REST APIs.
Microsoft offers programming libraries for several popular languages for the SQL API. Currently available libraries are :
Accessing Resources from REST-based URIs Concept
- .Net SDK
- Java SDK
- JavaScript SDK
- Python SDK
Accessing Resources from REST-based URIs Concept
As described previously (See figure-2 ), each resource in the hierarchy can be accessed using a URI, its concept is based on REST API design. We can understand this URI concept by the following diagram :
Above examples shows how these URIs work:
Note: Here we took sample IDs, but the ID of a resource is not based on the name you give in the portal or using a script. It is uniquely generated by the Azure Cosmos DB system.
Here in the above example, you may have noticed that the IDs used by Azure Cosmos DB “build” upon each other. For example, the Collection id begins with the Database id as a prefix.
Hey Viewers, I am ending this introductory section here, In my next blog I will come up with a more technical approach and with querying and programming using SQL API for Azure Cosmos DB.
Have a great learning with Azure Cosmos DB basics..!!
My Article: Introduction to Azure Cosmos DB
Wonderful ..... Helpful blog
ReplyDelete