Tutorial Import Data To Cosmos DB
Tutorial Import Data To Cosmos DB
Note
The Azure Cosmos DB Data Migration tool is an open source tool designed for small
migrations. For larger migrations, view our guide for ingesting data.
● SQL API - You can use any of the source options provided in the Data
Migration tool to import data at a small scale. Learn about migration
options for importing data at a large scale.
● Table API - You can use the Data Migration tool or AzCopy to import data.
For more information, see Import data for use with the Azure Cosmos DB
Table API.
● Azure Cosmos DB's API for MongoDB - The Data Migration tool doesn't
support Azure Cosmos DB's API for MongoDB either as a source or as a
target. If you want to migrate the data in or out of collections in Azure
Cosmos DB, refer to How to migrate MongoDB data to a Cosmos database
with Azure Cosmos DB's API for MongoDB for instructions. You can still use
the Data Migration tool to export data from MongoDB to Azure Cosmos DB
SQL API collections for use with the SQL API.
● Cassandra API - The Data Migration tool isn't a supported import tool for
Cassandra API accounts. Learn about migration options for importing data
into Cassandra API
● Gremlin API - The Data Migration tool isn't a supported import tool for
Gremlin API accounts at this time. Learn about migration options for
importing data into Gremlin API
Prerequisites
Before following the instructions in this article, ensure that you do the following steps:
Important
To make sure that the Data migration tool uses Transport Layer Security (TLS) 1.2 when
connecting to your Azure Cosmos accounts, use the .NET Framework version 4.7 or
follow the instructions found in this article.
Overview
The Data Migration tool is an open-source solution that imports data to Azure Cosmos
DB from a variety of sources, including:
● JSON files
● MongoDB
● SQL Server
● CSV files
● Azure Table storage
● Amazon DynamoDB
● HBase
● Azure Cosmos containers
While the import tool includes a graphical user interface (dtui.exe), it can also be driven
from the command-line (dt.exe). In fact, there's an option to output the associated
command after setting up an import through the UI. You can transform tabular source
data, such as SQL Server or CSV files, to create hierarchical relationships
(subdocuments) during import. Keep reading to learn more about source options,
sample commands to import from each source, target options, and viewing import
results.
Note
You should only use the Azure Cosmos DB migration tool for small migrations. For large
migrations, view our guide for ingesting data.
Installation
The migration tool source code is available on GitHub in this repository. You can
download and compile the solution locally, or download a pre-compiled binary, then
run either:
● Dtui.exe: Graphical interface version of the tool
● Dt.exe: Command-line version of the tool
● JSON files
● MongoDB
● MongoDB Export files
● SQL Server
● CSV files
● Azure Table storage
● Amazon DynamoDB
● Blob
● Azure Cosmos containers
● HBase
● Azure Cosmos DB bulk import
● Azure Cosmos DB sequential record import
● The <CosmosDB Endpoint> is the endpoint URI. You can get this value from
the Azure portal. Navigate to your Azure Cosmos account. Open
the Overview pane and copy the URI value.
● The <AccountKey> is the "Password" or PRIMARY KEY. You can get this value
from the Azure portal. Navigate to your Azure Cosmos account. Open
the Connection Strings or Keys pane, and copy the "Password"
or PRIMARY KEY value.
● The <CosmosDB Database> is the CosmosDB database name.
Example: AccountEndpoint=https://myCosmosDBName.documents.azure.com:443/;AccountK
ey=wJmFRYna6ttQ79ATmrTMKql8vPri84QBiHTt6oinFkZRvoe7Vv81x9sn6zlVlBY10bEPMgGM982wfYX
pWXWB9w==;Database=myDatabaseName
Note
Use the Verify command to ensure that the Cosmos DB account specified in the
connection string field can be accessed.
#Import a single JSON file and partition the data across 4 collections
dt.exe /s:JsonFile /s.Files:D:\\CompanyData\\Companies.json /t:DocumentDBBulk
/t.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB
Key>;Database=<CosmosDB Database>;" /t.Collection:comp[1-4] /t.PartitionKey:name
/t.CollectionThroughput:2500
If you're importing to a Cosmos account configured with Azure Cosmos DB's API for
MongoDB, follow these instructions.
With the MongoDB source importer option, you can import from a single MongoDB
collection, optionally filter documents using a query, and modify the document
structure by using a projection.
The connection string is in the standard MongoDB format:
mongodb://<dbuser>:<dbpassword>@<host>:<port>/<database>
Note
Use the Verify command to ensure that the MongoDB instance specified in the
connection string field can be accessed.
Enter the name of the collection from which data will be imported. You may optionally
specify or provide a file for a query, such as {pop: {$gt:5000}}, or a projection, such
as {loc:0}, to both filter and shape the data that you're importing.
ConsoleCopy
#Import all documents from a MongoDB collection
dt.exe /s:MongoDB
/s.ConnectionString:mongodb://<dbuser>:<dbpassword>@<host>:<port>/<database>
/s.Collection:zips /t:DocumentDBBulk
/t.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB
Key>;Database=<CosmosDB Database>;" /t.Collection:BulkZips /t.IdField:_id
/t.CollectionThroughput:2500
#Import documents from a MongoDB collection which match the query and exclude the
loc field
dt.exe /s:MongoDB
/s.ConnectionString:mongodb://<dbuser>:<dbpassword>@<host>:<port>/<database>
/s.Collection:zips /s.Query:{pop:{$gt:50000}} /s.Projection:{loc:0}
/t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=<CosmosDB
Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;"
/t.Collection:BulkZipsTransform /t.IdField:_id/t.CollectionThroughput:2500
If you're importing to an Azure Cosmos DB account with support for MongoDB, follow
these instructions.
The MongoDB export JSON file source importer option allows you to import one or
more JSON files produced from the mongoexport utility.
When adding folders that have MongoDB export JSON files for import, you have the
option of recursively searching for files in subfolders.
ConsoleCopy
dt.exe /s:MongoDBExport /s.Files:D:\mongoemployees.json /t:DocumentDBBulk
/t.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB
Key>;Database=<CosmosDB Database>;" /t.Collection:employees /t.IdField:_id
/t.Dates:Epoch /t.CollectionThroughput:2500
The format of the connection string is the standard SQL connection string format.
Note
Use the Verify command to ensure that the SQL Server instance specified in the
connection string field can be accessed.
{ "id": "956", "Name": "Finer Sales and Service", "Address": { "AddressType": "Main
Office", "AddressLine1": "#500-75 O'Connor Street", "Location": { "City": "Ottawa",
"StateProvinceName": "Ontario" }, "PostalCode": "K4B 1S2", "CountryRegionName":
"Canada" } }
ConsoleCopy
#Import records from SQL which match a query
dt.exe /s:SQL /s.ConnectionString:"Data Source=<server>;Initial
Catalog=AdventureWorks;User Id=advworks;Password=<password>;" /s.Query:"select
CAST(BusinessEntityID AS varchar) as Id, * from Sales.vStoreWithAddresses WHERE
AddressType='Main Office'" /t:DocumentDBBulk /t.ConnectionString:"
AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB
Database>;" /t.Collection:Stores /t.IdField:Id /t.CollectionThroughput:2500
#Import records from sql which match a query and create hierarchical relationships
dt.exe /s:SQL /s.ConnectionString:"Data Source=<server>;Initial
Catalog=AdventureWorks;User Id=advworks;Password=<password>;" /s.Query:"select
CAST(BusinessEntityID AS varchar) as Id, Name, AddressType as
[Address.AddressType], AddressLine1 as [Address.AddressLine1], City as
[Address.Location.City], StateProvinceName as
[Address.Location.StateProvinceName], PostalCode as [Address.PostalCode],
CountryRegionName as [Address.CountryRegionName] from Sales.vStoreWithAddresses
WHERE AddressType='Main Office'" /s.NestingSeparator:. /t:DocumentDBBulk
/t.ConnectionString:" AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB
Key>;Database=<CosmosDB Database>;" /t.Collection:StoresSub /t.IdField:Id
/t.CollectionThroughput:2500
The import tool tries to infer type information for unquoted values in CSV files (quoted
values are always treated as strings). Types are identified in the following order:
number, datetime, boolean.
ConsoleCopy
dt.exe /s:CsvFile /s.Files:.\Employees.csv /t:DocumentDBBulk
/t.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB
Key>;Database=<CosmosDB Database>;" /t.Collection:Employees /t.IdField:EntityID
/t.CollectionThroughput:2500
You may output data that was imported from Azure Table Storage to Azure Cosmos DB
tables and entities for use with the Table API. Imported data can also be output to
collections and documents for use with the SQL API. However, Table API is only
available as a target in the command-line utility. You can't export to Table API by using
the Data Migration tool user interface. For more information, see Import data for use
with the Azure Cosmos DB Table API.
The format of the Azure Table storage connection string is:
DefaultEndpointsProtocol=<protocol>;AccountName=<Account Name>;AccountKey=<Account
Key>;
Note
Use the Verify command to ensure that the Azure Table storage instance specified in
the connection string field can be accessed.
Enter the name of the Azure table from to import from. You may optionally specify
a filter.
The Azure Table storage source importer option has the following additional options:
ConsoleCopy
dt.exe /s:AzureTable
/s.ConnectionString:"DefaultEndpointsProtocol=https;AccountName=<Account
Name>;AccountKey=<Account Key>" /s.Table:metrics /s.InternalFields:All
/s.Filter:"PartitionKey eq 'Partition1' and RowKey gt '00001'"
/s.Projection:ObjectCount;ObjectSize /t:DocumentDBBulk /t.ConnectionString:"
AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB
Database>;" /t.Collection:metrics /t.CollectionThroughput:2500
Note
Use the Verify command to ensure that the Amazon DynamoDB instance specified in
the connection string field can be accessed.
ConsoleCopy
dt.exe /s:DynamoDB
/s.ConnectionString:ServiceURL=https://dynamodb.us-east-1.amazonaws.com;AccessKey=
<accessKey>;SecretKey=<secretKey> /s.Request:"{ """TableName""":
"""ProductCatalog""" }" /t:DocumentDBBulk
/t.ConnectionString:"AccountEndpoint=<Azure Cosmos DB Endpoint>;AccountKey=<Azure
Cosmos DB Key>;Database=<Azure Cosmos database>;" /t.Collection:catalogCollection
/t.CollectionThroughput:2500
ConsoleCopy
dt.exe /s:JsonFile /s.Files:"blobs://<account
key>@account.blob.core.windows.net:443/importcontainer/.*" /t:DocumentDBBulk
/t.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB
Key>;Database=<CosmosDB Database>;" /t.Collection:doctest
You can retrieve the Azure Cosmos DB account connection string from the Keys page of
the Azure portal, as described in How to manage an Azure Cosmos DB account.
However, the name of the database needs to be appended to the connection string in
the following format:
Database=<CosmosDB Database>;
Note
Use the Verify command to ensure that the Azure Cosmos DB instance specified in the
connection string field can be accessed.
To import from a single Azure Cosmos container, enter the name of the collection to
import data from. To import from more than one Azure Cosmos container, provide a
regular expression to match one or more collection names (for example, collection01 |
collection02 | collection03). You may optionally specify, or provide a file for, a query to
both filter and shape the data that you're importing.
Note
Since the collection field accepts regular expressions, if you're importing from a single
collection whose name has regular expression characters, then those characters must
be escaped accordingly.
The Azure Cosmos DB source importer option has the following advanced options:
Tip
The import tool defaults to connection mode DirectTcp. If you experience firewall
issues, switch to connection mode Gateway, as it only requires port 443.
Here are some command-line samples to import from Azure Cosmos DB:
ConsoleCopy
#Migrate data from one Azure Cosmos container to another Azure Cosmos containers
dt.exe /s:DocumentDB /s.ConnectionString:"AccountEndpoint=<CosmosDB
Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;"
/s.Collection:TEColl /t:DocumentDBBulk /t.ConnectionString:"
AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB
Database>;" /t.Collection:TESessions /t.CollectionThroughput:2500
#Migrate data from more than one Azure Cosmos container to a single Azure Cosmos
container
dt.exe /s:DocumentDB /s.ConnectionString:"AccountEndpoint=<CosmosDB
Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;"
/s.Collection:comp1|comp2|comp3|comp4 /t:DocumentDBBulk
/t.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB
Key>;Database=<CosmosDB Database>;" /t.Collection:singleCollection
/t.CollectionThroughput:2500
The Azure Cosmos DB Data Import Tool also supports import of data from the Azure
Cosmos DB Emulator. When importing data from a local emulator, set the endpoint
to https://localhost:<port>.
ServiceURL=<server-address>;Username=<username>;Password=<password>
Note
Use the Verify command to ensure that the HBase instance specified in the connection
string field can be accessed.
ConsoleCopy
dt.exe /s:HBase
/s.ConnectionString:ServiceURL=<server-address>;Username=<username>;Password=<pass
word> /s.Table:Contacts /t:DocumentDBBulk
/t.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB
Key>;Database=<CosmosDB Database>;" /t.Collection:hbaseimport
The Azure Cosmos DB account connection string can be retrieved from the Keys page
of the Azure portal, as described in How to manage an Azure Cosmos DB account,
however the name of the database needs to be appended to the connection string in
the following format:
Database=<CosmosDB Database>;
Note
Use the Verify command to ensure that the Azure Cosmos DB instance specified in the
connection string field can be accessed.
To import to a single collection, enter the name of the collection to import data from
and click the Add button. To import to more than one collection, either enter each
collection name individually or use the following syntax to specify more than one
collection: collection_prefix[start index - end index]. When specifying more than one
collection using the aforementioned syntax, keep the following guidelines in mind:
1. Only integer range name patterns are supported. For example, specifying
collection[0-3] creates the following collections: collection0, collection1,
collection2, collection3.
2. You can use an abbreviated syntax: collection[3] creates the same set of
collections mentioned in step 1.
3. More than one substitution can be provided. For example, collection[0-1]
[0-9] generates 20 collection names with leading zeros (collection01, ..02,
..03).
Once the collection name(s) have been specified, choose the desired throughput of the
collection(s) (400 RUs to 10,000 RUs). For best import performance, choose a higher
throughput. For more information about performance levels, see Performance levels in
Azure Cosmos DB.
Note
The performance throughput setting only applies to collection creation. If the specified
collection already exists, its throughput won't be modified.
When you import to more than one collection, the import tool supports hash-based
sharding. In this scenario, specify the document property you wish to use as the
Partition Key. (If Partition Key is left blank, documents are sharded randomly across the
target collections.)
You may optionally specify which field in the import source should be used as the
Azure Cosmos DB document ID property during the import. If documents don't have
this property, then the import tool generates a GUID as the ID property value.
There are a number of advanced options available during import. First, while the tool
includes a default bulk import stored procedure (BulkInsert.js), you may choose to
specify your own import stored procedure:
Additionally, when importing date types (for example, from SQL Server or MongoDB),
you can choose between three import options:
The Azure Cosmos DB Bulk importer has the following additional advanced options:
1. Batch Size: The tool defaults to a batch size of 50. If the documents to be
imported are large, consider lowering the batch size. Conversely, if the
documents to be imported are small, consider raising the batch size.
2. Max Script Size (bytes): The tool defaults to a max script size of 512 KB.
3. Disable Automatic Id Generation: If every document to be imported has an
ID field, then selecting this option can increase performance. Documents
missing a unique ID field aren't imported.
4. Update Existing Documents: The tool defaults to not replacing existing
documents with ID conflicts. Selecting this option allows overwriting
existing documents with matching IDs. This feature is useful for scheduled
data migrations that update existing documents.
5. Number of Retries on Failure: Specifies how often to retry the connection
to Azure Cosmos DB during transient failures (for example, network
connectivity interruption).
6. Retry Interval: Specifies how long to wait between retrying the connection
to Azure Cosmos DB in case of transient failures (for example, network
connectivity interruption).
7. Connection Mode: Specifies the connection mode to use with Azure
Cosmos DB. The available choices are DirectTcp, DirectHttps, and
Gateway. The direct connection modes are faster, while the gateway mode
is more firewall friendly as it only uses port 443.
Tip
The import tool defaults to connection mode DirectTcp. If you experience firewall
issues, switch to connection mode Gateway, as it only requires port 443.
You can retrieve the connection string for the Azure Cosmos DB account from the Keys
page of the Azure portal, as described in How to manage an Azure Cosmos DB account.
However, the name of the database needs to be appended to the connection string in
the following format:
Note
Use the Verify command to ensure that the Azure Cosmos DB instance specified in the
connection string field can be accessed.
To import to a single collection, enter the name of the collection to import data into,
and then click the Add button. To import to more than one collection, enter each
collection name individually. You may also use the following syntax to specify more
than one collection: collection_prefix[start index - end index]. When specifying more
than one collection via the aforementioned syntax, keep the following guidelines in
mind:
1. Only integer range name patterns are supported. For example, specifying
collection[0-3] creates the following collections: collection0, collection1,
collection2, collection3.
2. You can use an abbreviated syntax: collection[3] creates the same set of
collections mentioned in step 1.
3. More than one substitution can be provided. For example, collection[0-1]
[0-9] creates 20 collection names with leading zeros (collection01, ..02,
..03).
Once the collection name(s) have been specified, choose the desired throughput of the
collection(s) (400 RUs to 250,000 RUs). For best import performance, choose a higher
throughput. For more information about performance levels, see Performance levels in
Azure Cosmos DB. Any import to collections with throughput >10,000 RUs require a
partition key. If you choose to have more than 250,000 RUs, you need to file a request
in the portal to have your account increased.
Note
The throughput setting only applies to collection or database creation. If the specified
collection already exists, its throughput won't be modified.
When importing to more than one collection, the import tool supports hash-based
sharding. In this scenario, specify the document property you wish to use as the
Partition Key. (If Partition Key is left blank, documents are sharded randomly across the
target collections.)
You may optionally specify which field in the import source should be used as the
Azure Cosmos DB document ID property during the import. (If documents don't have
this property, then the import tool generates a GUID as the ID property value.)
There are a number of advanced options available during import. First, when importing
date types (for example, from SQL Server or MongoDB), you can choose between
three import options:
The import tool defaults to connection mode DirectTcp. If you experience firewall
issues, switch to connection mode Gateway, as it only requires port 443.
● Default. This policy is best when you perform equality queries against
strings. It also works if you use ORDER BY, range, and equality queries for
numbers. This policy has a lower index storage overhead than Range.
● Range. This policy is best when you use ORDER BY, range, and equality
queries on both numbers and strings. This policy has a higher index
storage overhead than Default or Hash.
Note
If you don't specify an indexing policy, then the default policy is applied. For more
information about indexing policies, see Azure Cosmos DB indexing policies.
JSONCopy
[{"id":"Sample","Title":"About
Paris","Language":{"Name":"English"},"Author":{"Name":"Don","Location"
:{"City":"Paris","Country":"France"}},"Content":"Don's document in
Azure Cosmos DB is a valid JSON document as defined by the JSON
spec.","PageViews":10000,"Topics":[{"Title":"History of
Paris"},{"Title":"Places to see in Paris"}]}]
JSONCopy
[
{
"id": "Sample",
"Title": "About Paris",
"Language": {
"Name": "English"
},
"Author": {
"Name": "Don",
"Location": {
"City": "Paris",
"Country": "France"
}
},
"Content": "Don's document in Azure Cosmos DB is a valid JSON
document as defined by the JSON spec.",
"PageViews": 10000,
"Topics": [
{
"Title": "History of Paris"
},
{
"Title": "Places to see in Paris"
}
]
}]
Here is a command-line sample to export the JSON file to Azure Blob storage:
ConsoleCopy
dt.exe /ErrorDetails:All /s:DocumentDB
/s.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB
Key>;Database=<CosmosDB database_name>" /s.Collection:<CosmosDB collection_name>
/t:JsonFile /t.File:"blobs://<Storage account key>@<Storage account
name>.blob.core.windows.net:443/<Container_name>/<Blob_name>"
/t.Overwrite
Advanced configuration
In the Advanced configuration screen, specify the location of the log file to which you
would like any errors written. The following rules apply to this page:
1. If a file name isn't provided, then all errors are returned on the Results
page.
2. If a file name is provided without a directory, then the file is created (or
overwritten) in the current environment directory.
3. If you select an existing file, then the file is overwritten, there's no append
option.
4. Then, choose whether to log all, critical, or no error messages. Finally,
decide how frequently the on-screen transfer message is updated with its
progress.
Confirm import settings and view command line
1. After you specify the source information, target information, and advanced
configuration, review the migration summary and view or copy the
resulting migration command if you want. (Copying the command is useful
to automate import operations.)
2. Once you’re satisfied with your source and target options, click Import.
The elapsed time, transferred count, and failure information (if you didn't
provide a file name in the Advanced configuration) update as the import is
in process. Once complete, you can export the results (for example, to
deal with any import failures).
3. You may also start a new import by either resetting all values or keeping
the existing settings. (For example, you may choose to keep connection
string information, source and target choice, and more.)