The Cassandra Data Model
The Cassandra Data Model
For developers new to Cassandra and coming from a relational database background, the data model can be a bit confusing. The following section provides a comparison of the two.
In Cassandra, the keyspace is the container for your application data, similar to a database or schema in a relational database. Inside the keyspace are one or more column family objects, which are analogous to tables. Column families contain columns, and a set of related columns is identified by an application-supplied row key. Each row in a column family is not required to have the same set of columns. Cassandra does not enforce relationships between column families the way that relational databases do between tables: there are no formal foreign keys in Cassandra, and joining column families at query time is not supported. Each column family has a self-contained set of columns that are intended to be accessed together to satisfy specific queries from your application. For example, using the blog application example, you might have a column family for user data and blog entries similar to the relational model. Other column families (or secondary indexes) could then be added to support the queries your application needs to perform. For example, to answer the queries: What users subscribe to my blog? Show me all of the blog entries about fashion. Show me the most recent entries for the blogs I subscribe to. You need to design additional column families (or add secondary indexes) to support those queries. Keep in mind that some denormalization of data is usually required.
About Keyspaces
In Cassandra, the keyspace is the container for your application data, similar to a schema in a relational database. Keyspaces are used to group column families together. Typically, a cluster has one keyspace per application. Replication is controlled on a per-keyspace basis, so data that has different replication requirements should reside in different keyspaces. Keyspaces are not designed to be used as a significant map layer within the data model, only as a way to control data replication for a set of column families.
Defining Keyspaces
Data Definition Language (DDL) commands for defining and altering keyspaces are provided in the various client interfaces, such as Cassandra CLI and CQL. For example, to define a keyspace in CQL: CREATE KEYSPACE keyspace_name WITH strategy_class = 'SimpleStrategy' AND strategy_options:replication_factor='2'; Or in Cassandra CLI: CREATE KEYSPACE keyspace_name WITH placement_strategy = 'SimpleStrategy' AND strategy_options = {replication_factor:2}; See Getting Started Using the Cassandra CLI and Getting Started with CQL for more information on DDL commands for Cassandra.
from distutils.core import setup from os.path import abspath, join, dirname setup( name='ccm', version='1.0dev', description='Cassandra Cluster Manager', long_description=open(abspath(join(dirname(__file__), 'README'))).read(), author='Sylvain Lebresne', author_email='[email protected]', url='https://github.com/pcmanus/ccm', packages=['ccmlib', 'ccmlib.cmds'], scripts=['ccm'], requires=['pyYaml'], classifiers=[ "License :: OSI Approved :: Apache Software License", "Programming Language :: Python", ], )