Tutorial ======== .. warning:: this document must be rewritten from scratch What does Doqu do? ------------------ Why document-oriented? ---------------------- Why not just use the library X for database Y? ---------------------------------------------- Native Python bindings exist for most databases. It is preferable to use a dedicated library if you are absolutely sure that your code will never be used with another database. But there are two common use cases when *Doqu* is much more preferable: a) prototyping: if you are unsure about which database fits your requirements best and wish to test various databases against your code, just write your code with *Doqu* and then try switching backends to see which performs best. Then optimize the code for it. b) reusing the code: if you expect the module to be plugged into an application with unpredictable settings, use *Doqu*. Of course we are talking about document databases. For relational databases you would use an ORM. What are "backends"? -------------------- .. warning:: this section is out of date Docu can be used with a multitude of databases providing a uniform API for retrieving, storing, removing and searching of records. To couple Docu with a database, a storage/query backend is needed. A "**backend**" is a module that provides two classes: `Storage` and `Query`. Both must conform to the basic specifications (see basic specs below). Backends may not be able to implement all default methods; they may also provide some extra methods. The **Storage** class is an interface for the database. It allows to add, read, create and update records by primary keys. You will not use this class directly in your code. The **Query** class is what you will talk to when filtering objects of a model. There are no constraints on how the search conditions should be represented. This is likely to cause some problems when you switch from one backend to another. Some guidlines will be probably defined to address the issue of portability. For now we try to ensure that all default backends share the conventions defined by the Tokyo Tyrant backend. Switching backends ------------------ .. warning:: this section is out of date Let's assume we have a Tokyo Cabinet database. You can choose the TC backend to use the DB file :doc:`directly ` or access the same file :doc:`through the manager `. The first option is great for development and some other cases where you would use SQLite; the second option is important for most production environments where multiple connections are expected. The good news is that there's no more import and export, dump/load sequences, create/alter/drop and friends. Having tested the application against the database `storage.tct` with Cabinet backend, just run `ttserver storage.tct` and switch the backend config. Let's create our application:: import docu import settings from models import Country, Person storage = docu.get_storage(settings.DATABASE) print Person.objects(storage) # prints all Person objects from DB Now define settings for both backends (settings.py):: # direct access to the database (simple, not scalable) TOKYO_CABINET_DATABASE = { 'backend': 'docu.ext.tokyo_cabinet', 'kind': 'TABLE', 'path': 'storage.tct', } # access through the Tyrant manager (needs daemon, scalable) TOKYO_TYRANT_DATABASE = { 'backend': 'docu.ext.tokyo_tyrant', 'host': 'localhost', 'port': 1978, } # this is the *only* line you need to change in order to change the backend DATABASE = TOKYO_CABINET_DATABASE A few words on what a model is ------------------------------ .. warning:: this section is out of date First off, what is a model? Well, it's something that represents an object. The object can be stored in a database. We can fetch it from there, modify and push back. How is a model different from a Python dictionary then? Easy. Dictionaries know nothing about where the data came from, what parts of it are important for us, how the values should be converted to and fro, and how should the data be validated before it is stored somewhere. A model of an apple *does* know what properties should an object have to be a Proper Apple; what can be done the apple so that it does not stop being a Proper Apple; and where does the apple belong so it won't be in the way when it isn't needed anymore. In other words, the model is an answer to questions *what*, *where* and *how* about a document. And a dictionary *is* a document (or, more precisely, a simple representation of the document in given environment). Working with documents ---------------------- .. warning:: this section is out of date A :term:`document` is basically a "dictionary on steroids". Let's create a document:: >>> from docu import * >>> document = Document(foo=123, bar='baz') >>> document['foo'] 123 >>> document['foo'] = 456 Well, any dictionary can do that. But wait:: >>> db = get_db(backend='docu.ext.shove') >>> document.save(db) 'new-primary-key' >>> Document.objects(db) [] >>> fetched = Document.objects(db)[0] >>> document == fetched True >>> fetched['bar'] 'baz' Aha, so :class:`~docu.document_base.Document` supports persistence! Nice. By the way, how about some syntactic sugar? Here:: class MyDoc(Document): use_dot_notation = True That's the same good old `Document` but with "dot notation" switched on. It allows access to keys with ``__getattr__`` as well as with ``__getitem__``:: >>> my_doc = MyDoc(foo=123) >>> my_doc.foo 123 Of course this will only work with alphanumeric keys. Now let's say we are going to make a little address book. We don't want any "foo" or "bar, just the relevant information. And the "foo" key should not be allowed in such documents. Can we restrict the :term:`structure ` to certain keys and data types? Let's see:: class Person(Document): structure = {'name': unicode, 'email': unicode} Great, now the names and values are controlled. The document will raise an exception when someone, say, attempts to put a number instead of the email. .. note:: Any built-in type will do; some classes are also accepted (like `datetime.date` et al). Even Document instances are accepted: they are interpreted as references. The exact set of supported types and classes is defined per storage backend because the data must be (de)serialized. It is possible to register custom converters in runtime. (Note that the values can be `None`.) But what if we need to mark some fields as required? Or what if the email is indeed a unicode string but its content has nothing to do with RFC 5322? We need to prevent malformed data from being saved into the database. That's the daily job for :term:`validators `:: from docu.validators import * class Person(Document): structure = { 'name': unicode, 'email': unicode, } validators = { 'name': [required()], 'email': [optional(), email()], } This will only allow correct data into the storage. .. note:: At this point you may ask why are the definitions so verbose. Why not Field classes à la Django? Well, they *can* be added on top of what's described here. Actually Docu ships with :doc:`ext_fields` so you can easily write:: class Person(Document): name = Field(unicode, required=True) email = EmailField() # this class is not implemented but can be Why isn't this approach used by default? Well, it turned out that such classes introduce more problems than they solve. Too much magic, you know. Also, they quickly become a name + clutter thing. Compact but unreadable. So we adopted the MongoKit approach, i.e. semantic grouping of attributes. And — guess what? — the document classes became **much** easier to understand. Despite the definitions are a bit longer. And remember, it is always possible to add syntax sugar, but it's usually extremely hard to *remove* it. And now, surprise: validators do an extra favour for us! Look:: XXX an example of query; previously defined documents are not shown because records are filtered by validators More questions? --------------- If you can't find the answer to your questions on Docu in the documentation, feel free to ask in the `discussion group`_. .. _discussion group: http://groups.google.com/group/docu-users ------------ XXXXXXXXXX The part below is outdated ---------------- The Document behaves Let's observe the object thoroughly and conclude that colour is an important distinctive feature of this... um, sort of thing:: class Thing(Document): structure = { 'colour': unicode } Great, now *that's* a model. It recognizes a property as significant. Now we can compare, search and distinguish *objects* by colour (and its presence or lack). Obviously, if colour is an applicable property for an object, then it *belongs* to this model. A more complete example which will look familiar to those who had ever used an ORM (e.g. the Django one):: import datetime from docu import * class Country(Document): structure = { 'name': unicode # any Python type; default is unicode } validators = { 'type': [AnyOf(['country'])] } def __unicode__(self): return self['name'] class Person(Document): structure = { 'first_name': unicode, 'last_name': unicode, 'gender': unicode, 'birth_date': datetime.date, 'birth_place': Country, # reference to another model } validators = { 'first_name': [required()], 'last_name': [required()], } use_dot_notation = True def __unicode__(self): return u'{first_name} {last_name}'.format(**self) @property def age(self): return (datetime.datetime.now().date() - self.birth_date).days / 365 The interesting part is the Meta subclass. It contains a must_have attribute which actually binds the model to a subset of data in the storage. ``{'first_name__exists': True}`` states that a data row/document/... must have the field `first_name` defined (not necessarily non-empty). You can easily define any other query conditions (currently with respect to the backend's syntax but we hope to unify things). When you create an empty model instance, it will have all the "must haves" pre-filled if they are not complex lookups (e.g. `Country` will have its `type` set to `True`, but we cannot do that with `Person`'s constraints). Inheritance ----------- .. warning:: this section is out of date Let's define another model:: class Woman(Person): class Meta: must_have = {'gender': 'female'} Or even that one:: today = datetime.datetime.now() day_16_years_back = now - datetime.timedelta(days=16*365) class Child(Person): parent = Reference(Person) class Meta: must_have = {'birth_date__gte': day_16_years_back} Note that our `Woman` or `Child` models are subclasses of `Person` model. They inherit all attributes of `Person`. Moreover, `Person`'s metaclass is inherited too. The `must_have` dictionaries of `Child` and `Woman` models are `merged` into the parent model's dictionary, so when we query the database for records described by the `Woman` model, we get all records that have `first_name` and `last_name` defined and `gender` set to "female". When we edit a `Person` instance, we do not care about the `parent` attribute; we actually don't even have access to it. Model is a query, not a container --------------------------------- .. warning:: this section is out of date We can even deal with data described above without model inheritance. Consider this valid model -- `LivingBeing`:: class LivingBeing(Model): species = Property() birth_date = Property() class Meta: must_have = {'birth_date__exists': True} The data described by `LivingBeing` overlaps the data described by `Person`. Some people have their birth dates not deifined and `Person` allows that. However, `LivingBeing` requires this attribute, so not all people will appear in a query by this model. At the same time `LivingBeing` does not require names, so anybody and anything, named or nameless, but ever born, is a "living being". Updating a record through any of these models will not touch data that the model does not know. For instance, saving an entity as a `LivingBeing` will not remove its name or parent, and working with it as a `Child` will neither expose nor destroy the information about species. These examples illustrate how models are more "views" than "schemata". Now let's try these models with a Tokyo Cabinet database:: >>> db = docu.get_db( ... backend = 'docu.ext.tokyo_cabinet', ... path = 'test.tct' ... ) >>> guido = Person(first_name='Guido', last_name='van Rossum') >>> guido >>> guido.first_name Guido >>> guido.birth_date = datetime.date(1960, 1, 31) >>> guido.save(db) # returns the autogenerated primary key 'person_0' >>> ppl_named_guido = Person.objects(db).where(first_name='Guido') >>> ppl_named_guido [] >>> guido = ppl_named_guido[0] >>> guido.age # calculated on the fly -- datetime conversion works 49 >>> guido.birth_place = Country(name='Netherlands') >>> guido.save() # model instance already knows the storage it belongs to 'person_0' >>> guido.birth_place >>> Country.objects(db) # yep, it was saved automatically with Guido [] >>> larry = Person(first_name='Larry', last_name='Wall') >>> larry.save(db) 'person_2' >>> Person.objects(db) [, ] ...and so on. Note that relations are supported out of the box.