"Databases are Categories"

Vasili I. Galchin

10 Aug 2010 10 Aug '10

2:12 a.m.

Hello, I stumbled across this tech talk: http://www.galois.com/blog/2010/05/27/tech-talk-categories-are-databases/ I was wondering what others in this mail list think about Spivak's thesis. I apologize if already posted. Regards, Vasili [For admin and other information see: http://www.mta.ca/~cat-dist/ ]

Attachments:

attachment.bin (multipart/alternative — 47 bytes)

Show replies by date

Mattias Wikström

15 Aug 15 Aug

7:20 a.m.

I think it makes sense to regard database schemas as theories and databases as models of such theories. Given a theory T that models some database schema S, a term in the language of T may be thought of as a "database query" (to obtain the result of the query, simplify the term), while a statement that two terms in the language of T are equal may be thought of as a "database constraint" that one may want to add to S. (In practise, though, one may want to formulate ones queries and contraints not in the language of T but in languages somehow obtained from that language.) What sort of theory should a database schema be? This surely depends on what exactly one is trying to model: A schema in Company A's DBMS (database management system) is rarely the same thing as a schema in Company B's DBMS, and in any case one probably wants to work with some idealised mathematical model. David Spivak seems to offer two different answers. On the one hand, a database schema may be the same thing as a category. On the other hand, a database schema may be a labeled simplicial set. Both answers may be found at http://www.uoregon.edu/~dspivak/cs/ . Mattias Wikstrom ----------------------------------------

...

Date: Mon, 9 Aug 2010 11:12:34 -0500 Subject: categories: "Databases are Categories" (again) From: vigalchin@gmail.com To: categories@mta.ca

Hello,

I stumbled across this tech talk: http://www.galois.com/blog/2010/05/27/tech-talk-categories-are-databases/ I was wondering what others in this mail list think about Spivak's thesis. I apologize if already posted.

Regards,

Vasili

[For admin and other information see: http://www.mta.ca/~cat-dist/ ]

Pym, Professor David J.

16 Aug 16 Aug

4:25 a.m.

John Cartmell (who introduced contextual categories and generalized algebraic theories as models of dependent types in his 1978 Oxford D.Phil thesis and a 1986 Ann. Pure App. Logic paper) worked on topics related to this thread some years ago. Another 1986 paper, 'Formalizing the Network and Hierarchical Data Models --- an Application of Categorical Logic', CATEGORY THEORY AND COMPUTER PROGRAMMING LNCS, 1986, Volume 240/1986, 466-492, DOI: 10.1007/3-540-17162-2_138, can be found via the following link: http://www.springerlink.com/content/y31234tkk63wp56k/ 'Abstract. We have noted that data modelling and conceptual modelling have content and performance as their concerns. For the different data models, the Network and the Hierarchic, we have given logics involving the operations which are physically supported according to the data model. The logics are sensitive to performance in a way that classical logic is not. We have now suggested how we might formalise this. Network and Hierarchical databases have the functional inverse or family as their primitive of organisation. To formalise the Network model we have given a general definition of network category which seems to generalise correctly the hierarchical logic of contextual categories.' There is also relevant work on categories and logic programming. David Pym -- Professor David J. Pym, MA, PhD, ScD, FBCS, CITP, FIMA, CMath, CSci 6th Century Chair in Logic University of Aberdeen Scotland +44 (0)1 224 27 4577 d.j.pym@abdn.ac.uk http://www.abdn.ac.uk/~csc335 The University of Aberdeen is a charity registered in Scotland, No SC013683. [For admin and other information see: http://www.mta.ca/~cat-dist/ ]

Dr. Cyrus F Nourani

17 Aug 17 Aug

2:33 a.m.

Hello, thought I state the "obvious"? since I had written an ACM review for newer techniques to model database and program transformation techniques a couple? years ago. There were papers by our UCLA Semantics and TCS group two decades ago with publications on database algebras. Further specificson databases and algebras were at TU Berlin. I enclose an excerpt: [There are publications at UCLA,JACM, and TU Berlin since 1979 that address abstract algorithms, programs, morphisms, and database models and schemas that are benefit the areas.There are mapping techniques that can be applied to the data processing realms: There is a concrete syntax route, e.g., untyped canonical mappings sparse trees using language independent format-universal types, generative mappings- metaprogram- GDK grammar deployment GUI, XML data biding, and the problem of providing an object model that is to represent an XML schema are stated. Graph Transformation for Model Refactoring deploys models and transformations as primary aritifcats. The techniques presented are to use graph model representations and apply graph transformations at the model refactoring arena. Refactoring (Opdyke) is changes to the internal program structure to improve without changing the external introduce model refraction as a new transformation. The Berlin school AGG is applied on graph grammars and to specify model refactoring.] CyrusFN -- Akdmkrd.tripod.com Projektakdmkrd.tripod.com On Sun, Aug 15, 2010 at 11:25 AM, Pym, Professor David J. < d.j.pym@abdn.ac.uk> wrote:

...

John Cartmell (who introduced contextual categories and generalized algebraic theories as models of dependent types in his 1978 Oxford D.Phil thesis and a 1986 Ann. Pure App. Logic paper) worked on topics related to this thread some years ago. Another 1986 paper, 'Formalizing the Network and Hierarchical Data Models --- an Application of Categorical Logic', CATEGORY THEORY AND COMPUTER PROGRAMMING LNCS, 1986, Volume 240/1986, 466-492, DOI: 10.1007/3-540-17162-2_138, can be found via the following link:

http://www.springerlink.com/content/y31234tkk63wp56k/

'Abstract. We have noted that data modelling and conceptual modelling have content and performance as their concerns. For the different data models, the Network and the Hierarchic, we have given logics involving the operations which are physically supported according to the data model. The logics are sensitive to performance in a way that classical logic is not. We have now suggested how we might formalise this. Network and Hierarchical databases have the functional inverse or family as their primitive of organisation. To formalise the Network model we have given a general definition of network category which seems to generalise correctly the hierarchical logic of contextual categories.'

There is also relevant work on categories and logic programming.

David Pym

[For admin and other information see: http://www.mta.ca/~cat-dist/ ]

Zinovy Diskin

2:07 a.m.

2010/8/14 Mattias Wikström <mattias.wikstrom@gmail.com>:

...

I think it makes sense to regard database schemas as theories and databases as models of such theories. Given a theory T that models

yes, this is the basic underlying assumption in a series of papers of Michael Johnson and the moderator of this list.

...

some database schema S, a term in the language of T may be thought of as a "database query" (to obtain the result of the query, simplify the term),

a query is a formula (defining a derived table) rather than a term while a statement that two terms in the language of T are equal

...

may be thought of as a "database constraint" that one may want to add to S.

there are many types of non-equational constraints, for example, inclusion P=>Q with P,Q formulas

...

What sort of theory should a database schema be? This surely depends on what exactly one is trying to model: A schema in Company A's DBMS (database management system) is rarely the same thing as a schema in Company B's DBMS, and in any case one probably wants to work with some idealised mathematical model.

a good question is what the place of SQL on the scale of logical doctrines is. Since models of SQL-theories include predefined infinite domains with operations (think of Integer and String), and finite domains for tables, model theory for SQL is not classical. I'm not sure but it seems model-theorists call it "metafinite model theory" (Gradel and Gurevich).

...

David Spivak seems to offer two different answers. On the one hand, a database schema may be the same thing as a category. On the other hand, a database schema may be a labeled simplicial set. Both answers may be found at http://www.uoregon.edu/~dspivak/cs/ .

I took a look at the slides of "databases are categories" talk. It seems to be an overly simplified model. Since an employee may work for several or none departments, data should be modeled by a functor into Rel rather than Set. 2-category structure may be important, and we often have a lax functor F(a;b)<F(a);F(b). For example, a Person _owns_ a Car, which is _parked_ at an Address, where the person _lives_, but a Person may not have a Car but still _live_ at an Address (arrow names are underlined, _lives=owns;parked_). Considering all columns as foreign keys, that is, disregarding the difference between value-valued and object-valued attributes, would look very strange for a practitioner. Z.

...

----------------------------------------

...
Date: Mon, 9 Aug 2010 11:12:34 -0500 Subject: categories: "Databases are Categories" (again) From: vigalchin@gmail.com To: categories@mta.ca

Hello,

I stumbled across this tech talk: http://www.galois.com/blog/2010/05/27/tech-talk-categories-are-databases/ I was wondering what others in this mail list think about Spivak's thesis. I apologize if already posted.

Regards,

Vasili

[For admin and other information see: http://www.mta.ca/~cat-dist/ ]

David Spivak

11:20 a.m.

Hi all, The basic idea of my talk was this: one can think of any category C as a "database schema": the objects of C are called "tables" and an arrow f:A-->B is called a "column of table A with values in table B". Now a functor C-->Sets is a "state" of that database: it fills every table with a set of rows. Leaf objects of C (objects with no outgoing arrows) correspond to "pure data." One can thus visualize a category as a system of tables; commutative diagrams correspond to "rules" such as "the secretary of a department must be in that department." Using sketches instead of categories allows a little more flexibility, but the basic idea is as above. The model is nice for a variety of reasons, most of all its simplicity. In polite contradiction to one of Diskin's claims, two database administrators (at two large multi-national corporations) that I know think it is a viable model. They do not balk at the idea of data columns being considered as foreign keys. It puts everything on the same playing field. The database=category idea furthermore allows us to "migrate data" from one schema to another. Given a morphism of schemas (functor F:C-->D) one pushes and pulls data between the two categories using F^*, F_*, and F_!. One can thus achieve unions, joins, etc. Use of F^*, F_*, and F_! may have a variety of applications including a solution to the "view update problem." The Grothendieck construction applied to a database state C-->Set yields its "RDF graph," which is an important component of the semantic web, and of which conversions to and from databases is of high interest. The point is that its all very natural in this model. Finally, in reference to Diskin's objection below (regarding people owning cars, and both having addresses) we can consider the category [1]x[1]: (excuse the typography -- V's are the tips of downward arrows) O----->P | | | | V V C----->A A functor from this category to sets is a set of ownerships, each of which has an associated person and an associated car, each of which has an associated address (and these addresses must agree). The same idea can be used to model situations where an employee could work for any number of departments (as in Diskin's other objection). If anyone wishes to engage me more personally either in support or denial of these ideas, I suggest we talk by phone; email me and I'll send you my number. Of course I also don't mind continuing the conversation by email, either privately or publicly. Also, let me extend my appreciation to Vasili for bringing this whole thing up! Best regards, David PS. I've informed the people at Galois of your comments on the camera-work; they said they'd take it into consideration in the future. PPS. A simplicial set (or any presheaf) can be regarded as a database state (on a schema with countably many tables). Can anyone imagine a database management system like Oracle being of use in answering mathematical questions about such things? In other words, could modern, incredibly fast database technology be useful to a mathematician studying the dynamics of a presheaf category such as simplicial sets? 2010/8/16 Zinovy Diskin <zdiskin@gsd.uwaterloo.ca>

...

2010/8/14 Mattias Wikström <mattias.wikstrom@gmail.com>:

...
I think it makes sense to regard database schemas as theories and databases as models of such theories. Given a theory T that models

yes, this is the basic underlying assumption in a series of papers of Michael Johnson and the moderator of this list.

...
some database schema S, a term in the language of T may be thought of as a "database query" (to obtain the result of the query, simplify the term),

a query is a formula (defining a derived table) rather than a term

while a statement that two terms in the language of T are equal

...
may be thought of as a "database constraint" that one may want to add to S.

there are many types of non-equational constraints, for example, inclusion P=>Q with P,Q formulas

...
What sort of theory should a database schema be? This surely depends on what exactly one is trying to model: A schema in Company A's DBMS (database management system) is rarely the same thing as a schema in Company B's DBMS, and in any case one probably wants to work with some idealised mathematical model.

a good question is what the place of SQL on the scale of logical doctrines is. Since models of SQL-theories include predefined infinite domains with operations (think of Integer and String), and finite domains for tables, model theory for SQL is not classical. I'm not sure but it seems model-theorists call it "metafinite model theory" (Gradel and Gurevich).

...
David Spivak seems to offer two different answers. On the one hand, a database schema may be the same thing as a category. On the other hand, a database schema may be a labeled simplicial set. Both answers may be found at http://www.uoregon.edu/~dspivak/cs/ .

I took a look at the slides of "databases are categories" talk. It seems to be an overly simplified model. Since an employee may work for several or none departments, data should be modeled by a functor into Rel rather than Set. 2-category structure may be important, and we often have a lax functor F(a;b)<F(a);F(b). For example, a Person _owns_ a Car, which is _parked_ at an Address, where the person _lives_, but a Person may not have a Car but still _live_ at an Address (arrow names are underlined, _lives=owns;parked_). Considering all columns as foreign keys, that is, disregarding the difference between value-valued and object-valued attributes, would look very strange for a practitioner.

Z.

[For admin and other information see: http://www.mta.ca/~cat-dist/ ]

Steve Vickers

7:08 p.m.

Dear Zinovy, Just a brief note about geometric logic, a positive first-order logic with infinitary disjunctions that is closely related to Grothendieck topos theory. The infinitary disjunctions give one the power to characterize certain infinite domains (including Integer, String, and also finite powerset) up to isomorphism using geometric structure and axioms. I've conjectured that a weaker "arithmetic" logic, with finite disjunctions but an underlying system of type constructors including free algebra constructions, is a good substitute in many situations - I wrote about this in my paper "Locales and toposes as spaces". (The arithmetic logic should be related to Joyal's arithmetic universes much as geometric logic is related to toposes.) I once wrote a paper "Geometric theories and databases", though to be honest I'm not sure a connection with SQL would flow immediately from that paper. Regards, Steve Vickers. Zinovy Diskin wrote:

...

a good question is what the place of SQL on the scale of logical doctrines is. Since models of SQL-theories include predefined infinite domains with operations (think of Integer and String), and finite domains for tables, model theory for SQL is not classical. I'm not sure but it seems model-theorists call it "metafinite model theory" (Gradel and Gurevich).

[For admin and other information see: http://www.mta.ca/~cat-dist/ ]

soloviev＠irit.fr

18 Aug 18 Aug

4:14 p.m.

Hi, then , in this setting, how coherence should be interpreted? According to Mac Lane, a coherence theorem asserts commutativity of a class of diagrams; some other authors mean rather decision procedure/criteria for commutativity of diagrams. By the way, did you consider some kind of "basic" arrows and generated "canonical maps"? Do there appear/be used some known types of categories with structure (cartesian, monoidal, cartesian closed, monoidal closed)? Another interesting question could be isomorphism of objects. Best wishes Sergei Soloviev Hi all,

...

The basic idea of my talk was this: one can think of any category C as a "database schema": the objects of C are called "tables" and an arrow f:A-->B is called a "column of table A with values in table B". Now a functor C-->Sets is a "state" of that database: it fills every table with a set of rows. Leaf objects of C (objects with no outgoing arrows) correspond to "pure data." One can thus visualize a category as a system of tables; commutative diagrams correspond to "rules" such as "the secretary of a department must be in that department."

Using sketches instead of categories allows a little more flexibility, but the basic idea is as above. The model is nice for a variety of reasons, most of all its simplicity. In polite contradiction to one of Diskin's claims, two database administrators (at two large multi-national corporations) that I know think it is a viable model. They do not balk at the idea of data columns being considered as foreign keys. It puts everything on the same playing field.

... [For admin and other information see: http://www.mta.ca/~cat-dist/ ]

5766

Age (days ago)

5775

Last active (days ago)

List overview

Download

7 comments

8 participants

participants (8)

David Spivak
Dr. Cyrus F Nourani
Mattias Wikström
Pym, Professor David J.
soloviev＠irit.fr
Steve Vickers
Vasili I. Galchin
Zinovy Diskin