3. Design¶
This part of the documentation presents general architecture, design and implementation guidelines.
3.1. Architecture¶
ClaimStore is an independent mini-application built upon our usual Flask ecosystem:
- Flask-RESTful for REST API
- Flask-Notifications for optional alerts
- OAuth for authorisation needs
- SQLAlchemy for DB abstraction
- JSON Schema for JSON object description
- PostgreSQL for DB persistence and JSON search
3.2. Database¶
The information about network of services, data objects and persistent identifier types, and claims about them is described via JSON snippets.
The JSON data is stored in several tables for claimants
,
object_types
etc. The individual claims are stored in a
claims
table that uses both regular RDBMS and JSONB columns,
permitting some fast inter-table JOINs as well as free-format
additional claim parameters, for example:
claims
=======================
uuid integer
created date
claimant ref ->
subject_type ref ->
subject_value text
claim ref ->
certainty number
claim_details jsonb
status ref -> e.g. to mark revoked claims
object_type ref ->
object_value text
The JSON format of claims is also checked against a formal JSON schema to verify its validity upon claim submission. There are several JSON Schemata describing the system: one JSON schema describes a service, another JSON schema describes a persistent ID type, another JSON schema describes a claim, etc.
For searching the claim database, PostgreSQL/JSONB column type can be used which offers efficient querying out of the box. In case of extended usage needs, JSON claims can be propagated to an Elasticsearch cluster. that can increase query speed and query language further.
3.3. Claim types¶
The primary motivation behind ClaimStore was the exchange of information about persistent identifiers, hence the typical claim types are:
- is_same_as: used when there is a 100% equivalence, e.g. a local copy of an arXiv record, with either the same or enriched metadata, e.g. ORCID corresponds to this INSPIRE ID
- is_variant_of: lesser claim, e.g. arXiv preprint and DOI of a published paper, e.g. when INSPIRE merges two sources into one
However, the system is generic enough to accept any kind of claims, so the ClaimStore can also be used to store information about other types of relations, such as:
- is_author_of: this person is the author of this document
- is_contributor_to: this person is supervisor/translator/spokesperson of this document
- is_erratum_of: e.g. if INSPIRE record R1 is variant of DOI1, and DOI2 is erratum of DOI1, but INSPIRE merges all these in the same record, then there would be three claims: R1 is variant of DOI1, DOI2 is erratum of DOI1, R1 is variant of DOI2
Examples of other possible relations that could be included in the future are:
is_cited_by
is_superseded_by
is_software_for_paper
is_dataset_for_paper
is_dataset_for_software
For example, imagine the following table of claims:
subject predicate object
-------------------- ----------- -------------------
arXiv:hep-th/0101001 is_variant_of DOI:10.1234/foo.bar
arXiv:hep-th/0101001 is_same_as arXiv:1506.07188
One could then ask queries like who does know about DOI 10.1234/foo.bar? and the system could return only direct claims:
GET /claims/?type=DOI&value=10.1234/foo.bar
listing only the first relation, or else we could also ask to include all indirect claims:
GET /claims/?type=DOI&value=10.1234/foo.bar&include=indirect&certainty=0.5+
which would return both relations.