DSX

    DDDDDDDDDDDDD               SSSSSSSSSSSSSSS      XXXXXXX       XXXXXXX  +
    D::::::::::::DDD          SS:::::::::::::::S     X:::::X       X:::::X
    D:::::::::::::::DD       S:::::SSSSSS::::::S     X:::::X       X:::::X
    DDD:::::DDDDD:::::D      S:::::S     SSSSSSS     X::::::X     X::::::X
      D:::::D    D:::::D     S:::::S                 XXX:::::X   X:::::XXX
      D:::::D     D:::::D    S:::::S                    X:::::X X:::::X   
      D:::::D     D:::::D     S::::SSSS                  X:::::X:::::X    
      D:::::D     D:::::D      SS::::::SSSSS              X:::::::::X     
      D:::::D     D:::::D        SSS::::::::SS            X:::::::::X     
      D:::::D     D:::::D           SSSSSS::::S          X:::::X:::::X    
      D:::::D     D:::::D                S:::::S        X:::::X X:::::X   
      D:::::D    D:::::D                 S:::::S     XXX:::::X   X:::::XXX
    DDD:::::DDDDD:::::D      SSSSSSS     S:::::S     X::::::X     X::::::X
    D:::::::::::::::DD       S::::::SSSSSS:::::S     X:::::X       X:::::X
    D::::::::::::DDD         S:::::::::::::::SS      X:::::X       X:::::X
    DDDDDDDDDDDDD             SSSSSSSSSSSSSSS        XXXXXXX       XXXXXXX

String type is the king of data types. It is both an absolute low-level, since it can contain any form of computer language as bytes, and the highest-level since it's where you would store a poem about God.

In a speech, there's an implicit network of links between words and expressions, which organize them by coordination and subordination. Humans can easily rebuild these invisible links, thanks to their common sense, and understand what is said. But for a computer, it's a tough task. Computers need formal languages, where everything is explicit, because they don't have background knowledge to fill the holes. We can use a special syntax to describe explicitly these links. When parsed, a text written in this syntax won't be stored in a single string, but in a meaningful structure of strings linked to one another.

Understanding natural language is hard also because one word often has several meanings. Humans use logic and the context to deduce what is meant. Again, this requires a rich knowledge computers don't have. In our syntax, we make meanings explicit by adding a number at the end of each word. This number indicates which meaning, in the Wiktionary, we're refering to. For example, door4 is "a non-physical entry into the next world, a particular feeling, a company, etc."

Another complex challenge for computers is coreference resolution, the ability to determine which expressions refer to the same entities. We make these coreferences explicit by adding a simple identity tag right after the expressions, when needed. If two expressions have the same identity tag, they refer to the same entity.

Finally, to help computers with named entity recognition, we also use a proper name tag, that is placed right before an expression to indicate that this expression is the name of an entity.


Here is a table showing special characters used in DSX syntax:

    # name = filename;   :    shortcut for long filenames
    # name:              :    section definition
                         :
    "name"               :    reference to section or file#section
                         :
    @                    :    identity tag
    &                    :    proper name tag
                         :
    >                    :    of
    <                    :    whose
                         :
    |                    :    parallel
    /                    :    forward
    \                    :    backward
                         :
    = =                  :    custom coordinator
    [ ]                  :    relation distribution
    ( )                  :    group as a whole
                         :
    A > B                :    A is sub-element of B
    A < B                :    B is sub-element of A
    A < [ B | C ]        :    B and C are sub-elements of A
    A < [ B / C ]        :    same + C comes after B
    A < [ B \ C ]        :    same + C comes before B
    A =B= C              :    A and C are co-elements, B is coordinator
    A == B               :    A and B are co-elements


There are basically two kinds of links between different parts of a sentence: coordination and subordination.

COORDINATION uses coordinating conjunctions, conjunctive adverbs (with appropriate punctuation), or punctuation to combine short independent clauses into a single sentence. Coordination implies the balance of elements that are of equal semantic value in the sentence.

SUBORDINATION uses subordinating conjunctions or relative pronouns to transform independent clauses (main clauses or ideas) into dependent clauses (subordinate clauses or ideas). Subordinate clauses are subordinate to (and thus hold less semantic value than) the independent clause(s) to which they are linked.

We extend these concepts to organize words, expressions, sentences, and even paragraphs, in a structure where dependencies and articulations are clearly defined.


Starting with a simple example, here is how we write "I need help" in DSX syntax:

    I < need < help

First, note that the syntax is NOT case-sensitive. Everything in uppercase would be the same.

Second, there's no punctuation. DSX syntax will take care of everything.

Third, we see two "lesser than" signs. They indicate that "need" depends on "I", and that "help" depends on "need". We say that "need" is a sub-element of "I", and that "help" is a sub-element of "need". We also say that "I" is an head-element of "need", and that "need" is an head-element of "help".

The same idea could be expressed like this:

    help > need > I

This is a correct DSX expression, and the meaning is exactly the same. It just looks weird to human readers.


When we want to link a group of words as a whole, we put them between parentheses. Let's say "my cat is hungry":

    ( my > cat ) < is < hungry

Here the subject is the group "my cat", so we define this entire group as head-element of "is". In this group, "cat" is more important than "my", which is why "my" is sub-element of "cat".

To say "when it rains, I'm sad", we could do:

    ( I < am < sad ) < when < ( it < rains )

Clearly, "I am sad" is the main group. When does it happen, how long does it last, ...etc, are secondary.

When it's getting big, we can use multiple lines and even indentation to make the structure easier to read:

    ( if < (you < have < (a > moment)) ) >
	
        ( I < would < love < ( your > thoughts ) < on < this )

If we want to combine several groups using a coordinator, we use the = = sign, like this:

    I < have < ( (a > dog) =and= (a > cat) )

Here is a bigger example:

    computers < are < 
    (
        (
            (very > good) < at <
            (
                ( following < (exact > orders) )

                =and= 

                ( handling < ((very > specific) > things) )
            )
        )

        =but=

        (
            (not > good) < at <
            (
                dealing < with <
                (                
                    (new > things) < they < haven't seen < before
                )
            )
        )
    )


Take the sentence "I have a nice cat". Here, "a" and "nice" should both be sub-elements of "cat", but we can't link directly "a" to "cat" because there's another word between them. We need to use relation distribution, using [ | ] characters:

    I < have < ( [ a | nice ] > cat )

This declares a list of elements inside [ ], which are separated by |. Each of these elements have the same links with their surroundings: here, "a" and "nice" are both sub-elements of "cat".

Another example, T800 would say:

    I < need < ( [ your > clothes | your > boots ] =and= ( your > motorcycle ) )

And here is a big one:

    ( for example ) <
    (
        (
            ( [ a | common | computer ] > program ) < can <
            (
                turn <
                [
                    a > report < of < ( names =and= (hours < worked) )
                |
                    into < paychecks < for <
                    (
                        the > workers < at < (a > company)
                    )
                ]
            )
        )
        =but=
        (
            ( [ the | same ] > program ) < could not <
            (
                answer < questions < 
                [
                    from < (an > employee)
                |
                    about < why <
                    (
                        (the > company) < will not < pay < for < (nap time)
                    )
                ]
            )
        )
    )


The lists we create with [ ] are unordered if we use the parallel separator |. That's what we did in the previous examples. But sometimes we need to express lists where the order of items matters, and then we'll use different separators.

The forward separator / indicates that items are written in the good order:

    [ kill < bunny / cook < bunny / eat < bunny ]

The backward separator \ indicates that items are written in reverse order:

    [ eat < bunny \ cook < bunny \ kill < bunny ]

These two gourmet examples have the exact same meaning.

Mixing several types of separators in the same list would be error prone and would lead to ambiguity. Hence, it's not allowed, and doing so is a syntax error. A list has to be all parallel, all forward, or all backward, to be well formed.


More examples...

Layout:

    ( clear < the page )
    ( draw < a table < [
        center it on the page |
        it has 4 columns < [
            head color < cyan |
            1 < [ title < name | width < 120px ] |
            2 < [ title < age | width < 40px ] |
            3 < [ title < city | width < 180px ] |
            4 < [ title < occupation | width < 240px ]
        ] |
        it has 10 rows < [
            height < 20 |
            head color < orange
        ]
    ] )

    
Logic:

    ( A < is grandfather of < C )
        < means that <
        ( (A < is father of < B) {and} (B < is father of < C) )

    ( A < is ancestor of < C )
        < means that <
        ( (A < is father of < C) {or}
          ( (A < is father of < B) {and} (B < is ancestor of < C) ) )