## Building trees starting from a list of symbols

Question!

I have a list of symbols which are organized by well formed parenthesis and I want to generate a tree like this one:

Leaf nodes are the symbols and the non-terminal nodes represent the parenthesis and I want to store in a strucuter both them.

Is there a way to build this tree?

Certainly. First, if you implement a language in which parentheses are syntactic collecting punctuation (e.g. Python lists), you can likely use a built-in evaluation function to parse your input into a desirable structure.

Failing that ... I believe that the below is merely a more detailed version of the previous answer. The steps are simple (recursion should be simple, neh?):

1. If the input is atomic, make this a leaf node and return.
2. Split the given list into elements at every internal 0 in the open parenthesis count.
3. For each element in this list:

3a. Remove the outermost parentheses;

3b. Reduce the parenthesis counts by 1 each;

3c. Recur on this element.

Now, let's walk through the example you give.I'm ignoring the original root node text, in favor of the structure you show in the tree:

``````[(A ((B C) (D E)))(F G (H I L))]
``````

At each level, the first thing to do is to strip off the outermost parentheses (actually brackets, in this case. I'm not sure why you have a different symbol on the outside).

``````(A ((B C) (D E)))(F G (H I L))
``````

Now, start at the front, keeping count of how many open parentheses you have.

``````(A ((B C) (D E)))(F G (H I L))
1  23   2 3   2101    2     10
``````

Note: If you need to throw syntax errors for an imbalance, you have a nice check: the final count must be 0, with no following characters.

Wherever you have a 0 in the middle, break the string (marked with ^):

``````(A ((B C) (D E)))  ^  (F G (H I L))
1  23   2 3   210     1    2     10
``````

Now, recur on each element you found. If the element is atomic, it's a leaf node. If you want to save counting time, carry the count as another argument to the routine. Reduce it by 1 on recursion.

``````                 ^
A ((B C) (D E))     F G (H I L)
12   1 2   10         1     0
``````

The left side has two elements: a leaf node A, and another expression on which we recur:

``````((B C) (D E))
12   1 2   10
``````

There is no internal 0, so we trivially recur on the entire list:

``````(B C) (D E)
1   0 1  0
``````

This break into two lists, (B C) and (D E)

Similarly, the right branch of the root node breaks into three elements: F, G, and (H I L). Handle these the same way.

By : Prune

It should be easy to split the whole formula into chunks that are direct descendants, while each chunk is well-formed. Use counting of nesting level for that: an opening parenthesis increases the level, a closing parenthesis decreases the level, and whenever the nesting level is 0, there is a boundary between chunks.

This way, you can convert

``````((a b) (c d)) e ((f g h) i)
``````

into its constituent parts:

``````((a b) (c d))
e
((f g h) i)
``````

For each part, if it contains more than one symbol, run the same algorithm recursively.

By : anatolyg

YES it is better to use the `sqlite3` functions to do batch insert. Since in core data it is not possible to do the same.

By : New16