Building trees starting from a list of symbols


I have a list of symbols which are organized by well formed parenthesis and I want to generate a tree like this one:

Tree with well formed parenthesis

Leaf nodes are the symbols and the non-terminal nodes represent the parenthesis and I want to store in a strucuter both them.

Is there a way to build this tree?


Certainly. First, if you implement a language in which parentheses are syntactic collecting punctuation (e.g. Python lists), you can likely use a built-in evaluation function to parse your input into a desirable structure.

Failing that ... I believe that the below is merely a more detailed version of the previous answer. The steps are simple (recursion should be simple, neh?):

  1. If the input is atomic, make this a leaf node and return.
  2. Split the given list into elements at every internal 0 in the open parenthesis count.
  3. For each element in this list:

    3a. Remove the outermost parentheses;

    3b. Reduce the parenthesis counts by 1 each;

    3c. Recur on this element.

Now, let's walk through the example you give.I'm ignoring the original root node text, in favor of the structure you show in the tree:

[(A ((B C) (D E)))(F G (H I L))]

At each level, the first thing to do is to strip off the outermost parentheses (actually brackets, in this case. I'm not sure why you have a different symbol on the outside).

(A ((B C) (D E)))(F G (H I L))

Now, start at the front, keeping count of how many open parentheses you have.

(A ((B C) (D E)))(F G (H I L))
1  23   2 3   2101    2     10

Note: If you need to throw syntax errors for an imbalance, you have a nice check: the final count must be 0, with no following characters.

Wherever you have a 0 in the middle, break the string (marked with ^):

(A ((B C) (D E)))  ^  (F G (H I L))
1  23   2 3   210     1    2     10

Now, recur on each element you found. If the element is atomic, it's a leaf node. If you want to save counting time, carry the count as another argument to the routine. Reduce it by 1 on recursion.

A ((B C) (D E))     F G (H I L)
  12   1 2   10         1     0

The left side has two elements: a leaf node A, and another expression on which we recur:

((B C) (D E))
12   1 2   10

There is no internal 0, so we trivially recur on the entire list:

(B C) (D E)
1   0 1  0

This break into two lists, (B C) and (D E)

Similarly, the right branch of the root node breaks into three elements: F, G, and (H I L). Handle these the same way.

By : Prune

It should be easy to split the whole formula into chunks that are direct descendants, while each chunk is well-formed. Use counting of nesting level for that: an opening parenthesis increases the level, a closing parenthesis decreases the level, and whenever the nesting level is 0, there is a boundary between chunks.

This way, you can convert

((a b) (c d)) e ((f g h) i)

into its constituent parts:

((a b) (c d))
((f g h) i)

For each part, if it contains more than one symbol, run the same algorithm recursively.

By : anatolyg

YES it is better to use the sqlite3 functions to do batch insert. Since in core data it is not possible to do the same.

By : New16

This video can help you solving your question :)
By: admin