Parsing json with awk/sed in bash to get key value pair

Question!

I have read many existing questions at SO but none of them answers what I am looking for. I know it is difficult to parse json in bash using sed/awk but I only need a few key-value pairs per record out of a whole list of key-value pairs per record. I want to do this because it will be faster as the main JSON is pretty big with millions of records.

The JSON format is like following:

{
    "documents":
    [
        {
            "title":"a",   //needed
            "description":"b",  //needed
            "id":"c",  //needed
            ....(some more:not useful)....
            "conversation":
            [
                {
                    "message":"",
                    "id":"d",   //not needed
                    .....(some more)....
                    "createDate":"e",   //not needed
                },
                ...(some more messages)....
            ],
            "createDate":"f",  //needed
            ....(many more labels).....
        }
    ],
    ....(some more global attributes)....
}

Now for this I require attributes which are marked as needed but their common key make it a problem to get by simple sed/awk. Could anyone suggest if we can do it with sed/awk. if possible any help to achieve the same would be appreciated.

P.S.: I know about jsawk but I do not want to introduce any dependency, so if possible please suggest usage of sed/awk.

EDIT: Multiple extries of the format given below(as in document we have a list)

"title":"a",
"description":"b"
"id":"c"
"createDate":"f"

EDIT: The JSON is without any spaces. It has been formated for readability.



Answers

Well, if you're going to use a regex to parse JSON, which will by nature be quick, dirty and heavily reliant on the exact syntax of the input file, you could write something that relies on the amount of white space occurring before the key value pairs you're interested in. Depending on the kind of output you're looking for, you could use something along the lines of:

awk '/^ {12}"title/
/^ {12}"description/
/^ {12}"id/
/^ {12}"createDate/' input_file.json

Not great, but it does the trick on your example input...



If the key characters [, and {, }, and ] are always isolated in every line this would work:

#!/usr/bin/awk -f

function walk(level, end) {
    while (getline 


This video can help you solving your question :)
By: admin