Merging mutliple pandas dfs time series on DATE index and which are contained in a python dictionary

Question!

I have a python dictionary that contains CLOSE prices for several stocks, stock indices, fixed income instruments and currencies (AAPL, AORD, etc.), using a DATE index. The different DFs in the dictionary have different lengths, i.e. some time series are longer than others. All the DFs have the same field, ie. 'CLOSE'.

The length of the dictionary is variable. How can I merge all the DFs into a single one, by DATE index, and also using lsuffix = partial name and feature of the the file I am reading? (for example, the AAPL_CLOSE.csv file has a DATE & a CLOSE field, but to differentiate from the other 'CLOSE' in the merged DF, its name should be AAPL_CLOSE)

This is what I have:

asset_name = []
files_to_test = glob.glob('*_CLOSE*')
for name in files_to_test:
    asset_name.append(name.rsplit('_', 1)[0])

Which returns:

asset_name = ['AAPL', 'AORD', 'EURODOLLAR1MONTH', 'NGETF', 'USDBRL']
files_to_test = ['AAPL_CLOSE.csv',
 'AORD_CLOSE.csv',
 'EURODOLLAR1MONTH_CLOSE.csv',
 'NGETF_CLOSE.csv',
 'USDBRL_CLOSE.csv']

Then:

asset_dict = {}
for name, file in zip(asset_name, files_to_test):
    asset_dict[name] = pd.read_csv(file, index_col = 'DATE', parse_dates = True)

This is the little function I would like to generalize, to create a big merge of all the DFs in the dictionary by DATE, using lsuffix = the elements in asset_list.

merged = asset_dict['AAPL'].join(asset_dict['AORD'], how = 'right', lsuffix ='_AAPL')

The DFs will have a lot of N/A due to the mismatch of lengths, but I will deal with that later.



Answers

After not getting any answers, I found a solution that works, although there might be better ones. This is what I did:

asset_dict = {}
for name, file in zip(asset_name, files_to_test):
    asset_dict[name] = pd.read_csv(file, index_col='DATE', parse_dates=True)
    asset_dict[name].sort_index(ascending = True, inplace = True)

Pandas can concatenate multilple dfs (at once, not one by one) contained in dictionaries, 'straight out of the box' without much tweaking, by specifying the axis and other parameters.

df = pd.concat(asset_dict, axis = 1)

The resulting df is a multi-index df, which is a problem for me. Also, the time series for stock prices are all of different lengths, which creates a lot of NaNs. I solved bot problems with this:

df.columns = df.columns.droplevel(1)
df.dropna(inplace = True)

Now, the columns of my df are these:

['AAPL', 'AORD', 'EURODOLLAR1MONTH', 'NGETF', 'USDBRL']

But since I wanted them to contain the 'STOCK_CLOSE' format, I do this:

old_columns = df.columns.tolist()
new_columns = []
for name in old_columns:
    new_name = name + '_CLOSE_'
    new_columns.append(new_name)


The type of e1 is Event the type of e2 is *Event. The initialization is actually the same (using composite literal syntax, also not sure if that jargon is Go or C# or both?) but with e2 you using the 'address of operator' & so it returns a pointer to that object rather than the instance itself.



Subtract the julian day integer of the earlier date from the julian integer of the later date. The following tells you exactly how to do that.

http://www.cs.utsa.edu/~cs1063/projects/Spring2011/Project1/jdn-explanation.html

Otherwise, http://pdc.ro.nu/jd-code.html already has a C version

long gregorian_calendar_to_jd(int y, int m, int d)
{
    y+=8000;
    if(m<3) { y--; m+=12; }
    return (y*365) +(y/4) -(y/100) +(y/400) -1200820
          +(m*153+3)/5-92
          +d-1;
}


This video can help you solving your question :)
By: admin