in this document: http://infocenter.arm.com/help/topic/com.arm.doc.ddi0301g/DDI0301G_arm1176jzfs_r0p7_trm.pdf

on page 21-25 (pdf page 875) the througput and latency timings are given for the assembly instructions of the VFP unit.

Are those numbers independant of vectorsize?

1: let's take FMULS which has throughput of 1 and latency of 8. does it mean that i can start in each cycle a new FMULS operation if i don't use a register which is not currently calculated by a previous function? for example:

```
FMULS s8, s16, s20
FMULS s12, s21, s25
```

will those exectue right after each other?

2: what happens if I have two FMULS functions after each other where one argument depends upon the previous computation

```
FMULS s8, s16, s20
FMULS s12, s21, s8
```

will the VFP wait for 8 cycles before starting to process the second instruction?

3: what if we are in vectormode with 4 elements and on the second FMULS instruction all inputregisters but one are available. what will happen?

4: sqrt and division: will a sqrt or division operation prevent any subsequent operation from being started for 19 cycles?

thanks!