In conversations about multicellular architecture , its applicability to a particular task in the context of the amount of natural parallelism present in it was often discussed earlier. Thus, when executing various benchmarks, in particular, CoreMark, we talked about the incompatibility of such programs with the multicellular architecture, due to the rather rigid sequence of the algorithm, which does not allow the cells within the group to extract a sufficient number of simultaneously executed commands. In this article, we will evaluate multiklets in more indicative conditions - using the WhetStone benchmark. -ffast-math -fno-builtin -O3 | System | MHz | MWIPS / MHz | MFLOPS1 / MHz | MFLOPS2 / MHz | MFLOPS3 / MHz | COS MOPS / MHz | EXP MOPS / MHz | FIXPT MOPS / MHz | IF MOPS / MHz | EQUAL MOPS / MHz |
|---|---|---|---|---|---|---|---|---|---|---|
| Multiclet R1 * | 100 | 0.721 | 0.256 | 0.212 | 0.162 | 0.018 | 0.008 | 3.569 | 0.417 | 1.57 |
| RPi 2 v7-A7 | 1000 | 0.585 | 0.28 | 0.291 | 0.248 | 0.011 | 0.006 | 1.314 | 1.209 | 0.981 |
| RPi 3 v8-A53 | 1200 | 0.604 | 0.276 | 0.29 | 0.248 | 0.01 | 0.007 | 1.267 | 1.561 | 1.014 |
| ARM v8-A53 | 1300 | 0.642 | 0.268 | 0.241 | 0.239 | 0.028 | 0.004 | 1.197 | 1.436 | 0.439 |
| Core i7 4820K | 3900 | 0.887 | 0.341 | 0.308 | 0.167 | 0.023 | 0.014 | 0.998 | 1.504 | 0.251 |
| Core i7 1 CP | 3066 | 0.873 | 0.325 | 0.295 | 0.174 | 0.025 | 0.013 | 0.892 | 0.958 | 0.167 |
| Phenom II | 3000 | 0.799 | 0.307 | 0.27 | 0.111 | 0.026 | 0.016 | 0.835 | 1.001 | 0.167 |
| Athlon 64 | 2211 | 0.785 | 0.308 | 0.272 | 0.104 | 0.026 | 0.016 | 0.832 | 0.999 | 0.187 |
| Turion 64 M | 1900 | 0.884 | 0.302 | 0.258 | 0.145 | 0.026 | 0.016 | 0.827 | 0.988 | 0.187 |
| Core i5 2467M | 2300 | 0.853 | 0.296 | 0.298 | 0.163 | 0.022 | 0.013 | 0.807 | 0.993 | 0.222 |
| Core 2 Duo 1 CP | 2400 | 0.885 | 0.337 | 0.307 | 0.198 | 0.024 | 0.012 | 0.804 | 0.81 | 0.176 |
| Celeron C2 M | 2000 | 0.868 | 0.297 | 0.296 | 0.194 | 0.023 | 0.012 | 0.778 | 0.781 | 0.172 |
| Core 2 duo m | 1830 | 0.878 | 0.337 | 0.305 | 0.197 | 0.024 | 0.012 | 0.751 | 0.785 | 0.174 |
| Multiclet R1 | 100 | 0.311 | 0.157 | 0.153 | 0.029 | 0.018 | 0.008 | 0.714 | 0.081 | 0.143 |
| Celeron m | 1295 | 0.832 | 0.324 | 0.297 | 0.178 | 0.022 | 0.012 | 0.631 | 0.923 | 0.173 |
| Raspberry pi | 1000 | 0.391 | 0.137 | 0.146 | 0.123 | 0.009 | 0.004 | 0.617 | 1.014 | 0.805 |
| Athlon XP | 2088 | 0.856 | 0.307 | 0.274 | 0.139 | 0.026 | 0.016 | 0.576 | 0.998 | 0.166 |
| Pentium pro | 200 | 0.79 | 0.332 | 0.278 | 0.146 | 0.023 | 0.013 | 0.575 | 0.755 | 0.149 |
| Athlon4 barton | 1800 | 0.846 | 0.305 | 0.272 | 0.137 | 0.026 | 0.016 | 0.571 | 0.988 | 0.165 |
| Celeron a | 450 | 0.76 | 0.291 | 0.276 | 0.14 | 0.022 | 0.012 | 0.569 | 0.751 | 0.147 |
| Pentium 4E | 3000 | 0.39 | 0.182 | 0.164 | 0.058 | 0.014 | 0.006 | 0.323 | 0.27 | 0.126 |
| Atom m | 1600 | 0.348 | 0.176 | 0.157 | 0.051 | 0.01 | 0.007 | 0.252 | 0.744 | 0.11 |
| Pentium 4 | 1900 | 0.383 | 0.214 | 0.188 | 0.056 | 0.012 | 0.006 | 0.241 | 0.427 | 0.118 |
| Pentium MMX | 200 | 0.615 | 0.328 | 0.267 | 0.079 | 0.025 | 0.013 | 0.198 | 0.73 | 0.186 |
| Pentium | 100 | 0.604 | 0.322 | 0.267 | 0.078 | 0.025 | 0.013 | 0.192 | 0.568 | 0.183 |
| 80486DX2 | 66 | 0.182 | 0.076 | 0.068 | 0.026 | 0.008 | 0.005 | 0.105 | 0.212 | 0.017 |
| * Option to use advanced optimizations compiler LLVM | ||||||||||
timea = dtime(); { for (ix=0; ix<xtra; ix++) { for(i=0; i<n1*n1mult; i+=5) { e1[0] = (e1[0] + e1[1] + e1[2] - e1[3]) * t; e1[1] = (e1[0] + e1[1] - e1[2] + e1[3]) * t; e1[2] = (e1[0] - e1[1] + e1[2] + e1[3]) * t; e1[3] = (-e1[0] + e1[1] + e1[2] + e1[3]) * t; } t = 1.0 - t; } t = t0; } timeb = dtime(); jmp LBB2_4 SR4 := rdq #IR7, 2160 SR5 := rdq #IR7, 2152 SR6 := rdq #IR7, 2144 SR7 := rdq #IR7, 2136 SR8 := rdq #IR7, 2128 SR9 := rdq #IR7, 2120 SR10:= subf @SR6, @SR5 SR11:= subf @SR5, @SR6 SR12:= addf @SR5, @SR6 SR5 := addf @SR10, @SR4 SR10:= addf @SR11, @SR4 SR4 := addf @SR10, @SR7 SR7 := mulf @SR4, @SR8 SR10:= addf @SR5, @SR7 SR5 := subf @SR4, @SR10 SR11:= subf @SR10, @SR4 SR4 := mulf @SR10, @SR8 SR10:= mulf @SR5, @SR8 SR5 := addf @SR12, @SR10 SR10:= addf @SR11, @SR5 SR11:= mulf @SR5, @SR8 SR5 := mulf @SR10, @SR8 SR10:= addf @SR5, @SR6 SR5 := mulf @SR10, @SR8 wrq @SR9, #IR7, 2760 wrq @SR5, #IR7, 2752 wrq @SR11, #IR7, 2744 wrq @SR4, #IR7, 2736 wrq @SR7, #IR7, 2728 jmp LBB2_4 SR4 := rdq #IR7, 272 SR5 := rdq #IR7, 264 SR6 := rdq #IR7, 256 SR7 := rdq #IR7, 248 SR8 := rdq #IR7, 320 SR9 := rdq #IR7, 240 SR10 := rdq #IR7, 232 SR11 := addf @SR4, @SR5 SR5 := addf @SR7, @SR6 SR12 := addf @SR11, @SR6 SR13 := subf @SR12, @SR7 SR12 := mulf @SR13, @SR8 SR14 := addf @SR12, @SR4 SR4 := addf @SR14, @SR11 SR11 := subf @SR14, @SR6 SR6 := addf @SR11, @SR7 SR11 := subf @SR13, @SR6 SR12 := subf @SR6, @SR13 SR13 := mulf @SR11, @SR8 SR11 := addf @SR5, @SR13 SR5 := addf @SR12, @SR11 SR12 := addf @SR11, @SR4 SR13 := mulf @SR5, @SR8 SR5 := mulf @SR12, @SR8 SR12 := addf @SR13, @SR7 SR7 := mulf @SR12, 0x3f000000 SR12 := subf @SR5, @SR7 SR5 := addf @SR12, @SR6 SR6 := addf @SR12, @SR11 SR13 := subf @SR5, @SR11 SR11 := addf @SR5, @SR4 SR4 := mulf @SR13, @SR8 SR5 := mulf @SR11, @SR9 SR11 := addf @SR4, @SR7 SR4 := subf @SR11, @SR12 SR12 := subf @SR6, @SR11 SR6 := mulf @SR12, @SR8 SR11 := subf @SR13, @SR12 SR12 := addf @SR6, @SR7 SR6 := mulf @SR8, @SR11 SR11 := addf @SR4, @SR12 SR4 := mulf @SR12, @SR8 SR13 := mulf @SR11, @SR8 SR11 := addf @SR4, @SR5 SR4 := addf @SR13, @SR7 SR5 := mulf @SR4, 0x3f000000 SR4 := subf @SR11, @SR5 SR7 := addf @SR4, @SR12 SR12 := addf @SR6, @SR4 SR6 := mulf @SR12, @SR8 SR12 := addf @SR6, @SR5 SR13 := addf @SR6, @SR11 SR6 := subf @SR12, @SR4 SR4 := subf @SR7, @SR12 SR7 := mulf @SR4, @SR8 SR4 := addf @SR7, @SR5 SR7 := addf @SR6, @SR4 SR6 := addf @SR4, @SR13 SR11 := mulf @SR7, @SR8 SR7 := mulf @SR6, @SR8 SR6 := addf @SR11, @SR5 SR5 := mulf @SR6, 0x3f000000 SR6 := subf @SR7, @SR5 SR7 := addf @SR6, @SR12 SR11 := addf @SR6, @SR4 SR12 := subf @SR7, @SR4 SR4 := addf @SR7, @SR13 SR8 := moveq @SR8 SR9 := moveq @SR9 SR10 := moveq @SR10 SR7 := mulf @SR12, @SR8 SR13 := mulf @SR4, @SR9 SR4 := addf @SR7, @SR5 SR7 := subf @SR4, @SR6 SR6 := subf @SR11, @SR4 SR4 := mulf @SR6, @SR8 SR9 := subf @SR12, @SR6 SR6 := addf @SR4, @SR5 SR4 := mulf @SR8, @SR9 SR9 := addf @SR7, @SR6 SR7 := mulf @SR6, @SR8 SR11 := mulf @SR9, @SR8 SR9 := addf @SR7, @SR13 SR7 := addf @SR11, @SR5 SR5 := mulf @SR7, 0x3f000000 SR7 := subf @SR9, @SR5 SR11 := addf @SR7, @SR6 SR6 := addf @SR4, @SR7 SR4 := mulf @SR6, @SR8 SR12 := addf @SR4, @SR5 SR13 := addf @SR4, @SR9 SR4 := subf @SR12, @SR7 SR7 := subf @SR11, @SR12 SR9 := mulf @SR7, @SR8 SR11 := subf @SR6, @SR7 SR6 := addf @SR9, @SR5 SR7 := mulf @SR8, @SR11 SR9 := addf @SR4, @SR6 SR4 := addf @SR13, @SR6 SR11 := mulf @SR9, @SR8 SR9 := mulf @SR4, @SR8 SR4 := addf @SR11, @SR5 SR5 := mulf @SR4, 0x3f000000 SR4 := subf @SR9, @SR5 SR9 := addf @SR7, @SR4 SR7 := addf @SR4, @SR6 SR6 := mulf @SR4, @SR8 SR11 := mulf @SR9, @SR8 SR9 := addf @SR11, @SR5 SR11 := subf @SR7, @SR9 SR7 := subf @SR9, @SR4 SR4 := mulf @SR9, @SR8 SR9 := mulf @SR11, @SR8 SR11 := addf @SR9, @SR5 SR9 := addf @SR7, @SR11 SR7 := mulf @SR11, @SR8 SR11 := mulf @SR9, @SR8 SR8 := addf @SR11, @SR5 SR5 := mulf @SR8, 0x3f000000 wrq @SR10, #IR7, 384 wrq @SR6, #IR7, 376 wrq @SR4, #IR7, 368 wrq @SR7, #IR7, 360 wrq @SR5, #IR7, 352 Source: https://habr.com/ru/post/307512/
All Articles