Raspberry PiのBluetoothをCommon Lispから使えなかった

rp3bのbluetoothcommon lispから使いたい。 bluezのラッパーから使うだけだけどc2ffiを実行できるのか?

とりあえずhu.dwim.bluezをロード

ros use sbcl-bin
ros run
(ql:quickload :hu.dwim.bluez)
; caught ERROR:
;   READ error during COMPILE-FILE: unmatched close parenthesisLine: 82, Column: 19, File-Position: 2874Stream: #<SB-INT:FORM-TRACKING-STREAM for "file /home/pi/.roswell/lisp/quicklisp/dists/quicklisp/software/cffi_0.17.1/toolchain/c-toolchain.lisp" {520B3241}>
While evaluating the form starting at line 30, column 0
  of #P"/home/pi/.roswell/lisp/quicklisp/dists/quicklisp/software/cffi_0.17.1/cffi-libffi.asd":

debugger invoked on a LOAD-SYSTEM-DEFINITION-ERROR: Error while trying to load definition for system cffi-libffi from pathname /home/pi/.roswell/lisp/quicklisp/dists/quicklisp/software/cffi_0.17.1/cffi-libffi.asd: COMPILE-FILE-ERROR while compiling #<CL-SOURCE-FILE "cffi-toolchain" "toolchain" "c-toolchain">

Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.

restarts (invokable by number or by possibly-abbreviated name):
  0: [RETRY                        ] Retry compiling #<CL-SOURCE-FILE "cffi-toolchain" "toolchain" "c-toolchain">.
  1: [ACCEPT                       ] Continue, treating compiling #<CL-SOURCE-FILE "cffi-toolchain" "toolchain" "c-toolchain"> as having been successful.
  2: [RETRY                        ] Retry EVAL of current toplevel form.
  3: [CONTINUE                     ] Ignore error and continue loading file "/home/pi/.roswell/lisp/quicklisp/dists/quicklisp/software/cffi_0.17.1/cffi-libffi.asd".
  4: [ABORT                        ] Abort loading file "/home/pi/.roswell/lisp/quicklisp/dists/quicklisp/software/cffi_0.17.1/cffi-libffi.asd".
  5:                                 Retry ASDF operation.
  6: [CLEAR-CONFIGURATION-AND-RETRY] Retry ASDF operation after resetting the configuration.
  7:                                 Give up on "hu.dwim.bluez"
  8:                                 Exit debugger, returning to top level.

((FLET #:H0 :IN LOAD-ASD) #<COMPILE-FILE-ERROR {5219A129}>)
0] 

閉じ括弧が合わないコンパイルエラーになるぞ。どういうことだ。。。

c2ffiのインストール

GitHub - rpav/c2ffi: Clang-based FFI wrapper generatorをビルドしたいのでllvmとかをインストール。

sudo apt-get install clang llvm libtool-bin libclang-dev

masterのreadme見ると3.5はサポートされてないがどうするか。 と、おもったらllvm-3.5のブランチもあった。

git clone https://github.com/rpav/c2ffi
cd c2ffi
git checkout llvm-3.5
./autogen
mkdir build
cd build
../configure
make
./src/c2ffi -h
make install

hu.dwim.bluezの修正を試みる

とりあえずソースコードをもらってくる。

git clone https://github.com/attila-lendvai/hu.dwim.bluez
cd .roswell/local-projects/
ln -s ~/hu.dwim.bluez/
cd
ros run
(ql:quickload :hu.dwim.bluez)
To load "hu.dwim.bluez":
  Load 1 ASDF system:
    hu.dwim.bluez
; Loading "hu.dwim.bluez"
; 
; caught ERROR:
;   READ error during COMPILE-FILE: unmatched close parenthesisLine: 82, Column: 19, File-Position: 2874Stream: #<SB-INT:FORM-TRACKING-STREAM for "file /home/pi/.roswell/lisp/quicklisp/dists/quicklisp/software/cffi_0.17.1/toolchain/c-toolchain.lisp" {516C5241}>
While evaluating the form starting at line 30, column 0
  of #P"/home/pi/.roswell/lisp/quicklisp/dists/quicklisp/software/cffi_0.17.1/cffi-libffi.asd":

debugger invoked on a LOAD-SYSTEM-DEFINITION-ERROR: Error while trying to load definition for system cffi-libffi from pathname /home/pi/.roswell/lisp/quicklisp/dists/quicklisp/software/cffi_0.17.1/cffi-libffi.asd: COMPILE-FILE-ERROR while compiling #<CL-SOURCE-FILE "cffi-toolchain" "toolchain" "c-toolchain">

Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.

restarts (invokable by number or by possibly-abbreviated name):
  0: [RETRY                        ] Retry compiling #<CL-SOURCE-FILE "cffi-toolchain" "toolchain" "c-toolchain">.
  1: [ACCEPT                       ] Continue, treating compiling #<CL-SOURCE-FILE "cffi-toolchain" "toolchain" "c-toolchain"> as having been successful.
  2: [RETRY                        ] Retry EVAL of current toplevel form.
  3: [CONTINUE                     ] Ignore error and continue loading file "/home/pi/.roswell/lisp/quicklisp/dists/quicklisp/software/cffi_0.17.1/cffi-libffi.asd".
  4: [ABORT                        ] Abort loading file "/home/pi/.roswell/lisp/quicklisp/dists/quicklisp/software/cffi_0.17.1/cffi-libffi.asd".
  5:                                 Retry ASDF operation.
  6: [CLEAR-CONFIGURATION-AND-RETRY] Retry ASDF operation after resetting the configuration.
  7:                                 Give up on "hu.dwim.bluez"
  8:                                 Exit debugger, returning to top level.

((FLET #:H0 :IN LOAD-ASD) #<COMPILE-FILE-ERROR {517AB901}>)
0] 

よく見たらcffiか。。。

githubからcffiをもらってくる

git clone https://github.com/cffi/cffi
cd ~/.roswell/local-projects
ln -s ~/cffi
cd
ros run
(ql:quickload :hu.dwim.bluez)
To load "hu.dwim.bluez":
  Load 1 ASDF system:
    hu.dwim.bluez
; Loading "hu.dwim.bluez"
; pkg-config libffi --cflags

.; cc -marm -o /home/pi/.cache/common-lisp/sbcl-1.3.9-linux-arm/sbcl-bin-1.3.9/home/pi/cffi/libffi/libffi-types__grovel-tmp7LQ0A0VI -I/home/pi/cffi/ /home/pi/.cache/common-lisp/sbcl-1.3.9-linux-arm/sbcl-bin-1.3.9/home/pi/cffi/libffi/libffi-types__grovel.c
/home/pi/.cache/common-lisp/sbcl-1.3.9-linux-arm/sbcl-bin-1.3.9/home/pi/cffi/libffi/libffi-types__grovel.c: In function ‘main’:
/home/pi/.cache/common-lisp/sbcl-1.3.9-linux-arm/sbcl-bin-1.3.9/home/pi/cffi/libffi/libffi-types__grovel.c:82:41: error: ‘FFI_UNIX64’ undeclared (first use in this function)
   fprintf(output, "%"PRIiMAX, (intmax_t)FFI_UNIX64);
                                         ^
/home/pi/.cache/common-lisp/sbcl-1.3.9-linux-arm/sbcl-bin-1.3.9/home/pi/cffi/libffi/libffi-types__grovel.c:82:41: note: each undeclared identifier is reported only once for each function it appears in

debugger invoked on a CFFI-GROVEL:GROVEL-ERROR: Subprocess #<UIOP/RUN-PROGRAM::PROCESS-INFO {524CBF21}>
 with command ("cc" "-marm" "-o" "/home/pi/.cache/common-lisp/sbcl-1.3.9-linux-arm/sbcl-bin-1.3.9/home/pi/cffi/libffi/libffi-types__grovel-tmp7LQ0A0VI" "-I/home/pi/cffi/" "/home/pi/.cache/common-lisp/sbcl-1.3.9-linux-arm/sbcl-bin-1.3.9/home/pi/cffi/libffi/libffi-types__grovel.c")
 exited with error code 1

Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.

restarts (invokable by number or by possibly-abbreviated name):
  0: [RETRY                        ] Retry PROCESS-OP on #<GROVEL-FILE "cffi-libffi" "libffi" "libffi-types">.
  1: [ACCEPT                       ] Continue, treating PROCESS-OP on #<GROVEL-FILE "cffi-libffi" "libffi" "libffi-types"> as having been successful.
  2:                                 Retry ASDF operation.
  3: [CLEAR-CONFIGURATION-AND-RETRY] Retry ASDF operation after resetting the configuration.
  4: [ABORT                        ] Give up on "hu.dwim.bluez"
  5:                                 Exit debugger, returning to top level.

(CFFI-GROVEL:GROVEL-ERROR "~a" #<UIOP/RUN-PROGRAM:SUBPROCESS-ERROR {524CC061}>)
0] 

FFI_UNIX64 はなんか32ビットarmのlibffiにはないっぽい。

diff --git a/libffi/libffi-types.lisp b/libffi/libffi-types.lisp
index 939a87b..1b47c98 100644
--- a/libffi/libffi-types.lisp
+++ b/libffi/libffi-types.lisp
@@ -69,6 +69,7 @@
 (cenum abi
  ((:default-abi "FFI_DEFAULT_ABI"))
  ((:sysv "FFI_SYSV"))
+ #-arm
  ((:unix64 "FFI_UNIX64")))
 
 (ctype ffi-abi "ffi_abi")

修正してもう一度 hu.dwim.bleuz をロード。

(ql:quickload :hu.dwim.bluez)
; Loading "hu.dwim.bluez"
; CFFI/C2FFI is generating the file #P"/home/pi/hu.dwim.bluez/c2ffi-spec/bluez.arm-pc-linux-gnu.lisp"
........
debugger invoked on a COMMON-LISP:SIMPLE-ERROR: Key :STORAGE-CLASS not found in json entry ((:TAG . "function") (:NAME . "close") (:LOCATION . "/usr/include/unistd.h:353:12") (:VARIADIC) (:INLINE) (:STORAGE--CLASS . "extern") (:PARAMETERS ((:TAG . "parameter") (:NAME . "__fd") (:TYPE (:TAG . ":int")))) (:RETURN-TYPE (:TAG . ":int"))).

Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.

restarts (invokable by number or by possibly-abbreviated name):
  0: [RETRY                        ] Retry GENERATE-LISP-OP on #<C2FFI-FILE "hu.dwim.bluez" "c2ffi-spec" "bluez.h">.
  1: [ACCEPT                       ] Continue, treating GENERATE-LISP-OP on #<C2FFI-FILE "hu.dwim.bluez" "c2ffi-spec" "bluez.h"> as having been successful.
  2:                                 Retry ASDF operation.
  3: [CLEAR-CONFIGURATION-AND-RETRY] Retry ASDF operation after resetting the configuration.
  4: [ABORT                        ] Give up on "hu.dwim.bluez"
  5:                                 Exit debugger, returning to top level.

(CFFI/C2FFI::JSON-VALUE #<unavailable argument> #<unavailable argument> :OTHERWISE COMMON-LISP:NIL)
0] 

:STORAGE-CLASS not found?

なかなか上手く行かない。。。

Raspberry PiでCommon Lispの起動が遅い

Raspberry Pi 3 Model B買ってCommon Lisp入れた - gos-k’s blog でちょっと触っていて起動が遅い気がするのでバージョン確認出来たものについてとりあえず時間計測する。

ros use sbcl-bin
time ros run -e "(quit)"
real   0m26.860s
user    0m24.720s
sys 0m0.480s
ros use ccl-bin
time ros run -e "(quit)"
real 0m8.136s
user    0m7.890s
sys 0m0.170s
ros use ecl
time ros run -e "(quit)"
real   0m7.412s
user    0m7.050s
sys 0m0.100s

sbclが極端に遅い?

pc側で何となくstraceして、何となくwcしてみる。

strace -o sbcl-strace-quit.txt ros run -L sbcl-bin -e "(quit)"
strace -o ccl-strace-quit.txt ros run -L ccl-bin -e "(quit)"
strace -o ecl-strace-quit.txt ros run -L ecl -e "(quit)"
wc *-strace-quit.txt
   1407    9939  106260 ccl-strace-quit.txt
   6254   38242  632461 ecl-strace-quit.txt
   7705   46700  728444 sbcl-strace-quit.txt

cclが少なめだな。 raspi側で何となくstraceして、何となくwcしてみる。

   1216    7387   82418 ccl-strace-quit.txt
   6867   40908  699180 ecl-strace-quit.txt
  25343  151090 1769082 sbcl-strace-quit.txt

他は数割だけど、sbclの行数が3倍以上違うのなんだろ? プロセッサのアーキテクチャが違うとはいえ、同じosの同じ処理系でこんなに呼び出し回数違うもんなのか?

raspi上でpython, ruby, nodeあたりを起動してみて時間は計ってないけど、体感的にはenterキー押すのと処理系の起動にタイムラグがほぼないので1秒より遥に短いはず。

sbclの結果をざっと眺めると cacheflush を連発してるからこれが遅いのか? あとopenの前にパスの全部にlstat出しまくってるのもこれなんんだろ? まあいずれにしてもCommon Lispは現状処理系選んで起動に8秒か。。。

Raspberry Pi 3 Model B買ってCommon Lisp入れた

Raspberry Pi 3 Model B - Raspberry Pi を購入したのでCommon Lispを動かしてみたメモ。

NOOBSのSDカード作成

PC上で Download NOOBS for Raspberry Pi から NOOBS LITE v1.9をダウンロードし、 NOOBS For Raspberry Pi – Rants & Raves – The Blog! を参考にしてSDカードを作成する。

Raspbianインストール

よく分からないので公式のRaspbianをインストールする。

どこかにありそうな情報を取得

lscpu
Architecture:          armv7l
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
Model name:            ARMv7 Processor rev 4 (v7l)
CPU max MHz:           1200.0000
CPU min MHz:           600.0000
uname -a
Linux raspberrypi 4.4.11-v7+ #888 SMP Mon May 23 20:10:33 BST 2016 armv7l GNU/Linux

公式には ARMv8 て書いてあるけど v7 なのか?

Linuxbrewインストール

sudo apt-get install build-essential curl git python-setuptools ruby
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Linuxbrew/install/master/install)"
echo 'Pexport ATH="$HOME/.linuxbrew/bin:$PATH"' >> .profile
. ~/.profile
brew install hello

動かない。。。

==> Downloading https://ftpmirror.gnu.org/hello/hello-2.10.tar.gz
==> Downloading from http://ftp.jaist.ac.jp/pub/GNU/hello/hello-2.10.tar.gz
######################################################################## 100.0%
==> ./configure --disable-silent-rules --prefix=/home/pi/.linuxbrew/Cellar/hello/2.10
*** Error in `/usr/bin/gcc-4.9': double free or corruption (top): 0x01e66718 ***
Last 15 lines from /home/pi/.cache/Homebrew/Logs/hello/01.configure:
--prefix=/home/pi/.linuxbrew/Cellar/hello/2.10

configure: WARNING: unrecognized options: --disable-debug
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... no
checking for mawk... mawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking for gcc... /usr/bin/gcc-4.9
checking whether the C compiler works... no
configure: error: in `/tmp/hello-20160910-2414-1lq40k3/hello-2.10':
configure: error: C compiler cannot create executables
See `config.log' for more details

READ THIS: https://github.com/Linuxbrew/brew/blob/master/share/doc/homebrew/Troubleshooting.md#troubleshooting
If reporting this issue please do so at (not Homebrew/brew):
  https://github.com/Linuxbrew/homebrew-core/issues

C compiler cannot create executables って何だ? double free or corruption の方か? alternativesを使ってgccを4.8に変えてみたが、linuxbrewの使うgccバージョンが4.9から変更されなかったのでとりあえず諦める。

Roswellインストール

linuxbrewを使わず直接roswellをビルドする。

sudo apt-get install autoconf automake libcurl4-openssl-dev
git clone https://github.com/roswell/roswell.git
cd roswell
./bootstrap
./configure
make
make install
ros init
ros --version
roswell 0.0.6.65(00f7451)

処理系色々インストール

SBCL

ros install sbcl-bin
ros use sbcl-bin
ros run -- --version
SBCL 1.3.9

CCL

ros install ccl-bin
ros use ccl-bin
ros run -- --version
Version 1.11-r16635  (LinuxARM32)

ECL

ros install ecl
ros use ecl
ros run -- --version
ECL 16.1.2

CMUL

ros install cmu-bin
Installing cmu-bin/21a...
Downloading archive:https://common-lisp.net/project/cmucl/downloads/release/21a/cmucl-21a-armhf-linux.tar.bz2
Downloading https://common-lisp.net/project/cmucl/downloads/release/21a/cmucl-21a-armhf-linux.tar.bz2
Unhandled UIOP/RUN-PROGRAM:SUBPROCESS-ERROR:
  Subprocess with command "/usr/local/bin/ros roswell-internal-use download https://common-lisp.net/project/cmucl/downloads/release/21a/cmucl-21a-armhf-linux.tar.bz2 /home/pi/.roswell/archives/cmucl-21a-armhf-linux.tar.bz2"
 exited with error code 2

Backtrace for: #<SB-THREAD:THREAD "main thread" RUNNING {52D82811}>
0: (SB-DEBUG::DEBUGGER-DISABLED-HOOK #<UIOP/RUN-PROGRAM:SUBPROCESS-ERROR {52A38E91}> #<unavailable argument>)
1: (SB-DEBUG::RUN-HOOK SB-EXT:*INVOKE-DEBUGGER-HOOK* #<UIOP/RUN-PROGRAM:SUBPROCESS-ERROR {52A38E91}>)
2: (INVOKE-DEBUGGER #<UIOP/RUN-PROGRAM:SUBPROCESS-ERROR {52A38E91}>)
3: (CERROR "IGNORE-ERROR-STATUS" UIOP/RUN-PROGRAM:SUBPROCESS-ERROR :COMMAND "/usr/local/bin/ros roswell-internal-use download https://common-lisp.net/project/cmucl/downloads/release/21a/cmucl-21a-armhf-linux.tar.bz2 /home/pi/.roswell/archives/cmucl-21a-armhf-linux.tar.bz2" :CODE 2 :PROCESS NIL)
4: (UIOP/RUN-PROGRAM::%CHECK-RESULT 2 :COMMAND "/usr/local/bin/ros roswell-internal-use download https://common-lisp.net/project/cmucl/downloads/release/21a/cmucl-21a-armhf-linux.tar.bz2 /home/pi/.roswell/archives/cmucl-21a-armhf-linux.tar.bz2" :PROCESS NIL :IGNORE-ERROR-STATUS NIL)
5: ((LAMBDA (UIOP/RUN-PROGRAM::REDUCED-INPUT #:G7867) :IN UIOP/RUN-PROGRAM::%USE-SYSTEM) :INTERACTIVE #<unavailable argument>)
6: (UIOP/RUN-PROGRAM::%USE-SYSTEM "/usr/local/bin/ros roswell-internal-use download https://common-lisp.net/project/cmucl/downloads/release/21a/cmucl-21a-armhf-linux.tar.bz2 /home/pi/.roswell/archives/cmucl-21a-armhf-linux.tar.bz2" :INPUT :INTERACTIVE :ERROR-OUTPUT :INTERACTIVE :IF-INPUT-DOES-NOT-EXIST :ERROR :IF-OUTPUT-EXISTS :OVERWRITE :IF-ERROR-OUTPUT-EXISTS :OVERWRITE :ELEMENT-TYPE :DEFAULT :EXTERNAL-FORMAT :UTF-8 :OUTPUT :INTERACTIVE :ERROR-OUTPUT :INTERACTIVE)
7: (ROS:ROSWELL ("roswell-internal-use" "download" "https://common-lisp.net/project/cmucl/downloads/release/21a/cmucl-21a-armhf-linux.tar.bz2" #P"/home/pi/.roswell/archives/cmucl-21a-armhf-linux.tar.bz2") :INTERACTIVE NIL)
8: (ROS.INSTALL::CMU-BIN-DOWNLOAD (:TARGET "cmu-bin" :VERSION "21a" :ARGV NIL))
9: (INSTALL-IMPL "cmu-bin" NIL NIL)
10: (INSTALL-IMPL-IF-PROBED "cmu-bin" NIL NIL)
11: (MAIN #<unused argument> "cmu-bin")
12: (SB-INT:SIMPLE-EVAL-IN-LEXENV (APPLY (QUOTE MAIN) ROS:*ARGV*) #<NULL-LEXENV>)
13: (SB-INT:SIMPLE-EVAL-IN-LEXENV (ROS:QUIT (APPLY (QUOTE MAIN) ROS:*ARGV*)) #<NULL-LEXENV>)
14: (SB-EXT:EVAL-TLF (ROS:QUIT (APPLY (QUOTE MAIN) ROS:*ARGV*)) NIL NIL)
15: ((LABELS SB-FASL::EVAL-FORM :IN SB-INT:LOAD-AS-SOURCE) (ROS:QUIT (APPLY (QUOTE MAIN) ROS:*ARGV*)) NIL)
16: (SB-INT:LOAD-AS-SOURCE #<CONCATENATED-STREAM :STREAMS NIL {52C0C2A1}> :VERBOSE NIL :PRINT NIL :CONTEXT "loading")
17: ((FLET SB-FASL::LOAD-STREAM :IN LOAD) #<CONCATENATED-STREAM :STREAMS NIL {52C0C2A1}> NIL)
18: (LOAD #<CONCATENATED-STREAM :STREAMS NIL {52C0C2A1}> :VERBOSE NIL :PRINT NIL :IF-DOES-NOT-EXIST T :EXTERNAL-FORMAT :DEFAULT)
19: ((FLET ROS::BODY :IN ROS:SCRIPT) #<SB-SYS:FD-STREAM for "file /usr/local/etc/roswell/install.ros" {52C0C179}>)
20: (ROS:SCRIPT :SCRIPT "/usr/local/etc/roswell/install.ros" "install" "cmu-bin")
21: (ROS:RUN ((:SCRIPT "/usr/local/etc/roswell/install.ros" "install" "cmu-bin") (:QUIT NIL)))
22: (SB-INT:SIMPLE-EVAL-IN-LEXENV (ROS:RUN (QUOTE ((:SCRIPT "/usr/local/etc/roswell/install.ros" "install" "cmu-bin") (:QUIT NIL)))) #<NULL-LEXENV>)
23: (EVAL (ROS:RUN (QUOTE ((:SCRIPT "/usr/local/etc/roswell/install.ros" "install" "cmu-bin") (:QUIT NIL)))))
24: (SB-IMPL::PROCESS-EVAL/LOAD-OPTIONS ((:EVAL . "(progn #-ros.init(cl:load \"/usr/local/etc/roswell/init.lisp\"))") (:EVAL . "(ros:quicklisp)") (:EVAL . "(ros:run '((:script \"/usr/local/etc/roswell/install.ros\"\"install\"\"cmu-bin\")(:quit ())))")))
25: (SB-IMPL::TOPLEVEL-INIT)
26: ((FLET #:WITHOUT-INTERRUPTS-BODY-72 :IN SB-EXT:SAVE-LISP-AND-DIE))
27: ((LABELS SB-IMPL::RESTART-LISP :IN SB-EXT:SAVE-LISP-AND-DIE))

unhandled condition in --disable-debugger mode, quitting

ABCL

ros install abcl-bin
ros use abcl-bin
Invalid maximum heap size: -Xmx4g
The specified size exceeds the maximum representable size.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
Error: unable to use 'abcl-bin'

4G固定なのかな?

CLISP

ros install clisp
Installing clisp/2.49...
prefix: #P"/home/pi/.roswell/impls/armhf/linux/clisp/2.49/"
chdir /home/pi/.roswell/src/clisp-2.49/src/
Unhandled UIOP/RUN-PROGRAM:SUBPROCESS-ERROR:
  Subprocess with command "make install"
 exited with error code 2

Backtrace for: #<SB-THREAD:THREAD "main thread" RUNNING {52D85581}>
0: (SB-DEBUG::DEBUGGER-DISABLED-HOOK #<UIOP/RUN-PROGRAM:SUBPROCESS-ERROR {514A8129}> #<unavailable argument>)
1: (SB-DEBUG::RUN-HOOK SB-EXT:*INVOKE-DEBUGGER-HOOK* #<UIOP/RUN-PROGRAM:SUBPROCESS-ERROR {514A8129}>)
2: (INVOKE-DEBUGGER #<UIOP/RUN-PROGRAM:SUBPROCESS-ERROR {514A8129}>)
3: (CERROR "IGNORE-ERROR-STATUS" UIOP/RUN-PROGRAM:SUBPROCESS-ERROR :COMMAND "make install" :CODE 2 :PROCESS NIL)
4: (UIOP/RUN-PROGRAM::%CHECK-RESULT 2 :COMMAND "make install" :PROCESS NIL :IGNORE-ERROR-STATUS NIL)
5: ((LAMBDA (UIOP/RUN-PROGRAM::REDUCED-INPUT #:G7867) :IN UIOP/RUN-PROGRAM::%USE-SYSTEM) NIL #<unavailable argument>)
6: ((FLET #:BEFORE7398 :IN UIOP/RUN-PROGRAM::%CALL-WITH-PROGRAM-IO) #P"/tmp/tmpZWLLOVAH.tmp")
7: (UIOP/STREAM:CALL-WITH-TEMPORARY-FILE #<CLOSURE (FLET #:BEFORE7398 :IN UIOP/RUN-PROGRAM::%CALL-WITH-PROGRAM-IO) {72593B95}> :WANT-STREAM-P NIL :WANT-PATHNAME-P T :DIRECTION :IO :KEEP NIL :AFTER NIL :DIRECTORY NIL :TYPE "tmp" :PREFIX NIL :SUFFIX NIL :ELEMENT-TYPE NIL :EXTERNAL-FORMAT NIL)
8: ((LABELS UIOP/RUN-PROGRAM::HARD-CASE :IN UIOP/RUN-PROGRAM::%CALL-WITH-PROGRAM-IO))
9: (UIOP/RUN-PROGRAM::%USE-SYSTEM "make install" :INPUT NIL :ERROR-OUTPUT NIL :IF-INPUT-DOES-NOT-EXIST :ERROR :IF-OUTPUT-EXISTS :OVERWRITE :IF-ERROR-OUTPUT-EXISTS :OVERWRITE :ELEMENT-TYPE :DEFAULT :EXTERNAL-FORMAT :UTF-8 :OUTPUT T)
10: (ROS.INSTALL::CLISP-INSTALL (:TARGET "clisp" :VERSION "2.49" :ARGV NIL))
11: (INSTALL-IMPL "clisp" NIL NIL)
12: (INSTALL-IMPL-IF-PROBED "clisp" NIL NIL)
13: (MAIN #<unused argument> "clisp")
14: (SB-INT:SIMPLE-EVAL-IN-LEXENV (APPLY (QUOTE MAIN) ROS:*ARGV*) #<NULL-LEXENV>)
15: (SB-INT:SIMPLE-EVAL-IN-LEXENV (ROS:QUIT (APPLY (QUOTE MAIN) ROS:*ARGV*)) #<NULL-LEXENV>)
16: (SB-EXT:EVAL-TLF (ROS:QUIT (APPLY (QUOTE MAIN) ROS:*ARGV*)) NIL NIL)
17: ((LABELS SB-FASL::EVAL-FORM :IN SB-INT:LOAD-AS-SOURCE) (ROS:QUIT (APPLY (QUOTE MAIN) ROS:*ARGV*)) NIL)
18: (SB-INT:LOAD-AS-SOURCE #<CONCATENATED-STREAM :STREAMS NIL {52C0B261}> :VERBOSE NIL :PRINT NIL :CONTEXT "loading")
19: ((FLET SB-FASL::LOAD-STREAM :IN LOAD) #<CONCATENATED-STREAM :STREAMS NIL {52C0B261}> NIL)
20: (LOAD #<CONCATENATED-STREAM :STREAMS NIL {52C0B261}> :VERBOSE NIL :PRINT NIL :IF-DOES-NOT-EXIST T :EXTERNAL-FORMAT :DEFAULT)
21: ((FLET ROS::BODY :IN ROS:SCRIPT) #<SB-SYS:FD-STREAM for "file /usr/local/etc/roswell/install.ros" {52C0B139}>)
22: (ROS:SCRIPT :SCRIPT "/usr/local/etc/roswell/install.ros" "install" "clisp")
23: (ROS:RUN ((:SCRIPT "/usr/local/etc/roswell/install.ros" "install" "clisp") (:QUIT NIL)))
24: (SB-INT:SIMPLE-EVAL-IN-LEXENV (ROS:RUN (QUOTE ((:SCRIPT "/usr/local/etc/roswell/install.ros" "install" "clisp") (:QUIT NIL)))) #<NULL-LEXENV>)
25: (EVAL (ROS:RUN (QUOTE ((:SCRIPT "/usr/local/etc/roswell/install.ros" "install" "clisp") (:QUIT NIL)))))
26: (SB-IMPL::PROCESS-EVAL/LOAD-OPTIONS ((:EVAL . "(progn #-ros.init(cl:load \"/usr/local/etc/roswell/init.lisp\"))") (:EVAL . "(ros:quicklisp)") (:EVAL . "(ros:run '((:script \"/usr/local/etc/roswell/install.ros\"\"install\"\"clisp\")(:quit ())))")))
27: (SB-IMPL::TOPLEVEL-INIT)
28: ((FLET #:WITHOUT-INTERRUPTS-BODY-72 :IN SB-EXT:SAVE-LISP-AND-DIE))
29: ((LABELS SB-IMPL::RESTART-LISP :IN SB-EXT:SAVE-LISP-AND-DIE))

unhandled condition in --disable-debugger mode, quitting

サムライト退職

8月でサムライト株式会社を退職しました。理由はお察し下さい。

1年チョットの在職中、前半はCommon LispJavaScriptを書き、後半はCommon Lispを書いてました。

Webサービスの開発は初めてだったのでSQLに悩まされる日々ではありつつ良い経験だったと思う。

次の会社で何やるのかよく分からんけど、取り敢えずプライベートでCommon Lispを書きつつ、仕事もまだまだプログラマーを続けます。

 

MGLを触るためにとりあえず動作確認

Common Lisp機械学習ライブラリが一部で流行りなので、ちょっと触ってみた。

cl-cudaのquicklispへの登録が止まっているので、gitで直接もらってくる。

cd ~/.roswell/local-projects
git clone https://github.com/takagi/cl-cuda.git
git clone https://github.com/melisgl/mgl-mat.git
git clone https://github.com/melisgl/mgl.git

blaslapackをインストール。

apt-get install libblas-dev liblapack-dev

slimeを立ち上げmglをロードしようとすると、

(ql:quickload :mgl)

エラーとなる。

Component :OSICAT not found, required by #<SYSTEM "cl-cuda">

よく分からんけど、一度cl-cudaをロードしてからだと成功する。

(ql:quickload :cl-cuda)
(ql:quickload :mgl)

とりあえずmglのテストを実行。

(in-package :mgl-test)
(time (test))

なんかエラーになった。

arithmetic error FLOATING-POINT-INVALID-OPERATION signalled
   [Condition of type FLOATING-POINT-INVALID-OPERATION]

Restarts:
 0: [RETRY] Retry SLIME REPL evaluation request.
 1: [*ABORT] Return to SLIME's top level.
 2: [ABORT] abort thread (#<THREAD "repl-thread" RUNNING {10050A0033}>)

Backtrace:
  0: (SB-KERNEL:TWO-ARG-< #<SINGLE-FLOAT quiet NaN> 84.919205)
  1: (SB-VM::GENERIC-<)
  2: ((FLET MGL-CG::UPDATE3 :IN CG))
...

NaNと比較した? もう一度やり直してみるも結果は変わらず。

mysqlのパスワードなしアカウント

ローカルの開発用mysqlでパスワードなしのアカウントを作る場合に、

update user set password='' where User='root';

というのを見かけたがやっても上手くいかず、describe user;でフィールド見て、

update user set authentication_string='' where User='root';

とやったら出来たのだけど、バージョン違いだろうか?

Common Lispでdisassembleを使ったりチューニングしたり

Common Lispはdisassembleも付いてきてチューニングできるらしいとゆー話をよく聞くので、逆アセンブルしてみた。 題材としてはこの辺を参考に

caching-gemmがバグってたので厳密には同じコードではなくなっている。

処理系のバージョンは次の通り

  • SBCL 1.3.2
  • CCL 1.11

simple-gemm

単純なGEMM。

(defun simple-gemm (ma mb)
  (declare (optimize (speed 3) (debug 0) (safety 0)))
  (declare (type (simple-array single-float (* *)) ma mb))
  (let ((rows (array-dimension ma 0))
        (cols (array-dimension mb 1)))
    (declare (type fixnum rows cols))
    (let ((result (make-matrix rows cols)))
      (declare (type (simple-array single-float (* *)) result))
      (dotimes (row rows)
        (dotimes (col cols)
          (dotimes (k cols)
            (incf (aref result row col)
                  (* (aref ma row k) (aref mb k col))))))
      result)))

SBCLの場合。 普通っぽい。

CL-USER> (disassemble 'simple-gemm)
; disassembly for SIMPLE-GEMM
; Size: 283 bytes. Origin: #x1004C5CA65
; A65:       4C894DE8         MOV [RBP-24], R9                ; no-arg-parsing entry point
; A69:       4C8945E0         MOV [RBP-32], R8
; A6D:       4D8B6831         MOV R13, [R8+49]
; A71:       4D8B5139         MOV R10, [R9+57]
; A75:       4C896DF8         MOV [RBP-8], R13
; A79:       4C8955F0         MOV [RBP-16], R10
; A7D:       488D5C24F0       LEA RBX, [RSP-16]
; A82:       4883EC18         SUB RSP, 24
; A86:       498BD5           MOV RDX, R13
; A89:       498BFA           MOV RDI, R10
; A8C:       488B057DFFFFFF   MOV RAX, [RIP-131]              ; #<FDEFINITION for MAKE-MATRIX>
; A93:       B904000000       MOV ECX, 4
; A98:       48892B           MOV [RBX], RBP
; A9B:       488BEB           MOV RBP, RBX
; A9E:       FF5009           CALL QWORD PTR [RAX+9]
; AA1:       480F42E3         CMOVB RSP, RBX
; AA5:       4C8B45E0         MOV R8, [RBP-32]
; AA9:       4C8B4DE8         MOV R9, [RBP-24]
; AAD:       4C8B55F0         MOV R10, [RBP-16]
; AB1:       4C8B6DF8         MOV R13, [RBP-8]
; AB5:       488BF2           MOV RSI, RDX
; AB8:       31DB             XOR EBX, EBX
; ABA:       E9AF000000       JMP L5
; ABF:       90               NOP
; AC0: L0:   31C0             XOR EAX, EAX
; AC2:       E99A000000       JMP L4
; AC7:       660F1F840000000000 NOP
; AD0: L1:   31C9             XOR ECX, ECX
; AD2:       E981000000       JMP L3
; AD7:       660F1F840000000000 NOP
; AE0: L2:   498B5039         MOV RDX, [R8+57]
; AE4:       488BFB           MOV RDI, RBX
; AE7:       48D1FF           SAR RDI, 1
; AEA:       480FAFFA         IMUL RDI, RDX
; AEE:       4801CF           ADD RDI, RCX
; AF1:       498B5011         MOV RDX, [R8+17]
; AF5:       F30F104C7A01     MOVSS XMM1, [RDX+RDI*2+1]
; AFB:       498B5139         MOV RDX, [R9+57]
; AFF:       488BF9           MOV RDI, RCX
; B02:       48D1FF           SAR RDI, 1
; B05:       480FAFFA         IMUL RDI, RDX
; B09:       4801C7           ADD RDI, RAX
; B0C:       498B5111         MOV RDX, [R9+17]
; B10:       F30F10547A01     MOVSS XMM2, [RDX+RDI*2+1]
; B16:       F30F59D1         MULSS XMM2, XMM1
; B1A:       488B5639         MOV RDX, [RSI+57]
; B1E:       488BFB           MOV RDI, RBX
; B21:       48D1FF           SAR RDI, 1
; B24:       480FAFFA         IMUL RDI, RDX
; B28:       4801C7           ADD RDI, RAX
; B2B:       488B5611         MOV RDX, [RSI+17]
; B2F:       F30F104C7A01     MOVSS XMM1, [RDX+RDI*2+1]
; B35:       F30F58CA         ADDSS XMM1, XMM2
; B39:       488B5639         MOV RDX, [RSI+57]
; B3D:       488BFB           MOV RDI, RBX
; B40:       48D1FF           SAR RDI, 1
; B43:       480FAFFA         IMUL RDI, RDX
; B47:       4801C7           ADD RDI, RAX
; B4A:       488B5611         MOV RDX, [RSI+17]
; B4E:       F30F114C7A01     MOVSS [RDX+RDI*2+1], XMM1
; B54:       4883C102         ADD RCX, 2
; B58: L3:   4C39D1           CMP RCX, R10
; B5B:       7C83             JL L2
; B5D:       4883C002         ADD RAX, 2
; B61: L4:   4C39D0           CMP RAX, R10
; B64:       0F8C66FFFFFF     JL L1
; B6A:       4883C302         ADD RBX, 2
; B6E: L5:   4C39EB           CMP RBX, R13
; B71:       0F8C49FFFFFF     JL L0
; B77:       488BD6           MOV RDX, RSI
; B7A:       488BE5           MOV RSP, RBP
; B7D:       F8               CLC
; B7E:       5D               POP RBP
; B7F:       C3               RET

CCLの場合。 対応するLispコードがコメントに付いてて分かりやすい。 S式風謎アセンブラよく見るとよく分からないの色々あるな。 $が即値で%が変数かな? CCLのドキュメントちゃんと読まないと無理そう

CL-USER> (disassemble 'caching-gemm)
;; "home:sandbox;common-lisp;simple-gemm.lisp.newest":1039-1729
    (recover-fn-from-rip)                   ;     [7]
    (pushq (% rbp))                         ;    [14]
    (movq (% rsp) (% rbp))                  ;    [15]
    (pushq (% arg_y))                       ;    [18]
    (pushq (% arg_z))                       ;    [19]
    (movq (@ (% gs) 80) (% stack-temp))     ;    [20]
    (subq ($ 64) (@ (% gs) 80))             ;    [29]
    (movq (@ (% gs) 80) (% imm0))           ;    [39]
    (movq (% stack-temp) (@ (% imm0)))      ;    [48]
    (movq (@ (% gs) #x178) (% stack-temp))  ;    [51]
    (movq (% stack-temp) (@ 8 (% imm0)))    ;    [60]
    (movq (% imm0) (@ (% gs) #x178))        ;    [64]
    (pushq (% save0))                       ;    [73]
    (pushq (% save1))                       ;    [75]
    (pushq (% save2))                       ;    [77]
    (pushq (% save3))                       ;    [79]

;;; (let ((rows (array-dimension ma 0)) (cols (array-dimension mb 1))) (declare (type fixnum rows cols))
    (xorl (% arg_z.l) (% arg_z.l))          ;    [81]
    (movl ($ 16) (% nargs))                 ;    [83]
    (movq (@ 'ARRAY-DIMENSION (% fn)) (% temp0)) ;    [88]
    (lisp-call (@ 10 (% temp0)))            ;    [97]
    (recover-fn-from-rip)                   ;   [100]
    (movq (% arg_z) (% arg_x))              ;   [107]
    (pushq (% arg_x))                       ;   [110]
    (movq (@ -16 (% rbp)) (% arg_y))        ;   [112]
    (movl ($ 8) (% arg_z.l))                ;   [116]
    (movl ($ 16) (% nargs))                 ;   [121]
    (movq (@ 'ARRAY-DIMENSION (% fn)) (% temp0)) ;   [126]
    (lisp-call (@ 10 (% temp0)))            ;   [137]
    (recover-fn-from-rip)                   ;   [140]
    (movq (% arg_z) (% save3))              ;   [147]

;;; (let ((result (make-matrix rows cols))) (declare (type (simple-array single-float (* *)) result)) (d
    (movq (@ -56 (% rbp)) (% arg_y))        ;   [150]
    (movq (% save3) (% arg_z))              ;   [154]
    (movl ($ 16) (% nargs))                 ;   [157]
    (movq (@ 'MAKE-MATRIX (% fn)) (% temp0)) ;   [162]
    (lisp-call (@ 10 (% temp0)))            ;   [169]
    (recover-fn-from-rip)                   ;   [172]
    (pushq (% arg_z))                       ;   [179]

;;; (dotimes (row rows) (dotimes (col cols) (let ((cell (aref result row col))) (declare (type (single-f
    (xorl (% save2.l) (% save2.l))          ;   [180]
    (jmpq L562)                             ;   [183]

;;; (dotimes (col cols) (let ((cell (aref result row col))) (declare (type (single-float) cell)) (dotime
L181
    (xorl (% save1.l) (% save1.l))          ;   [188]
    (jmpq L549)                             ;   [191]

;;; (let ((cell (aref result row col))) (declare (type (single-float) cell)) (dotimes (k cols) (incf cel
L189
    (movq (@ -64 (% rbp)) (% arg_x))        ;   [196]
    (movq (@ 43 (% arg_x)) (% imm0))        ;   [200]
    (sarq ($ 3) (% imm0))                   ;   [204]
    (imulq (% save2) (% imm0))              ;   [208]
    (leaq (@ (% save1) (% imm0)) (% temp0)) ;   [212]
    (movq (@ 11 (% arg_x)) (% arg_z))       ;   [216]
    (movq (% temp0) (% imm0))               ;   [220]
    (shrq (% imm0))                         ;   [223]
    (movss (@ -5 (% arg_z) (% imm0)) (% fp0)) ;   [226]
    (movd (% imm0) (% stack-temp))          ;   [232]
    (movq (@ (% gs) #x178) (% imm0))        ;   [236]
    (movss (% fp0) (@ 16 (% imm0)))         ;   [245]
    (movd (% stack-temp) (% imm0))          ;   [250]

;;; (dotimes (k cols) (incf cell (* (aref ma row k) (aref mb k col))))
    (xorl (% save0.l) (% save0.l))          ;   [254]
    (jmpq L471)                             ;   [257]

;;; (aref ma row k)
L255
    (movq (@ -8 (% rbp)) (% arg_x))         ;   [262]
    (movq (@ 43 (% arg_x)) (% imm0))        ;   [266]
    (sarq ($ 3) (% imm0))                   ;   [270]
    (imulq (% save2) (% imm0))              ;   [274]
    (leaq (@ (% save0) (% imm0)) (% temp0)) ;   [278]
    (movq (@ 11 (% arg_x)) (% arg_z))       ;   [282]
    (movq (% temp0) (% imm0))               ;   [286]
    (shrq (% imm0))                         ;   [289]
    (movss (@ -5 (% arg_z) (% imm0)) (% fp0)) ;   [292]

;;; (aref mb k col)
    (movq (@ -16 (% rbp)) (% arg_z))        ;   [298]
    (movq (@ 43 (% arg_z)) (% imm0))        ;   [302]
    (sarq ($ 3) (% imm0))                   ;   [306]
    (imulq (% save0) (% imm0))              ;   [310]
    (leaq (@ (% save1) (% imm0)) (% temp0)) ;   [314]
    (movq (@ 11 (% arg_z)) (% arg_z))       ;   [318]
    (movq (% temp0) (% imm0))               ;   [322]
    (shrq (% imm0))                         ;   [325]
    (movss (@ -5 (% arg_z) (% imm0)) (% fp1)) ;   [328]

;;; (* (aref ma row k) (aref mb k col))
    (mulss (% fp1) (% fp0))                 ;   [334]

;;; (incf cell (* (aref ma row k) (aref mb k col)))
    (movd (% imm0) (% stack-temp))          ;   [338]
    (movq (@ (% gs) #x178) (% imm0))        ;   [342]
    (movss (% fp0) (@ 32 (% imm0)))         ;   [351]
    (movd (% stack-temp) (% imm0))          ;   [356]
    (movd (% imm0) (% stack-temp))          ;   [360]
    (movq (@ (% gs) #x178) (% imm0))        ;   [364]
    (movss (@ 32 (% imm0)) (% fp1))         ;   [373]
    (movd (% stack-temp) (% imm0))          ;   [378]
    (movd (% imm0) (% stack-temp))          ;   [382]
    (movq (@ (% gs) #x178) (% imm0))        ;   [386]
    (movss (@ 16 (% imm0)) (% fp0))         ;   [395]
    (movd (% stack-temp) (% imm0))          ;   [400]
    (addss (% fp1) (% fp0))                 ;   [404]
    (movd (% imm0) (% stack-temp))          ;   [408]
    (movq (@ (% gs) #x178) (% imm0))        ;   [412]
    (movss (% fp0) (@ 48 (% imm0)))         ;   [421]
    (movd (% stack-temp) (% imm0))          ;   [426]
    (movd (% imm0) (% stack-temp))          ;   [430]
    (movq (@ (% gs) #x178) (% imm0))        ;   [434]
    (movss (@ 48 (% imm0)) (% fp0))         ;   [443]
    (movd (% stack-temp) (% imm0))          ;   [448]
    (movd (% imm0) (% stack-temp))          ;   [452]
    (movq (@ (% gs) #x178) (% imm0))        ;   [456]
    (movss (% fp0) (@ 16 (% imm0)))         ;   [465]
    (movd (% stack-temp) (% imm0))          ;   [470]

;;; (dotimes (k cols) (incf cell (* (aref ma row k) (aref mb k col))))
    (addq ($ 8) (% save0))                  ;   [474]
L471
    (cmpq (% save3) (% save0))              ;   [478]
    (jl L255)                               ;   [481]

;;; (setf (aref result row col) cell)
    (movd (% imm0) (% stack-temp))          ;   [487]
    (movq (@ (% gs) #x178) (% imm0))        ;   [491]
    (movss (@ 16 (% imm0)) (% fp0))         ;   [500]
    (movd (% stack-temp) (% imm0))          ;   [505]
    (movq (% save1) (% arg_y))              ;   [509]
    (movq (% save2) (% arg_x))              ;   [512]
    (movq (@ -64 (% rbp)) (% temp0))        ;   [515]
    (movq (@ 43 (% temp0)) (% imm0))        ;   [519]
    (sarq ($ 3) (% imm0))                   ;   [523]
    (imulq (% arg_x) (% imm0))              ;   [527]
    (leaq (@ (% arg_y) (% imm0)) (% arg_y)) ;   [531]
    (movq (@ 11 (% temp0)) (% arg_x))       ;   [535]
    (movq (% arg_y) (% imm2))               ;   [539]
    (shrq (% imm2))                         ;   [542]
    (movss (% fp0) (@ -5 (% arg_x) (% imm2))) ;   [545]

;;; (dotimes (col cols) (let ((cell (aref result row col))) (declare (type (single-float) cell)) (dotime
    (addq ($ 8) (% save1))                  ;   [552]
L549
    (cmpq (% save3) (% save1))              ;   [556]
    (jl L189)                               ;   [559]

;;; (dotimes (row rows) (dotimes (col cols) (let ((cell (aref result row col))) (declare (type (single-f
    (addq ($ 8) (% save2))                  ;   [565]
L562
    (movq (@ -56 (% rbp)) (% arg_z))        ;   [569]
    (cmpq (% arg_z) (% save2))              ;   [573]
    (jl L181)                               ;   [576]

;;; (let ((result (make-matrix rows cols))) (declare (type (simple-array single-float (* *)) result)) (d
    (movq (@ -64 (% rbp)) (% arg_z))        ;   [582]
    (addq ($ 16) (% rsp))                   ;   [586]
    (popq (% save3))                        ;   [590]
    (popq (% save2))                        ;   [592]
    (popq (% save1))                        ;   [594]
    (popq (% save0))                        ;   [596]
    (movq (@ (% gs) #x178) (% imm0))        ;   [598]
    (movq (@ 8 (% imm0)) (% stack-temp))    ;   [607]
    (movq (@ (% imm0)) (% imm0))            ;   [611]
    (movq (% imm0) (@ (% gs) 80))           ;   [614]
    (movq (% stack-temp) (@ (% gs) #x178))  ;   [623]
    (leaveq)                                ;   [632]
    (retq)                                  ;   [633]
CL-USER> (disassemble 'simple-gemm)
;; "home:sandbox;common-lisp;simple-gemm.lisp.newest":479-1037
    (recover-fn-from-rip)                   ;     [7]
    (pushq (% rbp))                         ;    [14]
    (movq (% rsp) (% rbp))                  ;    [15]
    (pushq (% arg_y))                       ;    [18]
    (pushq (% arg_z))                       ;    [19]
    (movq (@ (% gs) 80) (% stack-temp))     ;    [20]
    (subq ($ 48) (@ (% gs) 80))             ;    [29]
    (movq (@ (% gs) 80) (% imm0))           ;    [39]
    (movq (% stack-temp) (@ (% imm0)))      ;    [48]
    (movq (@ (% gs) #x178) (% stack-temp))  ;    [51]
    (movq (% stack-temp) (@ 8 (% imm0)))    ;    [60]
    (movq (% imm0) (@ (% gs) #x178))        ;    [64]
    (pushq (% save0))                       ;    [73]
    (pushq (% save1))                       ;    [75]
    (pushq (% save2))                       ;    [77]
    (pushq (% save3))                       ;    [79]

;;; (let ((rows (array-dimension ma 0)) (cols (array-dimension mb 1))) (declare (type fixnum rows cols))
    (xorl (% arg_z.l) (% arg_z.l))          ;    [81]
    (movl ($ 16) (% nargs))                 ;    [83]
    (movq (@ 'ARRAY-DIMENSION (% fn)) (% temp0)) ;    [88]
    (lisp-call (@ 10 (% temp0)))            ;    [97]
    (recover-fn-from-rip)                   ;   [100]
    (movq (% arg_z) (% arg_x))              ;   [107]
    (pushq (% arg_x))                       ;   [110]
    (movq (@ -16 (% rbp)) (% arg_y))        ;   [112]
    (movl ($ 8) (% arg_z.l))                ;   [116]
    (movl ($ 16) (% nargs))                 ;   [121]
    (movq (@ 'ARRAY-DIMENSION (% fn)) (% temp0)) ;   [126]
    (lisp-call (@ 10 (% temp0)))            ;   [137]
    (recover-fn-from-rip)                   ;   [140]
    (pushq (% arg_z))                       ;   [147]

;;; (let ((result (make-matrix rows cols))) (declare (type (simple-array single-float (* *)) result)) (d
    (movq (@ -56 (% rbp)) (% arg_y))        ;   [148]
    (movl ($ 16) (% nargs))                 ;   [152]
    (movq (@ 'MAKE-MATRIX (% fn)) (% temp0)) ;   [157]
    (lisp-call (@ 10 (% temp0)))            ;   [169]
    (recover-fn-from-rip)                   ;   [172]
    (movq (% arg_z) (% save3))              ;   [179]

;;; (dotimes (row rows) (dotimes (col cols) (dotimes (k cols) (incf (aref result row col) (* (aref ma ro
    (xorl (% save2.l) (% save2.l))          ;   [182]
    (jmpq L479)                             ;   [185]

;;; (dotimes (col cols) (dotimes (k cols) (incf (aref result row col) (* (aref ma row k) (aref mb k col)
L183
    (xorl (% save1.l) (% save1.l))          ;   [190]
    (jmpq L462)                             ;   [193]

;;; (dotimes (k cols) (incf (aref result row col) (* (aref ma row k) (aref mb k col))))
L191
    (xorl (% save0.l) (% save0.l))          ;   [198]
    (jmpq L445)                             ;   [201]

;;; (aref ma row k)
L199
    (movq (@ -8 (% rbp)) (% arg_x))         ;   [206]
    (movq (@ 43 (% arg_x)) (% imm0))        ;   [210]
    (sarq ($ 3) (% imm0))                   ;   [214]
    (imulq (% save2) (% imm0))              ;   [218]
    (leaq (@ (% save0) (% imm0)) (% temp0)) ;   [222]
    (movq (@ 11 (% arg_x)) (% arg_z))       ;   [226]
    (movq (% temp0) (% imm0))               ;   [230]
    (shrq (% imm0))                         ;   [233]
    (movss (@ -5 (% arg_z) (% imm0)) (% fp0)) ;   [236]

;;; (aref mb k col)
    (movq (@ -16 (% rbp)) (% arg_z))        ;   [242]
    (movq (@ 43 (% arg_z)) (% imm0))        ;   [246]
    (sarq ($ 3) (% imm0))                   ;   [250]
    (imulq (% save0) (% imm0))              ;   [254]
    (leaq (@ (% save1) (% imm0)) (% temp0)) ;   [258]
    (movq (@ 11 (% arg_z)) (% arg_z))       ;   [262]
    (movq (% temp0) (% imm0))               ;   [266]
    (shrq (% imm0))                         ;   [269]
    (movss (@ -5 (% arg_z) (% imm0)) (% fp1)) ;   [272]

;;; (* (aref ma row k) (aref mb k col))
    (mulss (% fp1) (% fp0))                 ;   [278]

;;; (incf (aref result row col) (* (aref ma row k) (aref mb k col)))
    (movd (% imm0) (% stack-temp))          ;   [282]
    (movq (@ (% gs) #x178) (% imm0))        ;   [286]
    (movss (% fp0) (@ 16 (% imm0)))         ;   [295]
    (movd (% stack-temp) (% imm0))          ;   [300]
    (movq (@ 43 (% save3)) (% imm0))        ;   [304]
    (sarq ($ 3) (% imm0))                   ;   [308]
    (imulq (% save2) (% imm0))              ;   [312]
    (leaq (@ (% save1) (% imm0)) (% temp0)) ;   [316]
    (movq (@ 11 (% save3)) (% arg_z))       ;   [320]
    (movq (% temp0) (% imm0))               ;   [324]
    (shrq (% imm0))                         ;   [327]
    (movss (@ -5 (% arg_z) (% imm0)) (% fp0)) ;   [330]
    (movd (% imm0) (% stack-temp))          ;   [336]
    (movq (@ (% gs) #x178) (% imm0))        ;   [340]
    (movss (@ 16 (% imm0)) (% fp1))         ;   [349]
    (movd (% stack-temp) (% imm0))          ;   [354]
    (addss (% fp1) (% fp0))                 ;   [358]
    (movd (% imm0) (% stack-temp))          ;   [362]
    (movq (@ (% gs) #x178) (% imm0))        ;   [366]
    (movss (% fp0) (@ 32 (% imm0)))         ;   [375]
    (movd (% stack-temp) (% imm0))          ;   [380]
    (movd (% imm0) (% stack-temp))          ;   [384]
    (movq (@ (% gs) #x178) (% imm0))        ;   [388]
    (movss (@ 32 (% imm0)) (% fp0))         ;   [397]
    (movd (% stack-temp) (% imm0))          ;   [402]
    (movq (% save1) (% arg_y))              ;   [406]
    (movq (% save2) (% arg_x))              ;   [409]
    (movq (% save3) (% temp0))              ;   [412]
    (movq (@ 43 (% temp0)) (% imm0))        ;   [415]
    (sarq ($ 3) (% imm0))                   ;   [419]
    (imulq (% arg_x) (% imm0))              ;   [423]
    (leaq (@ (% arg_y) (% imm0)) (% arg_y)) ;   [427]
    (movq (@ 11 (% temp0)) (% arg_x))       ;   [431]
    (movq (% arg_y) (% imm2))               ;   [435]
    (shrq (% imm2))                         ;   [438]
    (movss (% fp0) (@ -5 (% arg_x) (% imm2))) ;   [441]

;;; (dotimes (k cols) (incf (aref result row col) (* (aref ma row k) (aref mb k col))))
    (addq ($ 8) (% save0))                  ;   [448]
L445
    (movq (@ -64 (% rbp)) (% arg_z))        ;   [452]
    (cmpq (% arg_z) (% save0))              ;   [456]
    (jl L199)                               ;   [459]

;;; (dotimes (col cols) (dotimes (k cols) (incf (aref result row col) (* (aref ma row k) (aref mb k col)
    (addq ($ 8) (% save1))                  ;   [465]
L462
    (movq (@ -64 (% rbp)) (% arg_z))        ;   [469]
    (cmpq (% arg_z) (% save1))              ;   [473]
    (jl L191)                               ;   [476]

;;; (dotimes (row rows) (dotimes (col cols) (dotimes (k cols) (incf (aref result row col) (* (aref ma ro
    (addq ($ 8) (% save2))                  ;   [482]
L479
    (movq (@ -56 (% rbp)) (% arg_z))        ;   [486]
    (cmpq (% arg_z) (% save2))              ;   [490]
    (jl L183)                               ;   [493]

;;; (let ((result (make-matrix rows cols))) (declare (type (simple-array single-float (* *)) result)) (d
    (movq (% save3) (% arg_z))              ;   [499]
    (addq ($ 16) (% rsp))                   ;   [502]
    (popq (% save3))                        ;   [506]
    (popq (% save2))                        ;   [508]
    (popq (% save1))                        ;   [510]
    (popq (% save0))                        ;   [512]
    (movq (@ (% gs) #x178) (% imm0))        ;   [514]
    (movq (@ 8 (% imm0)) (% stack-temp))    ;   [523]
    (movq (@ (% imm0)) (% imm0))            ;   [527]
    (movq (% imm0) (@ (% gs) 80))           ;   [530]
    (movq (% stack-temp) (@ (% gs) #x178))  ;   [539]
    (leaveq)                                ;   [548]
    (retq)                                  ;   [549]

これアセンブラ部分が意図的にコメントになってないけど、このままREPLに入れると動くんだろうか?

CL-USER> (recover-fn-from-rip) 
; Evaluation aborted on #<CCL::UNDEFINED-FUNCTION-CALL #x3020018BF73D>.

ダメらしい。

caching-gemm

中間結果を変数にキャッシュするGEMM。

(defun caching-gemm (ma mb)
  (declare (optimize (speed 3) (debug 0) (safety 0)))
  (declare (type (simple-array single-float (* *)) ma mb))
  (let ((rows (array-dimension ma 0))
        (cols (array-dimension mb 1)))
    (declare (type fixnum rows cols))
    (let ((result (make-matrix rows cols)))
      (declare (type (simple-array single-float (* *)) result))
      (dotimes (row rows)
        (dotimes (col cols)
          (let ((cell (aref result row col)))
            (declare (type (single-float) cell))
            (dotimes (k cols)
              (incf cell
                    (* (aref ma row k) (aref mb k col))))
            (setf (aref result row col) cell))))
      result)))

SBCLの場合。 L2が多分最内ループで、SARしてるのなんだ? 後、幅とか先頭アドレスかを毎回ロードして計算しなおしてる?

CL-USER> (disassemble 'caching-gemm)
; disassembly for CACHING-GEMM
; Size: 272 bytes. Origin: #x1006D8D8F5
; 8F5:       4C894DE8         MOV [RBP-24], R9                ; no-arg-parsing entry point
; 8F9:       4C8945E0         MOV [RBP-32], R8
; 8FD:       4D8B6831         MOV R13, [R8+49]
; 901:       4D8B5139         MOV R10, [R9+57]
; 905:       4C896DF8         MOV [RBP-8], R13
; 909:       4C8955F0         MOV [RBP-16], R10
; 90D:       488D5C24F0       LEA RBX, [RSP-16]
; 912:       4883EC18         SUB RSP, 24
; 916:       498BD5           MOV RDX, R13
; 919:       498BFA           MOV RDI, R10
; 91C:       488B057DFFFFFF   MOV RAX, [RIP-131]              ; #<FDEFINITION for MAKE-MATRIX>
; 923:       B904000000       MOV ECX, 4
; 928:       48892B           MOV [RBX], RBP
; 92B:       488BEB           MOV RBP, RBX
; 92E:       FF5009           CALL QWORD PTR [RAX+9]
; 931:       480F42E3         CMOVB RSP, RBX
; 935:       4C8B45E0         MOV R8, [RBP-32]
; 939:       4C8B4DE8         MOV R9, [RBP-24]
; 93D:       4C8B55F0         MOV R10, [RBP-16]
; 941:       4C8B6DF8         MOV R13, [RBP-8]
; 945:       488BF2           MOV RSI, RDX
; 948:       31DB             XOR EBX, EBX
; 94A:       E9A4000000       JMP L5
; 94F:       90               NOP
; 950: L0:   31C0             XOR EAX, EAX
; 952:       E98F000000       JMP L4
; 957:       660F1F840000000000 NOP
; 960: L1:   488B4E39         MOV RCX, [RSI+57]
; 964:       488BFB           MOV RDI, RBX
; 967:       48D1FF           SAR RDI, 1
; 96A:       480FAFF9         IMUL RDI, RCX
; 96E:       4801C7           ADD RDI, RAX
; 971:       488B4E11         MOV RCX, [RSI+17]
; 975:       F30F104C7901     MOVSS XMM1, [RCX+RDI*2+1]
; 97B:       31C9             XOR ECX, ECX
; 97D:       EB43             JMP L3
; 97F:       90               NOP
; 980: L2:   498B5039         MOV RDX, [R8+57]
; 984:       488BFB           MOV RDI, RBX
; 987:       48D1FF           SAR RDI, 1
; 98A:       480FAFFA         IMUL RDI, RDX
; 98E:       4801CF           ADD RDI, RCX
; 991:       498B5011         MOV RDX, [R8+17]
; 995:       F30F10547A01     MOVSS XMM2, [RDX+RDI*2+1]
; 99B:       498B5139         MOV RDX, [R9+57]
; 99F:       488BF9           MOV RDI, RCX
; 9A2:       48D1FF           SAR RDI, 1
; 9A5:       480FAFFA         IMUL RDI, RDX
; 9A9:       4801C7           ADD RDI, RAX
; 9AC:       498B5111         MOV RDX, [R9+17]
; 9B0:       F30F105C7A01     MOVSS XMM3, [RDX+RDI*2+1]
; 9B6:       F30F59DA         MULSS XMM3, XMM2
; 9BA:       F30F58CB         ADDSS XMM1, XMM3
; 9BE:       4883C102         ADD RCX, 2
; 9C2: L3:   4C39D1           CMP RCX, R10
; 9C5:       7CB9             JL L2
; 9C7:       488B4E39         MOV RCX, [RSI+57]
; 9CB:       488BFB           MOV RDI, RBX
; 9CE:       48D1FF           SAR RDI, 1
; 9D1:       480FAFF9         IMUL RDI, RCX
; 9D5:       4801C7           ADD RDI, RAX
; 9D8:       488B4E11         MOV RCX, [RSI+17]
; 9DC:       F30F114C7901     MOVSS [RCX+RDI*2+1], XMM1
; 9E2:       4883C002         ADD RAX, 2
; 9E6: L4:   4C39D0           CMP RAX, R10
; 9E9:       0F8C71FFFFFF     JL L1
; 9EF:       4883C302         ADD RBX, 2
; 9F3: L5:   4C39EB           CMP RBX, R13
; 9F6:       0F8C54FFFFFF     JL L0
; 9FC:       488BD6           MOV RDX, RSI
; 9FF:       488BE5           MOV RSP, RBP
; A02:       F8               CLC
; A03:       5D               POP RBP
; A04:       C3               RET

CCLの場合。 L255あたりが最内ループ。 途中にlisp-callとか入ってたり、stack-tempが多分レジスタスピルだと思う。

CL-USER> (disassemble 'caching-gemm)
;; "home:sandbox;common-lisp;simple-gemm.lisp.newest":1039-1729
    (recover-fn-from-rip)                   ;     [7]
    (pushq (% rbp))                         ;    [14]
    (movq (% rsp) (% rbp))                  ;    [15]
    (pushq (% arg_y))                       ;    [18]
    (pushq (% arg_z))                       ;    [19]
    (movq (@ (% gs) 80) (% stack-temp))     ;    [20]
    (subq ($ 64) (@ (% gs) 80))             ;    [29]
    (movq (@ (% gs) 80) (% imm0))           ;    [39]
    (movq (% stack-temp) (@ (% imm0)))      ;    [48]
    (movq (@ (% gs) #x178) (% stack-temp))  ;    [51]
    (movq (% stack-temp) (@ 8 (% imm0)))    ;    [60]
    (movq (% imm0) (@ (% gs) #x178))        ;    [64]
    (pushq (% save0))                       ;    [73]
    (pushq (% save1))                       ;    [75]
    (pushq (% save2))                       ;    [77]
    (pushq (% save3))                       ;    [79]

;;; (let ((rows (array-dimension ma 0)) (cols (array-dimension mb 1))) (declare (type fixnum rows cols))
    (xorl (% arg_z.l) (% arg_z.l))          ;    [81]
    (movl ($ 16) (% nargs))                 ;    [83]
    (movq (@ 'ARRAY-DIMENSION (% fn)) (% temp0)) ;    [88]
    (lisp-call (@ 10 (% temp0)))            ;    [97]
    (recover-fn-from-rip)                   ;   [100]
    (movq (% arg_z) (% arg_x))              ;   [107]
    (pushq (% arg_x))                       ;   [110]
    (movq (@ -16 (% rbp)) (% arg_y))        ;   [112]
    (movl ($ 8) (% arg_z.l))                ;   [116]
    (movl ($ 16) (% nargs))                 ;   [121]
    (movq (@ 'ARRAY-DIMENSION (% fn)) (% temp0)) ;   [126]
    (lisp-call (@ 10 (% temp0)))            ;   [137]
    (recover-fn-from-rip)                   ;   [140]
    (movq (% arg_z) (% save3))              ;   [147]

;;; (let ((result (make-matrix rows cols))) (declare (type (simple-array single-float (* *)) result)) (d
    (movq (@ -56 (% rbp)) (% arg_y))        ;   [150]
    (movq (% save3) (% arg_z))              ;   [154]
    (movl ($ 16) (% nargs))                 ;   [157]
    (movq (@ 'MAKE-MATRIX (% fn)) (% temp0)) ;   [162]
    (lisp-call (@ 10 (% temp0)))            ;   [169]
    (recover-fn-from-rip)                   ;   [172]
    (pushq (% arg_z))                       ;   [179]

;;; (dotimes (row rows) (dotimes (col cols) (let ((cell (aref result row col))) (declare (type (single-f
    (xorl (% save2.l) (% save2.l))          ;   [180]
    (jmpq L562)                             ;   [183]

;;; (dotimes (col cols) (let ((cell (aref result row col))) (declare (type (single-float) cell)) (dotime
L181
    (xorl (% save1.l) (% save1.l))          ;   [188]
    (jmpq L549)                             ;   [191]

;;; (let ((cell (aref result row col))) (declare (type (single-float) cell)) (dotimes (k cols) (incf cel
L189
    (movq (@ -64 (% rbp)) (% arg_x))        ;   [196]
    (movq (@ 43 (% arg_x)) (% imm0))        ;   [200]
    (sarq ($ 3) (% imm0))                   ;   [204]
    (imulq (% save2) (% imm0))              ;   [208]
    (leaq (@ (% save1) (% imm0)) (% temp0)) ;   [212]
    (movq (@ 11 (% arg_x)) (% arg_z))       ;   [216]
    (movq (% temp0) (% imm0))               ;   [220]
    (shrq (% imm0))                         ;   [223]
    (movss (@ -5 (% arg_z) (% imm0)) (% fp0)) ;   [226]
    (movd (% imm0) (% stack-temp))          ;   [232]
    (movq (@ (% gs) #x178) (% imm0))        ;   [236]
    (movss (% fp0) (@ 16 (% imm0)))         ;   [245]
    (movd (% stack-temp) (% imm0))          ;   [250]

;;; (dotimes (k cols) (incf cell (* (aref ma row k) (aref mb k col))))
    (xorl (% save0.l) (% save0.l))          ;   [254]
    (jmpq L471)                             ;   [257]

;;; (aref ma row k)
L255
    (movq (@ -8 (% rbp)) (% arg_x))         ;   [262]
    (movq (@ 43 (% arg_x)) (% imm0))        ;   [266]
    (sarq ($ 3) (% imm0))                   ;   [270]
    (imulq (% save2) (% imm0))              ;   [274]
    (leaq (@ (% save0) (% imm0)) (% temp0)) ;   [278]
    (movq (@ 11 (% arg_x)) (% arg_z))       ;   [282]
    (movq (% temp0) (% imm0))               ;   [286]
    (shrq (% imm0))                         ;   [289]
    (movss (@ -5 (% arg_z) (% imm0)) (% fp0)) ;   [292]

;;; (aref mb k col)
    (movq (@ -16 (% rbp)) (% arg_z))        ;   [298]
    (movq (@ 43 (% arg_z)) (% imm0))        ;   [302]
    (sarq ($ 3) (% imm0))                   ;   [306]
    (imulq (% save0) (% imm0))              ;   [310]
    (leaq (@ (% save1) (% imm0)) (% temp0)) ;   [314]
    (movq (@ 11 (% arg_z)) (% arg_z))       ;   [318]
    (movq (% temp0) (% imm0))               ;   [322]
    (shrq (% imm0))                         ;   [325]
    (movss (@ -5 (% arg_z) (% imm0)) (% fp1)) ;   [328]

;;; (* (aref ma row k) (aref mb k col))
    (mulss (% fp1) (% fp0))                 ;   [334]

;;; (incf cell (* (aref ma row k) (aref mb k col)))
    (movd (% imm0) (% stack-temp))          ;   [338]
    (movq (@ (% gs) #x178) (% imm0))        ;   [342]
    (movss (% fp0) (@ 32 (% imm0)))         ;   [351]
    (movd (% stack-temp) (% imm0))          ;   [356]
    (movd (% imm0) (% stack-temp))          ;   [360]
    (movq (@ (% gs) #x178) (% imm0))        ;   [364]
    (movss (@ 32 (% imm0)) (% fp1))         ;   [373]
    (movd (% stack-temp) (% imm0))          ;   [378]
    (movd (% imm0) (% stack-temp))          ;   [382]
    (movq (@ (% gs) #x178) (% imm0))        ;   [386]
    (movss (@ 16 (% imm0)) (% fp0))         ;   [395]
    (movd (% stack-temp) (% imm0))          ;   [400]
    (addss (% fp1) (% fp0))                 ;   [404]
    (movd (% imm0) (% stack-temp))          ;   [408]
    (movq (@ (% gs) #x178) (% imm0))        ;   [412]
    (movss (% fp0) (@ 48 (% imm0)))         ;   [421]
    (movd (% stack-temp) (% imm0))          ;   [426]
    (movd (% imm0) (% stack-temp))          ;   [430]
    (movq (@ (% gs) #x178) (% imm0))        ;   [434]
    (movss (@ 48 (% imm0)) (% fp0))         ;   [443]
    (movd (% stack-temp) (% imm0))          ;   [448]
    (movd (% imm0) (% stack-temp))          ;   [452]
    (movq (@ (% gs) #x178) (% imm0))        ;   [456]
    (movss (% fp0) (@ 16 (% imm0)))         ;   [465]
    (movd (% stack-temp) (% imm0))          ;   [470]

;;; (dotimes (k cols) (incf cell (* (aref ma row k) (aref mb k col))))
    (addq ($ 8) (% save0))                  ;   [474]
L471
    (cmpq (% save3) (% save0))              ;   [478]
    (jl L255)                               ;   [481]

;;; (setf (aref result row col) cell)
    (movd (% imm0) (% stack-temp))          ;   [487]
    (movq (@ (% gs) #x178) (% imm0))        ;   [491]
    (movss (@ 16 (% imm0)) (% fp0))         ;   [500]
    (movd (% stack-temp) (% imm0))          ;   [505]
    (movq (% save1) (% arg_y))              ;   [509]
    (movq (% save2) (% arg_x))              ;   [512]
    (movq (@ -64 (% rbp)) (% temp0))        ;   [515]
    (movq (@ 43 (% temp0)) (% imm0))        ;   [519]
    (sarq ($ 3) (% imm0))                   ;   [523]
    (imulq (% arg_x) (% imm0))              ;   [527]
    (leaq (@ (% arg_y) (% imm0)) (% arg_y)) ;   [531]
    (movq (@ 11 (% temp0)) (% arg_x))       ;   [535]
    (movq (% arg_y) (% imm2))               ;   [539]
    (shrq (% imm2))                         ;   [542]
    (movss (% fp0) (@ -5 (% arg_x) (% imm2))) ;   [545]

;;; (dotimes (col cols) (let ((cell (aref result row col))) (declare (type (single-float) cell)) (dotime
    (addq ($ 8) (% save1))                  ;   [552]
L549
    (cmpq (% save3) (% save1))              ;   [556]
    (jl L189)                               ;   [559]

;;; (dotimes (row rows) (dotimes (col cols) (let ((cell (aref result row col))) (declare (type (single-f
    (addq ($ 8) (% save2))                  ;   [565]
L562
    (movq (@ -56 (% rbp)) (% arg_z))        ;   [569]
    (cmpq (% arg_z) (% save2))              ;   [573]
    (jl L181)                               ;   [576]

;;; (let ((result (make-matrix rows cols))) (declare (type (simple-array single-float (* *)) result)) (d
    (movq (@ -64 (% rbp)) (% arg_z))        ;   [582]
    (addq ($ 16) (% rsp))                   ;   [586]
    (popq (% save3))                        ;   [590]
    (popq (% save2))                        ;   [592]
    (popq (% save1))                        ;   [594]
    (popq (% save0))                        ;   [596]
    (movq (@ (% gs) #x178) (% imm0))        ;   [598]
    (movq (@ 8 (% imm0)) (% stack-temp))    ;   [607]
    (movq (@ (% imm0)) (% imm0))            ;   [611]
    (movq (% imm0) (@ (% gs) 80))           ;   [614]
    (movq (% stack-temp) (@ (% gs) #x178))  ;   [623]
    (leaveq)                                ;   [632]
    (retq)                                  ;   [633]

row-major-gemm

arefが何となく遅そうなので、row-major-arefで書き直した。

(defun row-major-gemm (ma mb)
  (declare (optimize (speed 3) (debug 0) (safety 0)))
  (declare (type (simple-array single-float (* *)) ma mb))
  (let ((rows (array-dimension ma 0))
        (cols (array-dimension mb 1)))
    (declare (type fixnum rows cols))
    (let ((result (make-matrix rows cols)))
      (declare (type (simple-array single-float (* *)) result))
      (dotimes (row rows)
        (dotimes (col cols)
          (let ((cell (aref result row col))
                (ma-index (array-row-major-index ma row 0))
                (mb-index (array-row-major-index mb 0 col)))
            (declare (type (single-float) cell))
            (declare (type fixnum ma-index mb-index))
            (dotimes (k cols)
              (incf cell (* (row-major-aref ma ma-index)
                            (row-major-aref mb mb-index)))
              (incf ma-index)
              (incf mb-index cols))
            (setf (aref result row col) cell))))
      result)))

SBCLの場合。 L2が最内ループで、indexの掛け算を消せた気がするけど、先頭アドレスのロードらしきやつが消せない。

CL-USER> (disassemble 'row-major-gemm)
; disassembly for ROW-MAJOR-GEMM
; Size: 266 bytes. Origin: #x1003296715
; 715:       4C8975E8         MOV [RBP-24], R14               ; no-arg-parsing entry point
; 719:       4C896DE0         MOV [RBP-32], R13
; 71D:       4D8B7D31         MOV R15, [R13+49]
; 721:       4D8B4E39         MOV R9, [R14+57]
; 725:       4C897DF8         MOV [RBP-8], R15
; 729:       4C894DF0         MOV [RBP-16], R9
; 72D:       488D5C24F0       LEA RBX, [RSP-16]
; 732:       4883EC18         SUB RSP, 24
; 736:       498BD7           MOV RDX, R15
; 739:       498BF9           MOV RDI, R9
; 73C:       488B057DFFFFFF   MOV RAX, [RIP-131]              ; #<FDEFINITION for MAKE-MATRIX>
; 743:       B904000000       MOV ECX, 4
; 748:       48892B           MOV [RBX], RBP
; 74B:       488BEB           MOV RBP, RBX
; 74E:       FF5009           CALL QWORD PTR [RAX+9]
; 751:       480F42E3         CMOVB RSP, RBX
; 755:       4C8B6DE0         MOV R13, [RBP-32]
; 759:       4C8B75E8         MOV R14, [RBP-24]
; 75D:       4C8B4DF0         MOV R9, [RBP-16]
; 761:       4C8B7DF8         MOV R15, [RBP-8]
; 765:       488BDA           MOV RBX, RDX
; 768:       31C9             XOR ECX, ECX
; 76A:       E99E000000       JMP L5
; 76F:       90               NOP
; 770: L0:   31C0             XOR EAX, EAX
; 772:       E989000000       JMP L4
; 777:       660F1F840000000000 NOP
; 780: L1:   488B5339         MOV RDX, [RBX+57]
; 784:       488BF1           MOV RSI, RCX
; 787:       48D1FE           SAR RSI, 1
; 78A:       480FAFF2         IMUL RSI, RDX
; 78E:       4801C6           ADD RSI, RAX
; 791:       488B5311         MOV RDX, [RBX+17]
; 795:       F30F104C7201     MOVSS XMM1, [RDX+RSI*2+1]
; 79B:       498B5539         MOV RDX, [R13+57]
; 79F:       488BF9           MOV RDI, RCX
; 7A2:       48D1FF           SAR RDI, 1
; 7A5:       480FAFFA         IMUL RDI, RDX
; 7A9:       4C8BC0           MOV R8, RAX
; 7AC:       31F6             XOR ESI, ESI
; 7AE:       EB2C             JMP L3
; 7B0: L2:   498B5511         MOV RDX, [R13+17]
; 7B4:       F30F10547A01     MOVSS XMM2, [RDX+RDI*2+1]
; 7BA:       498B5611         MOV RDX, [R14+17]
; 7BE:       F3420F105C4201   MOVSS XMM3, [RDX+R8*2+1]
; 7C5:       F30F59DA         MULSS XMM3, XMM2
; 7C9:       F30F58CB         ADDSS XMM1, XMM3
; 7CD:       4883C702         ADD RDI, 2
; 7D1:       4F8D1401         LEA R10, [R9+R8]
; 7D5:       4D8BC2           MOV R8, R10
; 7D8:       4883C602         ADD RSI, 2
; 7DC: L3:   4C39CE           CMP RSI, R9
; 7DF:       7CCF             JL L2
; 7E1:       488B5339         MOV RDX, [RBX+57]
; 7E5:       488BF1           MOV RSI, RCX
; 7E8:       48D1FE           SAR RSI, 1
; 7EB:       480FAFF2         IMUL RSI, RDX
; 7EF:       4801C6           ADD RSI, RAX
; 7F2:       488B5311         MOV RDX, [RBX+17]
; 7F6:       F30F114C7201     MOVSS [RDX+RSI*2+1], XMM1
; 7FC:       4883C002         ADD RAX, 2
; 800: L4:   4C39C8           CMP RAX, R9
; 803:       0F8C77FFFFFF     JL L1
; 809:       4883C102         ADD RCX, 2
; 80D: L5:   4C39F9           CMP RCX, R15
; 810:       0F8C5AFFFFFF     JL L0
; 816:       488BD3           MOV RDX, RBX
; 819:       488BE5           MOV RSP, RBP
; 81C:       F8               CLC
; 81D:       5D               POP RBP
; 81E:       C3               RET

一応実測する。

CL-USER> (setf *N* 256) (run #'simple-gemm 100) (run #'on-register-gemm 100) (run #'row-major-gemm 100)
Evaluation took:
  6.159 seconds of real time
  6.164000 seconds of total run time (6.164000 user, 0.000000 system)
  [ Run times consist of 0.004 seconds GC time, and 6.160 seconds non-GC time. ]
  100.08% CPU
  14,747,074,008 processor cycles
  26,281,520 bytes consed
  
Evaluation took:
  3.955 seconds of real time
  3.960000 seconds of total run time (3.960000 user, 0.000000 system)
  100.13% CPU
  9,469,087,053 processor cycles
  26,216,000 bytes consed
  
Evaluation took:
  3.003 seconds of real time
  3.008000 seconds of total run time (3.008000 user, 0.000000 system)
  [ Run times consist of 0.004 seconds GC time, and 3.004 seconds non-GC time. ]
  100.17% CPU
  7,192,309,782 processor cycles
  26,216,000 bytes consed

unroll無しでon-register-gemmよりは速くなってるけど、やっぱりCと同レベルなコードにはならなそう。

参考ページ