Raspberry Pi 3 Model B買ってCommon Lisp入れた
Raspberry Pi 3 Model B - Raspberry Pi を購入したのでCommon Lispを動かしてみたメモ。
NOOBSのSDカード作成
PC上で Download NOOBS for Raspberry Pi から NOOBS LITE v1.9をダウンロードし、 NOOBS For Raspberry Pi – Rants & Raves – The Blog! を参考にしてSDカードを作成する。
Raspbianインストール
よく分からないので公式のRaspbianをインストールする。
どこかにありそうな情報を取得
lscpu
Architecture: armv7l Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 Model name: ARMv7 Processor rev 4 (v7l) CPU max MHz: 1200.0000 CPU min MHz: 600.0000
uname -a
Linux raspberrypi 4.4.11-v7+ #888 SMP Mon May 23 20:10:33 BST 2016 armv7l GNU/Linux
公式には ARMv8
て書いてあるけど v7 なのか?
Linuxbrewインストール
sudo apt-get install build-essential curl git python-setuptools ruby ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Linuxbrew/install/master/install)" echo 'Pexport ATH="$HOME/.linuxbrew/bin:$PATH"' >> .profile . ~/.profile brew install hello
動かない。。。
==> Downloading https://ftpmirror.gnu.org/hello/hello-2.10.tar.gz ==> Downloading from http://ftp.jaist.ac.jp/pub/GNU/hello/hello-2.10.tar.gz ######################################################################## 100.0% ==> ./configure --disable-silent-rules --prefix=/home/pi/.linuxbrew/Cellar/hello/2.10 *** Error in `/usr/bin/gcc-4.9': double free or corruption (top): 0x01e66718 *** Last 15 lines from /home/pi/.cache/Homebrew/Logs/hello/01.configure: --prefix=/home/pi/.linuxbrew/Cellar/hello/2.10 configure: WARNING: unrecognized options: --disable-debug checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for a thread-safe mkdir -p... /bin/mkdir -p checking for gawk... no checking for mawk... mawk checking whether make sets $(MAKE)... yes checking whether make supports nested variables... yes checking for gcc... /usr/bin/gcc-4.9 checking whether the C compiler works... no configure: error: in `/tmp/hello-20160910-2414-1lq40k3/hello-2.10': configure: error: C compiler cannot create executables See `config.log' for more details READ THIS: https://github.com/Linuxbrew/brew/blob/master/share/doc/homebrew/Troubleshooting.md#troubleshooting If reporting this issue please do so at (not Homebrew/brew): https://github.com/Linuxbrew/homebrew-core/issues
C compiler cannot create executables
って何だ?
double free or corruption
の方か?
alternativesを使ってgccを4.8に変えてみたが、linuxbrewの使うgccバージョンが4.9から変更されなかったのでとりあえず諦める。
Roswellインストール
linuxbrewを使わず直接roswellをビルドする。
sudo apt-get install autoconf automake libcurl4-openssl-dev git clone https://github.com/roswell/roswell.git cd roswell ./bootstrap ./configure make make install ros init ros --version
roswell 0.0.6.65(00f7451)
処理系色々インストール
SBCL
ros install sbcl-bin ros use sbcl-bin ros run -- --version
SBCL 1.3.9
CCL
ros install ccl-bin ros use ccl-bin ros run -- --version
Version 1.11-r16635 (LinuxARM32)
ECL
ros install ecl ros use ecl ros run -- --version
ECL 16.1.2
CMUL
ros install cmu-bin
Installing cmu-bin/21a... Downloading archive:https://common-lisp.net/project/cmucl/downloads/release/21a/cmucl-21a-armhf-linux.tar.bz2 Downloading https://common-lisp.net/project/cmucl/downloads/release/21a/cmucl-21a-armhf-linux.tar.bz2 Unhandled UIOP/RUN-PROGRAM:SUBPROCESS-ERROR: Subprocess with command "/usr/local/bin/ros roswell-internal-use download https://common-lisp.net/project/cmucl/downloads/release/21a/cmucl-21a-armhf-linux.tar.bz2 /home/pi/.roswell/archives/cmucl-21a-armhf-linux.tar.bz2" exited with error code 2 Backtrace for: #<SB-THREAD:THREAD "main thread" RUNNING {52D82811}> 0: (SB-DEBUG::DEBUGGER-DISABLED-HOOK #<UIOP/RUN-PROGRAM:SUBPROCESS-ERROR {52A38E91}> #<unavailable argument>) 1: (SB-DEBUG::RUN-HOOK SB-EXT:*INVOKE-DEBUGGER-HOOK* #<UIOP/RUN-PROGRAM:SUBPROCESS-ERROR {52A38E91}>) 2: (INVOKE-DEBUGGER #<UIOP/RUN-PROGRAM:SUBPROCESS-ERROR {52A38E91}>) 3: (CERROR "IGNORE-ERROR-STATUS" UIOP/RUN-PROGRAM:SUBPROCESS-ERROR :COMMAND "/usr/local/bin/ros roswell-internal-use download https://common-lisp.net/project/cmucl/downloads/release/21a/cmucl-21a-armhf-linux.tar.bz2 /home/pi/.roswell/archives/cmucl-21a-armhf-linux.tar.bz2" :CODE 2 :PROCESS NIL) 4: (UIOP/RUN-PROGRAM::%CHECK-RESULT 2 :COMMAND "/usr/local/bin/ros roswell-internal-use download https://common-lisp.net/project/cmucl/downloads/release/21a/cmucl-21a-armhf-linux.tar.bz2 /home/pi/.roswell/archives/cmucl-21a-armhf-linux.tar.bz2" :PROCESS NIL :IGNORE-ERROR-STATUS NIL) 5: ((LAMBDA (UIOP/RUN-PROGRAM::REDUCED-INPUT #:G7867) :IN UIOP/RUN-PROGRAM::%USE-SYSTEM) :INTERACTIVE #<unavailable argument>) 6: (UIOP/RUN-PROGRAM::%USE-SYSTEM "/usr/local/bin/ros roswell-internal-use download https://common-lisp.net/project/cmucl/downloads/release/21a/cmucl-21a-armhf-linux.tar.bz2 /home/pi/.roswell/archives/cmucl-21a-armhf-linux.tar.bz2" :INPUT :INTERACTIVE :ERROR-OUTPUT :INTERACTIVE :IF-INPUT-DOES-NOT-EXIST :ERROR :IF-OUTPUT-EXISTS :OVERWRITE :IF-ERROR-OUTPUT-EXISTS :OVERWRITE :ELEMENT-TYPE :DEFAULT :EXTERNAL-FORMAT :UTF-8 :OUTPUT :INTERACTIVE :ERROR-OUTPUT :INTERACTIVE) 7: (ROS:ROSWELL ("roswell-internal-use" "download" "https://common-lisp.net/project/cmucl/downloads/release/21a/cmucl-21a-armhf-linux.tar.bz2" #P"/home/pi/.roswell/archives/cmucl-21a-armhf-linux.tar.bz2") :INTERACTIVE NIL) 8: (ROS.INSTALL::CMU-BIN-DOWNLOAD (:TARGET "cmu-bin" :VERSION "21a" :ARGV NIL)) 9: (INSTALL-IMPL "cmu-bin" NIL NIL) 10: (INSTALL-IMPL-IF-PROBED "cmu-bin" NIL NIL) 11: (MAIN #<unused argument> "cmu-bin") 12: (SB-INT:SIMPLE-EVAL-IN-LEXENV (APPLY (QUOTE MAIN) ROS:*ARGV*) #<NULL-LEXENV>) 13: (SB-INT:SIMPLE-EVAL-IN-LEXENV (ROS:QUIT (APPLY (QUOTE MAIN) ROS:*ARGV*)) #<NULL-LEXENV>) 14: (SB-EXT:EVAL-TLF (ROS:QUIT (APPLY (QUOTE MAIN) ROS:*ARGV*)) NIL NIL) 15: ((LABELS SB-FASL::EVAL-FORM :IN SB-INT:LOAD-AS-SOURCE) (ROS:QUIT (APPLY (QUOTE MAIN) ROS:*ARGV*)) NIL) 16: (SB-INT:LOAD-AS-SOURCE #<CONCATENATED-STREAM :STREAMS NIL {52C0C2A1}> :VERBOSE NIL :PRINT NIL :CONTEXT "loading") 17: ((FLET SB-FASL::LOAD-STREAM :IN LOAD) #<CONCATENATED-STREAM :STREAMS NIL {52C0C2A1}> NIL) 18: (LOAD #<CONCATENATED-STREAM :STREAMS NIL {52C0C2A1}> :VERBOSE NIL :PRINT NIL :IF-DOES-NOT-EXIST T :EXTERNAL-FORMAT :DEFAULT) 19: ((FLET ROS::BODY :IN ROS:SCRIPT) #<SB-SYS:FD-STREAM for "file /usr/local/etc/roswell/install.ros" {52C0C179}>) 20: (ROS:SCRIPT :SCRIPT "/usr/local/etc/roswell/install.ros" "install" "cmu-bin") 21: (ROS:RUN ((:SCRIPT "/usr/local/etc/roswell/install.ros" "install" "cmu-bin") (:QUIT NIL))) 22: (SB-INT:SIMPLE-EVAL-IN-LEXENV (ROS:RUN (QUOTE ((:SCRIPT "/usr/local/etc/roswell/install.ros" "install" "cmu-bin") (:QUIT NIL)))) #<NULL-LEXENV>) 23: (EVAL (ROS:RUN (QUOTE ((:SCRIPT "/usr/local/etc/roswell/install.ros" "install" "cmu-bin") (:QUIT NIL))))) 24: (SB-IMPL::PROCESS-EVAL/LOAD-OPTIONS ((:EVAL . "(progn #-ros.init(cl:load \"/usr/local/etc/roswell/init.lisp\"))") (:EVAL . "(ros:quicklisp)") (:EVAL . "(ros:run '((:script \"/usr/local/etc/roswell/install.ros\"\"install\"\"cmu-bin\")(:quit ())))"))) 25: (SB-IMPL::TOPLEVEL-INIT) 26: ((FLET #:WITHOUT-INTERRUPTS-BODY-72 :IN SB-EXT:SAVE-LISP-AND-DIE)) 27: ((LABELS SB-IMPL::RESTART-LISP :IN SB-EXT:SAVE-LISP-AND-DIE)) unhandled condition in --disable-debugger mode, quitting
ABCL
ros install abcl-bin ros use abcl-bin
Invalid maximum heap size: -Xmx4g The specified size exceeds the maximum representable size. Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. Error: unable to use 'abcl-bin'
4G固定なのかな?
CLISP
ros install clisp
Installing clisp/2.49... prefix: #P"/home/pi/.roswell/impls/armhf/linux/clisp/2.49/" chdir /home/pi/.roswell/src/clisp-2.49/src/ Unhandled UIOP/RUN-PROGRAM:SUBPROCESS-ERROR: Subprocess with command "make install" exited with error code 2 Backtrace for: #<SB-THREAD:THREAD "main thread" RUNNING {52D85581}> 0: (SB-DEBUG::DEBUGGER-DISABLED-HOOK #<UIOP/RUN-PROGRAM:SUBPROCESS-ERROR {514A8129}> #<unavailable argument>) 1: (SB-DEBUG::RUN-HOOK SB-EXT:*INVOKE-DEBUGGER-HOOK* #<UIOP/RUN-PROGRAM:SUBPROCESS-ERROR {514A8129}>) 2: (INVOKE-DEBUGGER #<UIOP/RUN-PROGRAM:SUBPROCESS-ERROR {514A8129}>) 3: (CERROR "IGNORE-ERROR-STATUS" UIOP/RUN-PROGRAM:SUBPROCESS-ERROR :COMMAND "make install" :CODE 2 :PROCESS NIL) 4: (UIOP/RUN-PROGRAM::%CHECK-RESULT 2 :COMMAND "make install" :PROCESS NIL :IGNORE-ERROR-STATUS NIL) 5: ((LAMBDA (UIOP/RUN-PROGRAM::REDUCED-INPUT #:G7867) :IN UIOP/RUN-PROGRAM::%USE-SYSTEM) NIL #<unavailable argument>) 6: ((FLET #:BEFORE7398 :IN UIOP/RUN-PROGRAM::%CALL-WITH-PROGRAM-IO) #P"/tmp/tmpZWLLOVAH.tmp") 7: (UIOP/STREAM:CALL-WITH-TEMPORARY-FILE #<CLOSURE (FLET #:BEFORE7398 :IN UIOP/RUN-PROGRAM::%CALL-WITH-PROGRAM-IO) {72593B95}> :WANT-STREAM-P NIL :WANT-PATHNAME-P T :DIRECTION :IO :KEEP NIL :AFTER NIL :DIRECTORY NIL :TYPE "tmp" :PREFIX NIL :SUFFIX NIL :ELEMENT-TYPE NIL :EXTERNAL-FORMAT NIL) 8: ((LABELS UIOP/RUN-PROGRAM::HARD-CASE :IN UIOP/RUN-PROGRAM::%CALL-WITH-PROGRAM-IO)) 9: (UIOP/RUN-PROGRAM::%USE-SYSTEM "make install" :INPUT NIL :ERROR-OUTPUT NIL :IF-INPUT-DOES-NOT-EXIST :ERROR :IF-OUTPUT-EXISTS :OVERWRITE :IF-ERROR-OUTPUT-EXISTS :OVERWRITE :ELEMENT-TYPE :DEFAULT :EXTERNAL-FORMAT :UTF-8 :OUTPUT T) 10: (ROS.INSTALL::CLISP-INSTALL (:TARGET "clisp" :VERSION "2.49" :ARGV NIL)) 11: (INSTALL-IMPL "clisp" NIL NIL) 12: (INSTALL-IMPL-IF-PROBED "clisp" NIL NIL) 13: (MAIN #<unused argument> "clisp") 14: (SB-INT:SIMPLE-EVAL-IN-LEXENV (APPLY (QUOTE MAIN) ROS:*ARGV*) #<NULL-LEXENV>) 15: (SB-INT:SIMPLE-EVAL-IN-LEXENV (ROS:QUIT (APPLY (QUOTE MAIN) ROS:*ARGV*)) #<NULL-LEXENV>) 16: (SB-EXT:EVAL-TLF (ROS:QUIT (APPLY (QUOTE MAIN) ROS:*ARGV*)) NIL NIL) 17: ((LABELS SB-FASL::EVAL-FORM :IN SB-INT:LOAD-AS-SOURCE) (ROS:QUIT (APPLY (QUOTE MAIN) ROS:*ARGV*)) NIL) 18: (SB-INT:LOAD-AS-SOURCE #<CONCATENATED-STREAM :STREAMS NIL {52C0B261}> :VERBOSE NIL :PRINT NIL :CONTEXT "loading") 19: ((FLET SB-FASL::LOAD-STREAM :IN LOAD) #<CONCATENATED-STREAM :STREAMS NIL {52C0B261}> NIL) 20: (LOAD #<CONCATENATED-STREAM :STREAMS NIL {52C0B261}> :VERBOSE NIL :PRINT NIL :IF-DOES-NOT-EXIST T :EXTERNAL-FORMAT :DEFAULT) 21: ((FLET ROS::BODY :IN ROS:SCRIPT) #<SB-SYS:FD-STREAM for "file /usr/local/etc/roswell/install.ros" {52C0B139}>) 22: (ROS:SCRIPT :SCRIPT "/usr/local/etc/roswell/install.ros" "install" "clisp") 23: (ROS:RUN ((:SCRIPT "/usr/local/etc/roswell/install.ros" "install" "clisp") (:QUIT NIL))) 24: (SB-INT:SIMPLE-EVAL-IN-LEXENV (ROS:RUN (QUOTE ((:SCRIPT "/usr/local/etc/roswell/install.ros" "install" "clisp") (:QUIT NIL)))) #<NULL-LEXENV>) 25: (EVAL (ROS:RUN (QUOTE ((:SCRIPT "/usr/local/etc/roswell/install.ros" "install" "clisp") (:QUIT NIL))))) 26: (SB-IMPL::PROCESS-EVAL/LOAD-OPTIONS ((:EVAL . "(progn #-ros.init(cl:load \"/usr/local/etc/roswell/init.lisp\"))") (:EVAL . "(ros:quicklisp)") (:EVAL . "(ros:run '((:script \"/usr/local/etc/roswell/install.ros\"\"install\"\"clisp\")(:quit ())))"))) 27: (SB-IMPL::TOPLEVEL-INIT) 28: ((FLET #:WITHOUT-INTERRUPTS-BODY-72 :IN SB-EXT:SAVE-LISP-AND-DIE)) 29: ((LABELS SB-IMPL::RESTART-LISP :IN SB-EXT:SAVE-LISP-AND-DIE)) unhandled condition in --disable-debugger mode, quitting
サムライト退職
8月でサムライト株式会社を退職しました。理由はお察し下さい。
1年チョットの在職中、前半はCommon LispとJavaScriptを書き、後半はCommon Lispを書いてました。
Webサービスの開発は初めてだったのでSQLに悩まされる日々ではありつつ良い経験だったと思う。
次の会社で何やるのかよく分からんけど、取り敢えずプライベートでCommon Lispを書きつつ、仕事もまだまだプログラマーを続けます。
MGLを触るためにとりあえず動作確認
Common Lispの機械学習ライブラリが一部で流行りなので、ちょっと触ってみた。
cl-cudaのquicklispへの登録が止まっているので、gitで直接もらってくる。
cd ~/.roswell/local-projects git clone https://github.com/takagi/cl-cuda.git git clone https://github.com/melisgl/mgl-mat.git git clone https://github.com/melisgl/mgl.git
apt-get install libblas-dev liblapack-dev
slimeを立ち上げmglをロードしようとすると、
(ql:quickload :mgl)
エラーとなる。
Component :OSICAT not found, required by #<SYSTEM "cl-cuda">
よく分からんけど、一度cl-cudaをロードしてからだと成功する。
(ql:quickload :cl-cuda) (ql:quickload :mgl)
とりあえずmglのテストを実行。
(in-package :mgl-test) (time (test))
なんかエラーになった。
arithmetic error FLOATING-POINT-INVALID-OPERATION signalled [Condition of type FLOATING-POINT-INVALID-OPERATION] Restarts: 0: [RETRY] Retry SLIME REPL evaluation request. 1: [*ABORT] Return to SLIME's top level. 2: [ABORT] abort thread (#<THREAD "repl-thread" RUNNING {10050A0033}>) Backtrace: 0: (SB-KERNEL:TWO-ARG-< #<SINGLE-FLOAT quiet NaN> 84.919205) 1: (SB-VM::GENERIC-<) 2: ((FLET MGL-CG::UPDATE3 :IN CG)) ...
NaNと比較した? もう一度やり直してみるも結果は変わらず。
mysqlのパスワードなしアカウント
ローカルの開発用mysqlでパスワードなしのアカウントを作る場合に、
update user set password='' where User='root';
というのを見かけたがやっても上手くいかず、describe user;
でフィールド見て、
update user set authentication_string='' where User='root';
とやったら出来たのだけど、バージョン違いだろうか?
Common Lispでdisassembleを使ったりチューニングしたり
Common Lispはdisassembleも付いてきてチューニングできるらしいとゆー話をよく聞くので、逆アセンブルしてみた。 題材としてはこの辺を参考に
- Common Lispで高速行列演算 | κeenのHappy Hacκing Blog
- Python (NumPy) と Common Lisp (LLA) で行列積の実行速度を比較する - 不確定特異点
caching-gemm
がバグってたので厳密には同じコードではなくなっている。
処理系のバージョンは次の通り
- SBCL 1.3.2
- CCL 1.11
simple-gemm
単純なGEMM。
(defun simple-gemm (ma mb) (declare (optimize (speed 3) (debug 0) (safety 0))) (declare (type (simple-array single-float (* *)) ma mb)) (let ((rows (array-dimension ma 0)) (cols (array-dimension mb 1))) (declare (type fixnum rows cols)) (let ((result (make-matrix rows cols))) (declare (type (simple-array single-float (* *)) result)) (dotimes (row rows) (dotimes (col cols) (dotimes (k cols) (incf (aref result row col) (* (aref ma row k) (aref mb k col)))))) result)))
SBCLの場合。 普通っぽい。
CL-USER> (disassemble 'simple-gemm) ; disassembly for SIMPLE-GEMM ; Size: 283 bytes. Origin: #x1004C5CA65 ; A65: 4C894DE8 MOV [RBP-24], R9 ; no-arg-parsing entry point ; A69: 4C8945E0 MOV [RBP-32], R8 ; A6D: 4D8B6831 MOV R13, [R8+49] ; A71: 4D8B5139 MOV R10, [R9+57] ; A75: 4C896DF8 MOV [RBP-8], R13 ; A79: 4C8955F0 MOV [RBP-16], R10 ; A7D: 488D5C24F0 LEA RBX, [RSP-16] ; A82: 4883EC18 SUB RSP, 24 ; A86: 498BD5 MOV RDX, R13 ; A89: 498BFA MOV RDI, R10 ; A8C: 488B057DFFFFFF MOV RAX, [RIP-131] ; #<FDEFINITION for MAKE-MATRIX> ; A93: B904000000 MOV ECX, 4 ; A98: 48892B MOV [RBX], RBP ; A9B: 488BEB MOV RBP, RBX ; A9E: FF5009 CALL QWORD PTR [RAX+9] ; AA1: 480F42E3 CMOVB RSP, RBX ; AA5: 4C8B45E0 MOV R8, [RBP-32] ; AA9: 4C8B4DE8 MOV R9, [RBP-24] ; AAD: 4C8B55F0 MOV R10, [RBP-16] ; AB1: 4C8B6DF8 MOV R13, [RBP-8] ; AB5: 488BF2 MOV RSI, RDX ; AB8: 31DB XOR EBX, EBX ; ABA: E9AF000000 JMP L5 ; ABF: 90 NOP ; AC0: L0: 31C0 XOR EAX, EAX ; AC2: E99A000000 JMP L4 ; AC7: 660F1F840000000000 NOP ; AD0: L1: 31C9 XOR ECX, ECX ; AD2: E981000000 JMP L3 ; AD7: 660F1F840000000000 NOP ; AE0: L2: 498B5039 MOV RDX, [R8+57] ; AE4: 488BFB MOV RDI, RBX ; AE7: 48D1FF SAR RDI, 1 ; AEA: 480FAFFA IMUL RDI, RDX ; AEE: 4801CF ADD RDI, RCX ; AF1: 498B5011 MOV RDX, [R8+17] ; AF5: F30F104C7A01 MOVSS XMM1, [RDX+RDI*2+1] ; AFB: 498B5139 MOV RDX, [R9+57] ; AFF: 488BF9 MOV RDI, RCX ; B02: 48D1FF SAR RDI, 1 ; B05: 480FAFFA IMUL RDI, RDX ; B09: 4801C7 ADD RDI, RAX ; B0C: 498B5111 MOV RDX, [R9+17] ; B10: F30F10547A01 MOVSS XMM2, [RDX+RDI*2+1] ; B16: F30F59D1 MULSS XMM2, XMM1 ; B1A: 488B5639 MOV RDX, [RSI+57] ; B1E: 488BFB MOV RDI, RBX ; B21: 48D1FF SAR RDI, 1 ; B24: 480FAFFA IMUL RDI, RDX ; B28: 4801C7 ADD RDI, RAX ; B2B: 488B5611 MOV RDX, [RSI+17] ; B2F: F30F104C7A01 MOVSS XMM1, [RDX+RDI*2+1] ; B35: F30F58CA ADDSS XMM1, XMM2 ; B39: 488B5639 MOV RDX, [RSI+57] ; B3D: 488BFB MOV RDI, RBX ; B40: 48D1FF SAR RDI, 1 ; B43: 480FAFFA IMUL RDI, RDX ; B47: 4801C7 ADD RDI, RAX ; B4A: 488B5611 MOV RDX, [RSI+17] ; B4E: F30F114C7A01 MOVSS [RDX+RDI*2+1], XMM1 ; B54: 4883C102 ADD RCX, 2 ; B58: L3: 4C39D1 CMP RCX, R10 ; B5B: 7C83 JL L2 ; B5D: 4883C002 ADD RAX, 2 ; B61: L4: 4C39D0 CMP RAX, R10 ; B64: 0F8C66FFFFFF JL L1 ; B6A: 4883C302 ADD RBX, 2 ; B6E: L5: 4C39EB CMP RBX, R13 ; B71: 0F8C49FFFFFF JL L0 ; B77: 488BD6 MOV RDX, RSI ; B7A: 488BE5 MOV RSP, RBP ; B7D: F8 CLC ; B7E: 5D POP RBP ; B7F: C3 RET
CCLの場合。
対応するLispコードがコメントに付いてて分かりやすい。
S式風謎アセンブラよく見るとよく分からないの色々あるな。
$
が即値で%
が変数かな?
CCLのドキュメントちゃんと読まないと無理そう
CL-USER> (disassemble 'caching-gemm) ;; "home:sandbox;common-lisp;simple-gemm.lisp.newest":1039-1729 (recover-fn-from-rip) ; [7] (pushq (% rbp)) ; [14] (movq (% rsp) (% rbp)) ; [15] (pushq (% arg_y)) ; [18] (pushq (% arg_z)) ; [19] (movq (@ (% gs) 80) (% stack-temp)) ; [20] (subq ($ 64) (@ (% gs) 80)) ; [29] (movq (@ (% gs) 80) (% imm0)) ; [39] (movq (% stack-temp) (@ (% imm0))) ; [48] (movq (@ (% gs) #x178) (% stack-temp)) ; [51] (movq (% stack-temp) (@ 8 (% imm0))) ; [60] (movq (% imm0) (@ (% gs) #x178)) ; [64] (pushq (% save0)) ; [73] (pushq (% save1)) ; [75] (pushq (% save2)) ; [77] (pushq (% save3)) ; [79] ;;; (let ((rows (array-dimension ma 0)) (cols (array-dimension mb 1))) (declare (type fixnum rows cols)) (xorl (% arg_z.l) (% arg_z.l)) ; [81] (movl ($ 16) (% nargs)) ; [83] (movq (@ 'ARRAY-DIMENSION (% fn)) (% temp0)) ; [88] (lisp-call (@ 10 (% temp0))) ; [97] (recover-fn-from-rip) ; [100] (movq (% arg_z) (% arg_x)) ; [107] (pushq (% arg_x)) ; [110] (movq (@ -16 (% rbp)) (% arg_y)) ; [112] (movl ($ 8) (% arg_z.l)) ; [116] (movl ($ 16) (% nargs)) ; [121] (movq (@ 'ARRAY-DIMENSION (% fn)) (% temp0)) ; [126] (lisp-call (@ 10 (% temp0))) ; [137] (recover-fn-from-rip) ; [140] (movq (% arg_z) (% save3)) ; [147] ;;; (let ((result (make-matrix rows cols))) (declare (type (simple-array single-float (* *)) result)) (d (movq (@ -56 (% rbp)) (% arg_y)) ; [150] (movq (% save3) (% arg_z)) ; [154] (movl ($ 16) (% nargs)) ; [157] (movq (@ 'MAKE-MATRIX (% fn)) (% temp0)) ; [162] (lisp-call (@ 10 (% temp0))) ; [169] (recover-fn-from-rip) ; [172] (pushq (% arg_z)) ; [179] ;;; (dotimes (row rows) (dotimes (col cols) (let ((cell (aref result row col))) (declare (type (single-f (xorl (% save2.l) (% save2.l)) ; [180] (jmpq L562) ; [183] ;;; (dotimes (col cols) (let ((cell (aref result row col))) (declare (type (single-float) cell)) (dotime L181 (xorl (% save1.l) (% save1.l)) ; [188] (jmpq L549) ; [191] ;;; (let ((cell (aref result row col))) (declare (type (single-float) cell)) (dotimes (k cols) (incf cel L189 (movq (@ -64 (% rbp)) (% arg_x)) ; [196] (movq (@ 43 (% arg_x)) (% imm0)) ; [200] (sarq ($ 3) (% imm0)) ; [204] (imulq (% save2) (% imm0)) ; [208] (leaq (@ (% save1) (% imm0)) (% temp0)) ; [212] (movq (@ 11 (% arg_x)) (% arg_z)) ; [216] (movq (% temp0) (% imm0)) ; [220] (shrq (% imm0)) ; [223] (movss (@ -5 (% arg_z) (% imm0)) (% fp0)) ; [226] (movd (% imm0) (% stack-temp)) ; [232] (movq (@ (% gs) #x178) (% imm0)) ; [236] (movss (% fp0) (@ 16 (% imm0))) ; [245] (movd (% stack-temp) (% imm0)) ; [250] ;;; (dotimes (k cols) (incf cell (* (aref ma row k) (aref mb k col)))) (xorl (% save0.l) (% save0.l)) ; [254] (jmpq L471) ; [257] ;;; (aref ma row k) L255 (movq (@ -8 (% rbp)) (% arg_x)) ; [262] (movq (@ 43 (% arg_x)) (% imm0)) ; [266] (sarq ($ 3) (% imm0)) ; [270] (imulq (% save2) (% imm0)) ; [274] (leaq (@ (% save0) (% imm0)) (% temp0)) ; [278] (movq (@ 11 (% arg_x)) (% arg_z)) ; [282] (movq (% temp0) (% imm0)) ; [286] (shrq (% imm0)) ; [289] (movss (@ -5 (% arg_z) (% imm0)) (% fp0)) ; [292] ;;; (aref mb k col) (movq (@ -16 (% rbp)) (% arg_z)) ; [298] (movq (@ 43 (% arg_z)) (% imm0)) ; [302] (sarq ($ 3) (% imm0)) ; [306] (imulq (% save0) (% imm0)) ; [310] (leaq (@ (% save1) (% imm0)) (% temp0)) ; [314] (movq (@ 11 (% arg_z)) (% arg_z)) ; [318] (movq (% temp0) (% imm0)) ; [322] (shrq (% imm0)) ; [325] (movss (@ -5 (% arg_z) (% imm0)) (% fp1)) ; [328] ;;; (* (aref ma row k) (aref mb k col)) (mulss (% fp1) (% fp0)) ; [334] ;;; (incf cell (* (aref ma row k) (aref mb k col))) (movd (% imm0) (% stack-temp)) ; [338] (movq (@ (% gs) #x178) (% imm0)) ; [342] (movss (% fp0) (@ 32 (% imm0))) ; [351] (movd (% stack-temp) (% imm0)) ; [356] (movd (% imm0) (% stack-temp)) ; [360] (movq (@ (% gs) #x178) (% imm0)) ; [364] (movss (@ 32 (% imm0)) (% fp1)) ; [373] (movd (% stack-temp) (% imm0)) ; [378] (movd (% imm0) (% stack-temp)) ; [382] (movq (@ (% gs) #x178) (% imm0)) ; [386] (movss (@ 16 (% imm0)) (% fp0)) ; [395] (movd (% stack-temp) (% imm0)) ; [400] (addss (% fp1) (% fp0)) ; [404] (movd (% imm0) (% stack-temp)) ; [408] (movq (@ (% gs) #x178) (% imm0)) ; [412] (movss (% fp0) (@ 48 (% imm0))) ; [421] (movd (% stack-temp) (% imm0)) ; [426] (movd (% imm0) (% stack-temp)) ; [430] (movq (@ (% gs) #x178) (% imm0)) ; [434] (movss (@ 48 (% imm0)) (% fp0)) ; [443] (movd (% stack-temp) (% imm0)) ; [448] (movd (% imm0) (% stack-temp)) ; [452] (movq (@ (% gs) #x178) (% imm0)) ; [456] (movss (% fp0) (@ 16 (% imm0))) ; [465] (movd (% stack-temp) (% imm0)) ; [470] ;;; (dotimes (k cols) (incf cell (* (aref ma row k) (aref mb k col)))) (addq ($ 8) (% save0)) ; [474] L471 (cmpq (% save3) (% save0)) ; [478] (jl L255) ; [481] ;;; (setf (aref result row col) cell) (movd (% imm0) (% stack-temp)) ; [487] (movq (@ (% gs) #x178) (% imm0)) ; [491] (movss (@ 16 (% imm0)) (% fp0)) ; [500] (movd (% stack-temp) (% imm0)) ; [505] (movq (% save1) (% arg_y)) ; [509] (movq (% save2) (% arg_x)) ; [512] (movq (@ -64 (% rbp)) (% temp0)) ; [515] (movq (@ 43 (% temp0)) (% imm0)) ; [519] (sarq ($ 3) (% imm0)) ; [523] (imulq (% arg_x) (% imm0)) ; [527] (leaq (@ (% arg_y) (% imm0)) (% arg_y)) ; [531] (movq (@ 11 (% temp0)) (% arg_x)) ; [535] (movq (% arg_y) (% imm2)) ; [539] (shrq (% imm2)) ; [542] (movss (% fp0) (@ -5 (% arg_x) (% imm2))) ; [545] ;;; (dotimes (col cols) (let ((cell (aref result row col))) (declare (type (single-float) cell)) (dotime (addq ($ 8) (% save1)) ; [552] L549 (cmpq (% save3) (% save1)) ; [556] (jl L189) ; [559] ;;; (dotimes (row rows) (dotimes (col cols) (let ((cell (aref result row col))) (declare (type (single-f (addq ($ 8) (% save2)) ; [565] L562 (movq (@ -56 (% rbp)) (% arg_z)) ; [569] (cmpq (% arg_z) (% save2)) ; [573] (jl L181) ; [576] ;;; (let ((result (make-matrix rows cols))) (declare (type (simple-array single-float (* *)) result)) (d (movq (@ -64 (% rbp)) (% arg_z)) ; [582] (addq ($ 16) (% rsp)) ; [586] (popq (% save3)) ; [590] (popq (% save2)) ; [592] (popq (% save1)) ; [594] (popq (% save0)) ; [596] (movq (@ (% gs) #x178) (% imm0)) ; [598] (movq (@ 8 (% imm0)) (% stack-temp)) ; [607] (movq (@ (% imm0)) (% imm0)) ; [611] (movq (% imm0) (@ (% gs) 80)) ; [614] (movq (% stack-temp) (@ (% gs) #x178)) ; [623] (leaveq) ; [632] (retq) ; [633] CL-USER> (disassemble 'simple-gemm) ;; "home:sandbox;common-lisp;simple-gemm.lisp.newest":479-1037 (recover-fn-from-rip) ; [7] (pushq (% rbp)) ; [14] (movq (% rsp) (% rbp)) ; [15] (pushq (% arg_y)) ; [18] (pushq (% arg_z)) ; [19] (movq (@ (% gs) 80) (% stack-temp)) ; [20] (subq ($ 48) (@ (% gs) 80)) ; [29] (movq (@ (% gs) 80) (% imm0)) ; [39] (movq (% stack-temp) (@ (% imm0))) ; [48] (movq (@ (% gs) #x178) (% stack-temp)) ; [51] (movq (% stack-temp) (@ 8 (% imm0))) ; [60] (movq (% imm0) (@ (% gs) #x178)) ; [64] (pushq (% save0)) ; [73] (pushq (% save1)) ; [75] (pushq (% save2)) ; [77] (pushq (% save3)) ; [79] ;;; (let ((rows (array-dimension ma 0)) (cols (array-dimension mb 1))) (declare (type fixnum rows cols)) (xorl (% arg_z.l) (% arg_z.l)) ; [81] (movl ($ 16) (% nargs)) ; [83] (movq (@ 'ARRAY-DIMENSION (% fn)) (% temp0)) ; [88] (lisp-call (@ 10 (% temp0))) ; [97] (recover-fn-from-rip) ; [100] (movq (% arg_z) (% arg_x)) ; [107] (pushq (% arg_x)) ; [110] (movq (@ -16 (% rbp)) (% arg_y)) ; [112] (movl ($ 8) (% arg_z.l)) ; [116] (movl ($ 16) (% nargs)) ; [121] (movq (@ 'ARRAY-DIMENSION (% fn)) (% temp0)) ; [126] (lisp-call (@ 10 (% temp0))) ; [137] (recover-fn-from-rip) ; [140] (pushq (% arg_z)) ; [147] ;;; (let ((result (make-matrix rows cols))) (declare (type (simple-array single-float (* *)) result)) (d (movq (@ -56 (% rbp)) (% arg_y)) ; [148] (movl ($ 16) (% nargs)) ; [152] (movq (@ 'MAKE-MATRIX (% fn)) (% temp0)) ; [157] (lisp-call (@ 10 (% temp0))) ; [169] (recover-fn-from-rip) ; [172] (movq (% arg_z) (% save3)) ; [179] ;;; (dotimes (row rows) (dotimes (col cols) (dotimes (k cols) (incf (aref result row col) (* (aref ma ro (xorl (% save2.l) (% save2.l)) ; [182] (jmpq L479) ; [185] ;;; (dotimes (col cols) (dotimes (k cols) (incf (aref result row col) (* (aref ma row k) (aref mb k col) L183 (xorl (% save1.l) (% save1.l)) ; [190] (jmpq L462) ; [193] ;;; (dotimes (k cols) (incf (aref result row col) (* (aref ma row k) (aref mb k col)))) L191 (xorl (% save0.l) (% save0.l)) ; [198] (jmpq L445) ; [201] ;;; (aref ma row k) L199 (movq (@ -8 (% rbp)) (% arg_x)) ; [206] (movq (@ 43 (% arg_x)) (% imm0)) ; [210] (sarq ($ 3) (% imm0)) ; [214] (imulq (% save2) (% imm0)) ; [218] (leaq (@ (% save0) (% imm0)) (% temp0)) ; [222] (movq (@ 11 (% arg_x)) (% arg_z)) ; [226] (movq (% temp0) (% imm0)) ; [230] (shrq (% imm0)) ; [233] (movss (@ -5 (% arg_z) (% imm0)) (% fp0)) ; [236] ;;; (aref mb k col) (movq (@ -16 (% rbp)) (% arg_z)) ; [242] (movq (@ 43 (% arg_z)) (% imm0)) ; [246] (sarq ($ 3) (% imm0)) ; [250] (imulq (% save0) (% imm0)) ; [254] (leaq (@ (% save1) (% imm0)) (% temp0)) ; [258] (movq (@ 11 (% arg_z)) (% arg_z)) ; [262] (movq (% temp0) (% imm0)) ; [266] (shrq (% imm0)) ; [269] (movss (@ -5 (% arg_z) (% imm0)) (% fp1)) ; [272] ;;; (* (aref ma row k) (aref mb k col)) (mulss (% fp1) (% fp0)) ; [278] ;;; (incf (aref result row col) (* (aref ma row k) (aref mb k col))) (movd (% imm0) (% stack-temp)) ; [282] (movq (@ (% gs) #x178) (% imm0)) ; [286] (movss (% fp0) (@ 16 (% imm0))) ; [295] (movd (% stack-temp) (% imm0)) ; [300] (movq (@ 43 (% save3)) (% imm0)) ; [304] (sarq ($ 3) (% imm0)) ; [308] (imulq (% save2) (% imm0)) ; [312] (leaq (@ (% save1) (% imm0)) (% temp0)) ; [316] (movq (@ 11 (% save3)) (% arg_z)) ; [320] (movq (% temp0) (% imm0)) ; [324] (shrq (% imm0)) ; [327] (movss (@ -5 (% arg_z) (% imm0)) (% fp0)) ; [330] (movd (% imm0) (% stack-temp)) ; [336] (movq (@ (% gs) #x178) (% imm0)) ; [340] (movss (@ 16 (% imm0)) (% fp1)) ; [349] (movd (% stack-temp) (% imm0)) ; [354] (addss (% fp1) (% fp0)) ; [358] (movd (% imm0) (% stack-temp)) ; [362] (movq (@ (% gs) #x178) (% imm0)) ; [366] (movss (% fp0) (@ 32 (% imm0))) ; [375] (movd (% stack-temp) (% imm0)) ; [380] (movd (% imm0) (% stack-temp)) ; [384] (movq (@ (% gs) #x178) (% imm0)) ; [388] (movss (@ 32 (% imm0)) (% fp0)) ; [397] (movd (% stack-temp) (% imm0)) ; [402] (movq (% save1) (% arg_y)) ; [406] (movq (% save2) (% arg_x)) ; [409] (movq (% save3) (% temp0)) ; [412] (movq (@ 43 (% temp0)) (% imm0)) ; [415] (sarq ($ 3) (% imm0)) ; [419] (imulq (% arg_x) (% imm0)) ; [423] (leaq (@ (% arg_y) (% imm0)) (% arg_y)) ; [427] (movq (@ 11 (% temp0)) (% arg_x)) ; [431] (movq (% arg_y) (% imm2)) ; [435] (shrq (% imm2)) ; [438] (movss (% fp0) (@ -5 (% arg_x) (% imm2))) ; [441] ;;; (dotimes (k cols) (incf (aref result row col) (* (aref ma row k) (aref mb k col)))) (addq ($ 8) (% save0)) ; [448] L445 (movq (@ -64 (% rbp)) (% arg_z)) ; [452] (cmpq (% arg_z) (% save0)) ; [456] (jl L199) ; [459] ;;; (dotimes (col cols) (dotimes (k cols) (incf (aref result row col) (* (aref ma row k) (aref mb k col) (addq ($ 8) (% save1)) ; [465] L462 (movq (@ -64 (% rbp)) (% arg_z)) ; [469] (cmpq (% arg_z) (% save1)) ; [473] (jl L191) ; [476] ;;; (dotimes (row rows) (dotimes (col cols) (dotimes (k cols) (incf (aref result row col) (* (aref ma ro (addq ($ 8) (% save2)) ; [482] L479 (movq (@ -56 (% rbp)) (% arg_z)) ; [486] (cmpq (% arg_z) (% save2)) ; [490] (jl L183) ; [493] ;;; (let ((result (make-matrix rows cols))) (declare (type (simple-array single-float (* *)) result)) (d (movq (% save3) (% arg_z)) ; [499] (addq ($ 16) (% rsp)) ; [502] (popq (% save3)) ; [506] (popq (% save2)) ; [508] (popq (% save1)) ; [510] (popq (% save0)) ; [512] (movq (@ (% gs) #x178) (% imm0)) ; [514] (movq (@ 8 (% imm0)) (% stack-temp)) ; [523] (movq (@ (% imm0)) (% imm0)) ; [527] (movq (% imm0) (@ (% gs) 80)) ; [530] (movq (% stack-temp) (@ (% gs) #x178)) ; [539] (leaveq) ; [548] (retq) ; [549]
これアセンブラ部分が意図的にコメントになってないけど、このままREPLに入れると動くんだろうか?
CL-USER> (recover-fn-from-rip) ; Evaluation aborted on #<CCL::UNDEFINED-FUNCTION-CALL #x3020018BF73D>.
ダメらしい。
caching-gemm
中間結果を変数にキャッシュするGEMM。
(defun caching-gemm (ma mb) (declare (optimize (speed 3) (debug 0) (safety 0))) (declare (type (simple-array single-float (* *)) ma mb)) (let ((rows (array-dimension ma 0)) (cols (array-dimension mb 1))) (declare (type fixnum rows cols)) (let ((result (make-matrix rows cols))) (declare (type (simple-array single-float (* *)) result)) (dotimes (row rows) (dotimes (col cols) (let ((cell (aref result row col))) (declare (type (single-float) cell)) (dotimes (k cols) (incf cell (* (aref ma row k) (aref mb k col)))) (setf (aref result row col) cell)))) result)))
SBCLの場合。
L2
が多分最内ループで、SAR
してるのなんだ?
後、幅とか先頭アドレスかを毎回ロードして計算しなおしてる?
CL-USER> (disassemble 'caching-gemm) ; disassembly for CACHING-GEMM ; Size: 272 bytes. Origin: #x1006D8D8F5 ; 8F5: 4C894DE8 MOV [RBP-24], R9 ; no-arg-parsing entry point ; 8F9: 4C8945E0 MOV [RBP-32], R8 ; 8FD: 4D8B6831 MOV R13, [R8+49] ; 901: 4D8B5139 MOV R10, [R9+57] ; 905: 4C896DF8 MOV [RBP-8], R13 ; 909: 4C8955F0 MOV [RBP-16], R10 ; 90D: 488D5C24F0 LEA RBX, [RSP-16] ; 912: 4883EC18 SUB RSP, 24 ; 916: 498BD5 MOV RDX, R13 ; 919: 498BFA MOV RDI, R10 ; 91C: 488B057DFFFFFF MOV RAX, [RIP-131] ; #<FDEFINITION for MAKE-MATRIX> ; 923: B904000000 MOV ECX, 4 ; 928: 48892B MOV [RBX], RBP ; 92B: 488BEB MOV RBP, RBX ; 92E: FF5009 CALL QWORD PTR [RAX+9] ; 931: 480F42E3 CMOVB RSP, RBX ; 935: 4C8B45E0 MOV R8, [RBP-32] ; 939: 4C8B4DE8 MOV R9, [RBP-24] ; 93D: 4C8B55F0 MOV R10, [RBP-16] ; 941: 4C8B6DF8 MOV R13, [RBP-8] ; 945: 488BF2 MOV RSI, RDX ; 948: 31DB XOR EBX, EBX ; 94A: E9A4000000 JMP L5 ; 94F: 90 NOP ; 950: L0: 31C0 XOR EAX, EAX ; 952: E98F000000 JMP L4 ; 957: 660F1F840000000000 NOP ; 960: L1: 488B4E39 MOV RCX, [RSI+57] ; 964: 488BFB MOV RDI, RBX ; 967: 48D1FF SAR RDI, 1 ; 96A: 480FAFF9 IMUL RDI, RCX ; 96E: 4801C7 ADD RDI, RAX ; 971: 488B4E11 MOV RCX, [RSI+17] ; 975: F30F104C7901 MOVSS XMM1, [RCX+RDI*2+1] ; 97B: 31C9 XOR ECX, ECX ; 97D: EB43 JMP L3 ; 97F: 90 NOP ; 980: L2: 498B5039 MOV RDX, [R8+57] ; 984: 488BFB MOV RDI, RBX ; 987: 48D1FF SAR RDI, 1 ; 98A: 480FAFFA IMUL RDI, RDX ; 98E: 4801CF ADD RDI, RCX ; 991: 498B5011 MOV RDX, [R8+17] ; 995: F30F10547A01 MOVSS XMM2, [RDX+RDI*2+1] ; 99B: 498B5139 MOV RDX, [R9+57] ; 99F: 488BF9 MOV RDI, RCX ; 9A2: 48D1FF SAR RDI, 1 ; 9A5: 480FAFFA IMUL RDI, RDX ; 9A9: 4801C7 ADD RDI, RAX ; 9AC: 498B5111 MOV RDX, [R9+17] ; 9B0: F30F105C7A01 MOVSS XMM3, [RDX+RDI*2+1] ; 9B6: F30F59DA MULSS XMM3, XMM2 ; 9BA: F30F58CB ADDSS XMM1, XMM3 ; 9BE: 4883C102 ADD RCX, 2 ; 9C2: L3: 4C39D1 CMP RCX, R10 ; 9C5: 7CB9 JL L2 ; 9C7: 488B4E39 MOV RCX, [RSI+57] ; 9CB: 488BFB MOV RDI, RBX ; 9CE: 48D1FF SAR RDI, 1 ; 9D1: 480FAFF9 IMUL RDI, RCX ; 9D5: 4801C7 ADD RDI, RAX ; 9D8: 488B4E11 MOV RCX, [RSI+17] ; 9DC: F30F114C7901 MOVSS [RCX+RDI*2+1], XMM1 ; 9E2: 4883C002 ADD RAX, 2 ; 9E6: L4: 4C39D0 CMP RAX, R10 ; 9E9: 0F8C71FFFFFF JL L1 ; 9EF: 4883C302 ADD RBX, 2 ; 9F3: L5: 4C39EB CMP RBX, R13 ; 9F6: 0F8C54FFFFFF JL L0 ; 9FC: 488BD6 MOV RDX, RSI ; 9FF: 488BE5 MOV RSP, RBP ; A02: F8 CLC ; A03: 5D POP RBP ; A04: C3 RET
CCLの場合。
L255
あたりが最内ループ。
途中にlisp-call
とか入ってたり、stack-temp
が多分レジスタスピルだと思う。
CL-USER> (disassemble 'caching-gemm) ;; "home:sandbox;common-lisp;simple-gemm.lisp.newest":1039-1729 (recover-fn-from-rip) ; [7] (pushq (% rbp)) ; [14] (movq (% rsp) (% rbp)) ; [15] (pushq (% arg_y)) ; [18] (pushq (% arg_z)) ; [19] (movq (@ (% gs) 80) (% stack-temp)) ; [20] (subq ($ 64) (@ (% gs) 80)) ; [29] (movq (@ (% gs) 80) (% imm0)) ; [39] (movq (% stack-temp) (@ (% imm0))) ; [48] (movq (@ (% gs) #x178) (% stack-temp)) ; [51] (movq (% stack-temp) (@ 8 (% imm0))) ; [60] (movq (% imm0) (@ (% gs) #x178)) ; [64] (pushq (% save0)) ; [73] (pushq (% save1)) ; [75] (pushq (% save2)) ; [77] (pushq (% save3)) ; [79] ;;; (let ((rows (array-dimension ma 0)) (cols (array-dimension mb 1))) (declare (type fixnum rows cols)) (xorl (% arg_z.l) (% arg_z.l)) ; [81] (movl ($ 16) (% nargs)) ; [83] (movq (@ 'ARRAY-DIMENSION (% fn)) (% temp0)) ; [88] (lisp-call (@ 10 (% temp0))) ; [97] (recover-fn-from-rip) ; [100] (movq (% arg_z) (% arg_x)) ; [107] (pushq (% arg_x)) ; [110] (movq (@ -16 (% rbp)) (% arg_y)) ; [112] (movl ($ 8) (% arg_z.l)) ; [116] (movl ($ 16) (% nargs)) ; [121] (movq (@ 'ARRAY-DIMENSION (% fn)) (% temp0)) ; [126] (lisp-call (@ 10 (% temp0))) ; [137] (recover-fn-from-rip) ; [140] (movq (% arg_z) (% save3)) ; [147] ;;; (let ((result (make-matrix rows cols))) (declare (type (simple-array single-float (* *)) result)) (d (movq (@ -56 (% rbp)) (% arg_y)) ; [150] (movq (% save3) (% arg_z)) ; [154] (movl ($ 16) (% nargs)) ; [157] (movq (@ 'MAKE-MATRIX (% fn)) (% temp0)) ; [162] (lisp-call (@ 10 (% temp0))) ; [169] (recover-fn-from-rip) ; [172] (pushq (% arg_z)) ; [179] ;;; (dotimes (row rows) (dotimes (col cols) (let ((cell (aref result row col))) (declare (type (single-f (xorl (% save2.l) (% save2.l)) ; [180] (jmpq L562) ; [183] ;;; (dotimes (col cols) (let ((cell (aref result row col))) (declare (type (single-float) cell)) (dotime L181 (xorl (% save1.l) (% save1.l)) ; [188] (jmpq L549) ; [191] ;;; (let ((cell (aref result row col))) (declare (type (single-float) cell)) (dotimes (k cols) (incf cel L189 (movq (@ -64 (% rbp)) (% arg_x)) ; [196] (movq (@ 43 (% arg_x)) (% imm0)) ; [200] (sarq ($ 3) (% imm0)) ; [204] (imulq (% save2) (% imm0)) ; [208] (leaq (@ (% save1) (% imm0)) (% temp0)) ; [212] (movq (@ 11 (% arg_x)) (% arg_z)) ; [216] (movq (% temp0) (% imm0)) ; [220] (shrq (% imm0)) ; [223] (movss (@ -5 (% arg_z) (% imm0)) (% fp0)) ; [226] (movd (% imm0) (% stack-temp)) ; [232] (movq (@ (% gs) #x178) (% imm0)) ; [236] (movss (% fp0) (@ 16 (% imm0))) ; [245] (movd (% stack-temp) (% imm0)) ; [250] ;;; (dotimes (k cols) (incf cell (* (aref ma row k) (aref mb k col)))) (xorl (% save0.l) (% save0.l)) ; [254] (jmpq L471) ; [257] ;;; (aref ma row k) L255 (movq (@ -8 (% rbp)) (% arg_x)) ; [262] (movq (@ 43 (% arg_x)) (% imm0)) ; [266] (sarq ($ 3) (% imm0)) ; [270] (imulq (% save2) (% imm0)) ; [274] (leaq (@ (% save0) (% imm0)) (% temp0)) ; [278] (movq (@ 11 (% arg_x)) (% arg_z)) ; [282] (movq (% temp0) (% imm0)) ; [286] (shrq (% imm0)) ; [289] (movss (@ -5 (% arg_z) (% imm0)) (% fp0)) ; [292] ;;; (aref mb k col) (movq (@ -16 (% rbp)) (% arg_z)) ; [298] (movq (@ 43 (% arg_z)) (% imm0)) ; [302] (sarq ($ 3) (% imm0)) ; [306] (imulq (% save0) (% imm0)) ; [310] (leaq (@ (% save1) (% imm0)) (% temp0)) ; [314] (movq (@ 11 (% arg_z)) (% arg_z)) ; [318] (movq (% temp0) (% imm0)) ; [322] (shrq (% imm0)) ; [325] (movss (@ -5 (% arg_z) (% imm0)) (% fp1)) ; [328] ;;; (* (aref ma row k) (aref mb k col)) (mulss (% fp1) (% fp0)) ; [334] ;;; (incf cell (* (aref ma row k) (aref mb k col))) (movd (% imm0) (% stack-temp)) ; [338] (movq (@ (% gs) #x178) (% imm0)) ; [342] (movss (% fp0) (@ 32 (% imm0))) ; [351] (movd (% stack-temp) (% imm0)) ; [356] (movd (% imm0) (% stack-temp)) ; [360] (movq (@ (% gs) #x178) (% imm0)) ; [364] (movss (@ 32 (% imm0)) (% fp1)) ; [373] (movd (% stack-temp) (% imm0)) ; [378] (movd (% imm0) (% stack-temp)) ; [382] (movq (@ (% gs) #x178) (% imm0)) ; [386] (movss (@ 16 (% imm0)) (% fp0)) ; [395] (movd (% stack-temp) (% imm0)) ; [400] (addss (% fp1) (% fp0)) ; [404] (movd (% imm0) (% stack-temp)) ; [408] (movq (@ (% gs) #x178) (% imm0)) ; [412] (movss (% fp0) (@ 48 (% imm0))) ; [421] (movd (% stack-temp) (% imm0)) ; [426] (movd (% imm0) (% stack-temp)) ; [430] (movq (@ (% gs) #x178) (% imm0)) ; [434] (movss (@ 48 (% imm0)) (% fp0)) ; [443] (movd (% stack-temp) (% imm0)) ; [448] (movd (% imm0) (% stack-temp)) ; [452] (movq (@ (% gs) #x178) (% imm0)) ; [456] (movss (% fp0) (@ 16 (% imm0))) ; [465] (movd (% stack-temp) (% imm0)) ; [470] ;;; (dotimes (k cols) (incf cell (* (aref ma row k) (aref mb k col)))) (addq ($ 8) (% save0)) ; [474] L471 (cmpq (% save3) (% save0)) ; [478] (jl L255) ; [481] ;;; (setf (aref result row col) cell) (movd (% imm0) (% stack-temp)) ; [487] (movq (@ (% gs) #x178) (% imm0)) ; [491] (movss (@ 16 (% imm0)) (% fp0)) ; [500] (movd (% stack-temp) (% imm0)) ; [505] (movq (% save1) (% arg_y)) ; [509] (movq (% save2) (% arg_x)) ; [512] (movq (@ -64 (% rbp)) (% temp0)) ; [515] (movq (@ 43 (% temp0)) (% imm0)) ; [519] (sarq ($ 3) (% imm0)) ; [523] (imulq (% arg_x) (% imm0)) ; [527] (leaq (@ (% arg_y) (% imm0)) (% arg_y)) ; [531] (movq (@ 11 (% temp0)) (% arg_x)) ; [535] (movq (% arg_y) (% imm2)) ; [539] (shrq (% imm2)) ; [542] (movss (% fp0) (@ -5 (% arg_x) (% imm2))) ; [545] ;;; (dotimes (col cols) (let ((cell (aref result row col))) (declare (type (single-float) cell)) (dotime (addq ($ 8) (% save1)) ; [552] L549 (cmpq (% save3) (% save1)) ; [556] (jl L189) ; [559] ;;; (dotimes (row rows) (dotimes (col cols) (let ((cell (aref result row col))) (declare (type (single-f (addq ($ 8) (% save2)) ; [565] L562 (movq (@ -56 (% rbp)) (% arg_z)) ; [569] (cmpq (% arg_z) (% save2)) ; [573] (jl L181) ; [576] ;;; (let ((result (make-matrix rows cols))) (declare (type (simple-array single-float (* *)) result)) (d (movq (@ -64 (% rbp)) (% arg_z)) ; [582] (addq ($ 16) (% rsp)) ; [586] (popq (% save3)) ; [590] (popq (% save2)) ; [592] (popq (% save1)) ; [594] (popq (% save0)) ; [596] (movq (@ (% gs) #x178) (% imm0)) ; [598] (movq (@ 8 (% imm0)) (% stack-temp)) ; [607] (movq (@ (% imm0)) (% imm0)) ; [611] (movq (% imm0) (@ (% gs) 80)) ; [614] (movq (% stack-temp) (@ (% gs) #x178)) ; [623] (leaveq) ; [632] (retq) ; [633]
row-major-gemm
aref
が何となく遅そうなので、row-major-aref
で書き直した。
(defun row-major-gemm (ma mb) (declare (optimize (speed 3) (debug 0) (safety 0))) (declare (type (simple-array single-float (* *)) ma mb)) (let ((rows (array-dimension ma 0)) (cols (array-dimension mb 1))) (declare (type fixnum rows cols)) (let ((result (make-matrix rows cols))) (declare (type (simple-array single-float (* *)) result)) (dotimes (row rows) (dotimes (col cols) (let ((cell (aref result row col)) (ma-index (array-row-major-index ma row 0)) (mb-index (array-row-major-index mb 0 col))) (declare (type (single-float) cell)) (declare (type fixnum ma-index mb-index)) (dotimes (k cols) (incf cell (* (row-major-aref ma ma-index) (row-major-aref mb mb-index))) (incf ma-index) (incf mb-index cols)) (setf (aref result row col) cell)))) result)))
SBCLの場合。
L2
が最内ループで、indexの掛け算を消せた気がするけど、先頭アドレスのロードらしきやつが消せない。
CL-USER> (disassemble 'row-major-gemm) ; disassembly for ROW-MAJOR-GEMM ; Size: 266 bytes. Origin: #x1003296715 ; 715: 4C8975E8 MOV [RBP-24], R14 ; no-arg-parsing entry point ; 719: 4C896DE0 MOV [RBP-32], R13 ; 71D: 4D8B7D31 MOV R15, [R13+49] ; 721: 4D8B4E39 MOV R9, [R14+57] ; 725: 4C897DF8 MOV [RBP-8], R15 ; 729: 4C894DF0 MOV [RBP-16], R9 ; 72D: 488D5C24F0 LEA RBX, [RSP-16] ; 732: 4883EC18 SUB RSP, 24 ; 736: 498BD7 MOV RDX, R15 ; 739: 498BF9 MOV RDI, R9 ; 73C: 488B057DFFFFFF MOV RAX, [RIP-131] ; #<FDEFINITION for MAKE-MATRIX> ; 743: B904000000 MOV ECX, 4 ; 748: 48892B MOV [RBX], RBP ; 74B: 488BEB MOV RBP, RBX ; 74E: FF5009 CALL QWORD PTR [RAX+9] ; 751: 480F42E3 CMOVB RSP, RBX ; 755: 4C8B6DE0 MOV R13, [RBP-32] ; 759: 4C8B75E8 MOV R14, [RBP-24] ; 75D: 4C8B4DF0 MOV R9, [RBP-16] ; 761: 4C8B7DF8 MOV R15, [RBP-8] ; 765: 488BDA MOV RBX, RDX ; 768: 31C9 XOR ECX, ECX ; 76A: E99E000000 JMP L5 ; 76F: 90 NOP ; 770: L0: 31C0 XOR EAX, EAX ; 772: E989000000 JMP L4 ; 777: 660F1F840000000000 NOP ; 780: L1: 488B5339 MOV RDX, [RBX+57] ; 784: 488BF1 MOV RSI, RCX ; 787: 48D1FE SAR RSI, 1 ; 78A: 480FAFF2 IMUL RSI, RDX ; 78E: 4801C6 ADD RSI, RAX ; 791: 488B5311 MOV RDX, [RBX+17] ; 795: F30F104C7201 MOVSS XMM1, [RDX+RSI*2+1] ; 79B: 498B5539 MOV RDX, [R13+57] ; 79F: 488BF9 MOV RDI, RCX ; 7A2: 48D1FF SAR RDI, 1 ; 7A5: 480FAFFA IMUL RDI, RDX ; 7A9: 4C8BC0 MOV R8, RAX ; 7AC: 31F6 XOR ESI, ESI ; 7AE: EB2C JMP L3 ; 7B0: L2: 498B5511 MOV RDX, [R13+17] ; 7B4: F30F10547A01 MOVSS XMM2, [RDX+RDI*2+1] ; 7BA: 498B5611 MOV RDX, [R14+17] ; 7BE: F3420F105C4201 MOVSS XMM3, [RDX+R8*2+1] ; 7C5: F30F59DA MULSS XMM3, XMM2 ; 7C9: F30F58CB ADDSS XMM1, XMM3 ; 7CD: 4883C702 ADD RDI, 2 ; 7D1: 4F8D1401 LEA R10, [R9+R8] ; 7D5: 4D8BC2 MOV R8, R10 ; 7D8: 4883C602 ADD RSI, 2 ; 7DC: L3: 4C39CE CMP RSI, R9 ; 7DF: 7CCF JL L2 ; 7E1: 488B5339 MOV RDX, [RBX+57] ; 7E5: 488BF1 MOV RSI, RCX ; 7E8: 48D1FE SAR RSI, 1 ; 7EB: 480FAFF2 IMUL RSI, RDX ; 7EF: 4801C6 ADD RSI, RAX ; 7F2: 488B5311 MOV RDX, [RBX+17] ; 7F6: F30F114C7201 MOVSS [RDX+RSI*2+1], XMM1 ; 7FC: 4883C002 ADD RAX, 2 ; 800: L4: 4C39C8 CMP RAX, R9 ; 803: 0F8C77FFFFFF JL L1 ; 809: 4883C102 ADD RCX, 2 ; 80D: L5: 4C39F9 CMP RCX, R15 ; 810: 0F8C5AFFFFFF JL L0 ; 816: 488BD3 MOV RDX, RBX ; 819: 488BE5 MOV RSP, RBP ; 81C: F8 CLC ; 81D: 5D POP RBP ; 81E: C3 RET
一応実測する。
CL-USER> (setf *N* 256) (run #'simple-gemm 100) (run #'on-register-gemm 100) (run #'row-major-gemm 100) Evaluation took: 6.159 seconds of real time 6.164000 seconds of total run time (6.164000 user, 0.000000 system) [ Run times consist of 0.004 seconds GC time, and 6.160 seconds non-GC time. ] 100.08% CPU 14,747,074,008 processor cycles 26,281,520 bytes consed Evaluation took: 3.955 seconds of real time 3.960000 seconds of total run time (3.960000 user, 0.000000 system) 100.13% CPU 9,469,087,053 processor cycles 26,216,000 bytes consed Evaluation took: 3.003 seconds of real time 3.008000 seconds of total run time (3.008000 user, 0.000000 system) [ Run times consist of 0.004 seconds GC time, and 3.004 seconds non-GC time. ] 100.17% CPU 7,192,309,782 processor cycles 26,216,000 bytes consed
unroll無しでon-register-gemm
よりは速くなってるけど、やっぱりCと同レベルなコードにはならなそう。
参考ページ
local-time:todayとlocal-time:now
SQLを扱っていると日付を指定する事が往々にある。 Common Lispからそういうコードのユニットテストを行う場合local-timeライブラリを使っている。
が、このライブラリにある関数todayと関数nowは両方ともtimestampオブジェクトを返すがタイムゾーンが違っていて、関数todayはUTCで返し、関数nowはJSTで返る。(手元の環境では) timestampオブジェクトのままなら問題ないんだろうけど、SQLに日付を埋め込むために年月日の文字列に変換して使っていると、日本時間の0時から9時までは文字列は一致しなくなる。
そして、昼間はローカルで動いてたしCIのログもSUCCESSだから何かここ数時間でローカル環境を壊したはず、という可能性を考えて深夜に色々調査するも見つからず。。。
ちなみにそのユニットテストを通すだけなら、
(setf local-time:*default-timezone* local-time:+utc-zone+)
とかどこかでやって両方共UTCにしてしまえば通る。 たまたま0時を跨ぐとやっぱり失敗する分けだがそこは気にしないとして、どうするのが正しいか。
C や C++ から Common Lisp に移って (2015年9月版)
なんか細々と思ったこととか
Emacs
EmacsでCommon Lispを扱う場合にSlimeを使っているが、Slime以外にもプラグインは存在するため、Web上で見つけたEmacsのCommon Lisp情報がそのまま利用できるのか、Slime向けに別の方法があるのか判断を要する。
デバッガ
Common Lispは規格としてデバッガを持っている。 処理系付属のデバッガ以外にもEmacsのプラグインであるSLIMEに、似て非なる独自のデバッガがある。 SLIMEのデバッガは単なる処理系付属のデバッガ出力をそのまま表示しているだけでなく、機能拡張が色々されているので便利。
パッケージのパラメータ変更
Common Lispではsetfが変数への代入に相当するが、各種パッケージのパラメータ変更にこれを使い、グローバル変数への代入する(ように表面上見える)。 コワイ
(setf prove:*default-reporter* :dot)
実際には何らかの関数呼び出しが裏に入ってたりするので、見た目からは単純な代入なのかはよく分からない。