--- Log opened Fri Aug 07 00:00:22 2009 00:03 -!- t_grabiec [n=tomekg@avu182.neoplus.adsl.tpnet.pl] has quit ["Leaving"] 03:11 -!- Netsplit lindbohm.freenode.net <-> irc.freenode.net quits: ahuillet 03:13 -!- Netsplit over, joins: ahuillet 03:13 -!- ahuillet [n=ahuillet@71.251.80-79.rev.gaoland.net] has left #jato [] 03:13 -!- ahuillet [n=ahuillet@71.251.80-79.rev.gaoland.net] has joined #jato 08:26 < penberg_home> ahuillet: some new regalloc bugs it seems 08:26 < ahuillet> great 08:26 < penberg_home> ahuillet: http://folk.uio.no/vegardno/jato-irc-logs/2009-08-06.txt 08:26 < penberg_home> :) 08:26 < penberg_home> at the bottom 08:26 < ahuillet> my patch will help 08:26 < penberg_home> http://pastebin.com/d417024b8 08:26 < penberg_home> this test case bombs apparently 08:26 < penberg_home> ahuillet: ah, ok 08:32 < penberg_home> ahuillet: so we're at 95% now :) 08:59 -!- tgrabiec [n=tomekg@avu182.neoplus.adsl.tpnet.pl] has joined #jato 09:13 < tgrabiec> penberg_home: look what I've found: http://pastebin.com/d417024b8 09:13 < tgrabiec> this fails with "jato: arch/x86/emit-code.c:561: __encode_reg: Assertion `!"unassigned register in code emission"' failed." 09:17 < penberg_home> tgrabiec: yup, I saw that 09:18 < penberg_home> ahuillet says his patch will hel?p 09:18 < ahuillet> I think it will 09:18 < ahuillet> though since it breaks pretty much everything else... 09:21 < penberg_home> :-) 09:22 < ahuillet> tgrabiec : how comes the regalloc trace output does not appear with this test case? I'm lost 09:22 < ahuillet> #2 0x08055523 in emit_insn (buf=0x9d06f20, insn=0x8b) at arch/x86/emit-code.c:2535 09:22 < ahuillet> wtf 09:24 < ahuillet> #2 0x08055523 in emit_insn (buf=0x9d06f20, insn=0x8b) at arch/x86/emit-code.c:2535 09:24 < ahuillet> No locals. 09:24 < ahuillet> #3 0x0805ed38 in emit_body (bb=0x9d03c80, buf=0x9d06f20) at jit/emit.c:84 09:24 < ahuillet> insn = (struct insn *) 0x9d06ba8 09:24 < ahuillet> some stack corruption? 09:25 < ahuillet> this does not seem like a regalloc bug to me. :) 09:31 < penberg_home> funny. 09:31 < penberg_home> stack corruption could explain the assertion tgrabiec is hitting too 09:31 < penberg_home> ahuillet: are you sure you have latest and greatest master? 09:31 < ahuillet> I'm sure I did not up until now 09:32 < penberg_home> actually, stack corruption would not explain it 09:32 < penberg_home> vars are in the heap 09:32 < penberg_home> tgrabiec: I added some regression tests for double and float comparisons 09:32 < penberg_home> tgrabiec: and updated our bytecode matrix! 09:33 < penberg_home> laload/lastore, daload/dastore, l2f/l2d, f2l/d2l, and lookupswitch are missing 09:34 < penberg_home> and the upcoming invokedynamic ;) 09:35 -!- tgrabiec [n=tomekg@avu182.neoplus.adsl.tpnet.pl] has quit [Read error: 110 (Connection timed out)] 09:39 < ahuillet> #6 0x08056086 in emit_insn (buf=0x86f4b50, insn=0x6) at arch/x86/emit-code.c:2788 09:39 < ahuillet> No locals. 09:42 -!- tgrabiec [n=tomekg@afhg143.neoplus.adsl.tpnet.pl] has joined #jato 09:44 < penberg_home> tgrabiec: ahuillet seems to get a different crash 09:45 < tgrabiec> ahuillet: what crash do you get? 09:46 < ahuillet> tgrabiec : same "crash" 09:46 < ahuillet> but I wonder why I don't get anything printed on screen for the bogus method 09:46 < ahuillet> normally we should be getting the register allocator trace before trying code emission 09:46 < tgrabiec> ahuillet: tracing is buffered for the whole compilation 09:47 < ahuillet> how can I turn it off? 09:47 < tgrabiec> ahuillet: either put trace_flush() in approproate places or modify vm/trace.c trace_printf() 09:48 < ahuillet> thx 09:48 < tgrabiec> I invastigated a bit, and it seems that we try to spill an interval which itself needds reloading 09:48 < tgrabiec> and has no register allocated 09:48 < ahuillet> because it doesn't need one 09:49 < tgrabiec> yup. but if we try to spill it, then we get assertion failed 09:51 < tgrabiec> ahuillet: what do you say for this: http://pastebin.com/d2345f27c 09:51 < ahuillet> it's a workaround 09:51 < ahuillet> probably not a fix 09:51 < tgrabiec> what's the propper fix? 09:52 < ahuillet> no idea, yet :) 09:52 < tgrabiec> why do you say it's a workaround? 09:54 < ahuillet> why do you think it fixes the problem? can you justify it is correct , 09:54 < ahuillet> ? 09:55 < tgrabiec> I don't know, I'm just asking 09:56 < ahuillet> we have to figure out why an interval gets no register 09:58 < tgrabiec> I know why. Wy had interval 5-24. register is available for 5-11, we split. we have interval 11-24 which needs reload. no register is available. we spill at next use pos in allocate_blocked_reg(), so we have new interval 23-24 which needs reloading 09:58 < tgrabiec> and we try to reload 23-34 from interval which itself needs reloading 09:58 < ahuillet> we have interval 11-24 which needs reload. no register is available. we spill at next use pos in allocate_blocked_reg() 09:58 < ahuillet> that's the problem 09:59 < ahuillet> 11-24 has no use positions: we must not mark it as needing reloading 09:59 < tgrabiec> mhm 09:59 < ahuillet> __spill_interval_intersecting behaves correctly I think 10:01 < ahuillet> * All active and inactive intervals are used before current, 10:01 < ahuillet> * so it is best to spill current itself 10:01 < ahuillet> here we do a split that seems a bit fishy 10:02 < tgrabiec> yup. if we don't mark current as need_reload when it has no use pos, it won't try to reload, but the new interval will still try to reload from current, right? 10:03 < ahuillet> if (has_use_positions(new)) { 10:03 < ahuillet> mark_need_reload(new, current); 10:03 < ahuillet> new won't try to reload 10:03 < ahuillet> but we're gonna spill, which is theorically fine 10:04 < tgrabiec> but new has use positions, how come it's not gonna reload ? 10:04 < ahuillet> let's check what case we're in first 10:06 < tgrabiec> interval 11-24 is split in the section you just mentioned, after split current=11-23(unassigned), new = 23-34(reload from cuurrent) 10:07 < ahuillet> ok look at the regalloc trace 10:07 < ahuillet> do you see who's using EDX? 10:07 < ahuillet> do you see why we need to give it back at position 11 10:08 < tgrabiec> we need it for another interval 11-24 10:08 < ahuillet> yes, and which is it? 10:09 < penberg_home> interesting workaround 10:09 < ahuillet> [main] 2 (pos: 11-29): EDX fixed no spill no reload 10:09 < ahuillet> penberg_home : it won't work. 10:09 < ahuillet> tgrabiec : that's the interval we're conflicting with 10:09 < ahuillet> a fixed interval with no use position 10:09 < penberg_home> ahuillet: oh, just said it's interesting 10:09 < penberg_home> I'm not going to apply something like that 10:09 < penberg_home> obviously. 10:10 < tgrabiec> ahuillet: yup, that's another problem 10:10 < ahuillet> tgrabiec : that actually is the problem :) 10:10 < tgrabiec> but this interval could have use positions 10:10 < tgrabiec> and we still would have the problem 10:10 < tgrabiec> wouldn't we ? 10:10 < ahuillet> I am not sure. 10:11 < tgrabiec> I think one problem is that we give registers unnecessarily, and another problem is that we do not handle properly cascadeous splitting 10:11 < ahuillet> you're wrong about cascadeous splitting as far as I can tell 10:12 < ahuillet> there is no problem with that as long as we don't try to spill or reload intervals that will have no registers assigned 10:12 < ahuillet> so we have to check the following : 10:12 < ahuillet> 1-any interval we shove in unhandled gets out of it with a register assigned 10:12 < ahuillet> 2-we are not marking as need_spill or need_reload intervals that we do not put in unhandled 10:13 < ahuillet> basically those are the only two ways to get an unassigned register with need_spill/need_reload 10:14 < tgrabiec> ok, we do not mark interval which has no use positions as needreload/need_spill, but still, after we split that interval, the new interval must reload from _something_ 10:14 < ahuillet> tgrabiec : can you elaborate on "give registers unnecessarily" btw? 10:14 < ahuillet> but still, after we split that interval, 10:14 < ahuillet> we do not split again the interval that does not have use positions. 10:14 < ahuillet> we throw it away 10:15 < ahuillet> if we don't, it's a mistake and this is our bug (case 2) 10:16 < tgrabiec> the point is that that interval which is split can result in two intervals, the first has no use positions, the second does 10:16 < ahuillet> yes, but the first one has to be thrown away 10:16 < ahuillet> ie. not inserted into any of the three lists 10:17 < ahuillet> and not considered later on in e.g. __spill_interval_intersecting 10:17 < tgrabiec> ok, but the second one needs reloading right? 10:17 < ahuillet> the second one yes 10:17 < tgrabiec> so, from what ? 10:17 < tgrabiec> we thrown away current 10:17 < ahuillet> from the original interval 10:17 < ahuillet> new = split_interval_at(it, current->range.start); 10:17 < ahuillet> split once 10:18 < ahuillet> new = split_interval_at(new, next_pos); 10:18 < ahuillet> split again 10:18 < ahuillet> mark_need_reload(new, it); 10:18 < ahuillet> tell "new" to reload from "it" 10:18 < ahuillet> this is correct tgrabiec 10:18 < tgrabiec> right 10:19 < ahuillet> and spill_all_intervals_intersecting looks good too :/ 10:19 < tgrabiec> so so, we need to put that in every place we split ? 10:20 < ahuillet> not necessarily, if we split only once and reload immediately it's ok too 10:22 < ahuillet> anyway let's fix the EDX problem first 10:23 < tgrabiec> the edx problem? 10:23 < ahuillet> the fact that EDX is absolutely fre 10:23 < ahuillet> *free 10:23 < tgrabiec> yup 10:23 < ahuillet> but the regallocator thinks it's not 10:23 < tgrabiec> I think it also exposes the xmm regalloc bug 10:23 < ahuillet> well my WIP patch is supposed to fix it 10:24 < ahuillet> problem is it kind of breaks 10:24 < ahuillet> could you help a bit btw? maybe you'll be able to find which method triggered the crash 10:24 < ahuillet> http://paste.pocoo.org/show/R8S9X0KyhtgN8hF9FsB7/ 10:24 < ahuillet> apply this 10:25 < ahuillet> if we make my patch work 10:25 < ahuillet> we probably kill many of those issues at once 10:27 < tgrabiec> what's you test case? 10:28 < tgrabiec> *your 10:28 < ahuillet> any regression test 10:28 < ahuillet> including the one you wrote to expose the regalloc problem 10:30 < ahuillet> ok I get it 10:31 < ahuillet> so for non fixed registers 10:31 < ahuillet> the live range has to start with a USE 10:31 < ahuillet> for fixed registers however this is not true 10:31 < tgrabiec> ahuillet: why don't we shrint the live range so that is starts at use positions? 10:32 < tgrabiec> *shrink 10:32 < ahuillet> because it's wrong 10:32 < ahuillet> think of CALL for example 10:32 < ahuillet> it defines EAX 10:32 < ahuillet> so the fixed interval for EAX has a live range that starts at the CALL site 10:32 < ahuillet> but CALL does not *use* EAX 10:32 < tgrabiec> right 10:32 < ahuillet> that said... 10:33 < ahuillet> making it actually USE EAX will work 10:33 < ahuillet> if we defined that DEF implies USE 10:33 < ahuillet> but I need to check that point 10:33 < ahuillet> penberg_home : any thoughts? 10:33 < tgrabiec> I think it is not correct 10:33 < vegard> tgrabiec: agree 10:34 < ahuillet> question is why :) 10:34 < tgrabiec> well, I think it breaks the concept of use and def 10:34 < ahuillet> but what is the concept of use and def? 10:34 < vegard> mov %ebx, %eax should NOT use %eax. now we might think that %eax is live-in 10:35 < penberg_home> ahuillet: I'm doing my tax declaration, sorry 10:35 < ahuillet> vegard : oh yes, that is correct 10:35 < penberg_home> need to get it to mail by today 10:35 < penberg_home> ;) 10:35 < ahuillet> vegard : for non fixed interals 10:36 < ahuillet> fixed intervals.. I'm not sure, though you probably are right again 10:36 < tgrabiec> ahuillet: aht's wrong with live reange not starting with use ? 10:37 < ahuillet> tgrabiec : nothing if you decide not to consider fixed interval's use positions when splitting 10:37 < ahuillet> (what I called "smoking the carpet" yesterday and which actually makes sense) 10:37 < tgrabiec> btw, this is interesting: http://pastebin.com/d41631e26 10:37 < ahuillet> ok, got a patch that works 10:38 < ahuillet> but doesn't fix the Test4 problem, as could have been expected. :) 10:38 < ahuillet> vegard : are you able to test a patch to see if the XMM thing is fixed? 10:38 < vegard> ahuillet: yes, sure! 10:39 < tgrabiec> ahuillet: got a fix for Test4 problem 10:39 < tgrabiec> as you suggested 10:39 < ahuillet> ah ? 10:40 < ahuillet> vegard : I'm sorry I dont have your mail address here, what is it? 10:40 < vegard> ahuillet: vegard.nossum@gmail.com 10:41 < ahuillet> sent 10:42 < tgrabiec> ahuillet: this fixes the problem, pls review http://pastebin.com/d1da93457 10:42 < ahuillet> god. I need to think. :) 10:42 < ahuillet> vegard : got the mail, 10:42 < ahuillet> ? 10:42 < vegard> yes 10:42 < vegard> test case and args? 10:43 < ahuillet> vegard : check if the XMM problem still happens 10:43 < ahuillet> I don't have a test case, I thought you did :) 10:44 < vegard> I reverted x86: explicitly save and restore XMM registers in prolog/epilog 10:44 < tgrabiec> ahuillet, vegard: http://pastebin.com/d1c38062b 10:44 < vegard> applied your patch 10:44 < ahuillet> tgrabiec : frankly, I think it only hides the problem, but it definitely is correct 10:44 < vegard> and System.out.printlnl() works 10:45 < ahuillet> tgrabiec : so I ACK that if only for performance reasons 10:45 < ahuillet> (because it will improve runtime performance) 10:45 < tgrabiec> ahuillet: he ? 10:45 < ahuillet> tgrabiec : what can you say of vegard's test case? 10:45 < penberg_home> correct? 10:45 < penberg_home> http://pastebin.com/d1da93457 10:45 < penberg_home> this one? 10:45 < ahuillet> yes 10:46 < penberg_home> hmm 10:46 < penberg_home> I don't understand it 10:46 < penberg_home> why do we split _again? 10:46 < tgrabiec> penberg_home: to don't generate intervals that have no use positions and need reloading 10:46 < vegard> ahuillet: I get RuntimeException at Test3.f(Test3.java:5) 10:46 < ahuillet> penberg_home : for a non fixed interval 10:47 < ahuillet> you don't need to assign a register to the LIR positions that don't *use* it 10:47 < penberg_home> but but 10:47 < ahuillet> ie. if the interval is live and not used to keep it on its stack slot 10:47 < penberg_home> first you split 10:47 < ahuillet> and reload as late as possible 10:47 < penberg_home> then you split the new interval again 10:47 < penberg_home> why 10:47 < ahuillet> penberg_home : yes, we do that already in __spill_interval_intersecting 10:47 < ahuillet> penberg_home : this way you spill as early as possible and reload as late as possible 10:47 < ahuillet> thereby decreasing register pressure 10:48 < penberg_home> but the first interval is not added to unhandled list! 10:48 < ahuillet> that's what wimmer explains in the thesis, but he does not do this is try_to_allocate_free_reg 10:48 < ahuillet> penberg_home : that's the objective 10:48 < ahuillet> penberg_home : we don't allocate a register *at all* to this interval 10:48 < penberg_home> so 10:48 < ahuillet> we don't spill it, we don't reload it, we don't do anything 10:48 < penberg_home> the code sucks then? 10:48 < penberg_home> shouldn't we just split once 10:48 < ahuillet> wimmer splits twice too 10:49 < tgrabiec> penberg_home: we will need to split it anyway 10:49 < ahuillet> by splitting twice you ensure your intervals are as short as possible 10:49 < penberg_home> split_interval_at(current, next_use_pos(new, free_until_pos[reg])) 10:49 < penberg_home> hmm? 10:49 < penberg_home> s/new/current/g 10:49 < penberg_home> oh 10:50 < penberg_home> and you have to check 10:50 < penberg_home> that there _is_ a next_use_pos() 10:50 < penberg_home> there might not be 10:50 < ahuillet> no, "new" is right 10:50 < tgrabiec> penberg_home: that would generate a tail for current which has no use positions 10:50 < ahuillet> tgrabiec : penberg is right about the next_use_pos check :) 10:50 < tgrabiec> if (has_use_positions(new)) { 10:50 < tgrabiec> it's already there 10:51 < tgrabiec> hm ? 10:51 < ahuillet> tgrabiec : you've just split again 10:52 < ahuillet> and oh, yes 10:52 < ahuillet> you have a guarantee to have use positions 10:52 < ahuillet> it's ok, pretend penberg_home was doing his tax declaration. 10:53 < tgrabiec> so, the fix goes in/goes out ? 10:54 < ahuillet> same as what I said earlier 10:54 < ahuillet> I think it's good, add the Acked-By: me tag 10:54 < tgrabiec> ok 10:54 < ahuillet> vegard : so, RuntimeException.. because of register stuff 10:54 < vegard> I get just "H" output with Test3 10:54 < ahuillet> ? 10:54 < vegard> sorry 10:54 < vegard> *SOP 10:54 < ahuillet> ok, so my fix did not fix anything, right? :) 10:55 < ahuillet> screw it 10:55 < tgrabiec> right. the xmm bug causes that byte buffer size is miscalculated 10:55 < ahuillet> I have to work for my employer a bit 11:10 < tgrabiec> vegard: in vm/classloader.c line 518, why do we have NOT_IMPLEMENTED ? 11:11 < vegard> because we have already allocated space for the class 11:11 < vegard> in the table 11:11 < penberg_home> vegard, tgrabiec: I think we should look at cleaning those up, btw. 11:11 < vegard> btw, the current code is wrong 11:11 < penberg_home> :) 11:12 < vegard> hm, no it isn't. maybe. 11:12 < tgrabiec> vegard: why ? 11:12 < penberg_home> I just talked to a friend of mine and said that we have 95% coverage 11:12 < penberg_home> then he asked whether he would be able to run Eclipse when we hit 100% 11:12 < vegard> tgrabiec: because I thought we couldn't ++nr_classes without putting something there. but I see that we have .loaded = false 11:12 < penberg_home> he laughed his ass off when I explained that "no, it doesn't mean that. 100% coverage just means that you can _try_ to run any program you want." 11:12 < tgrabiec> vegard: yes, it's needed for handling concurrent access 11:13 < ahuillet> penberg_home : eclipse... -_- 11:13 < penberg_home> heh heh 11:13 < tgrabiec> well, provided that you have unlimited memory storage 11:14 < penberg_home> :) 11:14 < penberg_home> oh, vegard said we're going to have a gc by monday or something :) 11:14 < tgrabiec> vegard: I hope you'll keep the word 11:14 < vegard> penberg_home: that relies somewhat on YOU implementing safepoints 11:15 < vegard> I can do gc maps. 11:16 < penberg_home> vegard: oh 11:16 < penberg_home> http://www.laughingpanda.org/~penberg/jato/safepoints 11:16 < vegard> that's not a patch :P 11:16 < penberg_home> I'm waiting for tomek to make that into a proper patch ;) 11:16 < penberg_home> vegard: so sure, "I" will do it! 11:16 < tgrabiec> penberg_home: oh really ? 11:16 < penberg_home> tgrabiec: yes! 11:16 < ahuillet> I actually kind of have unlimited memory storage 11:16 < ahuillet> access to computers with [], I mean; 11:16 < penberg_home> tgrabiec: it's mostly the same stuff as with the other traps. 11:17 < penberg_home> ahuillet: heh 11:17 < penberg_home> so we can release v0.01 and say "it runs on bull clusters just fine"? 11:17 < ahuillet> 100GB on the school server, and several petabytes at bull I think 11:17 < ahuillet> (counting swap) 11:17 < ahuillet> well the thing is each node in a cluster does not have *that* much RAM 11:17 < ahuillet> it's 3GB per core usually 11:18 < ahuillet> so we're talking of 16 socket, 4 core nodes 11:18 < ahuillet> ok, that's a lot of RAM. :) 11:18 < tgrabiec> I guess why cant address more than ~4GB per-process on x86-32 ? 11:19 < ahuillet> tgrabiec : ? 11:19 < ahuillet> tgrabiec : oh, we're kind of stuck by our architecture, right 11:19 < tgrabiec> pointers are 32-bit 11:19 < ahuillet> so what's quickest, finish x86-64 and run eclipse on bull clusters 11:19 < ahuillet> or implement a GC and run eclipse on our machines? 11:20 -!- penberg [n=penberg@cs146249.pp.htv.fi] has joined #jato 11:20 < penberg> tgrabiec: http://research.sun.com/techrep/1998/abstract-70.html 11:21 < tgrabiec> so we're implementing stop-the-world approach 11:21 < penberg> yes 11:21 < penberg> that's the first step 11:21 < penberg> AFAICT, all GCs do that. 11:21 < penberg> even the low-latency ones 11:21 < penberg> they just stop the world for a *very* brief period 11:21 < tgrabiec> and safepoints are after calls and _before_ back jumps ? 11:22 < penberg> tgrabiec: yeah. check the paper for details ;) 11:22 < tgrabiec> k 11:22 < penberg> tgrabiec: and hey, you have something like 450 commits now, I have 1020 or so. 11:22 < penberg> so pretty soon you will be maintaining Jato :) 11:22 < ahuillet> ohymgod: read = sscanf(s, "{%[^'{}']}{%[^'{}']}", &th, &rules); 11:22 < penberg> ahuillet: ouch 11:22 < penberg> I hope that's not in Jato! 11:22 < vegard> tgrabiec is for jato as ingo is for linux 11:23 < ahuillet> nope 11:23 < ahuillet> that's bull proprietary 11:23 < ahuillet> if you paste that somewhere else I'm gonna have to kill you before they send the ninjas against me 11:23 < penberg> vegard: :-) 11:23 < tgrabiec> ;) 11:23 < penberg> all I can say is 11:23 < penberg> poor ahuillet 11:23 < penberg> always debugging the hardest problems 11:23 < penberg> never getting any credit ;) 11:24 < penberg> tgrabiec: [PATCH] jit: fix register allocator bug 11:24 < tgrabiec> oops 11:24 < penberg> hmm 11:24 < penberg> Acked-by: Arthur HUILLET? 11:24 < tgrabiec> yes yes, sorry I forgot 11:24 < ahuillet> tgrabiec : yeah, the subject line is a bit.. :/ 11:24 < tgrabiec> ..brief ? 11:24 < penberg> tgrabiec: no but 11:24 < ahuillet> yeah, this world would do 11:24 < penberg> it doesn't say which bug :-) 11:24 < ahuillet> *word 11:25 < tgrabiec> penberg: caus I don;t know how to call it 11:25 < ahuillet> but it does not convey a negative impression 11:25 < penberg> let me have a looksie at the changelog 11:25 < penberg> tgrabiec, ahuillet: and maybe one day you can explain why my suggestion was wrong. 11:25 < penberg> because I think you are wrong :) 11:25 < ahuillet> penberg : you want to test for use positions in the new interval 11:25 < ahuillet> but we have created this new interval so that it beings with a use position 11:25 < ahuillet> so the test is uselss 11:25 < ahuillet> not incorrect, just useless 11:26 < penberg> ahuillet: like I said 11:26 < penberg> you can do that 11:26 < penberg> by passing the spill position to next_use_pos() 11:26 < penberg> so there's no need to actually _split_ current 11:26 < ahuillet> actually yes, in order to shorten it 11:26 < penberg> tgrabiec, ahuillet: so, which bug does it fix? 11:26 < penberg> ahuillet: no, we can just fix the splitting code 11:26 < ahuillet> regalloc bug #165984732 11:27 < penberg> to completely leave out the uninteresting interval 11:27 < tgrabiec> penberg: this one - http://pastebin.com/d417024b8 11:27 < tgrabiec> hmm, why isn't it in the changelog ? 11:27 < penberg> tgrabiec: RegisterAllocatorTortureTest please. 11:27 < penberg> I'd prefer you add new regression tests 11:27 < penberg> rather than put (useful) test s 11:27 < penberg> cases in the changelog 11:28 < penberg> so I'm going to be difficult here 11:28 < tgrabiec> penberg: did you come with a bug name ? 11:28 < penberg> tgrabiec: no 11:28 < penberg> I didn't merge it 11:28 < penberg> I want a regression test in RegisterAllocatorTest! 11:28 < tgrabiec> ok, so resend ? 11:28 < penberg> a separate patch is fine. 11:28 < tgrabiec> ok, separate it is 11:28 < penberg> ahuillet: http://jato.lighthouseapp.com/projects/29055/tickets/1-callee-saved-registers-are-saved-unconditionally this? 11:29 < penberg> hmm, or was that number a joke?-) 11:29 < ahuillet> what this? 11:29 < ahuillet> penberg : joke :) 11:29 < penberg> heh heh ok. 11:30 < penberg> I wonder if we should write RegisterAllocatorTortureTest in Jasmin. 11:31 < tgrabiec> penberg: we should write it in LIR assembly 11:32 < tgrabiec> some abstarct sort of LIR 11:32 < tgrabiec> actually it should be easy to do 11:32 < penberg_home> LIR is arch specific 11:33 < penberg_home> so I'm not sure how that would work. 11:33 < tgrabiec> well, we would use the abstract LIR, same as the one that regalloc works on 11:33 < penberg_home> live intervals? 11:33 < penberg_home> sure 11:33 < penberg_home> that would be a unit test :) 11:34 < tgrabiec> penberg_home: yes, I thought of someting like this: insn(use_v1, def_v2); insn(use_v1, def_v3); .... 11:34 < penberg_home> tgrabiec: yes, makes sense. 11:34 < penberg_home> the "arch" you use for jit tests is the mmix arch 11:40 < tgrabiec> how to put align arguments in java method declaration when they are too long to fit in one line ? 11:40 < tgrabiec> *how to align 11:41 < tgrabiec> something like this: http://pastebin.com/d2a57dc9d 11:44 < penberg_home> tgrabiec: keep them on one line 11:46 < tgrabiec> penberg_home: regression test pacth sent 11:53 < penberg_home> cool 11:54 < penberg> tgrabiec: where did you find this bug btw? 11:54 < penberg> HelloSwing? 11:54 < tgrabiec> yes 11:54 < penberg> k 11:54 < penberg> what's the next bug?-) 11:54 < tgrabiec> http://pastebin.com/d5c6f2b54 :/ 11:55 < penberg> ouch :) 11:55 < tgrabiec> there is one problem with having almost all instruction supported - you now can run so many programs which expose so many bugs 11:56 < penberg> :) 11:56 < vegard> ah 11:56 < vegard> you can run tetris now? 11:56 < vegard> no, frozen bubble 11:56 < penberg> probably not :) 11:56 < penberg> if HelloWorldSwing doesn't start up 11:56 < vegard> HelloWorldSwing. 11:56 < penberg> there's little hope the rest will 11:57 < vegard> penberg: how about putting all reference-type fields first in the object, so gc can easily check only those fields? 11:58 < penberg> vegard: makes sense. 11:58 < penberg> you're smart :) 11:58 < vegard> o.O 11:58 < tgrabiec> vegard: we could also make some translation table with offsetof(struct vm_object, ...) 11:59 < vegard> translation table? 11:59 < penberg> btw, if someone wants do a patch the frees all the unused parts of struct compilation_unit (liveness data, etc.) 11:59 < penberg> after compile() is done 11:59 < penberg> I'd be more than happy to take it ;) 12:00 < tgrabiec> vegard: yeah, so that we can put those fields anywhere in the object 12:00 < penberg> tgrabiec: that's not the problem. 12:00 < penberg> tgrabiec: putting them at the beginning 12:00 < penberg> means less cache line pressure 12:00 < penberg> when the register allocator starts to roam through memory. 12:00 < tgrabiec> well, right, never mind me 12:01 < penberg> ;) 12:01 < tgrabiec> though it seem hackish ;) 12:01 < penberg> tgrabiec: does it? 12:01 < penberg> it seems perfect sense to me 12:01 < penberg> we already reorder fields 12:01 < penberg> to avoid holes in memory 12:01 < vegard> uh, we don't 12:01 < penberg> we do. 12:01 < vegard> but we planned it ;) (and I'm writing it now) 12:01 < penberg> aha ok 12:02 < vegard> all our fields are 8 bytes anyway :P 12:02 < penberg> that might actually speed up method calls too. 12:02 < penberg> because references are packed to adjacent cache lines 12:02 < penberg> is adjacent the proper expression here? 12:03 < vegard> it means "next to each other", doesn't it? 12:03 < penberg> I think so :) 12:03 < tgrabiec> ok, never mind what I said, it's obviously ok to do that 12:03 < penberg> tgrabiec: so the mask field is a long. 12:04 < penberg> tgrabiec: http://pastebin.com/mfca56a8 12:04 < penberg> so probably a bug in one of the 64-bit arithmetic ops. 12:04 < penberg> or 12:04 < penberg> alternatively, the _passed_ mask is wrong 12:04 < penberg> so 64-bit arg passing 12:04 < tgrabiec> the invoke-verbose trace shows some memory corruption too 12:04 < penberg> oh, two for loops. 12:05 < penberg> k. 12:05 < penberg> tetris dies with 12:05 < penberg> vm_jni_check_trap: warning: JNI handler for index 190 not implemented. 12:05 < penberg> frozen bubble has the same problem as HelloWorldSwing. 12:05 < tgrabiec> those are easy to fix 12:05 < penberg> just think how sad it's going to be when those programs run :( 12:06 < tgrabiec> yeah, very sad 12:06 < penberg> tgrabiec: ;) 12:06 < penberg> well, time to go buy some food! 12:17 < tgrabiec> hmm, I think OP_AND is broken for J_LONG 12:17 < tgrabiec> no, never mind 12:34 < penberg_home> I'd just inspect the 64-bit OPs 12:34 < penberg_home> aah, tax declaration done and it's in the mail. 12:35 < tgrabiec> yup, I'm inspecting them 12:53 < tgrabiec> 64-bit ops are fine 12:53 < tgrabiec> ESP is corrupted, so push scratches local variables !! 12:54 < tgrabiec> oh crap I know where 12:55 < tgrabiec> its in emulate_op_64 12:55 < tgrabiec> yup. fixed 13:01 < penberg_home> tgrabiec, ahuillet: http://jato.lighthouseapp.com/projects/29055-jato/tickets/6-floating-points-require-sse2-support-on-32-bit-x86 13:02 < ahuillet> The floating point support in Jato requires the SSE2. This means that we're unable to run Java programs on Athlon class machines, for example, as they only support SSE1. 13:02 < ahuillet> that's misleading 13:02 < ahuillet> floating point support in jato does not require SSE2, does it? 13:02 < penberg> ahuillet: it does. 13:02 < ahuillet> ?! 13:02 < penberg> well doubles do. 13:02 < ahuillet> only double 13:02 < penberg> sure. 13:02 < tgrabiec> double precision floating point 13:02 < ahuillet> that's not the same as The floating point support in Jato requires the SSE2. 13:03 < ahuillet> for example, GPR->FPU regs transfers could be implemented with SSE2 13:03 < penberg_home> http://jato.lighthouseapp.com/projects/29055/tickets/6-floating-points-require-sse2-support-on-32-bit-x86#ticket-6-1 13:03 < ahuillet> in that case floating point support would require SSE2 to work at all 13:03 < penberg_home> i fixed it 13:03 < ahuillet> now owing to S.o.println using doubles 13:04 < ahuillet> we can say Athlon machines cannot run Java programs at all :) 13:04 < penberg> ahuillet: I fixed the description! 13:04 < tgrabiec> ahuillet: S.o.println does not use doubles does it? 13:04 < ahuillet> penberg : I know, I know ;) 13:04 < tgrabiec> it worked before my double patches 13:04 < ahuillet> tgrabiec : well, without double support you're not going to get it to work 13:04 < ahuillet> so it kind of does 13:04 < penberg> tgrabiec: doesn't seem to fix HelloWorldSwing yet? 13:05 < tgrabiec> penberg: no, I'm investigating 13:05 < tgrabiec> ahuillet: it _worked_ before double support patches 13:05 < ahuillet> tgrabiec : ?!! 13:05 < penberg> doubles were for HelloWorldSwing! 13:06 < penberg> not for hello world. 13:06 < penberg> (the console version) 13:06 < penberg> swing is the UI toolkit for Java. 13:06 < penberg> GUI 13:20 < tgrabiec> yet another regalloc bug 13:21 < penberg> :) 13:21 < tgrabiec> I guess it is the most complicated part of the whole compiler 13:22 < vegard> it's the one taking the most time anyway ;) 13:27 < tgrabiec> java/awt/image/DirectColorModel.(Ljava/awt/color/ColorSpace;IIIIIZI)V is incorrect, arg nr 4 (gmask) is passed to super corrupted 13:27 < tgrabiec> because of bug in data flow resultion AFAICT 13:34 < penberg> :-) 13:40 < tgrabiec> hmm, spill after branch? http://pastebin.com/d34207cf 13:40 < penberg_home> tgrabiec: yes. 13:40 < tgrabiec> is it ok ? 13:40 < penberg_home> there's some fishy list_add() and list_add_tail() hackery in the spill code 13:40 < penberg_home> tgrabiec: no! 13:42 < penberg> tgrabiec, ahuillet: http://github.com/penberg/jato/commit/f4b757adccd42cff28c27d913a93f81a37d88f19 13:42 < penberg> so !cpu_has(X86_FEATURE_SSE2) -> use emulation instead. 13:43 < tgrabiec> penberg: we could also allow for -Xemulate-doble 13:44 < tgrabiec> so I can actually test it 13:45 < tgrabiec> btw, not all machines have SSE1 right? 13:45 < penberg> tgrabiec: that's true. 13:45 < penberg> not sure if we care about them, though. 13:45 < tgrabiec> well, I don't, currently ;) 13:45 < penberg> tgrabiec: sure, something like -Xcpu:-sse might be nicer? 13:46 < penberg> -Xcpu:-sse2 13:46 < penberg> or -Xcpu:no-sse2 13:46 < tgrabiec> yup, looks good 13:46 < tgrabiec> the last one 13:46 < penberg> yup. 13:46 < penberg> we need arch_parse_options() or such. 13:47 < penberg> and just clear X86_FEATURES_SSE2 after cpuid if we see -Xcpu:no-sse2 13:55 < penberg> tgrabiec: http://github.com/penberg/jato/commit/2bc4db828aba86e81315b4356cf89755f7677e86 14:00 < tgrabiec> hmm. I'm just thinking. our register allocator splits intervals at use positions 14:01 < tgrabiec> what about def-positions ? 14:01 < tgrabiec> are def's considered use positions in regalloc ? 14:02 < tgrabiec> if DEFs are not recognized by regalloc, we could have such situation: use -- spill --- def ------- reload -- use 14:06 < penberg_home> hmm 14:06 < penberg_home> better talk to ahuillet 14:06 < penberg_home> ;) 14:06 < tgrabiec> I checked, apparently all registers are put to use position list 14:06 < tgrabiec> so it's ok 14:07 < ahuillet> know what guys 14:07 < ahuillet> there's a difference between "use" in use-def 14:07 < ahuillet> and "use" in linear-scan 14:07 < tgrabiec> yup 14:08 < ahuillet> the second covers all cases of use, plus many cases of def 14:08 < ahuillet> that's the source of my confusion this morning 14:09 < tgrabiec> hey, how about puttin gall DEF to use positions ? 14:09 < tgrabiec> even the fixed ones ? 14:09 < tgrabiec> scratch the question mark 14:10 < ahuillet> ? 14:10 < tgrabiec> if there is DEF_EAX declared for instruction, then when this instruction is encountered, we add fixed_intervals[reg] to use positions 14:10 < tgrabiec> does it make any sense ? 14:11 < ahuillet> well 14:11 < ahuillet> we need to fix "use pos" in linear-scan 14:12 < ahuillet> to mean what it really means 14:12 < tgrabiec> so what does it mean currently ? 14:12 < tgrabiec> and what it should 14:13 < ahuillet> currently I think we understand "use" in linear-scan the same way as we do in "use-def" 14:13 < ahuillet> ie. basically "register is read" 14:15 < tgrabiec> so what I proposed is a good thing? 14:15 < ahuillet> yes, yes it sounds good 14:16 < ahuillet> but it's not enough 15:25 < tgrabiec> I investigated the bug and the problem is as follows 15:27 < tgrabiec> we're resolving a data flow from A -> B, the interval part at B starts before B, so we insert reload instruction at the end of A 15:27 < tgrabiec> but we reload to the register which is used in block C, A --> C 15:28 < tgrabiec> the same register that interval has in B is allocated to another interval in C 15:29 < tgrabiec> proposed solution: split interval at B start, so it reloads itself automatically 15:29 < tgrabiec> ack ? 15:30 < tgrabiec> generally, if to_it starts before mappings[i].to that interval should be split, we should not insert reload instruction 15:31 < tgrabiec> ahuillet: whatdaya think ? 15:50 -!- t_grabiec [n=tomekg@aemx207.neoplus.adsl.tpnet.pl] has joined #jato 15:52 -!- tgrabiec [n=tomekg@afhg143.neoplus.adsl.tpnet.pl] has quit [Read error: 110 (Connection timed out)] 15:58 < vegard> I'm off. see you later :) 15:58 < t_grabiec> see you 16:00 < t_grabiec> ahuillet: why not use different resolution blocks for each CFG edge ? 16:00 < ahuillet> ????????????????? 16:00 < t_grabiec> if we don't, I'm afraid we must split all intervals at bb boundaries 16:00 < ahuillet> ?????????????????????? 16:00 < ahuillet> :] 16:01 < t_grabiec> ahuillet: did you write something? cause I was disconnected and I didn't see anything 16:01 < ahuillet> I wrote a lot of question marks. :) 16:01 < t_grabiec> did you see my explanation of the bug? 16:02 < ahuillet> this is getting out of proportion 16:02 < ahuillet> we're turning our register allocator into a giant pile of hacks 16:02 < t_grabiec> ahuillet: AFAIK hotspot of jikes use per cfg edge resolution blocks 16:03 < t_grabiec> *hotspot or jikes 16:03 < t_grabiec> so we don't have to care about other branch targets 16:03 < t_grabiec> cause currently, we insert reload instructions and do not care if this register is allocated to something on other execution paths 16:04 < ahuillet> are we talking about a bug or an optimization ? 16:04 < t_grabiec> a bug 16:04 < ahuillet> I'm afraid I don't understand your explanation really well 16:12 < t_grabiec> ahuillet: I commented the LIR: http://pastebin.com/d63f35f7e 16:14 < t_grabiec> I tried to move reloading to target bb, but the problem is the same - the value might be set in _revious_ bb. So that's why I think we need separate resolution bock per each CFG edge 16:15 < t_grabiec> or split all at bb boundaries 16:16 < t_grabiec> I've read that some JVMs do this, by emiting resolution blocks and patching/setting branch targets appropriately 16:16 < t_grabiec> or you have other idea? 16:21 < penberg_home> can we see the proposed patch too? 16:21 < t_grabiec> penberg_home: come on, it's not that simple :) 16:21 < penberg_home> aah 16:22 < penberg_home> ok, I thought you were testing a patch ;) 16:22 < t_grabiec> the idea I had at the beggining is also wrong 16:22 < penberg_home> oh okay, I misunderstood 16:22 < t_grabiec> I currently think we need different resolution blocks for each CFG edge 16:23 < t_grabiec> currently we have one common resolution block for all targets 16:23 < penberg_home> ok, I don't quite understand what you mean. 16:24 < t_grabiec> with the example I pasted (http://pastebin.com/d63f35f7e) 16:24 < t_grabiec> if we had different blocks for each CFG edge 16:24 < penberg_home> what does "different block" mean here? 16:24 < t_grabiec> different block = different resolution block 16:25 < t_grabiec> then we would reload r33 to edi _only_ on execution path from bb 0x86f9fe8 to bb 0x86fb0e8 16:25 < penberg_home> hmmh 16:25 < penberg_home> what's a resolution block... 16:25 < t_grabiec> so that we don't scratch the value allocated to r29 onpatch to bb 0x86fa868 16:25 < t_grabiec> that's what's generated by resolve_data_flow() 16:26 < penberg> so basically spills? 16:26 < t_grabiec> yeah, spills and reloads 16:26 < penberg> ok, so what do we do now then? 16:26 < penberg> if we don't do per-edge thing 16:26 < penberg> because AFAICT, it obviously needs to be per-edge. 16:26 < penberg> ahuillet: hmmh? 16:26 < t_grabiec> penberg: split all intervals at bb boundary 16:27 < penberg> t_grabiec: hmmmmmmmmmmm 16:27 < penberg> sounds bad. 16:27 < penberg> very bad, if I may add. 16:27 < penberg> does wimmer do that? 16:27 < penberg> you almost certainly _don't_ want to split intervals at bb boundaries 16:27 < t_grabiec> yup 16:27 < penberg> because that means that you'd generate unnecessary spills and reloads 16:27 < penberg> even if there's no register pressure 16:27 < penberg> t_grabiec: wimmer does do it? 16:27 < t_grabiec> that's why we emit data flow resolution 16:28 < penberg> t_grabiec: maybe I misundestand all this, but isn't this a bug in the resolution code? 16:28 < penberg> and the proper fix is to fix that 16:28 < t_grabiec> penberg: yes, it's a bug in resolution code 16:28 < t_grabiec> the point is 16:28 < t_grabiec> currently we put resolution at bb end 16:28 < t_grabiec> so it is shared between a number of targets 16:29 < t_grabiec> I think it should not be shared 16:29 < penberg> hmm 16:29 < penberg> ok, so NAK for splitting at bb boundaries. 16:29 < t_grabiec> of coursse 16:29 < penberg> I don't really know how wimmer deals with this. 16:29 < t_grabiec> neither do I 16:29 < penberg> so it's wimmer time (or lets bother ahuillet time) 16:29 < penberg> you decide :-) 16:30 < penberg> t_grabiec: I know, I know, I'm not being very helpful here 16:30 < penberg> ;-) 16:31 < t_grabiec> :) 16:39 < t_grabiec> hmm, wimmer says that they _do not_ split intervals at all. why do we do it? 16:41 < t_grabiec> doesn't it break wimmer's assumptions? 16:43 < penberg_home> intervals? 16:43 < penberg_home> do you mean fixed intervals? 16:45 < penberg> maybe we should send mail to wimmer :) 16:45 < penberg> please fix our register allocator. heh heh 16:45 < t_grabiec> penberg: "No variable renaming or live range splitting is performed by our linear scan algorithm" 16:46 < penberg_home> what are you reading? paper or thesis? 16:46 < penberg_home> maybe they're describing their _implementation_ at that point in time? 16:46 < penberg_home> (a limitation in it) 16:46 < t_grabiec> umm, dunno 16:46 < t_grabiec> which one should I read ? 16:46 < penberg_home> well, what are you reading? 16:47 < penberg_home> We're looking at the thesis and paper that are linked to on jato front page. 16:47 < t_grabiec> oh, oops 16:47 < penberg_home> and as the paper is called "Optimized Interval Splitting in Linear Scan Register Allocator" 16:47 < penberg_home> I guess they do splitting ;) 16:47 < t_grabiec> wrong paper 16:47 < t_grabiec> it's not even Wimmer's 16:47 < t_grabiec> ;) 16:47 < penberg_home> which one? 16:47 < penberg_home> :) 16:48 < t_grabiec> Linear Scan Register Allocation. MASSIMILIANO POLETTO ... 16:48 < penberg_home> ok 16:48 < penberg_home> that's the original linear scan paper 16:48 < penberg_home> which is almost useless from implementation point of view. 16:48 < penberg_home> IIRC, it doesn't support fixed regs at all. 16:49 < penberg_home> so it works on some RISC architectures but not on "odd" ones like x86 ;) 16:49 < t_grabiec> can you give me a link to wimmer's ? 16:49 < penberg_home> well sure 16:49 < penberg_home> http://www.ssw.uni-linz.ac.at/Research/Papers/Wimmer04Master/Wimmer04Master.pdf 16:49 < penberg_home> and 16:49 < penberg_home> http://www.ssw.uni-linz.ac.at/Research/Papers/Wimmer05/ 16:50 < penberg_home> they're on the jato front page :) 16:50 < penberg_home> Wimmer, C. 2004. Linear Scan Register Allocation for the Java HotSpot(TM) Client Compiler. URL 16:50 < penberg_home> Wimmer, C. and M?ssenb?ck, H. 2005. Optimized Interval Splitting in a Linear Scan Register Allocator. URL 16:50 < penberg_home> t_grabiec: btw, Wimmer has written some other interesting papers as well 16:50 < penberg_home> like 16:50 < penberg_home> Wimmer, C., and M?ssenb?ck, H. 2008. Automatic Array Inlining in Java Virtual Machines. URL 16:55 < penberg> oh, wimmer has three optimizations we're missing AFAICT 16:56 < penberg> (1) move split positions out of loops, (2) remove register -> register moves and (3) remove unnecessary spill stores. 16:56 < t_grabiec> we're also missing correctness ;) 16:57 < penberg> oh, that too! 16:57 < penberg> so which one are you reading now 16:57 < t_grabiec> Wimmer04Master 16:58 < penberg> as you can see, I'm always happy to help new people get into register allocator code 16:58 < penberg> a.k.a. world of pain 16:58 < penberg> t_grabiec: k 16:58 < t_grabiec> heh :) 16:58 < penberg> oh btw 16:58 < penberg> one thing to think about is 16:59 < penberg> ah never mind! ;) 16:59 < penberg> apparently I can't read 17:00 < penberg> Before register 17:00 < penberg> allocation, all basic blocks are sorted into a linear order: The 17:00 < penberg> control flow graph is flattened to a list using the standard 17:00 < penberg> reverse postorder algorithm. To improve the locality, all 17:00 < penberg> blocks belonging to a loop are emitted consecutively. Rarely 17:00 < penberg> executed blocks such as exception handlers are placed at the 17:00 < penberg> end of the method. 17:01 < penberg> hmm, I wonder what wimmer means when he talks about 17:01 < penberg> if an *use position* _should_ or _must_ have an register 17:02 < penberg> AFAICT, we _always_ require a register at any use position 17:03 < penberg> ok 17:03 < penberg> Our resolution algorithm is similar to the 17:03 < penberg> one in [17]. 17:03 < penberg> where [17] is this: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.34.8435 17:04 < penberg> ahuillet, t_grabiec: check out Section 2.4 ("Resolution") of that paper? 17:04 < penberg> maybe it sheds more light on the subject. 17:12 < penberg> t_grabiec: to be honest 17:13 < penberg> I'm not sure I understand the mapping thing in resolve_data_flow() 17:13 < penberg> oh, actually I do. 17:13 < penberg> thanks! 17:13 < penberg> ;) 17:13 < t_grabiec> penberg: If I get the author of [17] correctly, we resolve conflicts at CFG _edges_ 17:14 < penberg> yes 17:14 < penberg> but don't we do that too? 17:15 < penberg> AFAICT we do 17:15 < t_grabiec> hmm 17:15 < t_grabiec> yes, we resolve conflicts between two basic blocks, but we put that code in common basic block 17:16 < penberg> so is there a conditional branch in the buggy case? 17:16 < penberg> the thing is 17:16 < penberg> _if_ we have multiple exits from a basic block 17:17 < penberg> we need to introduce multiple resolution insns 17:17 < penberg> don't we do that? 17:17 < t_grabiec> yup 17:17 < t_grabiec> we do that 17:17 < penberg> we don't? 17:17 < penberg> ok 17:17 < penberg> so what do you mean with this "common basic block" thing 17:17 < penberg> because obviously the _insns_ go into the _same_ bb 17:17 < penberg> even though they're for different edges 17:17 < penberg> hmm? 17:18 < penberg> wow 17:18 < t_grabiec> penberg: because the same register (say EDI) may be used in targets by different intervals. and we can not reload to edi 17:18 < penberg> perf is really cool! 17:19 < t_grabiec> the point is, we can not reload to comon values that will satisfy all targets, because in each target register allocation differs 17:20 < t_grabiec> penberg: did you read my test case? http://pastebin.com/d63f35f7e 17:20 < penberg> ahuillet: http://www.laughingpanda.org/~penberg/jato/allocate-registers-annotate 17:20 < penberg> t_grabiec: I don't think that's true. 17:21 < penberg> AFAICT, the point of the mapping is to make sure the _source_ edge does the right thing 17:21 < penberg> so a basic block can _expect_ a temporary to be in certain place (register or spill slot) 17:24 < penberg> ahuillet: we can't do much eh? 17:24 < t_grabiec> penberg: yes, from the vreg point of view, yes. but mapping does not care about _other_ vregs 17:24 < penberg> it's spending 15% of the time in insert_to_list... 17:24 < penberg> t_grabiec: what do you mean? 17:24 < t_grabiec> penberg: did you read my test case ? 17:25 < penberg> no sorry! 17:25 < penberg> let me have a look :) 17:25 < penberg> (I forgot) 17:26 < penberg> ooh 17:26 < penberg> I actually did 17:26 < penberg> but I only understood it now 17:26 < penberg> hmmmmmmmmm 17:26 < penberg> so mem -> register is bad. 17:26 < t_grabiec> yup 17:26 < penberg> I wonder how we're supposed to handle that 17:27 < penberg> I don't think this requires any changes to the allocator core 17:27 < penberg> we just need to move the resolution insn to a different place 17:27 < penberg> but where? 17:27 < t_grabiec> we can't move it to target bb 17:28 < penberg> well 17:28 < penberg> # 17:28 < penberg> [main] [ 14 ] 17: cmp_imm_reg $0x0, r24 17:28 < penberg> # 17:28 < penberg> [main] [ 14 ] 18: jne_branch bb 0x86fb0e8 17:28 < penberg> .. 17:28 < penberg> so 17:28 < penberg> what we *should* do there is 17:28 < penberg> cmp 17:28 < penberg> je bb ... 17:28 < penberg> spill 17:29 < penberg> jmp bb 0x86fb0e8 17:29 < penberg> t_grabiec: makes sense? 17:29 < penberg> but I don't see that spill in the assembly 17:29 < penberg> where is it? 17:29 < penberg> sorry mem -> reg move 17:30 < penberg> or is it this: 17:30 < penberg> # 17:30 < penberg> [main] [ 17 ] 16: mov_reg_reg r16, r33 ----- vreg33 must is marked need-spill, interval ends at pos 20 17:30 < t_grabiec> penberg: this is LIR, spill/realod insns are not visiable 17:30 < penberg> t_grabiec: right. 17:31 < penberg> we should probably dump the lir after register alloc + spills too :) 17:31 < penberg> t_grabiec: so what do you think of my idea? 17:31 < t_grabiec> I don't understand your idea 17:31 < t_grabiec> can you explain ? 17:32 < penberg> so what we now do is 17:32 < penberg> cmp + spill + jne 17:32 < penberg> cmp + spill + jne (target bb) + fall-through to rest of the bb 17:32 < penberg> so what we want to do there is 17:33 < penberg> cmp + _je_ (rest of bb) + spill + _jmp_ (target bb) 17:33 < penberg> so instead of jne, we change it the inverse je 17:33 < penberg> and jump to the rest of the basic block 17:33 < penberg> (which we would fall-through to for jne) 17:33 < penberg> then we spill and unconditionally _jmp_ to the target bb. 17:34 < t_grabiec> penberg: what if there is more than 2 successors ? 17:34 < t_grabiec> tableswitch ? 17:35 < penberg> one word: ouch 17:35 < t_grabiec> penberg: what you propose is actually per cfg-edge resolution, but for one edge it is optimized to nothing because we don't need to resolve 17:35 < penberg> so I guess the only option here is 17:35 < penberg> if we need a mem -> register resolution insn 17:35 < penberg> is to introduce a new basic block 17:35 < penberg> that has the store + jmp 17:36 < penberg> and jne to that basic block instea.d 17:36 < t_grabiec> penberg: what you talk about is per cfg-edge resolution 17:36 < penberg> yes. 17:36 < penberg> well it's obvious that we need it! 17:36 < penberg> but note: there's no _splitting_ involved here. 17:36 < penberg> the generated basic block would be special 17:36 < t_grabiec> I read that some JVM does it by emiting the resolution blocks, and simply connecting branch target to appropriate resolution blocks 17:37 < penberg> ok, so it's the same thing as I suggested? 17:37 < penberg> did ahuillet object to it? 17:37 < t_grabiec> ahuillet wrote hardly anything about it 17:37 < penberg> ok, that sounds like a straight-forward fix to me 17:37 < penberg> would be nice if you found the paper you saw that in 17:37 < penberg> but anyway 17:37 < penberg> I guess that's the only option here 17:38 < penberg> so that the performance penalty is only for those edges that require resolution 17:38 < penberg> reg -> mem resolution is fine without anything special 17:38 < penberg> but mem -> reg needs resolution blocks 17:38 < penberg> t_grabiec: so is this what you've been suggesting all along? 17:38 < t_grabiec> yup 17:39 < penberg> aha 17:39 < penberg> and you just wasted 1,5 hours of your time on explaining it to me :) 17:39 < penberg> thnx! 17:39 < penberg> I bet you would have been able to write a patch in that time instead :) 17:39 < t_grabiec> well, that's the team work :) 17:40 < t_grabiec> I would have to explain it to you _anyway_ 17:40 < t_grabiec> or you wouldn't merge it 17:40 < penberg> true :) 17:40 < penberg> you're getting too smart with our merge process! 17:40 < ahuillet> I'll read what I'm back home 17:41 < penberg> ahuillet: yeah, tomek figured it out 17:41 < penberg> the way we do mem -> reg resolution is wrong 17:41 < penberg> the generated _insns_ 18:21 < ahuillet> I had a 150 tons train 18:21 < ahuillet> all for myself 18:21 < ahuillet> I think there were two people on it 18:21 < ahuillet> and this one ran on diesel 18:21 < ahuillet> talk about not taking the car. 18:23 < penberg_home> :) 18:23 < ahuillet> http://fr.wikipedia.org/wiki/Z_24500 18:23 < ahuillet> that's the train I was in 18:24 < ahuillet> 130tons apparently 18:24 < penberg_home> crazy 18:24 < ahuillet> 1600kW 18:24 < ahuillet> my car is 47kW for 1.3 ton 18:35 -!- t_grabiec [n=tomekg@aemx207.neoplus.adsl.tpnet.pl] has quit [Read error: 60 (Operation timed out)] 18:36 < penberg_home> ahuillet: I *think* tomek is working on a fix for the regalloc bug 18:36 < penberg_home> to fix the mem -> reg resolution not to clobber registers that are already in use. 18:37 < ahuillet> yeah, I'm sorry I screwed up 18:37 < penberg_home> sorry? no need to be! 18:37 < penberg_home> bugs happen. 18:37 < penberg_home> and I'm pretty happy we get to share some of the pain with other devs too :) 18:38 < ahuillet> hey, tgrabiec suffers in order for me not to, I'm happy! 18:38 < penberg_home> :) 18:41 < penberg_home> ahuillet: and oh, I hope you saw the email on register allocator performance 18:41 < penberg_home> we can probably optimize it quite a bit 18:42 < penberg_home> 15% of _whole_ execution time is spent in insert_to_list() 18:48 -!- t_grabiec [n=tomekg@abro203.neoplus.adsl.tpnet.pl] has joined #jato 18:52 -!- t_grabiec [n=tomekg@abro203.neoplus.adsl.tpnet.pl] has quit [Client Quit] 18:57 -!- tgrabiec [n=tomekg@abro203.neoplus.adsl.tpnet.pl] has joined #jato 19:03 < tgrabiec> btw, we save/restore only lower 32-bit of XMM registers (movss) in prolog/epilog. I think we should use movsd so that doubles are also preserved. 19:05 < ahuillet> yeah, that's a funny thing too 19:05 < ahuillet> what MOVs do we emit for spill/restore of FPU registers 19:06 < tgrabiec> hmm, I think we should assume the worst, so if no emulation => movsd 19:09 < ahuillet> movsd ? does it exist in SSE1 ? :) 19:12 < tgrabiec> no, that's why I wrote "if no emulation", if cpu doesn't have SSE2 we emulate doubles, and use movss to spill/reload 19:13 < ahuillet> sounds good 20:50 < penberg> Total Estimated Cost to Develop = $ 1,073,554 20:50 < penberg> haha 20:50 < penberg> I love software engineering cost models :-) 20:51 < ahuillet> :] 20:51 < penberg> ...or we'll all be filthy rich! 20:51 < penberg> Total Physical Source Lines of Code (SLOC) = 33,345 21:05 < vegard> hi. 21:06 < penberg> hi 21:09 < vegard> hm? 21:09 < vegard> now you can also figure out how many bugs we're supposed to have ;) 21:09 < penberg> How can I do that? 21:11 < penberg> hmm 21:11 < penberg> throw_from_native(sizeof(name) + sizeof(initialize) + sizeof(loader)); 21:11 < penberg> can't we figure out the frame size automatically? 21:11 < vegard> average number of bugs per line of code 21:11 < penberg> vegard: which average should we use? 21:11 < vegard> btw, mobile broadband is not too good for ssh typing speed. 21:13 < penberg> The average defect rate of the open source applications was 0.434 bugs per 1000 lines of code. This compares with an average defect rate of 20 to 30 bugs per 1000 lines of code for commercial software, according to Carnegie Mellon University's CyLab Sustainable Computing Consortium. 21:13 < vegard> so we should have 14.47 bugs, then? 21:13 < penberg> irb(main):003:0> 33345.0/100.0*0.434 21:13 < penberg> => 144.7173 21:14 < penberg> ouch 21:14 < penberg> yes! 21:14 < penberg> I typo'd 1000 ;) 21:14 < vegard> ;)' 21:14 < penberg> ok, that's not too bad, is it? 21:14 < vegard> not line we can improve it much in any way except remove code... ;) 21:14 < vegard> *like 21:15 < vegard> if we want to keep using the same formula. 21:15 < penberg> :-) 21:15 < penberg> so how much do we need to remove to get to below one? 21:16 < penberg> irb(main):011:0> 2300.0/1000.0*0.434 21:16 < penberg> => 0.9982 21:16 < penberg> seems reasonable. 21:16 < vegard> that's now how much we'd have to remove 21:16 < vegard> not* 21:17 < vegard> we'd have to remove 31041 ;) 21:17 < penberg> yes! 21:17 < penberg> I was calculating the target. 21:17 < vegard> anyway, this shows how silly those numbers are if you misuse them correctly :-) 21:18 < penberg> so what the hell is this cleanup_args() doing anyway 21:18 < tgrabiec> penberg: it removes them from stack 21:19 < tgrabiec> so we don't have stack leaks on throw() 21:19 < penberg> yes but 21:19 < penberg> why don't we just "leave" and addl , %esp ? 21:20 < tgrabiec> good question 21:20 < penberg> :-) 21:20 < penberg> and btw 21:20 < penberg> the if (args_size) seem buggy 21:21 < penberg> we should leave anyway 21:21 < tgrabiec> why is it buggy ? 21:21 < penberg> we should "leave" shouldn't we? 21:22 < penberg> so we're leaking stack there, aren't we? 21:22 < tgrabiec> well, we do not return on throw_from_native(), so current code is not buggy in my opinion 21:22 < tgrabiec> if we would leave, then yes 21:22 < tgrabiec> the point is, that currently throw_from_native() returns 21:23 < tgrabiec> ...but I forgot why 21:23 < penberg> hmmh 21:23 < tgrabiec> we can try to make it noreturn 21:23 < penberg> so how is this supposed to work? 21:23 < tgrabiec> what, cleanup_args ? 21:24 < penberg> throw_from_native() 21:24 < vegard> what does JNI do? 21:24 < vegard> for throwing exceptions? 21:24 < penberg> tgrabiec: we do __cleanup_args() at the very end in throw_from_native() 21:24 < tgrabiec> penberg: _currently_, it removes the arguments from stack frame - this involves moving local variables and modifying esp 21:24 < penberg> so we must "leave" even if args_count is zero. 21:25 < penberg> aha 21:25 < penberg> why do we move locals? 21:25 < tgrabiec> penberg: no, throw_from_native() does not exit the function, so the normal code generated by compiler does leace and ret 21:25 < tgrabiec> oooooooh I know why! 21:26 < tgrabiec> penberg: restoring caller saved registers - that's why I didn't make it exit the function 21:26 < penberg> hmmh 21:26 < tgrabiec> code generated by gcc restores them 21:27 < tgrabiec> vegard: "what does JNI do?" - JNI allows calling native functions (from external libraries) from java code and vice versa 21:27 < tgrabiec> penberg: so we must preserve local variables cause that's where are caller saved registers restored from 21:27 < tgrabiec> right? 21:28 < vegard> no, what does it do for throwing exceptions ;) 21:28 < penberg> tgrabiec: oh, sure 21:28 < penberg> I'm just thinking if there's a simpler way to do this. 21:28 < penberg> I'm wondering why we need throw_from_native() at all 21:28 < penberg> just let native code return to jit 21:28 < tgrabiec> penberg: if you're ok with exception test after VM native call, then we can get rid of it 21:28 < penberg> don't we have an "check for exceptions trap" there? 21:29 < penberg> tgrabiec: I am 21:29 < penberg> Did I object to it earlier? 21:29 < penberg> the thing is, the whole throw_from_native() function sticks out like a sore thumb ;) 21:29 < tgrabiec> well, we need throw_from_native anyway 21:29 < penberg> tgrabiec: we do? 21:29 < tgrabiec> penberg: for emulate_div 21:30 < penberg> we can add a poll after the call to emulate_div(), can't we? 21:30 < tgrabiec> yes, but it adds overhead, while throw_from_native() is the fastes possible option 21:30 < tgrabiec> well, it's your call 21:30 < penberg> tgrabiec: fast, how? 21:30 < penberg> throw_from_native() is not exactly small 21:31 < penberg> tgrabiec: well, I want to kill throw_from_native() 21:31 < penberg> :) 21:31 < tgrabiec> but it will be always faster than signal handler 21:31 < tgrabiec> because it does not go through kernel 21:31 < penberg> are you talking about the _exception_ case? 21:31 < tgrabiec> yup 21:31 < tgrabiec> throw_from_native() is executed only when there is exception 21:32 < penberg> oh, who cares if ArithmeticException is slow?-) 21:32 < tgrabiec> and we always must do the check in emulate_ 21:32 < penberg> tgrabiec: yes, but it will increase instruction cache footprint! 21:32 < penberg> that's the point here. 21:32 < penberg> and btw 21:32 < tgrabiec> penberg: ok, if you're ok with that, go on :) 21:32 < penberg> why don't we implement arithmetic exceptions with traps? 21:32 < tgrabiec> you have my blessing 21:33 < penberg> surely we'll get a "divide by zero" in userspace too? 21:33 < penberg> tgrabiec: hahah :-) 21:33 < tgrabiec> but wy can't figure out we're in emulate_ 21:33 < tgrabiec> *but we 21:33 < penberg> ok 21:33 < penberg> I was thinking of proper 64-bit division 21:33 < tgrabiec> actually not in emulate_ but some gcc function 21:33 < penberg> so the emulate_div() thing can be dog slow 21:33 < ahuillet> penberg: btw we talked about LWIW in IA64 at Bull today 21:34 < penberg> ahuillet: yeah? did you learn anything interesting? 21:34 < ahuillet> my mentor told me it was actually quite interesting because icc supported it quite well 21:34 < tgrabiec> vegard: JNI functions can call ThrowNew, which works as signal_new_exception(), exceptions from JNI are handled at JIT <-> JNI boundary 21:34 < ahuillet> the problem was the great variability of C -> assembly mapping 21:34 < penberg> tgrabiec: what's the poll insn to insert after emulate call in insn selector? 21:34 < ahuillet> which made it difficult to anticipate the results of some code you write 21:34 < penberg> oh? 21:35 < tgrabiec> penberg: select_exception_test(s, tree); 21:35 < penberg> k 21:35 < ahuillet> however the "bundle" thing as he calls it is very good for performance 21:35 < penberg> tgrabiec: lets see if I can write a patch :-) 21:35 < ahuillet> so for example you can do one multiply and one add (both floating point) in one instruction 21:35 < penberg> ahuillet: yeah? 21:35 < ahuillet> and then another 21:36 < penberg> oh, sure. 21:36 < ahuillet> and in two cycles you've done two multiplies and two adds 21:36 < penberg> but remember, x86 chips do that dynamically 21:36 < ahuillet> and this is significantly more useful than e.g. SSE 21:36 < ahuillet> because you don't do 4 additions in sequence that often 21:36 < ahuillet> penberg: apparently not so well 21:37 < penberg> ahuillet: what makes you say that? x86 beats ia64 pretty much everywhere :) 21:37 < ahuillet> penberg: well that's what my mentor told me 21:37 < ahuillet> penberg: that's because you're not using icc 21:38 < ahuillet> he also said IA64 was the most complicated CPU ever made 21:38 < ahuillet> and it sucked a lot for a lot of reasons :) 21:39 < ahuillet> so I guess bull is happier with x86-64 21:39 < penberg> :) 21:39 < penberg> the "use icc" argument sounds bit fishy, though 21:40 < penberg> as the problem is that the compiler needs to do the bundling _statically_ 21:40 < penberg> and you can always do a better job dynamically AFAICT. 21:40 < ahuillet> penberg: that's not so true actually 21:40 < penberg> but I don't have any experience with icc or ia-64 21:40 < penberg> ahuillet: yeah? 21:40 < ahuillet> the dependencies can be statically predicted not too badly 21:40 < ahuillet> problem is gcc doesn't even *try* to bundle instructions 21:40 < ahuillet> while intel worked a lot on it because they had to push IA64 to replace x86 21:42 < penberg> ahuillet: I don't think that's true. AFAIK, gcc *does* do bundling, it's just does a pretty horrible job with it. 21:42 < ahuillet> yeah, I guess it depends on which version you pick 21:42 < ahuillet> anyway bundling isn't too bad of an idea if you have the compiler writers with you 21:43 < penberg> hmm, well, I guess we should ask bull for an ia-64 machine so you can try to port jato on one of them ;) 21:43 < ahuillet> I'll check if they have any running I have access to 21:43 < ahuillet> but I don't think so 21:43 < ahuillet> it's x86-64 all the way now. 21:44 < penberg> so maybe they have some old box laying around? 21:44 < penberg> :) 21:45 < ahuillet> oh, well, if the y 21:45 < ahuillet> *they have it's a blade 21:45 < penberg> tgrabiec: http://github.com/penberg/jato/commit/dba869379daad3d668113a82dadc4c7d3692826c 21:45 < ahuillet> which might be difficult to plug 21:45 < penberg> tgrabiec: it's a start anyway 21:45 < penberg> ahuillet: it almost sounds like you don't want one :-) 21:46 < penberg> they should give you a x86-64 machine too! 21:46 < ahuillet> I doubt I can get any hardware out :/ 21:46 < penberg> Why wouldn't they want Jato optimized for their hardware?-) 21:46 < ahuillet> you know many high performance computing applications for Java? :) 21:47 < penberg> nope 21:47 < penberg> because java sucks pretty badly for those class of apps 21:47 < penberg> But I'm talking about Jato now! 21:48 < penberg> JFortran 21:48 < penberg> heh heh heh 21:48 < ahuillet> :S 21:50 < ahuillet> http://whitepapers.techrepublic.com.com/abstract.aspx?docid=391266 21:50 < ahuillet> so is that a big joke or what? 21:50 < penberg> No idea, requires registration :-) 21:51 < penberg> ok found it 21:51 < penberg> http://www.ukhec.ac.uk/publications/tw/hpcjava.pdf 21:51 < ahuillet> as far as I'm concerned it's just a joke 21:52 < penberg> well the "benefits" part smells like bs to me :-) 21:53 < penberg> ahuillet: there's references to some interesting benchmarks though 21:59 < penberg> tgrabiec: http://github.com/penberg/jato/commit/d94604fbb0192801ce209ef87d286cee178a0a04 22:00 < penberg> tgrabiec: I wonder if we should call __divdi3() directly for 64-bit div 22:01 < tgrabiec> but you also need to put some cmp there 22:01 < penberg> div by zero case? 22:01 < tgrabiec> yes 22:02 < penberg> oh sure 22:02 < penberg> we need trap_if_zero() :-) 22:02 < tgrabiec> heh 22:02 < penberg> tgrabiec: so how about the throw_from_native() calls in vm/jato.c? 22:02 < tgrabiec> I'm ok with removal 22:02 < penberg> sure 22:02 < penberg> but how :-) 22:03 < penberg> where do we put the exception test? 22:03 < tgrabiec> in invoke() and invokevirtual() and EXPR_INVOKEINTERFACE 22:03 < tgrabiec> if (vm_method_is_native(method)) select_exception_test() 22:03 < tgrabiec> you can merge this with the one for JNI case 22:04 < penberg> mhmh 22:05 < penberg> tgrabiec: too complicated for 23:02 :-) 22:05 < penberg> I'll get something to eat instead and have a glass of wine :-) 22:05 < tgrabiec> oh, sure :) 22:06 < penberg> so after we remove throw_from_native() 22:06 < penberg> I guess we could rename signal_exception() to throw_from_native() :-) 22:07 < tgrabiec> really ? 22:07 < penberg_home> yeah! 22:08 < penberg_home> I always hated the name signal_exception() :-) 22:08 < tgrabiec> you're kidding me 22:08 < penberg_home> nope 22:08 < tgrabiec> why ? 22:08 < penberg_home> it's easily confused with signal handling 22:08 < tgrabiec> but "signal" more accurately describes what the function does 22:09 < tgrabiec> OTOH, one might confuse throw_from_native() with the C++ throw 22:10 < ahuillet> no no no 22:10 < ahuillet> C++ doesn't exist 22:10 < penberg_home> well 22:10 < penberg_home> I can live with the current name too 22:11 < tgrabiec> I'll think about it 22:11 < tgrabiec> but you're the boss ;) 22:14 < penberg_home> ;) 22:15 < tgrabiec> throw(), throw_new() 22:24 -!- penberg [n=penberg@cs146249.pp.htv.fi] has quit [Read error: 110 (Connection timed out)] 22:24 < penberg_home> maybe 22:24 < penberg_home> in any case 22:24 < penberg_home> have a nice night 22:24 < penberg_home> I'm off! later. 23:24 < tgrabiec> I have made the patch to put data flow resolution reloading operations (mem -> reg) to per-edge blocks and it seem to work 23:24 < tgrabiec> will send it for rewiev tomorrow 23:24 -!- tgrabiec [n=tomekg@abro203.neoplus.adsl.tpnet.pl] has quit ["Leaving"] --- Log closed Sat Aug 08 00:00:22 2009