hphp/runtime/vm/jit/irgen-inlining.cpp (401 lines of code) (raw):

/* +----------------------------------------------------------------------+ | HipHop for PHP | +----------------------------------------------------------------------+ | Copyright (c) 2010-present Facebook, Inc. (http://www.facebook.com) | +----------------------------------------------------------------------+ | This source file is subject to version 3.01 of the PHP license, | | that is bundled with this package in the file LICENSE, and is | | available through the world-wide-web at the following url: | | http://www.php.net/license/3_01.txt | | If you did not receive a copy of the PHP license and are unable to | | obtain it through the world-wide-web, please send a note to | | license@php.net so we can mail you a copy immediately. | +----------------------------------------------------------------------+ */ /* IR function inliner Inlining functions at the IR level can be complex, particularly when dealing with async callees. All inlined regions are setup the same way, though later optimization passes may attempt to elide elements of this inititalization, particularly the DefinlineFP. Below is an annotated version of this setup. StStk ... # Stores initiated for parameters as if this was an ordinary call. # # This is where this module starts to emit instructions, the stores and spills # are generated by passing arguments and emitting FCall* instructions, prior # to having made a decision about inlining. # LdStk ... # The inliner first loads all of the arguments off of the stack, # effectively popping them. These loads and the preceding # stores are generally elided by load and store elimination. BeginInlining # Sets up memory effects for the inlined frame and returns a new # fp that can be used for stores into the frame. StLoc ... # At this point the arguments are stored back to the frame # locations for their corresponding locals in the callee. N.B. # that in memory these are the same addresses as the stack # locations above. Once the inlined function returns, the callee must be torn down and the return value moved into the caller. The sequence for this is annotated below. tx := LdLoc ... # Locals are decref'ed inline. A GenericRetDecRefs is never DecRef tx # emitted for inlined regions, in the hopes that IncRefs from ... # pushing arguments will cancel DecRefs from the return. ty := LdFrameThis # The context is DecRef'ed DecRef ty EndInlining # This instruction marks the end of the inlined region, beyond # this point the frame should no longer considered to be live. # The locals and context have been explicitly decrefed. Side exits from inlined code must publish the frames using the InlineCall instruction. InlineCall # This instruction links the new frame to the old frame via the # m_sfp field and loads the new frame into rvmfp(). It can be # thought of as doing the work of a Call. <exit> # An instruction with ExitEffects, such as ReqBindJmp or # EndCatch. We also support inlining of async functions. Returning from an inlined region is handled differently depending on whether or not the caller entered the region using FCall with an async eager offset. If the offset was specified, the execution continues at that offset. Otherwise the result is wrapped in a StaticWaitHandle and the execution continues at the next opcode. Async functions may also exit by suspending. The suspend block is annotated below: tCallee := DefLabel # The callee region may suspend from multiple locations, # but after constructing the waithandle they must all # branch to a common suspend block which will continue # execution of the caller at the opcode following FCall, # usually an Await. EndInlining # The return sequence looks the same as a regular call but # rather than killing the frame it has been teleported to # the heap. If a FCall with an async eager offset was followed by an Await, this HHIR sequence follows the InlineSuspend: tState := LdWhState # Check the state of the waithandle representing the JmpZero succeeded # suspended callee. It may have finished in meanwhile due EqInt 1 # to ability of surprise hooks to execute arbitrary Hack JmpNZero failed # code, so side-exit if that's the case. tCaller := CreateAFWH # An AFWH is constructed for the caller, as Await must # suspend if waithandle is not finished. CheckSurpriseFlags # The caller suspend hook is entered if requested. RetCtrl # Control is returned, suspending the caller. In the case # of nested inlined FCall instructions this return would # simply be a branch to the suspend block for the next # most nested callee. Calls to async functions without async eager offset use an analogous suspend block which side-exits rather than continuing at the next opcode. This is done so that the fast path through the callee will return only StaticWaitHandles which may be elided by the DCE and simplifier passes. Inlined regions maintain the following invariants, which later optimization passes may depend on (most notably DCE and partial-DCE): - Every callee region must contain a single BeginInlining - BeginInlining must dominate every instruction within the callee. - Excluding side-exits and early returns, EndInlining must post-dominate every instruction in the callee. - The callee must contain a return or await. When a callee contains awaits, these will be implemented as either an await of the nested callee (in the case of FCall) or a return from the callee and side-exit from the caller, unless the callee does not contain a return (e.g. the caller was profiled as always suspending), in which case the callee will return the waithandle to the caller rather than side-exiting in the case of a FCall without async eager offset. Below is an unmaintainable diagram of a pair of an async function inlined into a pair of callers with and without (*) async eager offset respectively, Outer: | Inner: ... | ... FCall "Inner" aeo1 or FCall "Inner" | FCall "Other" aeo2 Await | Await aeo1: | aeo2: ... | ... ... | v +-------------------+ | ... | | BeginInlining | | StLoc | | ... | +-------------------+ | v +-------------------+ | ... | | tx = Call "Other" | | ty = LdTVAux tx | | JmpNZero ty | -> +---------------------+ +-------------------+ | ta = CreateAWFH tx | | | SuspendHook | v | StStk ta | +-------------------+ | Jmp suspendTarget |--- or (*) ----+ | StStk tx | +---------------------+ | | ... // aeo2 | | v | Jmp returnTarget | v +---------------------+ +-------------------+ +---------------------+ | tb = LdStk | | | tb = LdStk | | EndInlining | v | InlineSuspend | | StStk tb | +-------------------+ | StStk tb | +---------------------+ | DecRef Locals | | tc = LdStk | | | DecRef This | | te = LdWhState tc | v | tr = LdStk | | JmpZero te |--------->*Side Exit* | EndInlining | +---------------------+ | StStk tr | | | CreateSSWH (*) | | +-------------------+ v | +---------------------+ | | td = CreateAFWH tc | v | SuspendHook | ... | RetCtrl td | +---------------------+ */ #include "hphp/runtime/vm/jit/irgen-inlining.h" #include "hphp/runtime/vm/jit/analysis.h" #include "hphp/runtime/vm/jit/mutation.h" #include "hphp/runtime/vm/jit/irgen.h" #include "hphp/runtime/vm/jit/irgen-call.h" #include "hphp/runtime/vm/jit/irgen-control.h" #include "hphp/runtime/vm/jit/irgen-create.h" #include "hphp/runtime/vm/jit/irgen-exit.h" #include "hphp/runtime/vm/jit/irgen-internal.h" #include "hphp/runtime/vm/jit/irgen-func-prologue.h" #include "hphp/runtime/vm/jit/irgen-sprop-global.h" #include "hphp/runtime/vm/hhbc-codec.h" #include "hphp/runtime/vm/resumable.h" #include "hphp/runtime/vm/unwind.h" namespace HPHP::jit::irgen { bool isInlining(const IRGS& env) { return env.inlineState.depth > 0; } uint16_t inlineDepth(const IRGS& env) { return env.inlineState.depth; } void beginInlining(IRGS& env, const Func* target, const FCallArgs& fca, SSATmp* ctx, bool dynamicCall, Op writeArOpc, SrcKey startSk, Offset callBcOffset, InlineReturnTarget returnTarget, int cost) { assertx(callBcOffset >= 0 && "callBcOffset before beginning of caller"); // curFunc is null when called from conjureBeginInlining assertx((!curFunc(env) || callBcOffset < curFunc(env)->bclen()) && "callBcOffset past end of caller"); assertx(fca.numArgs >= target->numRequiredParams()); assertx(fca.numArgs <= target->numNonVariadicParams()); assertx(!fca.hasUnpack() || fca.numArgs == target->numNonVariadicParams()); assertx(!fca.hasUnpack() || target->hasVariadicCaptureParam()); FTRACE(1, "[[[ begin inlining: {}\n", target->fullName()->data()); auto const numArgsInclUnpack = fca.numArgs + (fca.hasUnpack() ? 1U : 0U); // For cost calculation, use the most permissive coeffect auto const coeffects = curFunc(env) ? curCoeffects(env) : cns(env, RuntimeCoeffects::none().value()); auto const prologueFlags = cns(env, PrologueFlags( fca.hasGenerics(), dynamicCall, false, // inline return stub doesn't support async eager return 0, // call offset unused by the logic below 0, RuntimeCoeffects::none() // coeffects may not be known statically ).value()); // Callee checks and input initialization. emitCalleeGenericsChecks(env, target, prologueFlags, fca.hasGenerics()); emitCalleeDynamicCallChecks(env, target, prologueFlags); emitCalleeCoeffectChecks(env, target, prologueFlags, coeffects, fca.skipCoeffectsCheck(), numArgsInclUnpack, ctx); emitCalleeRecordFuncCoverage(env, target); emitInitFuncInputs(env, target, numArgsInclUnpack); ctx = [&] () -> SSATmp* { if (target->isClosureBody()) { return gen(env, AssertType, Type::ExactObj(target->implCls()), ctx); } if (!target->cls()) { assertx(ctx->isA(TNullptr)); return ctx; } assertx(!ctx->type().maybe(TNullptr)); if (ctx->isA(TBottom)) return ctx; if (target->isStatic()) { assertx(ctx->isA(TCls)); if (ctx->hasConstVal(TCls)) { return ctx; } auto const ty = ctx->type() & Type::SubCls(target->cls()); if (ctx->type() <= ty) return ctx; return gen(env, AssertType, ty, ctx); } assertx(ctx->type().maybe(TObj)); auto const ty = ctx->type() & thisTypeFromFunc(target); if (ctx->type() <= ty) return ctx; return gen(env, AssertType, ty, ctx); }(); auto const numTotalInputs = target->numParams() + (target->hasReifiedGenerics() ? 1U : 0U) + (target->hasCoeffectsLocal() ? 1U : 0U); jit::vector<SSATmp*> inputs{numTotalInputs}; for (auto i = 0; i < numTotalInputs; ++i) { inputs[numTotalInputs - i - 1] = popCU(env); } updateMarker(env); // NB: Now that we've popped the callee's arguments off the stack // and thus modified the caller's frame state, we're committed to // inlining. If we bail out from now on, the caller's frame state // will be as if the arguments don't exist on the stack (even though // they do). // The top of the stack now points to the space for ActRec. IRSPRelOffset calleeAROff = spOffBCFromIRSP(env); IRSPRelOffset calleeSBOff = calleeAROff - target->numSlotsInFrame(); auto const calleeFP = gen( env, BeginInlining, BeginInliningData{ calleeAROff, target, static_cast<uint32_t>(env.inlineState.depth + 1), nextSrcKey(env), calleeSBOff, spOffBCFromStackBase(env) - kNumActRecCells + 1, cost, int(numArgsInclUnpack) }, sp(env) ); assertx( startSk.funcEntry() && startSk.func() == target && startSk.entryOffset() == target->getEntryForNumArgs(numArgsInclUnpack) ); env.inlineState.depth++; env.inlineState.costStack.emplace_back(env.inlineState.cost); env.inlineState.returnTarget.emplace_back(returnTarget); env.inlineState.bcStateStack.emplace_back(env.bcState); env.inlineState.cost += cost; env.inlineState.stackDepth += target->maxStackCells(); env.bcState = startSk; // We have entered a new frame. updateMarker(env); env.irb->exceptionStackBoundary(); // Inline return stub doesn't support async eager return. StFrameMetaData meta; meta.callBCOff = callBcOffset; meta.isInlined = true; meta.asyncEagerReturn = false; gen(env, StFrameMeta, meta, calleeFP); gen(env, StFrameFunc, FuncData { target }, calleeFP); if (!(ctx->type() <= TNullptr)) gen(env, StFrameCtx, fp(env), ctx); for (auto i = 0; i < numTotalInputs; ++i) { stLocRaw(env, i, calleeFP, inputs[i]); } assertx(startSk.hasThis() == startSk.func()->hasThisInBody()); assertx( ctx->isA(TBottom) || (ctx->isA(TNullptr) && !target->cls()) || (ctx->isA(TCls) && target->cls() && target->isStatic()) || (ctx->isA(TObj) && target->cls() && !target->isStatic()) || (ctx->isA(TObj) && target->isClosureBody()) ); } void conjureBeginInlining(IRGS& env, const Func* func, SrcKey startSk, Type thisType, const std::vector<Type>& args, InlineReturnTarget returnTarget) { auto conjure = [&](Type t) { return t.admitsSingleVal() ? cns(env, t) : gen(env, Conjure, t); }; always_assert(isFCall(env.context.initSrcKey.op())); always_assert(thisType != TBottom); auto const hasUnpack = args.size() > func->numNonVariadicParams(); auto const numParams = safe_cast<uint32_t>(args.size()) - (hasUnpack ? 1 : 0); // Push space for out parameters for (auto i = 0; i < func->numInOutParams(); i++) { push(env, cns(env, TUninit)); } allocActRec(env); for (auto const argType : args) { push(env, conjure(argType)); } if (func->hasReifiedGenerics()) { push(env, conjure(TVec)); } // beginInlining() assumes synced state. updateMarker(env); env.irb->exceptionStackBoundary(); auto flags = FCallArgsFlags::FCANone; if (hasUnpack) flags = flags | FCallArgsFlags::HasUnpack; if (func->hasReifiedGenerics()) flags = flags | FCallArgsFlags::HasGenerics; // thisType is the context type inside the closure, but beginInlining()'s ctx // is a context given to the prologue. auto const ctx = func->isClosureBody() ? conjure(Type::ExactObj(func->implCls())) : conjure(thisType); beginInlining( env, func, FCallArgs(flags, numParams, 1, nullptr, nullptr, kInvalidOffset, nullptr), ctx, false, env.context.initSrcKey.op(), startSk, 0 /* callBcOffset */, returnTarget, 9 /* cost */ ); } namespace { struct InlineFrame { SrcKey bcState; InlineReturnTarget target; }; InlineFrame popInlineFrame(IRGS& env) { always_assert(env.inlineState.depth > 0); always_assert(env.inlineState.returnTarget.size() > 0); always_assert(env.inlineState.bcStateStack.size() > 0); InlineFrame inlineFrame { env.bcState, env.inlineState.returnTarget.back() }; // Pop the inlined frame in our IRGS. Be careful between here and the // updateMarker() below, where the caller state isn't entirely set up. env.inlineState.depth--; env.inlineState.cost = env.inlineState.costStack.back(); env.inlineState.costStack.pop_back(); env.inlineState.stackDepth -= inlineFrame.bcState.func()->maxStackCells(); env.inlineState.returnTarget.pop_back(); env.bcState = env.inlineState.bcStateStack.back(); env.inlineState.bcStateStack.pop_back(); updateMarker(env); return inlineFrame; } void pushInlineFrame(IRGS& env, const InlineFrame& inlineFrame) { // No need to preserve and update cost, as we are not going to recursively // inline anything during a single Jmp opcode we are restoring the state for. env.inlineState.depth++; env.inlineState.costStack.emplace_back(env.inlineState.cost); env.inlineState.returnTarget.emplace_back(inlineFrame.target); env.inlineState.bcStateStack.emplace_back(env.bcState); env.inlineState.stackDepth += inlineFrame.bcState.func()->maxStackCells(); env.bcState = inlineFrame.bcState; updateMarker(env); } InlineFrame implInlineReturn(IRGS& env) { assertx(resumeMode(env) == ResumeMode::None); // Return to the caller function. gen(env, EndInlining, fp(env)); return popInlineFrame(env); } void freeInlinedFrameLocals(IRGS& env, const RegionDesc& calleeRegion) { // The IR instructions should be associated with one of the return bytecodes; // all predecessors of this block should be valid options. Choose the hottest // predecessor in order to get the most-likely DecRefProfiles. auto const curBlock = env.irb->curBlock(); always_assert(curBlock && !curBlock->preds().empty()); auto best = curBlock->preds().front().inst(); for (auto const& pred : curBlock->preds()) { if (pred.inst()->block()->profCount() > best->block()->profCount()) { best = pred.inst(); } } auto const bcContext = best->bcctx(); env.bcState = bcContext.marker.sk(); env.irb->setCurMarker(bcContext.marker); // At this point, env.profTransIDs and env.region are already set with the // caller's information. We temporarily reset both of these with the callee's // information, so that the HHIR instructions emitted for the RetC have their // markers associated with the callee. This is necessary to successfully look // up any profile data associated with them. auto const callerProfTransIDs = env.profTransIDs; auto const callerRegion = env.region; SCOPE_EXIT{ env.profTransIDs = callerProfTransIDs; env.region = callerRegion; }; auto const& calleeTransIDs = bcContext.marker.profTransIDs(); env.profTransIDs = calleeTransIDs; env.region = &calleeRegion; updateMarker(env); env.irb->resetCurIROff(bcContext.iroff + 1); decRefLocalsInline(env); decRefThis(env); } void implReturnBlock(IRGS& env, const RegionDesc& calleeRegion) { auto const rt = env.inlineState.returnTarget.back(); auto const didStart = env.irb->startBlock(rt.callerTarget, false); always_assert(didStart); freeInlinedFrameLocals(env, calleeRegion); auto const callee = curFunc(env); auto const nret = callee->numInOutParams() + 1; if (nret > 1 && callee->isCPPBuiltin()) { auto const ret = pop(env, DataTypeGeneric); implInlineReturn(env); for (int32_t idx = 0; idx < nret - 1; ++idx) { auto const off = offsetFromIRSP(env, BCSPRelOffset{idx}); auto const type = callOutType(callee, idx); gen(env, AssertStk, type, IRSPRelOffsetData{off}, sp(env)); } push(env, gen(env, AssertType, callReturnType(callee), ret)); return; } jit::vector<SSATmp*> retVals{nret, nullptr}; for (auto& v : retVals) v = pop(env, DataTypeGeneric); implInlineReturn(env); // Pop the NullUninit values from the stack. for (uint32_t idx = 0; idx < nret - 1; ++idx) pop(env); if (!callee->isAsyncFunction()) { // Non-async function. Just push the result on the stack. if (nret > 1) { for (int32_t idx = nret - 2; idx >= 0; --idx) { auto const val = retVals[idx]; push(env, gen(env, AssertType, callOutType(callee, idx), val)); } } push(env, gen(env, AssertType, callReturnType(callee), retVals.back())); return; } assertx(nret == 1); auto const retVal = gen(env, AssertType, awaitedCallReturnType(callee), retVals.back()); if (rt.asyncEagerOffset == kInvalidOffset) { // Async eager return was not requested. Box the returned value and // continue execution at the next opcode. push(env, gen(env, CreateSSWH, retVal)); } else { // Async eager return was requested. Continue execution at the async eager // offset with the returned value. push(env, retVal); jmpImpl(env, bcOff(env) + rt.asyncEagerOffset); } } bool implSuspendBlock(IRGS& env, bool exitOnAwait) { auto const rt = env.inlineState.returnTarget.back(); // Start a new IR block to hold the remainder of this block. auto const didStart = env.irb->startBlock(rt.suspendTarget, false); if (!didStart) return false; // We strive to inline regions which will mostly eagerly terminate. if (exitOnAwait) hint(env, Block::Hint::Unlikely); assertx(curFunc(env)->isAsyncFunction()); auto block = rt.suspendTarget; auto const label = env.unit.defLabel(1, block, env.irb->nextBCContext()); auto const wh = label->dst(0); retypeDests(label, &env.unit); auto const inlineFrame = implInlineReturn(env); SCOPE_EXIT { if (exitOnAwait) pushInlineFrame(env, inlineFrame); }; push(env, wh); if (exitOnAwait) { if (rt.asyncEagerOffset == kInvalidOffset) { gen(env, Jmp, makeExit(env, nextSrcKey(env))); } else { jmpImpl(env, nextSrcKey(env)); } } return true; } void implEndCatchBlock(IRGS& env, const RegionDesc& calleeRegion) { auto const rt = env.inlineState.returnTarget.back(); auto const didStart = env.irb->startBlock(rt.endCatchTarget, false); if (!didStart) return; hint(env, Block::Hint::Unused); freeInlinedFrameLocals(env, calleeRegion); auto const inlineFrame = implInlineReturn(env); SCOPE_EXIT { pushInlineFrame(env, inlineFrame); }; // If the caller is inlined as well, try to use shared EndCatch of its caller. if (isInlining(env)) { if (endCatchFromInlined(env)) return; if (spillInlinedFrames(env)) { gen(env, StVMFP, fp(env)); gen(env, StVMPC, cns(env, uintptr_t(curSrcKey(env).pc()))); gen(env, StVMReturnAddr, cns(env, 0)); } } // Tell the unwinder that we have popped stuff from the stack. auto const spOff = spOffBCFromIRSP(env); auto const bcSP = gen(env, LoadBCSP, IRSPRelOffsetData { spOff }, sp(env)); gen(env, StVMSP, bcSP); auto const data = EndCatchData { spOffBCFromIRSP(env), EndCatchData::CatchMode::UnwindOnly, EndCatchData::FrameMode::Phplogue, EndCatchData::Teardown::Full }; gen(env, EndCatch, data, fp(env), sp(env)); } //////////////////////////////////////////////////////////////////////////////// } bool endInlining(IRGS& env, const RegionDesc& calleeRegion) { auto const rt = env.inlineState.returnTarget.back(); implEndCatchBlock(env, calleeRegion); if (env.irb->canStartBlock(rt.callerTarget)) { implSuspendBlock(env, true); } else { return implSuspendBlock(env, false); } implReturnBlock(env, calleeRegion); FTRACE(1, "]]] end inlining: {}\n", curFunc(env)->fullName()->data()); return true; } bool conjureEndInlining(IRGS& env, const RegionDesc& calleeRegion, bool builtin) { if (!endInlining(env, calleeRegion)) return false; gen(env, ConjureUse, pop(env)); gen(env, EndBlock, ASSERT_REASON); return true; } void retFromInlined(IRGS& env) { gen(env, Jmp, env.inlineState.returnTarget.back().callerTarget); } void suspendFromInlined(IRGS& env, SSATmp* waitHandle) { gen(env, Jmp, env.inlineState.returnTarget.back().suspendTarget, waitHandle); } bool endCatchFromInlined(IRGS& env) { assertx(isInlining(env)); if (findCatchHandler(curFunc(env), bcOff(env)) != kInvalidOffset) { // We are not exiting the frame, as the current opcode has a catch handler. // Use the standard EndCatch logic that will have to spill the frame. return false; } if (fp(env) == env.irb->fs().fixupFP()) { // The current frame is already spilled. return false; } if (curFunc(env)->isAsync() && env.inlineState.returnTarget.back().asyncEagerOffset == kInvalidOffset) { // This async function was called without associated await, the exception // needs to be wrapped into an Awaitable. return false; } // Clear the evaluation stack and jump to the shared EndCatch handler. int locId = 0; while (spOffBCFromStackBase(env) > spOffEmpty(env)) { popDecRef(env, static_cast<DecRefProfileId>(locId++)); } gen(env, Jmp, env.inlineState.returnTarget.back().endCatchTarget); return true; } bool spillInlinedFrames(IRGS& env) { assertx(inlineDepth(env) == env.irb->fs().inlineDepth()); // Nothing to spill. if (!isInlining(env)) return false; auto const fixupFP = env.irb->fs().fixupFP(); bool spilled = false; for (size_t depth = 0; depth < env.irb->fs().inlineDepth(); depth++) { auto const parentFP = env.irb->fs()[depth].fp(); auto const fp = env.irb->fs()[depth + 1].fp(); if (parentFP == fixupFP) spilled = true; if (spilled) { gen(env, InlineCall, fp, parentFP); updateMarker(env); } } return spilled; } ////////////////////////////////////////////////////////////////////// }