3.5.1.7.2. 从InstRW定义推导
InstRW为一组指令重新绑定SchedReadWrite。在前面CodeGenSchedModels的createInstRWClass()方法里已经为InstRW定义准备了CodeGenSchedClass对象,并在CodeGenSchedClass对象的InstRWs容器里记录了该InstRW定义的Record实例。但对InstRW定义的处理并没完成,因为InstRW定义中的OperandReadWrites尚未处理,这些定义可能会衍生出一些新的调度类型,就像ItinRW的那样。
因此在inferSchedClasses()的846行,如果当前CodeGenSchedClass对象的InstRWs容器不为空,表明这个对象至少对应一个InstRW定义。886行展开这些InstRW定义的Instrs成员,得到该InstRW定义所应用的指令组。然后在888行的循环里确定该InstRW定义与当前CodeGenSchedClass对象相关。
882 void CodeGenSchedModels::inferFromInstRWs(unsigned SCIdx) {
883 for (unsigned I = 0, E = SchedClasses[SCIdx].InstRWs.size(); I != E; ++I) {
884 assert(SchedClasses[SCIdx].InstRWs.size() == E && "InstrRWs was mutated!");
885 Record *Rec = SchedClasses[SCIdx].InstRWs[I];
886 const RecVec *InstDefs = Sets.expand(Rec);
887 RecIter II = InstDefs->begin(), IE = InstDefs->end();
888 for (; II != IE; ++II) {
889 if (InstrClassMap[*II] == SCIdx)
890 break;
891 }
892 // If this class no longer has any instructions mapped to it, it has become
893 // irrelevant.
894 if (II == IE)
895 continue;
896 IdxVec Writes, Reads;
897 findRWs(Rec->getValueAsListOfDefs("OperandReadWrites"), Writes, Reads);
898 unsigned PIdx = getProcModel(Rec->getValueAsDef("SchedModel")).Index;
899 IdxVec ProcIndices(1, PIdx);
900 inferFromRW(Writes, Reads, SCIdx, ProcIndices); // May mutate SchedClasses.
901 }
902 }
接下来,类似于ItinRW,根据InstRW定义的OperandReadWrites的内容调用infterFromRW()方法来推导新的调度类型。
3.5.1.7.3. 从CodeGenSchedClass的Writes容器内容推导
我们知道一个CodeGenSchedClass对象由成员ItinClassDef,容器Writes与Reads的内容决定。
对于从ItinRW定义以及InstRW定义推导得到的CodeGenSchedClass对象,它们的ItinClassDef都是NULL,在容器Writes与Reads里,所有可能存在的SchedVariant定义都被展开了。
但对于从指令定义得到的CodeGenSchedClass对象,容器Writes与Reads的内容来自对应指令定义的SchedRW成员(类型list<SchedReadWrite>),这里面可能存在SchedVariant定义。
因此需要在inferSchedClasses()的848行对它们调用inferFromRW()。注意,参数ProcIndices来自SchedClasses[Idx].ProcIndices,这是适用该调度类型的处理器集合。因此,getIntersectingVariants()的1095~1109行必须设计成那个样子。
3.5.1.8. 保存资源对象
在构建了所有的调度类型CodeGenSchedClass实例后,CodeGenSchedModels构造函数最后调用CodeGenSchedModels::collectProcResources()方法来保存调度类型需要资源的Record对象(v7.0在方法开头添加了如下两行。ProcResourceDefs与ProcResGroups都是CodeGenSchedModels里的容器,在处理过程中临时保存ProcResourceUnits与ProcResGroup定义。这样做的意义应该是出于效率)。
1433 void CodeGenSchedModels::collectProcResources() {
ProcResourceDefs = Records.getAllDerivedDefinitions("ProcResourceUnits"); <-- v7.0增加
ProcResGroups = Records.getAllDerivedDefinitions("ProcResGroup");
1434 // Add any subtarget-specific SchedReadWrites that are directly associated
1435 // with processor resources. Refer to the parent SchedClass's ProcIndices to
1436 // determine which processors they apply to.
1437 for (SchedClassIter SCI = schedClassBegin(), SCE = schedClassEnd();
1438 SCI != SCE; ++SCI) {
1439 if (SCI->ItinClassDef)
1440 collectItinProcResources(SCI->ItinClassDef);
1441 else {
1442 // This class may have a default ReadWrite list which can be overriden by
1443 // InstRW definitions.
1444 if (!SCI->InstRWs.empty()) {
1445 for (RecIter RWI = SCI->InstRWs.begin(), RWE = SCI->InstRWs.end();
1446 RWI != RWE; ++RWI) {
1447 Record *RWModelDef = (*RWI)->getValueAsDef("SchedModel");
1448 IdxVec ProcIndices(1, getProcModel(RWModelDef).Index);
1449 IdxVec Writes, Reads;
1450 findRWs((*RWI)->getValueAsListOfDefs("OperandReadWrites"),
1451 Writes, Reads);
1452 collectRWResources(Writes, Reads, ProcIndices);
1453 }
1454 }
1455 collectRWResources(SCI->Writes, SCI->Reads, SCI->ProcIndices);
1456 }
1457 }
前面看过,CodeGenSchedClass对象的ItinClassDef如果不是NULL,那么这个ItinClassDef就是某类指令的执行步骤(itinerary,类型InstrItinClass)。而我们知道ItinRW定义将一组InstrItinClass定义映射到一组SchedReadWrite定义,那么我们需要下面的方法来查找并处理这些ItinRW定义。
1531 void CodeGenSchedModels::collectItinProcResources(Record *ItinClassDef) {
1532 for (unsigned PIdx = 0, PEnd = ProcModels.size(); PIdx != PEnd; ++PIdx) {
1533 const CodeGenProcModel &PM = ProcModels[PIdx];
1534 // For all ItinRW entries.
1535 bool HasMatch = false;
1536 for (RecIter II = PM.ItinRWDefs.begin(), IE = PM.ItinRWDefs.end();
1537 II != IE; ++II) {
1538 RecVec Matched = (*II)->getValueAsListOfDefs("MatchedItinClasses");
1539 if (!std::count(Matched.begin(), Matched.end(), ItinClassDef))
1540 continue;
1541 if (HasMatch)
1542 PrintFatalError((*II)->getLoc(), "Duplicate itinerary class "
1543 + ItinClassDef->getName()
1544 + " in ItinResources for " + PM.ModelName);
1545 HasMatch = true;
1546 IdxVec Writes, Reads;
1547 findRWs((*II)->getValueAsListOfDefs("OperandReadWrites"), Writes, Reads);
1548 IdxVec ProcIndices(1, PIdx);
1549 collectRWResources(Writes, Reads, ProcIndices);
1550 }
1551 }
1552 }
1532行遍历处理器的定义查找与参数ItinClassDef相关的ItinRW定义,获取CodeGenProcModel实例的索引。因为ItinRW的出现,现在该处理器的资源由这个ItinRW定义的OperandReadWrites里的SchedReadWrite域描述。下面的方法用于获取这些SchedReadWrite实例。
1593 void CodeGenSchedModels::collectRWResources(const IdxVec &Writes,
1594 const IdxVec &Reads,
1595 const IdxVec &ProcIndices) {
1596
1597 for (IdxIter WI = Writes.begin(), WE = Writes.end(); WI != WE; ++WI)
1598 collectRWResources(*WI, false, ProcIndices);
1599
1600 for (IdxIter RI = Reads.begin(), RE = Reads.end(); RI != RE; ++RI)
1601 collectRWResources(*RI, true, ProcIndices);
1602 }
为了关联资源,TableGen给出了若干特殊的SchedReadWrite派生定义。目标机器由此出发,可以对自己资源的使用情况建模。我们需要解析这些特殊SchedReadWrite派生定义的特别语义。
1554 void CodeGenSchedModels::collectRWResources(unsigned RWIdx, bool IsRead,
1555 const IdxVec &ProcIndices) {
1556 const CodeGenSchedRW &SchedRW = getSchedRW(RWIdx, IsRead);
1557 if (SchedRW.TheDef) {
1558 if (!IsRead && SchedRW.TheDef->isSubClassOf("SchedWriteRes")) {
1559 for (IdxIter PI = ProcIndices.begin(), PE = ProcIndices.end();
1560 PI != PE; ++PI) {
1561 addWriteRes(SchedRW.TheDef, *PI);
1562 }
1563 }
1564 else if (IsRead && SchedRW.TheDef->isSubClassOf("SchedReadAdvance")) {
1565 for (IdxIter PI = ProcIndices.begin(), PE = ProcIndices.end();
1566 PI != PE; ++PI) {
1567 addReadAdvance(SchedRW.TheDef, *PI);
1568 }
1569 }
1570 }
1571 for (RecIter AI = SchedRW.Aliases.begin(), AE = SchedRW.Aliases.end();
1572 AI != AE; ++AI) {
1573 IdxVec AliasProcIndices;
1574 if ((*AI)->getValueInit("SchedModel")->isComplete()) {
1575 AliasProcIndices.push_back(
1576 getProcModel((*AI)->getValueAsDef("SchedModel")).Index);
1577 }
1578 else
1579 AliasProcIndices = ProcIndices;
1580 const CodeGenSchedRW &AliasRW = getSchedRW((*AI)->getValueAsDef("AliasRW"));
1581 assert(AliasRW.IsRead == IsRead && "cannot alias reads to writes");
1582
1583 IdxVec ExpandedRWs;
1584 expandRWSequence(AliasRW.Index, ExpandedRWs, IsRead);
1585 for (IdxIter SI = ExpandedRWs.begin(), SE = ExpandedRWs.end();
1586 SI != SE; ++SI) {
1587 collectRWResources(*SI, IsRead, AliasProcIndices);
1588 }
1589 }
1590 }
首先,SchedWrite有一个特殊的派生类型——SchedWriteRes,它本身就会关联到一组资源,作用与WriteRes类似。CodeGenProcModel对象专门有记录WriteRes与SchedWriteRes的Record对象的容器WriteResDefs。
1674 void CodeGenSchedModels::addWriteRes(Record *ProcWriteResDef, unsigned PIdx) {
1675 assert(PIdx && "don't add resources to an invalid Processor model");
1676
1677 RecVec &WRDefs = ProcModels[PIdx].WriteResDefs;
1678 RecIter WRI = std::find(WRDefs.begin(), WRDefs.end(), ProcWriteResDef);
1679 if (WRI != WRDefs.end())
1680 return;
1681 WRDefs.push_back(ProcWriteResDef);
1682
1683 // Visit ProcResourceKinds referenced by the newly discovered WriteRes.
1684 RecVec ProcResDefs = ProcWriteResDef->getValueAsListOfDefs("ProcResources");
1685 for (RecIter WritePRI = ProcResDefs.begin(), WritePRE = ProcResDefs.end();
1686 WritePRI != WritePRE; ++WritePRI) {
1687 addProcResource(*WritePRI, ProcModels[PIdx]);
1688 }
1689 }
WriteRes与SchedWriteRes还派生自基类ProcWriteResources,具有类型为list<ProcResourceKind>的成员ProcResources。这就是WriteRes与SchedWriteRes所关联的资源,这些资源的定义需要被记录下来。
1651 void CodeGenSchedModels::addProcResource(Record *ProcResKind,
1652 CodeGenProcModel &PM) {
1653 for (;;) {
1654 Record *ProcResUnits = findProcResUnits(ProcResKind, PM);
1655
1656 // See if this ProcResource is already associated with this processor.
1657 RecIter I = std::find(PM.ProcResourceDefs.begin(),
1658 PM.ProcResourceDefs.end(), ProcResUnits);
1659 if (I != PM.ProcResourceDefs.end())
1660 return;
1661
1662 PM.ProcResourceDefs.push_back(ProcResUnits);
1663 if (ProcResUnits->isSubClassOf("ProcResGroup"))
1664 return;
1665
1666 if (!ProcResUnits->getValueInit("Super")->isComplete())
1667 return;
1668
1669 ProcResKind = ProcResUnits->getValueAsDef("Super");
1670 }
1671 }
上面的参数ProcResKind可能是ProcResource派生定义,也可能是ProcResGroup派生定义。其中ProcResource派生自ProcResourceKind与ProcResourceUnits,而ProcResGroup只是ProcResourceKind的派生定义。因此,给定一个ProcResKind,如果它是ProcResourceUnits派生定义,我们就知道它是一个ProcResource。否则就是一个ProcResGroup。因此,下面1611行以下处理ProcResGroup定义(对v7.0,ProcResourceDefs与ProcResGroups是CodeGenSchedModels的容器,前面已经获取了相关的定义,因此可以优化掉这里的getAddDerivedDefinitions())。
1606 Record *CodeGenSchedModels::findProcResUnits(Record *ProcResKind,
1607 const CodeGenProcModel &PM) const {
1608 if (ProcResKind->isSubClassOf("ProcResourceUnits"))
1609 return ProcResKind;
1610
1611 Record *ProcUnitDef = nullptr;
assert(!ProcResourceDefs.empty()); <-- v7.0增加
assert(!ProcResGroups.empty());
1612 RecVec ProcResourceDefs =
1613 Records.getAllDerivedDefinitions("ProcResourceUnits"); <-- v7.0删除
1614
1615 for (RecIter RI = ProcResourceDefs.begin(), RE = ProcResourceDefs.end();
1616 RI != RE; ++RI) {
1617
1618 if ((*RI)->getValueAsDef("Kind") == ProcResKind
1619 && (*RI)->getValueAsDef("SchedModel") == PM.ModelDef) {
1620 if (ProcUnitDef) {
1621 PrintFatalError((*RI)->getLoc(),
1622 "Multiple ProcessorResourceUnits associated with "
1623 + ProcResKind->getName());
1624 }
1625 ProcUnitDef = *RI;
1626 }
1627 }
1628 RecVec ProcResGroups = Records.getAllDerivedDefinitions("ProcResGroup"); <-- v7.0删除
1629 for (RecIter RI = ProcResGroups.begin(), RE = ProcResGroups.end();
1630 RI != RE; ++RI) {
1631
1632 if (*RI == ProcResKind
1633 && (*RI)->getValueAsDef("SchedModel") == PM.ModelDef) {
1634 if (ProcUnitDef) {
1635 PrintFatalError((*RI)->getLoc(),
1636 "Multiple ProcessorResourceUnits associated with "
1637 + ProcResKind->getName());
1638 }
1639 ProcUnitDef = *RI;
1640 }
1641 }
1642 if (!ProcUnitDef) {
1643 PrintFatalError(ProcResKind->getLoc(),
1644 "No ProcessorResources associated with "
1645 + ProcResKind->getName());
1646 }
1647 return ProcUnitDef;
1648 }
目前ProcResourceUnits仅有的派生类是ProcResource,它向基类的Kind成员设置了一个特别的ProcResourceKind派生定义——EponymousProcResourceKind,因此ProcResGroup不会满足1618行的条件,1615行循环目前是没有作用的。对同一族处理器,类似的资源单位通常有相同的类型,而且这些资源单位的定义通常还是匿名的,因此ProcResourceUnits定义里专门给出了类型为SchedMachineModel的域SchedModel,用于区分不同的处理器。1632、1633行就是通过资源类型与作用处理器来找出对应的ProcResGroup定义。
在ProcResourceUnits定义里可以通过Super成员指定包含自己作为子集的更通用的资源单位,如果Super被定义了,就要辗转获取它的Record对象,直到某一级资源单位没有设置Super为止。
SchedRead也有一个特殊的派生类型——SchedReadAdvance,表示可以将关联的SchedWrite的时延降低指定周期。这些SchedReadAdvance必须记录下来。另外,还有一个ReadAdvance,它就像SchedReadAdvance的另一种封装。因此SchedReadAdvance与ReadAdvance将记录在一起。
1692 void CodeGenSchedModels::addReadAdvance(Record *ProcReadAdvanceDef,
1693 unsigned PIdx) {
1694 RecVec &RADefs = ProcModels[PIdx].ReadAdvanceDefs;
1695 RecIter I = std::find(RADefs.begin(), RADefs.end(), ProcReadAdvanceDef);
1696 if (I != RADefs.end())
1697 return;
1698 RADefs.push_back(ProcReadAdvanceDef);
1699 }
CodeGenProcModel对象的容器ReadAdvanceDefs就用于记录ReadAdvance定义的Record对象。在collectRWResources()里对SchedReadWrite别名的处理也很熟悉,首先expandRWSequence()展开可能存在的WriteSequence,然后对这些(展开后的)SchedReadWrite调用collectRWResources()。
回到CodeGenSchedModels::collectProcResources(),如果CodeGenSchedClass实例的ItinClassDef是空的,那么这个对象是根据ItinRW或instRW定义创建的。这些CodeGenSchedClass对象的InstRWs容器都是空的,因此1444~1454行实际上是不会执行的。因此,对从ItinRW与instRW定义衍生的调度类型,实际的处理由1455行的collectRWResources()完成。
CodeGenSchedModels::collectProcResources(续)
1458 // Add resources separately defined by each subtarget.
1459 RecVec WRDefs = Records.getAllDerivedDefinitions("WriteRes");
1460 for (RecIter WRI = WRDefs.begin(), WRE = WRDefs.end(); WRI != WRE; ++WRI) {
1461 Record *ModelDef = (*WRI)->getValueAsDef("SchedModel");
1462 addWriteRes(*WRI, getProcModel(ModelDef).Index);
1463 }
1464 RecVec SWRDefs = Records.getAllDerivedDefinitions("SchedWriteRes");
1465 for (RecIter WRI = SWRDefs.begin(), WRE = SWRDefs.end(); WRI != WRE; ++WRI) {
1466 Record *ModelDef = (*WRI)->getValueAsDef("SchedModel");
1467 addWriteRes(*WRI, getProcModel(ModelDef).Index);
1468 }
1469 RecVec RADefs = Records.getAllDerivedDefinitions("ReadAdvance");
1470 for (RecIter RAI = RADefs.begin(), RAE = RADefs.end(); RAI != RAE; ++RAI) {
1471 Record *ModelDef = (*RAI)->getValueAsDef("SchedModel");
1472 addReadAdvance(*RAI, getProcModel(ModelDef).Index);
1473 }
1474 RecVec SRADefs = Records.getAllDerivedDefinitions("SchedReadAdvance");
1475 for (RecIter RAI = SRADefs.begin(), RAE = SRADefs.end(); RAI != RAE; ++RAI) {
1476 if ((*RAI)->getValueInit("SchedModel")->isComplete()) {
1477 Record *ModelDef = (*RAI)->getValueAsDef("SchedModel");
1478 addReadAdvance(*RAI, getProcModel(ModelDef).Index);
1479 }
1480 }
1481 // Add ProcResGroups that are defined within this processor model, which may
1482 // not be directly referenced but may directly specify a buffer size.
1483 RecVec ProcResGroups = Records.getAllDerivedDefinitions("ProcResGroup");
1484 for (RecIter RI = ProcResGroups.begin(), RE = ProcResGroups.end();
1485 RI != RE; ++RI) {
1486 if (!(*RI)->getValueInit("SchedModel")->isComplete())
1487 continue;
1488 CodeGenProcModel &PM = getProcModel((*RI)->getValueAsDef("SchedModel"));
1489 RecIter I = std::find(PM.ProcResourceDefs.begin(),
1490 PM.ProcResourceDefs.end(), *RI);
1491 if (I == PM.ProcResourceDefs.end())
1492 PM.ProcResourceDefs.push_back(*RI);
1493 }
1494 // Finalize each ProcModel by sorting the record arrays.
1495 for (CodeGenProcModel &PM : ProcModels) {
1496 std::sort(PM.WriteResDefs.begin(), PM.WriteResDefs.end(),
1497 LessRecord());
1498 std::sort(PM.ReadAdvanceDefs.begin(), PM.ReadAdvanceDefs.end(),
1499 LessRecord());
1500 std::sort(PM.ProcResourceDefs.begin(), PM.ProcResourceDefs.end(),
1501 LessRecord());
1502 DEBUG(
1503 PM.dump();
1504 dbgs() << "WriteResDefs: ";
1505 for (RecIter RI = PM.WriteResDefs.begin(),
1506 RE = PM.WriteResDefs.end(); RI != RE; ++RI) {
1507 if ((*RI)->isSubClassOf("WriteRes"))
1508 dbgs() << (*RI)->getValueAsDef("WriteType")->getName() << " ";
1509 else
1510 dbgs() << (*RI)->getName() << " ";
1511 }
1512 dbgs() << "\nReadAdvanceDefs: ";
1513 for (RecIter RI = PM.ReadAdvanceDefs.begin(),
1514 RE = PM.ReadAdvanceDefs.end(); RI != RE; ++RI) {
1515 if ((*RI)->isSubClassOf("ReadAdvance"))
1516 dbgs() << (*RI)->getValueAsDef("ReadType")->getName() << " ";
1517 else
1518 dbgs() << (*RI)->getName() << " ";
1519 }
1520 dbgs() << "\nProcResourceDefs: ";
1521 for (RecIter RI = PM.ProcResourceDefs.begin(),
1522 RE = PM.ProcResourceDefs.end(); RI != RE; ++RI) {
1523 dbgs() << (*RI)->getName() << " ";
1524 }
1525 dbgs() << '\n');
1526 verifyProcResourceGroups(PM);
1527 }
ProcResourceDefs.clear(); <-- v7.0增加
ProcResGroups.clear();
1528 }
collectProcResources()接下来的处理属于补漏性质。将上面没有覆盖的WriteRes,SchedWriteRes,ReadAdvance,SchedReadAdvance及ProcResGroup保存到CodeGenSchedModels实例的相应容器中。最后对这些容器(WriteResDefs,ReadAdvanceDefs,ProcResourceDefs)按名字排序,并通过verifyProcResourceGroups()方法确定重叠的组有共同的超集。
3.5.1.9. 收集可选的处理器信息(V7.0)
下一步就是收集所谓的可选的处理器信息。
244 void CodeGenSchedModels::collectOptionalProcessorInfo() {
245 // Find register file definitions for each processor.
246 collectRegisterFiles();
247
248 // Collect processor RetireControlUnit descriptors if available.
249 collectRetireControlUnits();
250
251 // Find pfm counter definitions for each processor.
252 collectPfmCounters();
253
254 checkCompleteness();
255 }
3.5.1.9.1. RegisterFile的定义
V7.0在TD文件里引入了RegisterFile定义,以允许用于寄存器重命名目的的处理器寄存器文件的定义。
每个处理器寄存器文件声明:
- 可以重命名的寄存器集。
- 可用于寄存器重命名目的的物理寄存器数。
- 重命名一个寄存器的代价。
一次重命名的代价是由寄存器别名表分配的、映射到新定义的物理寄存器的数量。缺省的,可以一个物理寄存器的代价重命名寄存器。注意,可以在寄存器类的层面定义寄存器代价(参考Cost域)。
使用一组寄存器类来声明受制于寄存器重命名的寄存器集(参考RegClasses域)。空寄存器类列表表示:所有由目标定义的逻辑寄存器可以完全重命名。
寄存器R可以被重命名,如果其寄存器类出现在RegClasses集合中。以一个或多个物理寄存器为代价,在写入R时分配一个别名;因此,消除R上的假依赖。
寄存器R的子寄存器V属于同一个寄存器文件。不过,仅在其寄存器类是RegClasses部分时,V被重命名。否则,处理器将它与R一同保留(以及R的其他部分),V的一次写总是导致强制读R。
这就是在诸如AMD处理器(至少是Bulldozer)上发生的情形,其中AL与AH不被视为独立于AX,AX不被视为独立于EAX。写AL对EAX(或EAX部分)的最后一次写有一个隐含的假依赖。因此,AL写不能与AH写并行。
如果这个部分寄存器写属于在RegClasses里的一个寄存器类,不存在假依赖。
对清除一个超级寄存器内容的写,也不存在惩罚(参考MC/MCInstrAnalysis.h——方法MCInstrAnalysis::clearsSuperRegisters())。
在x86-64上,32位GPR写隐含清零物理寄存器的高半部,实际上消除了与之前寄存器定义的假依赖。
目前该实现尚有这些问题:
实现假设每周期重命名数没有限制,对所有硬件或寄存器类这可能不成立。同样,在同一个周期中同一个逻辑寄存器可重命名的次数没有限制。
当前,在写一个部分寄存器后跟读同一寄存器更大部分的情形里,没有对合并惩罚建模。在某些Intel芯片上,一个GPR的不同部分可以保存在不同的物理寄存器中。不过,在该部分写与前面的超级寄存器定义合并时存在代价。我们应该添加对这些情形的支持,以及正确对部分寄存器访问的合并问题建模。
510 class RegisterFile<int numPhysRegs, list<RegisterClass> Classes = [],
511 list<int> Costs = []> {
512 list<RegisterClass> RegClasses = Classes;
513 list<int> RegCosts = Costs;
514 int NumPhysRegs = numPhysRegs;
515 SchedMachineModel SchedModel = ?;
516 }
X86目标机器定义了这些RegisterFile的派生定义:
51 def JIntegerPRF : RegisterFile<64, [GR64, CCR]>;
其中CCR是状态标记寄存器类(status flag register)。
用于Jaguar的整形PRF有64个,它持有64位整数寄存器的架构与推测版本。参考www.realworldtech.com/jaguar/4/。
处理器总是将整数寄存器不同的部分保持在一起。因此一条写一个寄存器部分的指令,对之前写这个寄存器或任意部分有一个假依赖。参考Agner Fog的microarchitecture.pdf,AMD Bobcat 与 Jaguar流水线:部分寄存器访问一节。
56 def JFpuPRF: RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]>;
Jaguar FP Retire Queue在一个有72个SSE寄存器的寄存器池上重命名SIMD与FP微操作。 256位数据类型上的操作被分解为两个COP(复杂微操作,complex micro-operation)。参考:www.realworldtech.com/jaguar/4/。
96 def ZnIntegerPRF : RegisterFile<168, [GR64, CCR]>;
用于 Zen 的整形PRF有168个,它持有64位整数寄存器的架构与推测版本。参考:Software Optimization Guide for AMD Family 17h Processors。
106 def ZnFpuPRF: RegisterFile<160, [VR64, VR128, VR256], [1, 1, 2]>;
Zen FP Retire Queue在一个有160个128位寄存器的寄存器池上重命名SIMD与FP微操作。256位数据类型上的操作被分解为两个COP。参考:Software Optimization Guide for AMD Family 17h Processors。
CodeGenProcModel里定义了容器RegisterFiles(类型std::vector<CodeGenRegisterFile>)。元素CodeGenRegisterFile是一个简单的类。
180 struct CodeGenRegisterFile {
181 std::string Name;
182 Record *RegisterFileDef;
183
184 unsigned NumPhysRegs;
185 std::vector<CodeGenRegisterCost> Costs;
186
187 CodeGenRegisterFile(StringRef name, Record *def)
188 : Name(name), RegisterFileDef(def), NumPhysRegs(0) {}
189
190 bool hasDefaultCosts() const { return Costs.empty(); }
191 };
类CodeGenRegisterCost也是平凡的。
166 struct CodeGenRegisterCost {
167 Record *RCDef;
168 unsigned Cost;
169 CodeGenRegisterCost(Record *RC, unsigned RegisterCost)
170 : RCDef(RC), Cost(RegisterCost) {}
171 CodeGenRegisterCost(const CodeGenRegisterCost &) = default;
172 CodeGenRegisterCost &operator=(const CodeGenRegisterCost &) = delete;
173 };
方法collectRegisterFiles()用于填充容器RegisterFiles。从它的处理可以看到,在RegisterFile定义中,RegCosts要么不给,要么就与RegClasses一一对应。
1513 void CodeGenSchedModels::collectRegisterFiles() {
1514 RecVec RegisterFileDefs = Records.getAllDerivedDefinitions("RegisterFile");
1515
1516 // RegisterFiles is the vector of CodeGenRegisterFile.
1517 for (Record *RF : RegisterFileDefs) {
1518 // For each register file definition, construct a CodeGenRegisterFile object
1519 // and add it to the appropriate scheduling model.
1520 CodeGenProcModel &PM = getProcModel(RF->getValueAsDef("SchedModel"));
1521 PM.RegisterFiles.emplace_back(CodeGenRegisterFile(RF->getName(),RF));
1522 CodeGenRegisterFile &CGRF = PM.RegisterFiles.back();
1523
1524 // Now set the number of physical registers as well as the cost of registers
1525 // in each register class.
1526 CGRF.NumPhysRegs = RF->getValueAsInt("NumPhysRegs");
1527 RecVec RegisterClasses = RF->getValueAsListOfDefs("RegClasses");
1528 std::vector<int64_t> RegisterCosts = RF->getValueAsListOfInts("RegCosts");
1529 for (unsigned I = 0, E = RegisterClasses.size(); I < E; ++I) {
1530 int Cost = RegisterCosts.size() > I ? RegisterCosts[I] : 1;
1531 CGRF.Costs.emplace_back(RegisterClasses[I], Cost);
1532 }
1533 }
1534 }
3.5.1.9.2. 处理器回收控制单元定义
描述回收控制单元。
回收控制单元指定了重排缓冲的大小以及每周期可回收的操作码最大数量。ReorderBufferSize小于等于零的值表示:大小未知。想法是如果重排缓冲大小未知,外部工具可以后退到使用SchedModel里的MicroOpBufferSize域。
零或负的MaxRetirePerCycle值意味着每周期回收指令数没有限制。
每调度模型可以可选地指定最多一个RetireControlUnit实例。
528 class RetireControlUnit<int bufferSize, int retirePerCycle> {
529 int ReorderBufferSize = bufferSize;
530 int MaxRetirePerCycle = retirePerCycle;
531 SchedMachineModel SchedModel = ?;
532 }
X86目标机器定义了这些RetireControlUnit的派生定义:
61 def JRCU : RetireControlUnit<64, 2>;
回收控制单元可以追踪64个微操作。每周期它可以回收最多两个微操作。参考:Software Optimization Guide for AMD Family 16h Processors。
115 def ZnRCU : RetireControlUnit<192, 8>;
该单元可以追踪192个微操作。每周期回收单元可以处理依次提交的最多8个微操作。参考:Software Optimization Guide for AMD Family 17h Processors。
注意,回收单元由整数及浮点操作共享。
在SMT模式中,每线程是96项。但,这里我们不使用保守值,因为当前没有办法对SMT完整建模,因此尝试是没有意义的。
在CodeGenProcModel中成员RetireControlUnit(类型Record*)用于记录与该处理器模型相关的RetireControlUnit定义。
228 void CodeGenSchedModels::collectRetireControlUnits() {
229 RecVec Units = Records.getAllDerivedDefinitions("RetireControlUnit");
230
231 for (Record *RCU : Units) {
232 CodeGenProcModel &PM = getProcModel(RCU->getValueAsDef("SchedModel"));
233 if (PM.RetireControlUnit) {
234 PrintError(RCU->getLoc(),
235 "Expected a single RetireControlUnit definition");
236 PrintNote(PM.RetireControlUnit->getLoc(),
237 "Previous definition of RetireControlUnit was here");
238 }
239 PM.RetireControlUnit = RCU;
240 }
241 }
3.5.1.9.3. 处理器的性能计数器定义
处理器带有所谓的性能计数器来辅助查找性能瓶颈。因此,在TD文件里给出了这些定义:
535 class PfmCounter {
536 SchedMachineModel SchedModel = ?;
537 }
以此为基类,派生出PfmCycleCounter,处理器可以定义它来说明如何测量周期。
541 class PfmCycleCounter<string counter> : PfmCounter {
542 string Counter = counter;
543 }
处理器还可以定义PfmIssueCounter来说明如何测量发布的微操作。
547 class PfmIssueCounter<ProcResourceUnits resource, list<string> counters>
548 : PfmCounter{
549 // The resource units on which uops are issued.
550 ProcResourceUnits Resource = resource;
551 // The list of counters that measure issue events.
552 list<string> Counters = counters;
553 }
X86在x86PfmCounters.td文件中为处理器SandyBridgeModel,HaswellModel,BroadwellModel,SkylakeClientModel,SkylakeServerModel,BtVer2Model定义了性能计数器集。以SandyBridgeModel为例:
14 let SchedModel = SandyBridgeModel in {
15 def SBCycleCounter : PfmCycleCounter<"unhalted_core_cycles">;
16 def SBPort0Counter : PfmIssueCounter<SBPort0, ["uops_dispatched_port:port_0"]>;
17 def SBPort1Counter : PfmIssueCounter<SBPort1, ["uops_dispatched_port:port_1"]>;
18 def SBPort23Counter : PfmIssueCounter<SBPort23,
19 ["uops_dispatched_port:port_2",
20 "uops_dispatched_port:port_3"]>;
21 def SBPort4Counter : PfmIssueCounter<SBPort4, ["uops_dispatched_port:port_4"]>;
22 def SBPort5Counter : PfmIssueCounter<SBPort5, ["uops_dispatched_port:port_5"]>;
23 }
CodeGenProcModel里容器PfmIssueCounterDefs(类型std::vector<Record*>)用于保存这些寄存器定义。而成员PfmCycleCounterDef则是一个Record指针。
1573 void CodeGenSchedModels::collectPfmCounters() {
1574 for (Record *Def : Records.getAllDerivedDefinitions("PfmIssueCounter")) {
1575 CodeGenProcModel &PM = getProcModel(Def->getValueAsDef("SchedModel"));
1576 PM.PfmIssueCounterDefs.emplace_back(Def);
1577 }
1578 for (Record *Def : Records.getAllDerivedDefinitions("PfmCycleCounter")) {
1579 CodeGenProcModel &PM = getProcModel(Def->getValueAsDef("SchedModel"));
1580 if (PM.PfmCycleCounterDef) {
1581 PrintFatalError(Def->getLoc(),
1582 "multiple cycle counters for " +
1583 Def->getValueAsDef("SchedModel")->getName());
1584 }
1585 PM.PfmCycleCounterDef = Def;
1586 }
1587 }
3.5.1.9.4. 检查完整性
1662~1699行的循环遍历所有的CodeGenProcModel实例,在嵌套的1666~1697行循环遍历所有的指令定义,在1667行滤除没有调度信息的指令,在1669行滤除当前处理器不支持的指令。剩下的代码检查CodeGenProcModel实例是否对其余指令设置有调度信息。
1659 void CodeGenSchedModels::checkCompleteness() {
1660 bool Complete = true;
1661 bool HadCompleteModel = false;
1662 for (const CodeGenProcModel &ProcModel : procModels()) {
1663 const bool HasItineraries = ProcModel.hasItineraries();
1664 if (!ProcModel.ModelDef->getValueAsBit("CompleteModel"))
1665 continue;
1666 for (const CodeGenInstruction *Inst : Target.getInstructionsByEnumValue()) {
1667 if (Inst->hasNoSchedulingInfo)
1668 continue;
1669 if (ProcModel.isUnsupported(*Inst))
1670 continue;
1671 unsigned SCIdx = getSchedClassIdx(*Inst);
1672 if (!SCIdx) {
1673 if (Inst->TheDef->isValueUnset("SchedRW") && !HadCompleteModel) {
1674 PrintError("No schedule information for instruction '"
1675 + Inst->TheDef->getName() + "'");
1676 Complete = false;
1677 }
1678 continue;
1679 }
1680
1681 const CodeGenSchedClass &SC = getSchedClass(SCIdx);
1682 if (!SC.Writes.empty())
1683 continue;
1684 if (HasItineraries && SC.ItinClassDef != nullptr &&
1685 SC.ItinClassDef->getName() != "NoItinerary")
1686 continue;
1687
1688 const RecVec &InstRWs = SC.InstRWs;
1689 auto I = find_if(InstRWs, [&ProcModel](const Record *R) {
1690 return R->getValueAsDef("SchedModel") == ProcModel.ModelDef;
1691 });
1692 if (I == InstRWs.end()) {
1693 PrintError("'" + ProcModel.ModelName + "' lacks information for '" +
1694 Inst->TheDef->getName() + "'");
1695 Complete = false;
1696 }
1697 }
1698 HadCompleteModel = true;
1699 }
1700 if (!Complete) {
1701 errs() << "\n\nIncomplete schedule models found.\n"
1702 << "- Consider setting 'CompleteModel = 0' while developing new models.\n"
1703 << "- Pseudo instructions can be marked with 'hasNoSchedulingInfo = 1'.\n"
1704 << "- Instructions should usually have Sched<[...]> as a superclass, "
1705 "you may temporarily use an empty list.\n"
1706 << "- Instructions related to unsupported features can be excluded with "
1707 "list<Predicate> UnsupportedFeatures = [HasA,..,HasY]; in the "
1708 "processor model.\n\n";
1709 PrintFatalError("Incomplete schedule model");
1710 }
1711 }
从checkCompleteness()返回,collectOptionalProcessorInfo()也立即返回到CodeGenSchedModels构造函数,至此CodeGenSchedModels构造完成。InstrInfoEmitter()也随即完成构造。