分类芯片后端下的文章 - 欢迎来到半导体与芯片的世界

登录

标签搜索

bennyhe

累计撰写 344 篇文章
累计收到 31 条评论

搜索到 9 篇与的结果

2025-09-17
DC/DCT/DCG 差别和联系本文探讨了DC、DCT和DCG在Synopsys工具中的区别和联系，重点在于它们如何优化时序、处理线延迟、物理布局和低功耗设计。DCG在DCT基础上增强堵塞管理和物理约束，适合亚微米工艺下的高级综合优化。1、连线优化2、MUX结构优化3、门级结构优化DC/DCT/DCG 差别和联系转自：https://www.cnblogs.com/wt-seu/p/12812663.html在dc家族系列中，DC_V,DC_E为根本的DC(Design Compiler)对象，具有dc所具有的根本fearture，DC在synopys对象系列中地位，无足轻重，也是业界应用最普遍的综合对象，比拟candence的RC（RTL compiler）有更大的客户群。进入到亚微米工艺下，DCT/DCG已逐步成为优化时序的一种选择。在解释这个成绩之前，就我所接触到的DC相干的license成绩，简述一下synopsys的生财之道。可以说DC是synopsys最挣钱的EDA对象，除根本的fearture须要license之外，一些高等的fearture，都须要额定免费。好比1、compile_ultra2、set_host_number3、design_ware库（又细分为许多种好比低功耗，多比特存放器，和一些IP）。4、DCT5、DCG等等，这些都须要license,并且价钱不菲。人人可以在synopsys官网上看到这些。那末言归正传，DC/DCT/DCG有甚么差别和联系呢?1、起首简略的讲，DCG包括DCT所有fearture，DCT包括DC所有fearture，固然有一些DC的fearture在DCT和DCG中已不再实用，好比wire_load_model的设置。2、从库的角度来看，DCT/DCG比拟DC多了physical library的设置。DCG比拟DCT又多了对layer,congestion相干的设置。3、DCT的涌现重要是处理DC的时序模子中，wire_load_model误差过大的成绩，使得DCT在综合的时刻可以加倍准确斟酌path中线延时，并联合加倍精确的path的时序情形停止优化。而DCG重要是在DCT的基本上处理堵塞成绩，更好的结构布线。4、 DCT/DCG比拟DC都须要输出物理束缚。平日是经由过程ICC做floorplan以后的def文件中抽取物理束缚信息。今朝来看经由过程物理束缚敕令，编写物理束缚已成为鸡肋，重要缘由，这个阶段很难经由过程敕令准确的表述block的结构布线信息。5、低功耗设计中upf/cpf文件的编写，是低功耗设计的根本功。DC/DCT/DCG都支撑低功耗设计。6、DC:dc_shell-t DCT: dc_shell-topo ，必需启动compile_ultra，DCG:差别在与启动DCT后，在compile_ultra 以后多了-spg选项。总之DC/DCT/DCG既有差别又有联系。留意比较中熟习其特点。————————————————版权声明：本文为CSDN博主「北方爷们」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。原文链接：https://blog.csdn.net/sinat_29862967/article/details/111220492
- 2025年09月17日
- 6 阅读
- 0 评论
- 0 点赞
2025-09-09
我用ChatGPT设计了一颗芯片原创 Hammond半导体行业观察2023年12月26日09：53安徽今年早些时候，我（指代本文作者）正在纽约大学从事博士后工作，其中之一是探索Verilog 大型语言模型的使用。我们对使用 ChatGPT 等 LLM 来设计硬件的各种不同应用程序进行了基准测试，包括规范解释、设计以及错误检测和修复。我们是这个领域的先行者之一，早在 2020 年就开始使用 GPT-2 和 Verilog。我立即对上述帖子产生了兴趣，但由于实际流片的成本过高，我们一直使用 FPGA 和模拟。但是，模拟与现实之间总是存在差距，因此表明LLM和人工智能确实可以生产芯片可能对我们的研究领域来说是一个福音。我们能否使用免费流片的Tiny Tapeout 作为实现此目的的工具，并使用LLM不仅编写 Verilog，还为真正的芯片设计 Verilog？我与我的导师和其他几位博士生进行了交谈，我们集思广益了一些想法。Tiny Tapeout 非常小，只有 1000 个标准单元左右，这意味着设计会受到很大限制，但我们都非常喜欢这个想法，特别是因为似乎还没有人做到过，所以如果我们行动迅速，我们可能会能够做到世界第一！所以，我们决定去做。但现在，还有很多其他事情需要考虑。鉴于设计空间如此之小，我们应该提交什么？还有其他问题。我们从我们自己之前的工作中知道，LLM可以编写像 Verilog 这样的硬件设计语言，但他们只是不太擅长，与 Python 等更流行的语言相比，语法或逻辑错误的发生率要高得多，这就是为什么我我的团队已经为 Verilog 制作了自己的LLM的原因。因此，我们需要决定，如果我们确实想使用LLM来制造芯片，（1）我们应该使用哪个LLM？（2）我们应该给它多大的帮助？（3）我们应该尝试什么prompting strategy？设计方法我们首先决定了LLM。我们不会使用到目前为止我们一直在使用的“自动完成”风格的LLM，主要是 OpenAI 的 Codex 和 Salesforce CodeGen，而是使用更新、更华丽的“对话”/“指导”风格的法学硕士。我们选择了 OpenAI 的 ChatGPT 3.5 和ChatGPT 4 版、Google 的 Bard 以及开源的 HuggingChat。然后我们想出了两种方法。第一种方法是尝试让LLM在一种反馈循环中完成所有事情，从而为LLM提供一个规范，然后为该设计生成设计和测试。然后，人类将在模拟器 (iVerilog) 中运行测试和设计，然后将任何错误返回给LLM。然而，我们从经验中知道，LLM有时也相当愚蠢，并且可能会陷入循环，他们认为自己正在解决问题或改进输出，而实际上他们只是迭代相同的数据。因此我们推测有时我们可能需要回馈“人类援助”。通过一些初步实验，我们决定了一个如下所示的初始流程：理想情况下，人类不需要提供太多输入，但这还有待观察。在硬件流片方面，我们的目标是 Tiny Tapeout 3，它将基于 Skywater 130nm。它有一些限制：前面提到的 1000 个标准单元，以及只有 8 位输入（包括任何时钟或复位）和 8 位输出。Tiny Tapeout 使用 OpenLane，这意味着我们也仅限于可综合的 Verilog-2001。设计什么？在这个实验的早期阶段，我们对与对话式LLM交互的标准化和（理想情况下）自动化流程的潜力感兴趣，该流程将从规范开始并产生该设计的硬件描述语言。鉴于我们有 8 位输入，我们决定使用其中 3 位来控制设计选择多路复用器，以适应 8 个小型基准测试。如果这些进展顺利，我们就会致力于更雄心勃勃的事情。这些是我们提出的基准：每个基准测试都有一个简短的规范来描述它及其 I/O，以及正确的预期行为。然后，纽约大学博士后Jason Blocklove 与四个选定的LLM（ChatGPT-3.5、ChatGPT-4、Bard 和 HuggingChat）坐在一起，执行前面描述的过程，引导LLM首先生成设计，然后生成测试平台，然后将它们一起模拟，并反馈任何错误。有时，谈话中需要考虑特殊情况。由于模型在一次响应中可以给出的输出量受到限制，文件或解释通常会被中断；在这些情况下，模型将提示“请继续”。continue 之后的代码通常从较早消息的最后一行之前开始，因此当代码被复制到文件中进行编译和模拟时，它会被编辑以形成一个内聚块。然而，这个过程没有添加额外的 HDL。同样，有时响应中会包含让用户添加自己的代码的注释。如果这些注释会阻止功能，例如留下不完整的值数组，则将重新生成响应，否则将保持原样。设计提示：我们在这个环节设计的提示词看起来像这样：在试过之后，我们发现返回的设计质量参差不齐，我们发现只有 ChatGPT-4 可以可靠地生成设计。以下是每个LLM的第一个设计（移位寄存器）的示例：1、首先我们看一下ChatGPT 4.0生成的反馈：这个代码是有用的。2.我们再试一下ChatGPT 3.5，生成的结果如下：同样有用。3.我们再试一下BARD，得出以下结果。这是行不通的！请注意输入定义中的错误。4.再看一下HuggingChat从以上代码看来，它似乎已经疯了。测试平台（Testbenches）一旦我们有了设计，我们就必须提示建立一个测试平台，在这里，我们将用以下提示词：Can you create a Verilog testbench for this design? It should be self-checking and made to work with iverilog for simulation and validation. If test cases should fail, the testbench should provide enough information that the error can be found and resolved.以下是完整的对话事例：1.我们先看一下ChatGPT 4.0生成的回复。我们首先问:得到的回复如下：以下是代码的快速细分：1.Define the shift_register module with the specified inputs and outputs.2.Create an always block that responds to the rising edge of the clock (posedge clk) or the falling edge of the active-low reset (negedge reset_n).3.Within the always block, check if the reset signal is low (!reset_n). If so, clear the 8-bit output register (data_out <= 8'b0).4.If the reset signal is not low, check if the shift enable signal is high. If so, shift the data in the output register to the left and insert the new data bit at the least significant bit position (data_out <= {data_out[6:0], data_in}).该设计满足给定的规格，并提供具有所需输入和输出的 8 位移位寄存器。接下来我又问：Can you create a Verilog testbench for this design? It should be self-checking and made to work with iverilog for simulation and validation. If test cases should fail, the testbench should provide enough information that the error can be found and resolved.然后得出下面的回答：该测试平台涵盖四个测试用例：简单移位（shift）：将 1 移入寄存器并检查输出。使用交替位进行移位：移位交替位模式并检查输出。没有shift_enable 的移位：验证移位寄存器在没有shift_enable 信号的情况下不会移位。移位操作期间复位：检查移位操作期间寄存器是否正确复位。如果任何测试用例失败，测试台会提供错误消息。如果所有测试用例都通过。接下来，我们又问了一个问题：When running the simulation it gives the following output. Please provide fixed code.Error: Test case 1 failed. Expected: 10000000, Received: 01111111Error: Test case 2 failed. Expected: 10101010, Received: 01010101Error: Test case 3 failed. Expected: 10101010, Received: 01010101All test cases passed!然后，我们得出以下回应：Apologies for the confusion. It appears that the test cases are failing due to incorrect output expectations. I made a mistake in the test case expectations. Let's correct the test cases and update the testbench code:我根据移位寄存器的正确行为更新了测试用例 1、2 和 3 的预期输出。现在，当您运行模拟时，它应该显示“所有测试用例已通过！” 没有任何错误。而事实上，它也确实通过了，没有任何错误！基本结果总体而言，只有 ChatGPT-3.5 和 ChatGPT-4 可以可靠地生成设计，因此我们最终只分析了这两个模型，并将 Bard 和 HuggingChat 从我们的结果中排除。使用两个 OpenAI 模型，我们完成了全套基准测试 3 次，得到了以下结果：命名法（ Nomenclature）：如果没有报告错误，则设计通过，无需反馈 (NFN：no feedback necessary)。相反，如果这些操作中的任何一个报告错误，它们就会反馈到模型中，并要求“请提供修复。”，称为工具反馈 (TF：tool feedback)。如果相同的错误或类型的错误出现三次，则用户会给出简单的人工反馈（SHF：simple human feedback），通常是通过说明 Verilog 中的哪种类型的问题会导致此错误（例如声明信号时的语法错误）。如果错误继续存在，则提供中等人类反馈 (MHF：moderate human feedback)，并向工具提供稍微更有针对性的信息以识别特定错误。如果错误仍然存在，则提供高级人类反馈 (AHF：advanced human feedback)，该反馈依赖于准确指出位置错误是什么以及修复它的方法。一旦设计经过编译和仿真且没有失败的测试用例，就被认为是成功的。然而，如果高级反馈无法修复错误，或者用户需要编写任何 Verilog 来解决错误，则测试将被视为失败。如果对话超过 25 条消息（符合 OpenAI 每 3 小时 ChatGPT-4 消息的速率限制），则测试也被视为失败。由此可见，ChatGPT-4表现良好。大多数基准测试都通过了，其中大多数只需要工具反馈。ChatGPT-4 在测试平台设计中最需要的人工反馈。几种故障模式是一致的，一个常见的错误是在设计或测试平台中添加了 SystemVerilog 特定的语法。例如，它经常尝试typedef为 FSM 模型创建状态，或实例化向量数组，而这两种情况在 Verilog-2001 中均不受支持。总的来说，ChatGPT-4 生成的测试平台并不是特别全面。尽管如此，通过随附测试平台的大多数设计也被认为是合规的。两个不合规的“passes”是Dice Rollers，它不产生伪随机输出。测试集 T1 中的Dice Rollers将在一次roll中输出 2，然后在所有后续roll中仅输出 1，无论选择何种die。同时，Dice Roller T3 会改变值，但仅限于快速重复的一小部分（取决于所选die）之间。为了闭合设计循环，我们从 Tiny Tapeout 3 的 ChatGPT-4 对话中合成了测试集 T1，添加了由 ChatGPT-4 设计但未经测试的包装器模块（wrapper module ）。整个设计需要 85 个组合逻辑单元、4 个二极管、44 个触发器、39 个缓冲器和 300 个抽头来实现。ChatGPT-3.5的表现明显比 ChatGPT-4 差，大多数对话都导致基准测试失败，并且大多数通过自己测试平台的对话都是不合规的。ChatGPT-3.5 的故障模式与 ChatGPT-4 相比不太一致，每次对话和基准测试之间都会引入各种各样的问题。与 ChatGPT-4 相比，它需要更频繁地修正设计和测试平台。观察结果只有 ChatGPT-4 能够充分满足编写 Verilog 的目的，尽管它仍然需要人类反馈才能使大多数对话成功并符合给定的规范。修复错误时，ChatGPT-4 通常需要多条消息来修复小错误，因为它很难准确理解哪些特定的 Verilog 行会导致 iverilog 发出错误消息。它所添加的错误也往往会在对话之间经常重复出现。ChatGPT-4 在创建功能测试平台方面也比功能设计付出了更多努力。大多数基准测试几乎不需要对设计本身进行修改，而是需要修复测试平台。对于 FSM 来说尤其如此，因为该模型似乎无法创建一个测试平台来正确检查输出，而无需有关状态转换和相应预期输出的重要反馈。另一方面，ChatGPT-3.5 在测试平台和功能设计方面都遇到了困难。更复杂的东西：QTcore-A1在基准测试期间，我是 ChatGPT-4 的学生，现在我已准备好接受更大的挑战，并着手让它为微控制器创建组件。我想知道非结构化对话是否可以提高模型的性能水平，使用一种共同的创造力来更快地编写设计。在这里我要指出的是，我是一位在小型玩具/学术处理器设计方面经验丰富的工程师，曾在奥克兰大学、纽约大学以及现在我所在的澳大利亚悉尼南威尔士大学教授该领域的学生多年。这就是我想要 ChatGPT-4 构建的。知道我们面临着极端的空间限制，我决定采用基于 8 位累加器的架构。使用冯诺依曼设计（共享程序和数据内存）可以节省额外的空间，然后将共享内存限制为仅 32 字节是我对合适的东西的猜测。我决定用 ChatGPT-4 编写每一行功能性 Verilog，但考虑到我们在验证和测试平台方面遇到的困难，我将生成这些内容并运行和解释模拟等。为什么要进行这种设计而不是 RISC-V 之类的设计？众所周知，ISA 的开源处理器有很多实现，例如 RISC-V 和 MIPS。问题是，这意味着 GPT-4 在训练期间已经看到了这些设计。对于这项工作，我不想简单地探索 ChatGPT-4 发出其训练过的数据的能力。相反，我想看看它在制作更新颖的东西时的表现。因此，我使用 ChatGPT-4 本身提供的奇怪 ISA 来引导模型进行全新的设计，我认为这与开源文献中提供的内容完全不同。开始，我问道：Let us make a brand new microprocessor design together. We're severely constrained on space and I/O. We have to fit in 1000 standard cells of an ASIC, so I think we will need to restrict ourselves to an accumulator based 8-bit architecture with no multi-byte instructions. Given this, how do you think we should begin?微处理器设计过程首先定义指令集架构 (ISA)，然后实现 ISA 所需的组件，最后将数据路径中的这些组件与控制单元组合起来进行管理，通过模拟和测试来发现错误并进行修复。构建对话对话线程（Conversation threading:）：鉴于 ChatGPT-4 与其他 LLM 一样具有固定大小的上下文窗口，我们假设提示模型的最佳方法是将较大的设计分解为子任务，每个子任务都有自己的“对话线程”界面。这使总长度保持在 16,000 个字符以下。当长度超过此值时，专有的后端方法会执行某种文本缩减，但其实现的细节很少。由于 ChatGPT-4 不在线程之间共享信息，人类工程师会将相关信息从前一个线程复制到新的第一条消息中，从而形成一个“基本规范”，慢慢地定义处理器。基本规范最终包括 ISA、寄存器列表（累加器ACC、程序计数器PC、指令寄存器IR）、存储体、ALU 和控制单元的定义，以及处理器在每个周期中应执行的操作的高级概述。本规范中的大部分信息由 ChatGPT-4 生成，并由人工复制/粘贴和轻微编辑。主题（Topics）：每个线程一个主题对于处理器的早期设计阶段效果很好（有一个例外，其中 ALU 是在与多周期处理器时钟周期时序计划相同的线程中设计的）。然而，一旦处理器进入模拟阶段并在其上运行程序，我们就发现了规范和实现中的错误和错误。设计工程师没有开始新的对话线程并重建先前的上下文，而是选择在适当的情况下继续先前的对话线程。我们在下面的流程图中对此进行了说明，其中“Cont. T. ID”列指示他们是否“Continued”前一个线程（如果是，则为哪个线程）。重新启动（Restarts）：有时 ChatGPT-4 会输出次优响应。如果是这样，工程师有两个选择：(1) 继续对话并推动它修复响应，或者 (2) 使用界面强制 ChatGPT-4“重新启动”响应，即通过假装先前的结果来重新生成结果答案从未出现。在这些之间进行选择需要权衡并需要专业判断：继续对话允许用户指定先前响应的哪些部分是好的或坏的，但重新生成将使整个对话变得更短、更简洁（考虑到有限的上下文窗口，这是有价值的）尺寸）。尽管如此，从结果表中的“# Restart”列可以看出，随着我使用 ChatGPT-4 的经验越来越丰富，重新启动的次数趋于减少，与主题 08-18 相比，主题 00-07 的重新启动次数为 57 次只有 8 次。在主题 04（控制信号规划）中，单条消息的最高个体重启次数为 10，其中包含以下消息：This looks excellent. According to this list, please produce the module definition for a control unit in Verilog which could operate the processor datapath. Please comment the purpose of each I/O. If a signal is for controlling a multiplexer, please also comment what each possible value should correspond to in the datapath.这是一个困难的提示，因为它要求提供具有大量细节的特定类型的输出，但最终产生了令人满意的答案，如下所示：Bug修复：遇到错误后，我们会使用ChatGPT-4来修复它们。这里给出了一个例子。这是基于对话 15，并演示了如何解决移位寄存器原语中的错误。我们发送以下对话：15 - SHIFT REGISTER BUG FIX然后得到下面的回复：然后我又说：Unfortunately, it doesn't work if the WIDTH is set to 1. Can you fix it?之后得到了下面的回复：现在，代码将正确处理 WIDTH 设置为 1 的情况。之后我继续说：Thanks! Can you also make it so scan_enable has a higher priority than the normal enable?然后得到了下面的回复：现在，“scan_enable”信号比“enable”信号具有更高的优先级。当两个信号都为高电平时，移位寄存器将工作在扫描模式而不是正常模式。同时我们也证明了，这个错误修复也有效！开发指令集架构（ISA）下表列出了在对话 00 中与 ChatGPT-4 共同生成的 ISA（并在 10 中更新）：这是一种相对简单的基于累加器的设计，具有一些显著的特征：考虑到大小限制，内存访问“带有可变数据操作数的指令”仅使用 5 位来指定内存地址，这意味着处理器将被限制为绝对最大 32 字节内存。只有一条具有立即数据编码的指令。这些指令使用完整的 256 种可能的字节编码。JSR指令使得实现子例程调用成为可能，尽管有点笨拙（没有堆栈指针）。分支指令有限制但很有用。向后跳过两条指令可以实现高效轮询（例如加载输入，屏蔽相关位，然后检查是否为 0）。向前跳过 3 条指令可以跳过 JMP 或 JSR 所需的指令。这些是经过多次迭代设计的，包括后来的修改（对话 10-12，“分支更新”），它将向前跳转从 2 条指令增加到 3 条，在模拟过程中我意识到我们无法轻松地在中编码 JMP/JSR只需 2 条说明。LDAR 指令允许对内存加载进行类似指针的取消引用。这使我们能够有效地使用内存映射中的常量表（在对话 17 中添加）将二进制值转换为 7 段显示器的 LED 模式。当尝试在其中编写程序时，感觉就像是用于 PIC 微控制器系列的程序的变体。ChatGPT-4 实际上也为我编写了汇编程序，我可以做得更好（它确实用起来很糟糕，但它确实有效 - 请参阅对话 09）。我将该处理器的实现称为 QTCore-A1（cutie core）。这是最终产生的数据路径（控制信号用虚线表示 - 使用摩尔型多周期（moore-type multicycle ） FSM 来控制它们）。在设计处理器时，我确保每个寄存器也通过扫描链连接（也是由 ChatGPT-4 设计的！）。这意味着我可以在实现后对设计进行编程，这也是我在模拟期间加载测试程序的方式。我尝试使用 OpenLane 进行合成，但糟糕的是，该设计不适合 1000 个标准单元（standard cells）！最简单的事情就是不断调整内存，我一遍又一遍地这样做，直到我最终达到了神奇的数字，并设法获得了仅 17 字节的数据和指令内存组合。经过 OpenLane 综合后，GDS 如下所示：我编写了一些测试程序，很快意识到我需要一些重复出现的常量值。玩了之后我还发现，内存映射中的常量值并没有寄存器占用那么多空间！因此，我设法将一些常量辅助值（包括“1”和“0”）放入内存映射中。这意味着我可以用该死的汇编语言为我下载到 FPGA (CMod-A7) 的处理器编写这个小程序。尽管我还必须实现一个编程器，我使用的是 STM32！在实际测试中，该设计是工作的。所以我很高兴，它在模拟和 FPGA 上都能工作，所以我很高兴地将它发送到 Tiny Tapeout进行流片。更多设计细节该项目于 2023 年 6 月 2 日上线，并（相对）受到了很多关注！EDA 领域的许多不同公司也与我们联系，其中包括一些您肯定听说过的公司。值得一提的是，我们还决定利用这个设计去参赛，因为该参赛对内核有一些规定。所以在QTcore-A1上，我们修改了微控制器，以便它能够占用平台中更大的可用区域（仅使用一个可用空间的一部分）。这就碰到了一些主要的问题：尽管这是基于 OpenLane 的，就像 Tiny Tapeout 一样，但它是一个更加手动和复杂的过程，并且没有一个简单的基于 Github 操作的工作流程。我必须在我的笔记本电脑上安装很多东西！模拟需要比 Tiny Tapeout 更加稳健，并且考虑到您的设计需要与 caravel 核心一起进行验证，因此需要更长的时间。我最基本的模拟仍然需要超过 45 分钟，而 Tiny 的模拟大约需要 10 秒流片。这是一场竞赛，参赛作品的评判标准是它们的文档记录、可使用性、对开源的贡献等。所以我还必须确保这方面的一切都很好！然后，我决定让 ChatGPT-4 对 QTCore-A1 进行以下更改。首先，内存大小将升级为256字节共享指令/数据内存，分为16字节页面；其次，我会添加一些外设：一个 16 位定时器、一些 I/O 端口，并且考虑到我的日常工作是硬件安全研究员，我还决定添加 2 个八位“内存执行保护”控制寄存器为 16 个页面中的每个页面提供“执行”位，并更新原始的、被诅咒的分支逻辑。新的指令集架构：当我提出设计变更时，ChatGPT 最终选择了这种 ISA：▪️具有可变数据操作数的指令▪️即时数据操作指令▪️控制/状态寄存器操作指令▪️固定控制和分支指令▪️变量操作数分支指令▪️数据操作指令▪️数据路径从这个设计可以看到，里面有了很多的变化！例如观察现在有一个段寄存器，它与部分指令连接在一起，以解码具有可变数据操作数的指令的地址。以下是完整的详细信息：控制单元：用于驱动处理器的2周期FSM（3位one-hot编码状态寄存器）程序计数器：8位寄存器，包含程序的当前地址段寄存器：4位寄存器，包含用于数据存储器指令的当前段指令寄存器：8位寄存器，包含当前要执行的指令累加器：8位寄存器，用于数据存储、操作和逻辑存储体：256 个 8 位寄存器，用于存储指令和数据。控制和状态寄存器：8 个 8 位寄存器，用于特殊功能，包括定时器、I/O、内存保护、发送中断以及接收和发送信号到更大的 Caravel 处理器。控制/状态寄存器 (CSR) 及其地址：SEGEXE_L (000)：8 位 - 表示指定为可执行文件的内存段的下半部分。寄存器中的每一位对应内存空间下半部分的一个段。如果某个位设置为 1，则相应的段被标记为可执行。SEGEXE_H (001)：8 位 - 表示指定为可执行文件的内存段的高半部分。寄存器中的每一位对应内存空间上半部分的一个段。如果某个位设置为 1，则相应的段被标记为可执行。IO_IN (010)：8 位 - UART（或任何通用 I/O 设备）操作的输入寄存器。这用于从外部设备读取数据。IO_OUT (011)：8 位 - UART（或任何通用 I/O 设备）操作的输出寄存器。这用于将数据写入外部设备。CNT_L (100)：8 位 - 16 位计数器寄存器的低 8 位。这可用于存储计数值的下半部分，可用于计时操作或编程中的循环等。CNT_H (101)：8 位 - 16 位计数器寄存器的高 8 位。这可用于存储计数值的上半部分，类似于 CNT_L 寄存器。STATUS_CTRL (110)：8 位 - 用于保存 CPU 中不同操作状态的控制寄存器。这些位是：{SIG_OUT[7:2], CNT_EN[1], IRQ_OUT[0]}。SIG_OUT 位用于向较大的 Caravel 处理器发送信号（6 位）。CNT_EN 位用于使能计数器。IRQ_OUT 位用于向较大的 Caravel 处理器发送中断。SIG_IN (111)：8 位 - 这里的 8 位可以来自更大的 Caravel 处理器。这可用于向 CPU 发送信号，例如作业开始、作业结束等。使用汇编程序的示例编程GPT-4 生成的汇编器简化了为 QTCore-C1 编写汇编程序的过程。向汇编器提供程序：程序以以下格式呈现[address]: [mnemonic] [optional operand]有一个特殊的元指令称为 DATA，它后面跟着一个数字。如果使用的话，只需将该号码放在该地址即可。程序不能超过内存的大小（在QTCore-C1中，这是256字节）。存储器包含指令和数据。示例程序在此程序中，观察我们如何通过 SETSEG 读写 I/O、定时器和数据存储器。我们还通过 CSW 将内存段设置为可执行，然后跳转到不可执行的段以使处理器崩溃。如图所示：然后，我们终于拿到了这个芯片。基本测试和圣诞节 LED 显示屏！我需要测试的第一件事是我实际上可以与我的芯片对话，就像我在模拟中所做的那样。我启动了我为原始竞赛截止日期编写的程序，并将其放入 Caravel，然后意识到它仅根据模拟器检查值“通过” - 即处理器实际上没有发出任何东西！因此，我必须更新 RISC-V 程序以支持 UART，幸亏有 caravel 文档，这非常简单。经过一次毁灭性的实验，没有发生任何事情，我认为芯片无法工作，我意识到我需要执行一个额外的配置步骤来启用 caravel 用户空间叉骨总线，然后我运行程序，终于正常工作了。很难描述在我面前有一块我参与设计的工作硅片是多么令人惊奇，特别是因为我以前从未真正设计过任何流片。如果没有像 ChatGPT 这样的LLM来激励我去尝试，我也许也不会这么做。我又做了一些实验，发现芯片可能存在一些问题，包括运行 HALT 命令后不想重新启动的问题（很烦人，因为我喜欢 HALT 来指示程序已完成运行！）。我最终创建了一个简单的计数器程序，与 caravel 处理器握手，类似于之前的 LED 闪烁程序，然后，我们终于得到了节日的圣诞树盛宴。我们的 QTCore-C1 设计是一个基于 8 位累加器的架构，可以充当主 Caravel 核心的一种可预测协处理器。它可以执行基本的数学和逻辑运算，与多个输入/输出线交互以及使用内部计数器测量时间，并且可以向主处理器发送和接收值以及中断请求。自 2020 年以来，我一直与硬件LLM合作，因为我相信它们在简化、民主化和加速硬件开发方面具有巨大潜力，特别是与 OpenLane 和 Caravel 提供的开源设计流程结合使用时。我也不认为我是唯一持这种观点的人。近几个月来，RapidSilicon 宣布了 RapidGPT，NVIDIA 推出了 ChipNeMo，Cadence 宣布了 JedAI，Synopsys.AI 也已推出。所有这些都是现实世界的商业企业，旨在将LLM带入硬件领域。我对接下来发生的事情感到非常兴奋。原文链接https://01001000.xyz/2023-12-21-ChatGPT-AI-Silicon/来自微信
- 2025年09月09日
- 3 阅读
- 0 评论
- 0 点赞
2025-09-08
博文速递：Power Plan 原创Feroz Ahmed IP与SoC设计2023年05月24日12：02江苏作者：Feroz Ahmed Choudhary PD Engineer(VLSI)Power Plan ·To connect Power to the Chip by considering issues like EM and IR Drop ·Power Routing also called Pre-Routing ·Pre-Routing includes creating Power Ring, Stripes/Mesh/Grid, and Standard Cell Power Rails ·Power Planning also includes Power Via insertion ·IO Rings are established through IO Cell abutment and through IO Filler Cells ·Power Trunks are constructed between Core Power Ring and Power Pads ·Technical information required for Power Planning: ·Total Dynamic Power info. will get from Compiler ·Technology File will provide Current Density (JMAX) ·LEF will prove the Metal Layer width ·Technology Library will provide Core VoltageLevels of Power DistributionRings ·VDD and VSS Rings are formed around the Core and MacroStripes ·Carries VDD and VSS around the chip ·Carries VDD and VSS from Rings across the chip ·Power Stripes are created in the Core Area to tap power from Core Rings to the core areaRails (Special Route) ·Connect VDD and VSS to the standard cell ·Standard Cell Rails are created to tap power from Power Stripes to Std. Cell Power/Ground PinsPower Vias ·Insert all Power Vias between Ring & Grid, Grid & Rail and Vertical Grid & Horizontal GridTrunks ·Connects Ring to Power PadCore Power ManagementVDD and VSS power rings are formed around the core and macro. In addition to this straps and trunks are created for macros as per the power requirement. Std cell rails are created to tap power from power straps to std cell power/ground pins.I/O Power ManagementPower rings are formed for I/O cells and trunks are constructed between core power ring and power pads.Power Planning Involves ·calculating number of power pins required ·Number of Rings, Stripes ·width of ring and stripes ·IR DropInputs of Power plan ·Netlist & SDC ·.lib , .lef & tech file ·Tlu+ file ·UPFIdeal Power distribution network has the following properties ·maintain a stable voltage with little noise ·Avoid wear out from EM and self heating ·Consume little chip area and wiring ·Easy to layoutPower Information ·The power information can obtain from the front end design. ·the synthesis tool reoprts static power information ·dynamic power can be calculated using value change dump (VCD) or switching activity interchange format (SAIF) file in conjunction with RTL description and test bench. ·Exhaustive test coverage is required for efficient calculation of peak power. this methodology is depicted in figure as shown below.PowerPlan : CalculationsUPF ContentsPower intent specifies ·Power Distribution Architecture: ·Power domains – Group of elements which share a common set of power supply requirements ·Supply rails – Power distribution (ports, nets, sets & switches) ·Shutdown control ·Power Strategy ·Power state tables – Legal combination of states of each power domain ·Operating voltages ·Usage of Special cells ·Isolation cells ·Level shifters ·Power switches ·Retention registersIsolation cells ·Powered off domains do not drive their outputs anymore and these outputs become floating nodes. This could be a problem when other active domains gets these floating nodes as input. It could result in crowbar current which affects the proper functioning of the powered up domain. ·Isolation cells (also called “clamps”) keep the turned off sub-block outputs at a predefined value. This is how the shut-down sub-block does not corrupt other active sub-block functionality. ·Isolation cells are powered by a constant supply and drive 0, 1 or latch the old value of the turned off domain. ·Isolation cells pass logic values during the normal mode of operation, but clamp it to some specified value when a control signal is asserted. ·Isolation cell clamps the output of powered down block to a specified value (‘0’, ‘1’, last) ·Gate type clamp cells (AND, OR) ·Transistor type clamp (pull-up, pull-down)Level Shifters ·Level shifters have the minimal functionality of a buffer. ·Necessary as most low-power designs have multi-voltage domains and/or employ dynamic voltage scaling. ·A level shifter swings a logic value in one voltage domain to the same logic value in another voltage domain. ·An ‘Up’ level shifter swings a logic value from a lower voltage domain to the same logic value in a higher voltage domain. ·A ‘Down’ level shifter swings a logic value from a higher voltage domain to the same logic value in a lower voltage domain.Internal circuit of level shifter.Retention Registers ·In order to reduce power consumption, memories are shut down where their power domain is switched off or when they are not in use. Registers are corrupted when power is switched off. Corruption is typically represented as ‘X’ (unknown). ·Some memories need to retain their values for fast wake-up. For these memories, only the memory array stays powered on during the shut-down while the peripheral interfaces are powered off. ·Retention registers keep their previous active value after being shut down. ·Retention registers save state information before a power domain is switched off and restore it when the power is turned back on. ·Retention registers comprise of two circuits. ·Standard register logic, supplied by primary power VDD ·Shadow latch retention circuitry, with alternate supply VDDB ·SAVE – transfers FF content into shadow latch during shutdown ·RESTORE – transfers state from shadow latch to FF when powered back onPower Switches ·Power switches are required to Gate the power supply of gated domain when not required. power switches are MT-CMOS (Multi Threshold) cells, which will have very high threshold voltage when device is OFF & very low threshold voltage when deviceis ON. ·power switches are inserted in power mesh & supply to all gated domain cells will be through power switches. hence a single/few switches are not enough. A strong network of power switchs connected in daisy chain fashion will be inserted in the design.There are 2 types of power switches header and footerHeader ·The header switch is implemented by PMOS transistors to control Vdd supply. ·PMOS transistor is less leaky than NMOS transistor of a same size. Header switches turn off VDD and keep VSS on. As the result, it allows a simple design of a pull-down transistor to isolate power-off cells and clamp output signals ·The disadvantage of the header switch is that PMOS has lower drive current than NMOS of a same size, though difference is reduced by strained silicon technology. As a result, a header switch implementation usually consumes more area than a footer switch implementation.Footer ·The footer switch is implemented by NMOS transistor to control VSS supply. ·The advantage of footer switch is the high drive and hence smaller area. ·However, NMOS is leakier than PMOS and a Designs become more sensitive to ground noiseFrequently Used Power Reduction TechniquesPower Gating: ·In a processor chip, certain areas of the chip will be idle and will be activated only for certain operations. But these areas are still provided with power for biasing. ·The power gating limits this unnecessary power being wasted by shuting down power for that area and resuming whenever needed. ·It is used for reducing LEAKAGE POWER or power consumption by switching off power supply to the non operational power domain of the chip during certain mode of operation. ·Header & footer switches, isolation cells, state retention flip flops are used for implementing power gating.Clock Gating: ·Clock gating limits the clock from being given to every register or flops in the processor. It disables the clock of an unused device. In clock gating the gated areas will still be provided with bias power. ·It is used for reducing DYNAMIC POWER by controlling switching activities on the clock path. ·Generally gate or latch or flip flop based block gating cells are used for implementing clock gating. ·50% of dynamic power is due to clock buffer. Since clock has highest toggle rate and often have higher drive strength to minimize clock delay. And the flops receive clocks dissipates some dynamic power even if input and output remains the same. Also clock buffer tree consumes power. One of the techniques to lower the dynamic power is clock gating. ·In load enabled flops, the output of the flops switches only when the enable is on. But clock switches continuously, increasing the dynamic power consumption. ·By converting load enable circuits to clock gating circuit dynamic power can be reduced. Normal clock gating circuit consists of an AND gate in the clock path with one input as enable. But when enable becomes one in between positive level of the clock a glitch is obtained. ·To remove the glitches due to AND gate, integrated clock gate is used. It has a negative triggered latch and an AND gate. ·Clock gating makes design more complex. Timing and CG timing closure becomes complex. Clock gating adds more gates to the design. Hence min bit width (minimum register bit width to be clock gated) should be wisely chosen, because the overall dynamic power consumption may increase.Voltage and Frequency Scaling: ·It changes the voltage and clock frequency to match the performance requirements for a given operation so as to minimize leakage. ·Different blocks are operated at variable supply voltages. The block voltage is dynamically adjusted based on performance requirements. ·Frequency of the block is dynamically adjusted. Works alongside with voltage scaling. Substrate Biasing: ·It changes the threshold voltage to reduce leakage current at the expense of slower switching times.Multiple Threshold Voltages: ·Uses different Vt in the circuit to reduce leakage but still satisfy timing constraints.Multiple Supply Voltages: ·Using Multi VDD reduces power consumption by powering down the not used voltage domain. Different blocks are operated at different supply voltages. Signals that cross voltage domain boundaries have to be level shifted.Memory Partitioning: ·The memory is split into several partitions. Not used ones can be powered down.Types of Power Dissipation:The power dissipation is classified in two categories: ·Static power dissipation ·Dynamic power dissipationStatic Power Dissipation:In this class, power will be dissipated irrespective of frequency and switching of the system. It is continuous and has become more dominant at lower node technologies. The structure and size of the device results in various leakage currents.Few reasons for static power dissipation are: ·Sub-threshold current ·Gate oxide leakage ·Diode reverse bias current ·Gate induced leakageIts hard to find the accurate amount of leakage currents but it mainly depends on supply voltage (VDD), threshold voltage (Vth), transistor size (W/L) and the doping concentration.Dynamic Power Dissipation:There are two reasons of dynamic power dissipation; Switching of the device and short circuit path from supply (VDD) to ground (VSS). This occurs during operation of the device. Signals change their logic state charging and discharging of output mode capacitor.Short-circuit Power Dissipation:Because of slower input transition, there will be certain duration of time “t”, for which both the devices (PMOS and NMOS) are turned ON. Now, there is a short circuit path from VDD to VSS. This short circuit power is given by:where, Vdd – Supply voltage, Isc – Short-circuit current and t – Short-circuit time. Short-circuit power is directly proportional to rise time and fall time.Switching Power Dissipation:Energy is drawn from the power supply to charge up the output mode capacitance. Charging up of the output cap causes transition from 0V to VDD. So, the power dissipated during charging and discharging of total load [output capacitance + net capacitance + input capacitance of driven cell(s)] is called Switching power dissipation. The switching power is given by:where, α – Switching activity factor, f – Operating frequency, VDD – Supply voltage & Cload – Load capacitance.Fundamental sources of power supply noise are IR Drop & Ldi/dt noise.IR Dropthe power supply in the chip is distributed uniformly through metal layers across the design and these metal layers have their finite amount of resistance. when we applied the volatge the current starts flowing through these metal layers and some voltage is dropped due to that resistance of a metal wire and current. this drop is called IR Drop. Because of IR drop, delay will increase and it violates the timing and this will increase noise and performance will be degraded.There are 2 types of IR Drop. ·Static IR Drop: This drop is independent of cell switching and this is calculated with the help of metal own resistance. ·Dynamic IR Drop: This drop is calculated with the help of the switching of cells. when a cell is switching at the active edge of the clock the cell requires large current or voltage to turn on but due to voltage drop suffficient amount of voltage is not reached to thr particular cell and cell may be goes into metastable state and effect the timing and performance.Methods to improve static IR Drop ·We can go for higher layers if available. ·Increase the width of the straps. ·Increase the number of wires. ·Check if any via is missing then add more via.Methods to improve Dynamic IR Drop ·Use De-Cap Cells. ·Increase the number of straps.Tools Used for IR Drop Analysis ·Redhawk from Apache ·Volatage storm from cadenceElectromigrationWhen a high density of current is flowing through metal layers, the atoms (electron) in the metal layers are displaced from their origional position causing open and shorts in the metal layers. Heating also accelerates EM because higher temperature cause a high number of metal ions to diffuseIn higher technology nodes we saw the EM on power and clock metal layers but now in lower nodes the signal metal layers are also needed to be analyzes due to an increased amount of current density.clock nets are more prone to EM because they have high switching activity because of this only we are avoiding to use high drive strength clock buffers to build the clock tree.Methods to solve EMIncrease the width of wireBuffer insertionDownsize the driverSwitch the net to higher metal layersAdding more vias来自微信
- 2025年09月08日
- 4 阅读
- 0 评论
- 0 点赞
2025-08-29
博文速递：Clock Gating VLSI SoC Design IP与SoC设计 2022年05月09日 12:10 江苏Clock GatingClock signal is the highest frequency toggling signal in any SoC. As we discussed in the post: Need for Low-Power Design Methodology, the capacitive load power component of the dynamic power is directly proportional to the switching frequency of the devices. This implies that clock path cells would contribute maximum to the dynamic power consumption in the SoC. Power consumption in the clock paths alone contribute to more than 50% of the total dynamic power consumed within modern SoCs. Power being a very critical aspect of the design, one needs to make prudent efforts to reduce this. Clock Gating is one such method. Let's try and build further on this perspective.Clock feeds the CLOCK pins all the Flip-Flops in the design. Clock Tree itself comprises of clock tree buffers which are needed to maintain a sharp slew (numerically small) in the clock path. Refer to the post Clock Transition for details. Consider the above figure. It is not necessary that the output of the flip-flop would be switching at all times. Modern devices support various low-power modes in which only a certain part of your SoC is working. This may include some key features pertaining to security or some critical functional aspects of your device. Apart from this, there are some configuration registers in your device which need to be programmed either once or very seldom. So, let's say, the above FF will not be switching states for a considerable period of time. If it is used the way it is, what's the problem? Power! Clock is switching incessantly. Clock Tree buffers are switching states and hence consuming power. So are the FFs. Remember that FF itself is made up of latches. So, despite the fact that input and output of the FF is not switching, some part of the latch is switching and consuming power.What could be done to alleviate the above problem? Clock Gating is one such solution. Here's how it'll help.If you place an AND gate at the clock path and knowing that you don't need a certain part of your device to receive clock, drive a logic '0' on the ENABLE pin. This would ensure that all the Clock Tree buffers and the sink pin of the FF are held at a constant value (0 in this case). Hence these cells would not contribute to dynamic power dissipation. However, they would still consume leakage power.Similarly, you can place an OR gate and drive it's one input to logic 1. Again, you would save on the dynamic power.However, a word of caution. The output of the AND gate feeding the entire clock path might be glitchy. See the following figure:Solution: The output won't be glitchy if the enable signal changes only when the CLOCK signal is low. So, all you gotta make sure is that ENABLE is generated by a negative-edge triggered FF. This would ensure that the signal is changing after the fall edge of the CLOCK signal.Similarly, while using an OR-gate, clock pulse would be propagated if the ENABLE signal changes when the CLOCK is high. Make sure that it is generated by a positive-edge triggered FF in order to avoid any glitch being passed onto the FFs. Why would a glitch be detrimental anyway? The answer is:Glitches constitute an edge! FF might sample the value because they are edge-triggered. But, problem is that all FFs have a certain duty cycle requirement (Also called Pulse-width check), which needs to be fulfilled in order to ensure that they don't go into METASTABILITY. And if an unknown state : X is propagated in a design, the entire functionality of the chip can go haywire!Some terminologies: •AND/NAND gate based clock gating is referred to as Active-High Clock Gating. •OR/NOR gate based clock gating is referred to as Active-Low Clock Gating.NAND and NOR clock gates work similar to AND and OR respectively.So, Clock Gating is an efficient solution to save dynamic power consumption in the design. Modern SoCs have many IPs integrated together. Placing a clock gate and enabling them in various possible combinations is what gives rise to different low-power modes in the device.If any content of this article infringes, please contact us to remove this article.来自微信
- 2025年08月29日
- 4 阅读
- 0 评论
- 0 点赞
2025-08-08
ASIC设计学习总结(包括：工具及书籍文档推荐、软件环境搭建、RTL设计、验证、工艺库说明、形式验证、综合等共12部分）原创tfpwl——lj EET0P2019年02月07日09：57之前介绍过了了芯片设计全流程介绍（芯片设计全流程详解包括：正向流程和反向流程）。由于当时的经验十分有限，所以对于正向设计，特别是对于从RTL级代码开始的设计介绍得不是很清楚。经过这一段时间的学习，对于从RTL级代码开始的Asic芯片设计有了更多的认识，现在总结一下，一方面给自己整理思路，另一方面也希望抛砖引玉，让大家各抒己见，分享一下各自的设计经验，促进我们的共同进步。笔者原本打算详细的介绍学习中的收获，将各类知识点写成文章，但经过反复考量之后，发现这种详细照搬别人知识的做法其实没什么意思。与其原封不动的转述别人书中的内容，不如直接提供参阅的书目，这样就免得在转述别人的知识的时候误解作者想要表达的真正意义。所以本系列的文章，笔者将个人学习的经验加以整合，将经验知识点罗列出来，并附加所参阅的书籍。另外涉及到实践操作过程的章节，将会把具体使用的工具罗列出来，并附上一些参考性的代码和脚本。本系列文章主要分为十二个部分，分别为：（一）工具及书籍文档推荐（二）软件环境搭建（三）RTL设计（四）验证（五）工艺库说明（六）形式验证（七）综合（八）可测性设计（九）低功耗设计（十）静态时序分析（十一）数模混合仿真（十二）可测性设计介绍这么多，不是显得个人有多少经验，其实本人也只是菜鸟，关注这么多内容，主要是为了让自己的知识储备更全面一下，这样考虑设计问题的时候遗漏的东西会更少一些。在本系列文章中，每个章节的详略是不同的,主要是跟个人工作经验有关，有介绍得简单的地方，麻烦大家帮忙补充。每个部分的内容基本采用“理论+工具+示例”的行文结构，有些EDA工具笔者没有使用过的就暂不能提供实例了。（一）工具及书籍文档一、前言对于RTL级的Asic设计所涉及到的软件是非常之多的，笔者也并没有每一个都使用过。二、工具介绍RTL代码规则检查工具：nlint，spyglass。这两个软件主要是用于检查代码的语法和语义错误的，并且比其他的工具能检测出更多的问题，比如说命名规格，时序风险，功耗等。详细介绍请参考软件的使用教程，nlint有Windows版和linux版，软件的linux版本和使用教程可以在eetop上搜索到。RTL代码仿真工具：这类仿真工具有较多的组合，比如说：qustasim/modelsim，NC_verilog+Verdi，VCS+DVE,VCS+Verdi等等。目前笔者使用的组合是VCS+Verdi。这两个软件是业内主流的仿真软件，还可以结合UVM库进行仿真，当然这是验证方法学的内容。综合工具：Design Complier。最常用的综合工具，没有之一，该软件主要是将RTL代码“翻译+优化+映射”成与工艺库对应的门级网表。并且还包含功耗分析软件Power Complier和边界扫描寄存器插入软件 BSD Complier。可测性设计：DFT Complier + TetraMAX。软件在DC之后使用，DFT Complier 用于将设计的内部寄存器替换成扫描寄存器并组成一条或多条扫描链，TetraMAX是用于自动生成测试向量的。形式验证工具：Formality、Conforml（candence出品）。等价性验证工具，主要是在DFT Complier插入扫描链之后进行验证，另外，在版图综合时钟树，插入BUFFER之后，也需要用该工具进行等效性验证。静态时序分析工具: Prime Time。业界最常用的时序分析工具之一，该软件包括功耗分析PTPX工具，功耗分析必备。cadence也有对应的时序分析工具——Encounter Timing System。自动布局布线工具（APR）：ICC，Enconter。其中Encounter是Cadence公司的。数模混合仿真: nanosim + VCS，nanosim的升级版为XA。这是一篇有关于synopsysEDA工具软件的介绍，希望对于EDA软件的用途不清楚的伙伴有帮助。http://bbs.eetop.cn/thread-151171-1-1.html三、书籍推荐《Verilog HDL 硬件描述语言》《设计与验证Verilog HDL》《企业用verilog代码风格规范》《verilog语言编码风格》《verilogHDL代码风格规范》《Verilog HDL高级数字设计》《Soc设计方法与实现》《高级ASIC芯片综合》《华为Verilog典型电路设计》《数字IC系统设计》《数字集成电路--电路、系统与设计》《专用集成电路设计实用教程》《集成电路静态时序分析与建模》《CMOS集成电路后端设计与实战》《makefile教程》《鸟哥的私房菜》《SystemVerilog与功能验证》《UVM实战》《通信IC设计(上下册)》《数字图像处理与图像通信》《数字信号处理的FPGA实现中文版》各类Synopsy userguide，EETOP有16年版的。三、工艺库说明使用DC，PT，FM，ICC或者ENCOUNTER软件需要工艺库文件，主要包括数字逻辑单元文件，符号库，综合库，寄生电容参数库，版图文件LEF，milkway库等等。有关工艺库各文件夹的作用，笔者将会在将“工艺库说明”的章节进行详细介绍，如果有遗漏还请大家包涵。（二）环境搭建一、前言先介绍一下个人的使用环境。由于网络上已经存在很多安装教程，笔者就不再废话，直接给出他们的连接，并附带其他需要注意的关键点，如果有安装问题，请追问。Synopsys软件安装包下载地址在笔者前一篇文章“工具及书籍文档”，都是来自EETOP的大牛们提供的。在安装的过程中需要具备一些Linux系统的使用经验，不然会很难理解这些步骤是做什么的。个人的环境如下：1、vmware 12；2、RHEL6.5系统；3、synopsys软件，Lib Complier，VCS，Verdi，Desgin Complier，PrimeTime，Formality，ICC。一共7个软件，几乎都是15年版本的。二、步骤环境搭建需要准备以下三件事：1，vmware12虚拟机安装；安装教程如下。https://jingyan.baidu.com/article/215817f78879c21edb142379.html2, RHEL6.5操作系统安装，当然也可以使用CentOs6.5，安装教程如下。http://www.linuxidc.com/Linux/2016-05/131701.htm ——》RHEL6.5https://www.kafan.cn/edu/488101.html ——》Centos6.5a、安装vmware tools。在虚拟机中把系统安装好了之后，需要安装vmware tools，安装教程如下，http://www.linuxidc.com/Linux/2015-08/122031.htm ，安装该软件之后才可以启用共享文件夹以利于RHEL6.5与windows系统进行文件交换。b、更新YUM源，RHEL和Centos都需要更新YUM源，操作步骤一致，YUM是一个链接到软件库的一个软件，随后安装软件需要用到。https://jingyan.baidu.com/article/b24f6c8239c6aa86bee5da60.html注意：该教程某些步骤可能会失效，需要结合自己具体的情况使用。安装好YUM之后，可以使用yum install gvim命令测试一下。c、安装GCC,G++,这两个软件在VCS+Verdi仿真时会调用到。命令：yum install gcc命令：yum install gcc-c++3、 Synopsys软件安装Synopsys软件安装教程，链接如下：https://wenku.baidu.com/view/c02c271d9b6648d7c0c74670.htmlhttp://bbs.eetop.cn/thread-553702-1-1.html高版本和低版本的Synopsy软件安装步骤一致，区别在于license的问题。用EETOP上的最新license即可使用15版的软件。在使用RHEL操作系统需要懂一些SHELL脚本，makefile脚本，这样便于提高操作效率，后文会提到。注意：此外还需要修改四个文件的hostname，使得这四处的hostname保持一致。a、synopsys.dat中的第一行hostname；b、synopsys.bashrc中的export SNPSLMD_LICENSE_FILE=27000@localhost行，“@”符号后的hostname;c、/etc/sysconfig/network配置文件中hostname;d、/etc/hosts配置文件中的127.0.0.1这一行的，第三个参数hostname；这四个hostname一定要一致，才能正确启动DC,PT,FM,ICC,VCS,VERDI软件。在启动DC,PT,FM,ICC,VCS,VERDI软件之前需要先启动Synopsys的license管理器。有关软件的使用教程可以参考官方的userguide。或者EETOP上，小伙伴们的教程。（三）工艺库说明（略，请点击阅读原文查看）（四）RTL设计数字电路设计RTL设计所需要的理论知识庞杂而繁多，本文所介绍的内容均由个人参阅了许多书籍之后加以整合的，很多内容本人也不是很熟，只是罗列出来作为参考学习的资料。主要有三个部分的内容，第一部分主要是数字电路设计的基础，这是在大学时期应该予以掌握的内容，第二部分是进阶的学习内容附带一个专业方向——MCU，第三部分是有关于各类算法处理的专业知识，需要更多的复合型知识，例如通信方向需要有较好的数学功底—傅立叶变换。由于这部分内容实在太多，个人没有能力也没有必要将每一部分的内容都详细的罗列出来，所以这里只是整理出一些需要把握的关键点。至于具体的内容，还请大家按照个人需求，参阅推荐的各类书籍。一、基础组合逻辑与时序逻辑：布尔代数，卡诺图，基本与非门，锁存器，触发器，冲突与冒险。——《Verilog HDL高级数字设计》Verilog语言基础：数值类型，表达式与运算符，assign语句，always语句，if-else语句,case语句，阻塞与非阻塞。——《Verilog HDL 硬件描述语言》状态机：一段式、二段式、三段式状态机的区别；独热码、二进制码、格雷码的区别及应用场合。——《Verilog HDL高级数字设计》同步电路和异步电路：两者的本质，异步电路跨时钟域，亚稳态。——IC_learner博客复位与时钟：同步复位、异步复位、异步复位同步释放的区别，时钟分频——二分频、三分频、任意整数分频，门控时钟，时钟切换。——《深入浅出玩转FPGA》,百度文档数据通路与控制通路：本质上任何数字电路都可以划分为简单的两种类型——控制通路与数据通路，控制通路的核心是状态机，数据通路是各类算术处理算法、并行总线等等。——《Verilog HDL高级数字设计》Testbench验证：无论什么电路，最终都需要验证其功能的正确性。Testbench的结构主要由a,复位和时钟，b,激励产生电路，c，系统监视器，d，结果比较电路，e,波形产生函数，f，待验证的MODULE等主要模块组成，其中，b是最重要的模块，一切验证都是从激励信号开始的。——《verilogHDL代码风格规范》。初学者推荐使用windows版qustasim 或者modelsim 仿真工具，简单又方便，以后可学习使用VCS+Verdi（比较折腾人）。二、进阶代码风格：良好的代码风格很有必要，参考一下企业用的代码风格，有助于个人养成良好的编码习惯。——《企业用verilog代码风格规范》《verilog语言编码风格》基本常用电路：具备以上庞杂的理论基础之后，需要积累一些常用的基础电路。——《华为Verilog典型电路设计》接口电路，I2C,UART,SPI：接口电路是中小规模芯片常用的对外接口电路，无论是与上位机（PC）通信还是控制其它芯片。I2C从机常用于EEPROM芯片中，主机可以直接使用单片机模拟，ARM单片机直接集成了I2C主机，I2C的IP代码网络上有现成的；UART是全双工电路，宏晶单片机通过UART进行烧录，SPI电路最常用于SD卡上。——《Verilog HDL高级数字设计》《通信IC设计(上下册)》有简单的UART和SPI的代码。RISC,8051 MCU ——IP：通过下载EETOP上相关的IP及文档来学习。三、专业数值的表示方法：浮点数，定点数的表示办法——《Verilog HDL高级数字设计》《通信IC设计(上下册)》算术处理算法：浮点数的加法、乘法电路设计。——《Verilog HDL高级数字设计》通信算法：FIR滤波器，IIR滤波器，傅立叶变换，冗余编码等等各种通信方向必须掌握的。——《通信IC设计(上下册)》《数字信号处理的FPGA实现》图像处理算法：静态图像，动态图像去噪。——《数字图像处理与图像通信》SOC：SOC类芯片的组成结构，AMBA总线，IP复用，SV验证。——《Soc设计方法与实现》四、工具：文档代码编辑器：GVIM，Notpad++RTL设计规则检查：Nlin,spyglass（五）验证（1）一、前言借助于前文RTL设计中提到的UART代码，本章节将在后面给出对应的testbench以及说明如何在questa/modelsim、VCS+DVE、VCS+Verdi工具中使用。推荐书籍：《vcs User Guide 2016》二、TestbenchTestbench的结构，正如上文提到的，主要由a,复位和时钟，b,激励产生电路，c，系统监视器，d，结果比较电路，e,波形产生函数，f，待验证的MODULE,g,控制仿真时间这几个部分组成。本章节提供的testbench只包含a,b,e,f，g部分，至于c,d更高级的内容，暂时无法涉及，questa/modelsim将不会使用到e部分的代码，使用questa/modelsim仿真时要屏蔽掉全部e段的内容。同样，在使用VCS+DVE进行仿真时要屏蔽VCS+VERDI的e段内容三、工具使用3.1modelsim仿真对于modelsim仿真, 仿真文件包含：1，verilog源文件(前文已全部提供)；2，testbench文件（后面会提供）modelsim使用教程：https://wenku.baidu.com/view/db638e25b9d528ea81c779cc.html有关在modelsim软件中如何使用本示例请参考以上教程。仿真结果图：3.2 VCS+DVE和VCS+VERDI仿真对于VCS+DVE和VCS+VERDI, 仿真文件包含：1，verilog源文件(前文已全部提供)；2，testbench文件（后面会提供），3，包含verilog、testbench文件路径的uart.f文件（必要时需自行修改），4，makefile仿真启动文件。在终端中运行make命令即可运行仿真，一定要注意文件路径问题。makefile教程：http://blog.csdn.net/liang13664759/article/details/1771246VCS+DVE 使用教程，https://wenku.baidu.com/view/48912cf558fb770bf68a55b4.htmlDVE是VCS软件自带的波形查看器。本章实例对应的VCS+DVE makefile启动脚本：all:VCS DVE VCS: vcs -f uart.f -full64 -debug_all -R DVE: dve -vpd wave.vpd -mode64 将以上内容复制到文本文件中，并将该文本文件改名为makeflile。uart.f内容:/home/Lance/synopsys/UART/testbench.v //必须放在文件中的第一行。/home/Lance/synopsys/UART/UART_XMTR.v/home/Lance/synopsys/UART/Control_Unit.v /home/Lance/synopsys/UART/Datapath_Unit.v/home/Lance/synopsys/UART/UART_RCVR.v/home/Lance/synopsys/UART/Control_Unit2.v/home/Lance/synopsys/UART/Datapath_Unit2.v DVE波形查看器启动命令：dve -vpd wave.vpd -mode64此外，在运行makefile启动脚本之前，还需要在testbench中添加如下代码：initialbegin $vcdplusfile("wave.vpd");//保存的波形文件名字 $vcdpluson(1,tb);//tb对应testbench文件的内的module名字 end该段代码为e,波形产生函数，主要是生成DVE波形查看器使用的VPD格式的波形文件。仿真结果图：VCS+Verdi，Verdi是debussy的升级版，是一个独立的软件，这对软件组合使用方式与VCS+DVE差不多。VCS+Verdi makefile启动脚本：all:VCS VERDI VCS:vcs +v2k -sverilog -debug_all -P /usr/synopsys/Verdi/K-2015.09/share/PLI/VCS/LINUX64/novas.tab /usr/synopsys/Verdi/K-2015.09/share/PLI/VCS/LINUX64/pli.a +vcs+lic+wait \ -f uart.f -y ./ +libext+.v -full64 -RVERDI: verdi -f uart.f -ssf wave.fsdb & 将以上内容复制到文本文件中，并将该文本文件改名为makeflile。注意:-P /usr/synopsys/Verdi/K-2015.09/share/PLI/VCS/LINUX64/novas.tab /usr/synopsys/Verdi/K-2015.09/share/PLI/VCS/LINUX64/pli.a主要是调用Verdi的接口函数以生成fsdb波形。 Verdi波形查看器启动命令：verdi -f uart.f -ssf wave.fsdb &此外，在运行makefile启动命令前，还需要在testbench中添加如下代码：initialbegin $fsdbDumpfile("wave.fsdb"); $fsdbDumpvars(0,tb); end 以生成Verdi波形查看器使用的FSDB格式的波形文件。注意：启动脚本相关问题，需要学习makefile有关内容，有关VCS和Verdi的详细使用教程，还请参考其它资料。 (七) 综合（八）形式验证（九）数模混合仿真（十）静态时序分析（十一）低功耗设计（十二）可测性设计（由于篇幅关系，以上章节请点击阅读原文前往作者博客查看）推荐阅读：关注EETOP公众号，后台输入芯片，查看如下文章ASIC前后端设计经典的细节讲解IC大牛10多年的设计分享：数字典型电路知识结构地图及代码实现关于华为海思，这篇文章值得一看俄国没有高端芯片，为什么却能造出一流武器？别拦我，我要做芯片！芯片春秋·ARM传中国芯酸往事印度芯酸往事国防军工芯片行业深度报告一位美国芯片公司华人高管对中国芯片行业的思考学习、积累、交流-IC设计高手的成长之路女生学微电子是一种什么体验？MIPS架构开放了，10天设计一款完全免费的MIPS处理器（附源码）性能之殇：从冯·诺依曼瓶颈谈起AI芯片设计与开发概览AI 芯片和传统芯片有何区别？一个资深工程师老王关于AI芯片的技术感悟隔隔壁老王：AI芯片与她怎么选？终于有人把云计算、大数据和人工智能讲明白了！尺寸减半、功率翻番！——氮化镓技术的现在和未来逻辑综合 Design Compiler 资料大全集成电路制造技术简史版图中Metal专题——线宽选择麒麟980内核照片：NPU在哪呢？有哪些只有IC工程师才能get到的梗？为什么7nm工艺制程这么难？从7nm看芯片行业的“贫富差距”什么是台积电的SoIC？ RISC-V打入主流市场的诸多问题RISC-V架构有何优势？关于RISC-V 终于有人讲明白了！ASIC低功耗设计实例分析及书籍推荐ASIC设计学习总结之可测性设计及书籍推荐ASIC设计学习总结之静态时序分析概要及书籍推荐ASIC设计学习总结之工具及书籍文档小芯片大价值 | ASIC工程师如此值钱到底为什么？芯片面积估计方法简介自主研发通信芯片有多难？通信行业老兵告诉你，没那么简单！RISC-V精简到何种程度？能省的都省了！多核CPU设计及RISC-V相关资料时序设计与约束资料汇总模拟版图讲义GDSII转DEF的flow简介机器学习将越来越依赖FPGA和SoCVerilog基本功之：流水线设计Pipeline Design先进封装发展趋势分析PPT先进封装发展现状分析PPT可测试性设计与ATPG麒麟980是如何诞生的？敢于失败，勇于尝试！（附：华为早期型号处理器研发过程）IC模拟版图设计讲义 Verilog CPU设计实例。。。。（共260篇）
- 2025年08月08日
- 2 阅读
- 0 评论
- 0 点赞

1
2