Skip to content
  • Gavin Shan's avatar
    powerpc/eeh: No hotplug on permanently removed dev · d2b0f6f7
    Gavin Shan authored
    
    
    The issue was detected in a bit complicated test case where
    we have multiple hierarchical PEs shown as following figure:
    
                    +-----------------+
                    | PE#3     p2p#0  |
                    |          p2p#1  |
                    +-----------------+
                            |
                    +-----------------+
                    | PE#4     pdev#0 |
                    |          pdev#1 |
                    +-----------------+
    
    PE#4 (have 2 PCI devices) is the child of PE#3, which has 2 p2p
    bridges. We accidentally had less-known scenario: PE#4 was removed
    permanently from the system because of permanent failure (e.g.
    exceeding the max allowd failure times in last hour), then we detects
    EEH errors on PE#3 and tried to recover it. However, eeh_dev instances
    for pdev#0/1 were not detached from PE#4, which was still connected to
    PE#3. All of that was because of the fact that we rely on count-based
    pcibios_release_device(), which isn't reliable enough. When doing
    recovery for PE#3, we still apply hotplug on PE#4 and pdev#0/1, which
    are not valid any more. Eventually, we run into kernel crash.
    
    The patch fixes above issue from two aspects. For unplug, we simply
    skip those permanently removed PE, whose state is (EEH_PE_STATE_ISOLATED
    && !EEH_PE_STATE_RECOVERING) and its frozen count should be greater
    than EEH_MAX_ALLOWED_FREEZES. For plug, we marked all permanently
    removed EEH devices with EEH_DEV_REMOVED and return 0xFF's on read
    its PCI config so that PCI core will omit them.
    
    Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
    Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
    d2b0f6f7