Skip to content
  • Jason Yan's avatar
    scsi: libsas: direct call probe and destruct · 0558f33c
    Jason Yan authored
    In commit 87c8331f
    
     ("[SCSI] libsas: prevent domain rediscovery
    competing with ata error handling") introduced disco mutex to prevent
    rediscovery competing with ata error handling and put the whole
    revalidation in the mutex. But the rphy add/remove needs to wait for the
    error handling which also grabs the disco mutex. This may leads to dead
    lock.So the probe and destruct event were introduce to do the rphy
    add/remove asynchronously and out of the lock.
    
    The asynchronously processed workers makes the whole discovery process
    not atomic, the other events may interrupt the process. For example,
    if a loss of signal event inserted before the probe event, the
    sas_deform_port() is called and the port will be deleted.
    
    And sas_port_delete() may run before the destruct event, but the
    port-x:x is the top parent of end device or expander. This leads to
    a kernel WARNING such as:
    
    [   82.042979] sysfs group 'power' not found for kobject 'phy-1:0:22'
    [   82.042983] ------------[ cut here ]------------
    [   82.042986] WARNING: CPU: 54 PID: 1714 at fs/sysfs/group.c:237
    sysfs_remove_group+0x94/0xa0
    [   82.043059] Call trace:
    [   82.043082] [<ffff0000082e7624>] sysfs_remove_group+0x94/0xa0
    [   82.043085] [<ffff00000864e320>] dpm_sysfs_remove+0x60/0x70
    [   82.043086] [<ffff00000863ee10>] device_del+0x138/0x308
    [   82.043089] [<ffff00000869a2d0>] sas_phy_delete+0x38/0x60
    [   82.043091] [<ffff00000869a86c>] do_sas_phy_delete+0x6c/0x80
    [   82.043093] [<ffff00000863dc20>] device_for_each_child+0x58/0xa0
    [   82.043095] [<ffff000008696f80>] sas_remove_children+0x40/0x50
    [   82.043100] [<ffff00000869d1bc>] sas_destruct_devices+0x64/0xa0
    [   82.043102] [<ffff0000080e93bc>] process_one_work+0x1fc/0x4b0
    [   82.043104] [<ffff0000080e96c0>] worker_thread+0x50/0x490
    [   82.043105] [<ffff0000080f0364>] kthread+0xfc/0x128
    [   82.043107] [<ffff0000080836c0>] ret_from_fork+0x10/0x50
    
    Make probe and destruct a direct call in the disco and revalidate function,
    but put them outside the lock. The whole discovery or revalidate won't
    be interrupted by other events. And the DISCE_PROBE and DISCE_DESTRUCT
    event are deleted as a result of the direct call.
    
    Introduce a new list to destruct the sas_port and put the port delete after
    the destruct. This makes sure the right order of destroying the sysfs
    kobject and fix the warning above.
    
    In sas_ex_revalidate_domain() have a loop to find all broadcasted
    device, and sometimes we have a chance to find the same expander twice.
    Because the sas_port will be deleted at the end of the whole revalidate
    process, sas_port with the same name cannot be added before this.
    Otherwise the sysfs will complain of creating duplicate filename. Since
    the LLDD will send broadcast for every device change, we can only
    process one expander's revalidation.
    
    [mkp: kbuild test robot warning]
    
    Signed-off-by: default avatarJason Yan <yanaijie@huawei.com>
    CC: John Garry <john.garry@huawei.com>
    CC: Johannes Thumshirn <jthumshirn@suse.de>
    CC: Ewan Milne <emilne@redhat.com>
    CC: Christoph Hellwig <hch@lst.de>
    CC: Tomas Henzl <thenzl@redhat.com>
    CC: Dan Williams <dan.j.williams@intel.com>
    Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
    Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
    0558f33c