[MTK] AP suspend异常debug及log分析
文摘 Android Kernel MediaTek 2020-08-24 阅读:20855[DESCRIPTION]
一般待机会遇到如下几个问题:
- 系统无法待机
- 唤醒较多,导致待机平均电流偏大
- AP 睡下去后漏电等导致底电偏高
针对这几个问题,提供几种debug分析方法
[SOLUTION]
1、定位系统无法待机原因
1.1 设备连接powermonitor可以直接从电流图判断休眠情况的定位方法
如果从电流图看到系统无法suspend,插入USB通过如下cmd dump当前阻止休眠的wakelocks:
#adb shell
#cat /sys/kernel/debug/wakeup_sources |sed -e 's/"^ "/"unnamed"/g' | awk '{print $6 "\t" $1}' | grep -v "^0" |sort -n
例如:
active_since name
24 WLAN
79 mt6360_pmu_chg.2.auto
24271 PowerManagerService.WakeLocks (PMS持锁)
35558 musb-hdrc (USB持锁,因插usb,属于正常现象)
如果有PMS持锁,则再执行如下cmd dump持锁的应用:
#dumpsys power | grep "PARTIAL_WAKE_LOCK"
例如:如下GMS应用持锁
PARTIAL_WAKE_LOCK 'wake:com.google.android.gms/.auth.account.be.accountstate.LoginAccountsChangedIntentService' ACQ=-15s910ms (uid=10177 (displayid =0 pid=4937 pkg=com.google.android.gms uid=10177)
1.2 kernel log定位方法
在对应时间段log中搜索"active wakeup source",可以定位到阻止系统休眠的wakelock:
active wakeup source: WLAN timeout
active wakeup source: PowerManagerService.WakeLocks
active wakeup source: ccci_poll
2、快速定位suspend/resume耗时较长问题
执行如下adb cmd:
#adb shell "echo 1 > /sys/module/kernel/parameters/initcall_debug"
#adb shell "echo 1 > /sys/power/pm_print_times"
拔掉USB灭屏待机,确认系统suspend后,插入usb执行如下cmd:
#adb shell
#dmesg | grep 'call ' | awk '{print $4 "\t" $8}' | sort -k 2 -rn > /data/result.txt
#adb pull /data/result.txt
打开result文件,可以看的每个device resume耗时(如下),找到耗时长的函数请相关owner确认是否可以优化:
例如:
dpm_name usecs
bootdevice+ 25318
mt_charger+ 6406
13000000.mfg_lorne+ 5208
mt-pmic+ 3334
3、唤醒kernel log分析
note:如下介绍以MT6853 kernel log为例,旧平台log或有差异,详见具体item介绍;
搜索"suspend wake up by"关键字,可以找到AP的唤醒原因、上次suspend时长、AP睡下去后subsystem休眠行为等信息
<6>[244518.391440] -(0)[5430:Binder:535_2][name:spm&][SPM] suspend wake up by R12_EINT_EVENT_B, timer_out = 4591536, r13 = 0x84006a80, debug_flag = 0xfc1f9bfc 0x4000006e, r12 = 0x40, r12_ext = 0x0, raw_sta = 0x0 0x0 0x0, idle_sta = 0x341b3, req_sta = 0x1c01000 0x0 0x50000017 0x1f0000 0x0, cg_check_sta =0x0, isr = 0x0, rt_req_sta0 = 0x0 rt_req_sta1 = 0x0 rt_req_sta2 = 0xffffffff rt_req_sta3 = 0xffffffff dram_sw_con_3 = 0x0, raw_ext_sta = 0x184255, wake_misc = 0x310000, pcm_flag = 0x488400 0x0 0x488000 0x488000, req = 0x0, clk_settle = 0x60fe, wlk_cntcv_l = 0x94cb344f, wlk_cntcv_h = 0x559, 26M_off_pct = 0
<6>[244518.391597] -(0)[5430:Binder:535_2][name:spm&][SPM] suspend warning:(OneShot) System LPM is blocked by conn
<6>[244518.391605] -(0)[5430:Binder:535_2][name:spm&][SPM] Suspended for 140.122 seconds
<6>[244518.391615] -(0)[5430:Binder:535_2][name:spm&][SPM] md_slp_duration = 4430012
<6>[244518.391629] -(0)[5430:Binder:535_2][WMT-CONSYS-HW][I]mtk_wmt_resume: mtk_wmt_resume !!
<6>[244518.391719] -(0)[5430:Binder:535_2]alarmtimer_fired.: type=1, count=19, wakeup count=12, func=devalarm_alarmhandler
<6>[244518.392756] -(0)[833:mtk_wmtd][WMT-CONSYS-HW][I]mtk_wcn_consys_sleep_info_read_all_ctrl:TOP:0,0;MCU:305,4384893;BT:0,0;WIFI:507,4472662;GPS:0,4591656
<6>[244518.393447] (0)[833:mtk_wmtd][WMT-CONSYS-HW][I]consys_dump_osc_state:0x180c1340: 0xe010b (BT)
<6>[244518.428292] (4)[108:irq/143-mt6358-]mt6358-pmic 10026000.pwrap:mt6359-pmic: Reg[0x534]=0x1,name=rtc,hwirq=64,type=4
<6>[244518.544532] (1)[5430:Binder:535_2][name:scp&][mt_scp_dump_sleep_count:988] - scp_sleep_cnt_0 = 26862754
(1)唤醒原因:R12_EINT_EVENT_B表示本次由外部中断唤醒,具体中断为PMIC mt6359 rtc,即本次因alarmtimer到期唤醒系统,需从android log或者bugreport中对应时间点log定位alarmtimer所属应用;
如,AlarmManager: Native set alarm :Alarm{42786ea8 type 0 com.sina.weibo}
ps:旧平台搜索关键字或有差别,可能是"[SPM] wake up by"。
注意"mcusys wake up by"关键字为系统退出idle状态时打印的log,非AP suspend唤醒;
(2)suspend时长:timer_out = 4591536 表示上次AP suspend的时长(32k计数,除以32768可转换为秒数)。Suspended for 140.122 seconds打印转换为单位秒的suspend时长;
ps:旧平台或许只打印32k计数suspend时长 ,没有转换为单位秒的suspend时长;
(3)唤醒瞬间子系统工作状态:如若log中debug_flag = 0xfc1f9bfc最后两个字符不是ff,则表示唤醒前一瞬间子系统有起来工作;System LPM is blocked by conn打印具体子系统名称(conn subsys为wifi/bt/gps/fm四合一子系统);
ps:旧平台或许没有打印block子系统名称,则需要通过R13(如 r13 = 0x84006a80)确认到底哪个子系统:
R13每个bit代表的意义可以在${platform}_pcm_def.h中找到
(4)AP休眠期间子系统行为:26M clk为所有系统模块提供clock,26M_off_pc表示AP suspend期间26M clock buffer关闭的比例,比例越高理论耗电越小;如上log26M_off_pct = 0则表示AP休眠期间子系统clk一直有开启,可以通过如下log定位哪个子系统模块的问题:
- md_slp_duration = 4430012; AP休眠期间modem休眠时长,32k计数,除以AP suspend时长即为modem休眠比例;
- mtk_wcn_consys_sleep_info_read_all_ctrl:TOP:0,0;MCU:305,4384893;BT:0,0;WIFI:507,4472662;GPS:0,4591656; AP休眠期间connsys各个模块休眠时长,32k计数;log可见BT期间一直没有休眠,需要BT owner帮忙进一步分析;
- scp_sleep_cnt_0 = 26862754;scp(sensorhub)core0休眠总次数,可以对比休眠前scp休眠次数确认是否scp导致问题;
PS:旧平台或无此log打印;
4、常见唤醒源
其余AP suspend常见唤醒源:
- R12_CONN2AP_SPM_WAKEUP_B: connsys唤醒,如应用需要使用wifi联网,需从netlog定位联网应用;
- R12_SCP2SPM_WAKEUP_B: scp sensor相关唤醒,如VOW唤醒系统;
- R12_CCIF0_EVENT_B: modem ccci唤醒,需从log中继续查找ccci
channel,详细分析参考[FAQ21806]
- R12_AP2AP_PEER_EVENT_B:modem数据业务唤醒,旧平台可能叫R12_CLDMA_EVENT_B,需从netlog定位联网应用;
5、AP suspend后漏电辅助排查
(1)系统进入suspend之前默认会打印没有关闭的MTCMOS/PLL/CLKBUF,可以从中排查是否有异常点
note:大部分时候AP睡下去时MD/CONN MTCMOS还没有关闭属于正常现象
(0)5430:Binder:535_2[mt6853_suspend_prompt:127] - suspend enter
(0)5430:Binder:535_2 suspend warning: XO_WCN is on!!
(0)5430:Binder:535_2 suspend warning: XO_EXT is on!!
(0)[5430:Binder:535_2]suspend warning[0m: PG_MD1
(0)[5430:Binder:535_2]suspend warning[0m: PG_CONN
(0)5430:Binder:535_2[mt6853_suspend_reflect:160] - prepare suspend resume
(0)5430:Binder:535_2[mt6853_suspend_reflect:179] - resume
(2)通过如下cmd在suspend前check更多信息,如gpio、pmic setting等
- gpio dump disable/enable:
echo gpio_dump 0/1 > /sys/kernel/debug/suspend/suspend_state - golden dump disable/enable:
echo golden_dump 0/1 > /sys/kernel/debug/suspend/suspend_state - golden type setting (default)PMIC[0], CG[1], DCM[2]:
echo golden_type 1/3/7 > /sys/kernel/debug/suspend/suspend_state
6、快速釐清系統無法進入Suspend原因
當系統成功進入Suspend狀態時,我們可以觀察到幾種現象 :
- Uart debug console 凍結無法使用
Log 資訊顯示 :
- Disabling non-boot CPUs ...
- CPUX killed (polled 0 ms)
如果系統的Log持續輸出並且您仍然可以從Uart console操作,這意味著系統尚未進入Suspend狀態。
您可以透過以下方式釐清問題:
1. 檢查系統當前是否有模組持有wakelock
cat /sys/kernel/debug/wakeup_sources | awk -F$'\t' '{ if(NR==1) print $1"\t" $7; else if($10) print $1"\t"$10;}'
2. 確定哪個模組未能進入suspend狀態
cat /sys/kernel/debug/suspend_stats