如何利用appium做一个自动抓取APP数据的爬虫
12年开始做过三年的APP数据爬取,主要是利用fiddler、wireshark等网络抓包工具分析APP的网络请求然后模拟请求获得对方返回的数据,这种方法的一个最大的障碍就是碰到对方服务器对请求做加密的时候很难破解。 最近公司提出智能化运营,分析数据以获得更好的运营策略,所以打算研究利用自动化测试工具appium来做一个爬虫。这种方式的优点是完全使用对方的APP,并不是模拟http请求,也就是说和真实的用户毫无差别,也就避开了http的反爬取。
后面有空的话打算做一个appium+kafka+storm的实时抓取数据分析入库的应用
目标
搭建自动化测试环境appium,利用java编写爬取代码。 效果图如下:
低帧版效果图:
流程:
1. Appium环境搭建
1.1 搭建所使用环境
操作系统:Windows 10
编程IDE:IDEA
安卓环境:Android sdk
运行所需工具:node.js,夜神模拟器
ps: Appium支持MAC os,windows, 也支持java,python,ruby等多种编程语言,能够在IOS、Android和WEB平台上运行,本次使用Win10,java和安卓模拟器 #### 1.2 官网下载安装Appium-desktop-setup
node.js和模拟器的安装百度上都有,这里着重讲解Appium的安装和简单的抓取代码实现 目前Appium提供的 下载链接
这个应用方便我们定位APP中的元素
node 安装 appium以及appium-doctor
npm install –g appium
npm install –g appium-doctor
2. 爬虫代码编写及运行
2.1 爬虫代码编写及运行pom增加导入
<dependency>
<groupId>io.appium</groupId>
<artifactId>java-client</artifactId>
<version>6.1.0</version>
</dependency>
<dependency>
<groupId>org.testng</groupId>
<artifactId>testng</artifactId>
<version>6.9.10</version>
</dependency>
2.2 获取相应参数
Appium自动化测试运行需要知道app的包名appPackage,appActivity,platformName,platformVersion, 为了获得包名和activity名称我们需要在模拟器打开相应APP,然后使用adb命令来获取当前的应用包名和activity名。
adb connect 127.0.0.1:62001
adb devices
连接上模拟器之后使用下面命令获取包名,activity名称。
adb shell
root@shamu:/ # dumpsys activity | grep mFocusedActivity
其中com.sankuai.meituan
为包名,.deal.DealListActivity
为activity名 对相应的元素操作使用appium-desktop桌面应用,启动appium-desktop之后新建一个检查器inspector,然后设置相应参数之后启动
点击start session开启一个新的会话 开启之后使用元素探查功能获取相应元素的id,xpath等参数
2.3 爬虫代码编写及运行抓取代码编写
项目代码appium-spider
import com.eat.util.MongoDbUtil;
import io.appium.java_client.MobileElement;
import io.appium.java_client.TouchAction;
import io.appium.java_client.android.Activity;
import io.appium.java_client.android.AndroidDriver;
import io.appium.java_client.android.AndroidElement;
import org.openqa.selenium.By;
import org.openqa.selenium.Dimension;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.remote.DesiredCapabilities;
import org.openqa.selenium.support.ui.ExpectedConditions;
import org.openqa.selenium.support.ui.WebDriverWait;
import org.testng.Assert;
import org.testng.annotations.BeforeClass;
import org.testng.annotations.Test;
import java.io.IOException;
import java.time.Duration;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import static io.appium.java_client.touch.WaitOptions.waitOptions;
import static io.appium.java_client.touch.offset.PointOption.point;
public class AndroidBasicInteractionsTest extends BaseTest {
private final double longitude = 113.94756;
private final double latitude = 22.549040;
private AndroidDriver<WebElement> driver;
private final String SEARCH_ACTIVITY = ".city.CityActivity";
private final String ALERT_DIALOG_ACTIVITY = ".app.AlertDialogSamples";
private final String PACKAGE = "com.sankuai.meituan";
@BeforeClass
public void setMtEmuUp() throws IOException {
DesiredCapabilities capabilities = new DesiredCapabilities();
capabilities.setCapability("deviceName", "Android Emulator");
capabilities.setCapability("platformVersion","5.1.1");
capabilities.setCapability("platformName","Android");
capabilities.setCapability("appActivity",".activity.MainActivity");
capabilities.setCapability("appPackage","com.sankuai.meituan");
capabilities.setCapability("noReset","True");
driver = new AndroidDriver<WebElement>(getServiceUrl(), capabilities);
}
// @BeforeClass
public void setEleUp() throws IOException {
DesiredCapabilities capabilities = new DesiredCapabilities();
capabilities.setCapability("deviceName", "Android Emulator");
capabilities.setCapability("platformVersion","6.0");
capabilities.setCapability("platformName","Android");
capabilities.setCapability("appActivity","application.ui.home.HomeActivity");
capabilities.setCapability("appPackage","me.ele");
capabilities.setCapability("noReset","True");
driver = new AndroidDriver<WebElement>(getServiceUrl(), capabilities);
}
/* @AfterClass
public void tearDown() {
// driver.quit();
}*/
@Test()
public void testClickEmulator() {
driver.startActivity(new Activity(PACKAGE, ".activity.MainActivity"));
AndroidElement e = (AndroidElement)driver.findElementById("category_layout");
List<MobileElement> eElements = e.findElements(By.id("text"));
for (MobileElement ele :
eElements) {
if (ele.getText().equals("美食")) {
ele.click();
break;
}
}
for (int i = 0; i < 10; i++) {
List<WebElement> poi_layout = driver.findElements(By.id("poi_layout"));
for (WebElement webElement:poi_layout) {
ResInfo resInfo = new ResInfo();
try {
WebElement poi_name = webElement.findElement(By.id("poi_name"));
String resName = poi_name.getText();
int hashCode = resName.hashCode();
resInfo.setResName(resName);
resInfo.setResHashCode(hashCode);
resInfo.setLatitude(latitude);
resInfo.setLongitude(longitude);
resInfo.setCreateTime(new Date());
}catch (Exception ex){
continue;
}
try {
WebElement avg_price = webElement.findElement(By.id("avg_price"));
resInfo.setAvgPrice(avg_price.getText());
}catch (Exception ex){
}
try {
WebElement ratingText = webElement.findElement(By.id("rating_text"));
resInfo.setRatingText(ratingText.getText());
}catch (Exception ex){
}
try {
WebElement cate = webElement.findElement(By.id("cate"));
resInfo.setCate(cate.getText());
}catch (Exception ex){
}
try {
WebElement distance = webElement.findElement(By.id("distance"));
resInfo.setDistance(distance.getText());
}catch (Exception ex){
}
try {
WebElement area = webElement.findElement(By.id("area"));
resInfo.setArea(area.getText());
}catch (Exception ex){
}
try {
WebElement voucherInfo = webElement.findElement(By.id("voucher_info"));
resInfo.setVoucherInfo(voucherInfo.getText());
}catch (Exception ex){
}
System.out.println(resInfo);
MongoDbUtil.save(resInfo);
}
try {
gestSwipeVerticalPercentage(0.9,0.2,0.5,2000);
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
@SuppressWarnings("Since15")
public void gestSwipeVerticalPercentage(double startPercentage, double finalPercentage, double anchorPercentage, int duration) throws Exception {
Dimension size = driver.manage().window().getSize();
int anchor, startPoint, endPoint;
anchor = (int) (size.width * anchorPercentage);
startPoint = (int) (size.height * startPercentage);
endPoint = (int) (size.height * finalPercentage);
System.out.println("--------------------------------------------------");
System.out.println(anchor+"/"+startPoint+"/"+endPoint);
// Thread.sleep(200);
new TouchAction(driver).press(point(anchor, startPoint)).waitAction(waitOptions(Duration.ofMillis(duration)))
.moveTo(point(anchor, endPoint)).release().perform();
}
}
2.4 存入mongoDb
核心代码运行之后能够抓取想抓取的数据,去重或者修改代码之后入库或者做其他发消息、实时计算就随你选择了
PS:上面的代码是借鉴了github上appium的示例改编而成
3. 运行的时候可能碰到的障碍
3.1 各种环境变量配置不对,可以使用appium-doctor检查(使用nodejs安装)
appium-doctor
3.2 InvalidServerInstanceException
修改BaseTest中AppiumDriverLocalService的初始化方法,手动设置AppiumJS为安装appium的安装目录下的js文件
@BeforeSuite
public void globalSetup () throws IOException {
// service = AppiumDriverLocalService.buildDefaultService(); 替换前
AppiumServiceBuilder builder = new AppiumServiceBuilder()
.withAppiumJS(new File("C:\\Program Files (x86)\\Appium\\resources\\app\\node_modules\\appium\\build" +
"\\lib\\main.js"))
.withLogFile(new File("/appium/target/logs/sample.txt"))
.usingAnyFreePort();
service = builder.build();
service.start();
}
修改后启动正常
3.3 Android home 找不到
用IDEA运行的时候找不到ANDROID_HOME的话需要设置运行参数
ANDROID_HOME=D:\Program Files\android-sdk_r24.4.1-windows\android-sdk-windows
3.4 node.js/node.exe 找不到
用IDEA运行的时候找不到node.js/node.exe的话需要设置运行参数PATH
PATH=node安装目录