b2科目四模拟试题多少题驾考考爆了怎么补救
b2科目四模拟试题多少题 驾考考爆了怎么补救

VBA Web搜寻器和多线程指南(2)

电脑杂谈  发布时间:2020-04-23 09:04:30  来源:网络整理

在searchWorker过程中创建了一个Excel对象,并且已爬网的数据通过工作簿名称workbookName写回到原始工作簿. SearchWorker代码示例:

Const CThread = 20    '同时发送请求数
Public Sub searchWorker(nWorker As Integer, maxWorkers As Integer, workbookName As String, nRowStart As Long, userName As String, passWord As String)
    
    Application.ScreenUpdating = False
    Application.EnableEvents = False
    
    If userName = "" Or passWord = "" Then
        MsgBox "Login information required."
        Exit Sub
    End If
    
    '利用Webbrowser登录
    fmUI.oIE.Navigate "http://foo.com/login.do"
    Do While fmUI.oIE.Busy Or (fmUI.oIE.ReadyState <> 4)
        DoEvents
    Loop
     
    fmUI.oIE.Document.getElementById("userName").Value = userName
    fmUI.oIE.Document.getElementById("userPassword").Value = passWord
    fmUI.oIE.Document.getElementById("submitBtn").Click
     
    Do While fmUI.oIE.Busy Or (fmUI.oIE.ReadyState <> 4) Or fmUI.oIE.LocationURL <> "http://foo.com/mainindex.do?method=login&status=1"
        DoEvents
    Loop
    Dim oXL As Object
    Set oXL = GetObject(, "Excel.Application")
    
    Dim target(1 To CThread) As String  '查询目标
    Dim URL(1 To CThread) As String     'url
    Dim errflag(1 To CThread) As Boolean '错误标识
    Dim oHTTP(1 To CThread) As Object    'xmlhttp
    Dim oHTML(1 To CThread) As Object    'html文档对象
    
    nThread = CThread
    n = Application.WorksheetFunction.CountA(ThisWorkbook.Sheets("Sheet1").Range("A:A")) - 1
    
    errmsg = "错误信息"
    
    For i = 1 To nThread
        Set oHTTP(i) = CreateObject("msxml2.xmlhttp.6.0")
        Set oHTML(i) = CreateObject("htmlfile")
    Next i
    For m = nRowStart To n Step nThread * maxWorkers
        
        For i = 1 To nThread
            target(i) = ThisWorkbook.Sheets("sheet1").Cells(m + (i - 1) * maxWorkers + (nWorker - 1), 1).Value
            URL(i) = "http://foo.com/search.do&keyword=" & target(i)
            oHTTP(i).Open "get", URL(i), True
            oHTTP(i).Send
            errflag(i) = False
        Next i
        
        For i = 1 To nThread
            Do While oHTTP(i).ReadyState <> 4
                DoEvents
            Loop
        Next i
        
        For i = 1 To nThread
            oHTML(i).body.innerhtml = oHTTP(i).responsetext
            If InStr(1, oHTML(i).body.outerhtml, errmsg) <> 0 Then
                errflag(i) = True
                msg = "错误"
                If target(i) <> "" Then
                    oXL.Workbooks(workbookName).Sheets("sheet1").Cells(m + (i - 1) * maxWorkers + (nWorker - 1), j).Value = msg
                End If
            Else
                errflag(i) = False
                oXL.Workbooks(workbookName).Sheets("sheet1").Cells(m + (i - 1) * maxWorkers + (nWorker - 1), 2).Value = Trim(oHTML(i).getElementsByTagName("td")(5).innertext)
            End If
        Next i
        
    Next m
    
    Set oXL = Nothing
    For i = 1 To nThread
        Set oHTTP(i) = Nothing
        Set oHTML(i) = Nothing
    Next i
End Sub

使用上一节中的示例,很容易构造一个合适的VBScript文件,并将数据直接获取到该文件中,因此我不会放置代码. 与VBScript加应用程序的方法相比,仅使用VBScript拼写字符串会更麻烦,但是该程序执行起来非常轻巧,因此,如果要爬网的网站没有复杂的登录过程,并且您不必担心代码麻烦,那么您可以考虑使用VBScript. 可以在此处找到示例,代码相当混乱且冗长: 多线程VBA

这是前辈写的. 优点是资源消耗适中. 缺点是它需要Visual Basic环境,实现起来更加复杂. 请参阅: VBA异步多线程网络抓取教程-excelhome

我个人建议使用VBScript和Application的多线程解决方案,该解决方案更加通用,并且当前的计算机不太在意占用更多内存. 与本文前面使用XMLhttp批处理异步发送的方法相比,VBS + Application方案通过创建8个线程可以将速度提高大约5倍,并且效率非常高. 测试计算机是具有8G内存的4核8线程i7台式机. 爬网时,每个WPS ET线程占用的内存不足100M,机器可以负担得起.

执行爬网程序可能会遇到很多问题,例如页面翻动,动态网页,json解析vba 异步数据,保存附件等. 有时,为了避免被网站阻止,添加了一些延迟. 特定问题只能在爬网过程中单独解决. 祝大家好运.

上方.

能做到的老狼


本文来自电脑杂谈,转载请注明本文网址:
http://www.pc-fly.com/a/ruanjian/article-185764-2.html

相关阅读
    发表评论  请自觉遵守互联网相关的政策法规,严禁发布、暴力、反动的言论

    • 诗经
      诗经

      反掉的是自己的未来”

    热点图片
    拼命载入中...