[PowerShell] Web ページ上のリンク URL を抜き出す

11?30???? PowerShell ?????????????????????????????Web ???????? URL ?????????????

?? PDC (Professional Developers Conference) 2009 ?????????????????????????????????????????? https://microsoftpdc.com/Videos ??????????????????????????????????????????????????????????????????????????????????????????????????PowerShell ???????????????

??????????????????????????

image

?????????????? HTTP ??????????????? HTML ?????????????? CL05 ?????

  1. <tr class="">
  2.     <td>CL05</td> 
  3.     <td> 
  4.         <a href="/Sessions/CL05" alt="">Embodiment: The Third Great Wave of Computing Applications</a> 
  5.         <br /> 
  6.         <span class="speakers"><em>Butler Lampson</em></span> 
  7.     </td> 
  8.     <td><a href="https://ecn.channel9.msdn.com/o9/pdc09/wmv/CL05.wmv" alt="WMV">WMV</a></td> 
  9.     <td><a href="https://ecn.channel9.msdn.com/o9/pdc09/wmvhigh/CL05.wmv" alt="WMVHigh">WMVHigh</a></td> 
  10.     <td><a href="https://ecn.channel9.msdn.com/o9/pdc09/mp4/CL05.mp4" alt="MP4">MP4</a></td> 
  11.     <td><a href="https://ecn.channel9.msdn.com/o9/pdc09/ppt/CL05.pptx" alt="Slides">Slides</a></td> 
  12. </tr> 

????????a ??? href ?????????????????????????????? ?? PowerShell ??????????????????HTML ????????????7???????????

  1. $w = new-object system.net.webclient  
  2. $enc = [Text.Encoding]::GetEncoding("utf-8")  
  3. $url = "https://microsoftpdc.com/Videos" 
  4. $h = $enc.GetString($w.DownloadData($url))  
  5.  
  6. $regex = "<\s*a\s*[^>]*?href\s*=\s*[`"']*([^`"'>]+)[^>]*?>"   
  7. $m = $h | select-string -pattern $regex -AllMatches  
  8. $m.matches | %{$_.groups[1].value} | select-string "wmvhigh" 

????????

https://ecn.channel9.msdn.com/o9/pdc09/wmvhigh/CL05.wmv
https://ecn.channel9.msdn.com/o9/pdc09/wmvhigh/CL06.wmv
https://ecn.channel9.msdn.com/o9/pdc09/wmvhigh/CL07.wmv

???????????????????????????????????????????????????????????????????????????????????????????(???????????????)?????????????????????????

???1??? .NET Framework ? WebClient ??????????????????????????????????????????????????HTTP ??????????????

2?????HTTP ???????????????????????????????(???GetEncoding ???????????new-object ????????????????????)

3????????????? URL ?????????4????????????????????????????? HTML ?????????????????????? $h ????????????????????????????1???????? $h ??????????????????$h –join “” ??????1???????????

6????????????????? <a href=”…”> ????????????? … ?????????????????????????????????????????????????

Precision Computing: Unit Testing in PowerShell – a Link Parser
https://www.leeholmes.com/blog/UnitTestingInPowerShellALinkParser.aspx

7??? $h ??? HTML ???????????$regex ??????????????????????select-string ???????? –AllMatches ??????????????????????????????????????????????????????????????

$m ?????????????????????Get-Member ? $m ????????Microsoft.PowerShell.Commands.MatchInfo ?????????????????????????????????????????????? Matches ???????????????????15???????????????

PSH> ($m.Matches)[15]

Groups : {<a href="https://ecn.channel9.msdn.com/o9/pdc09/
           wmvhigh/CL05.wmv" alt="WMVHigh">, https://ecn.cha
           nnel9.msdn.com/o9/pdc09/wmvhigh/CL05.wmv}
Success : True
Captures : {<a href="https://ecn.channel9.msdn.com/o9/pdc09/
           wmvhigh/CL05.wmv" alt="WMVHigh">}
Index : 6057
Length : 79
Value : <a href="https://ecn.channel9.msdn.com/o9/pdc09/w
           mvhigh/CL05.wmv" alt="WMVHigh">

????????????? Groups ??????2?????? (????) ??
($m.Matches)[15].Groups[1].Value
???????????

8??????????????? Groups[1].Value ?????????????????????????????????????? select-string ????? “wmvhigh” ??????????? URL ??????????select-string ??????????????????????????????????????????????

???????????????? PowerShell ????.NET Framework ?????????????????????????????????????????????