I’ve started to mess around with PowerShell at home and work and found a nice little test. A forum I go to had a thread full of Wikipedia links I wanted, but there were an awful lot of them.
So, a few Google searches later and some experiments later I had a script that will download all Wikipedia links from a set of URLs loaded into a text file.
1: cd C:\Users\Jeff\Documents\Development\Powershell\DownloadLinks2: $pages = Get-Content ".\pages.txt"3: $file = ".\links.txt"4:5: foreach ($page in $pages)6: {7: $ie = new-object -com "InternetExplorer.Application"8: $ie.Navigate($page)9: While ($ie.Busy) { Start-Sleep -Milliseconds 400 }10:11: $doc = $ie.Document12: $doc.getElementsByTagName('a') | `13: Where-Object { $_.href -ne $null } | `14: Where-Object { $_.href.Contains("wikipedia") -eq "true" } | `15: Select-Object -ExpandProperty href | `16: Out-File -filepath $file -append17: }
And the list of pages to process
http://forum.rpg.net/showthread.php?t=279379http://forum.rpg.net/showthread.php?t=279379&page=2http://forum.rpg.net/showthread.php?t=279379&page=3http://forum.rpg.net/showthread.php?t=279379&page=4http://forum.rpg.net/showthread.php?t=279379&page=5http://forum.rpg.net/showthread.php?t=279379&page=6http://forum.rpg.net/showthread.php?t=279379&page=7http://forum.rpg.net/showthread.php?t=279379&page=8http://forum.rpg.net/showthread.php?t=279379&page=9http://forum.rpg.net/showthread.php?t=279379&page=10
Pages used to build this
http://powershell.com/cs/blogs/tips/archive/2010/05/03/use-null-to-identify-empty-data.aspx
http://powershell.com/cs/blogs/tobias/archive/2010/03/17/downloading-images-from-webpages.aspx
http://www.computerperformance.co.uk/powershell/powershell_file_outfile.htm
http://technet.microsoft.com/en-us/library/ff730958.aspx
http://www.orcsweb.com/blog/jeremy/powershell-pearl-filter-by-contained-text/
No comments:
Post a Comment