View Single Post
Old 28. May 2009, 01:14 AM   #10 (permalink)
debtboy
Senior Member
 
debtboy's Avatar
 
Join Date: May 2009
Location: ~/
Posts: 128
Default wget grep pipe

OK Everyone,

Before we get into the pipe...
Here are a few more standard commands you might already be using:
cp (copy)
rm (remove)
mkdir (make directory)
rmdir (remove directory)
mv (move)
find (find pattern)
sort (sort)
Look them up in the man pages.

Now lets look at wget (web get)
wget allows you to extract files/webpages from the internet,
and save them locally. It will recursively extract the hierachy of
files and directories in a domain name and also follow links.
It does respect the robots.txt file.

Now I know what your thinking and yes you could put this command into a
subnet loop and your 3/4's of the way through creating a robot or spider,
but that is not what it was designed for so it's not very fast.

I'm going to extract the default page of http://techsupportalert.com
from my Linux command line and for purposes of demonstration, I'm going
to force the output into this terminal instead of saving the file.
Now here is the command:
wget -qO - "http://techsupportalert.com"
The options I'm using are quite mode and file output to - the terminal
This is the very same command with the options/switches separated:
wget -q -O - "http://techsupportalert.com"



Now if you've been following this thread, yes I changed my background.
It's one of those free ones from Digital Blasphemy.

Now here is some of the output from that command, it's really about 13 pages long.


On to grep, I use this command quite often to extract matching patterns within files and I also use it to count occurrences within the input.

Here I'm going to pipe the output of wget directly into the input of
grep using the pipe character (it looks like 2 hyphens turned vertical on the keyboard).
grep will extract all lines with the pattern <title>

wget -qO - "http://techsupportalert.com" | grep -e "<title>"



Here is the result of that piped command:



This was only a single pipe, but you can have multiple pipes connecting
many outputs to inputs which is called what else... a pipeline.

I know what your thinking, you could create a bot looping through subnets
extracting anything with an "@" for email addresses. Yes it's possible,
but we've decided to use what we've learned for good and not evil!!

Hope this post wasn't too confusing...
Are you begining to see the power of the command line yet??
We haven't even created a script or even mentioned PERL.
debtboy is offline   Reply With Quote