Thursday, April 22, 2010

Teh Interwebs & Searches

Tim Berners-Lee talking about the internet, the web and hypertext links on TED. He also talks about linked data and how amazing it would be if the government's information were all posted as linked data and raw data, which will allow so many valuable uses than are currently available. "RAW DATA NOW!"

I need to start culling the deep web for information for my story. I'm not talking about a Google search, I'm talking about going rogue all over the internet...hardcore data mining the invisible web.

Google can't find everything, for all the information that is indexed, there is so much more than isn't. Dr. Baltrip gave the class a statistic, that roughly 20% of the data on the web is searchable. This means you may not be able to find information you need...but, you can find it if you know how.

There is some information, of course, which is purposely unsearchable, and rightly so. Financial information, social security numbers, personal medical information, etc. This information is protected by security, such as firewalls.

For that information that isn't unsearchable on purpose, here are the tips Dr. Baltrip gave to our class.

Dr. Baltrip gave us a five step process. (Which, in my opinion, is significantly better than a twelve step process.)

We can control the type of information we're looking for. And example of this is databases, you can search for only databases.

One example of a search engines that search the invisible web is Lycos. You can also use specialized sites that are more specifically geared to what you're looking for, Google Scholar and ipl2.

You can use robots, the details on how to use robots can be found here.

You can add the word "database" to the search criteria.

And you can use wildcards, such as "*" and "?", which search various forms of a word or phrase.

To search better, brainstorm terms that are similar to the information you're searching for, in Google you can use the colon (:) and then .org or .gov, to search only certain types of sites. You can also type "Filetype:" and Google will search only one kind of file, for example .ppt for Powerpoint, .xls for Excel, etc. You can search entire sites from Google, by typing "site: k-state.edu", for example, to search only within K-State's website.

There are several versions of Google, such as Google Trends, Google Books and the Google News Archive Search. Google also has a section, Google Chart Tools, which allow you to use data to create a variety of charts.

No comments:

Post a Comment