Marty Hermsen

Talks around the ICT and Financial Coffee Corners

Detecting Browsers, Crawlers, and Web Bots in C# ASP .NET

August 02
by Marty Hermsen 2. August 2009 17:52

The .NET framework, used to create C# ASP .NET web applications, actually comes with a built-in web browser detector, called the BrowserCaps feature. .NET 2.0 adds an additional detector, called the .Browser feature. Regardless of the .NET version, determining the difference between a user's web browser and an automated web crawler can make a big difference in a web application, and it's easy to do.

In this article, we'll discuss three methods for determining the web browser type. We'll also describe how to tell the difference between a user's web browser and an automated crawler.

What's Inside the User-Agent String

It really all starts with the web browser user-agent string. The user-agent is a string of text, sent in the HTTP header by the web browser, for each request made when accessing a page in the C# ASP .NET web application. The user-agent typically describes the web browser client type, name, version, and other information.

Some example User-Agent strings:

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727)
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Mozilla/5.0 (compatible; Yahoo! Slurp; +http://help.yahoo.com/help/us/ysearch/slurp)

As you can tell from the above examples, quite a bit of information can be parsed out of the user-agent string. We can tell that the first user-agent is a Microsoft Internet Explorer web browser, and thus a regular user. The other two user-agents are web bots. By looking at the details of the user-agent string, you can probably determine the most direct method of detecting the user's web browser is by simply looking for sub-strings.

Looking for Keywords in a User-Agent

The most direct and simple method for detecting web browsers accessing your C# ASP .NET web application is to simply search for a sub-string within the user-agent and classify the web browser accordingly.

if (Request.UserAgent.ToString().IndexOf("Googlebot") > -1)
{
   // We have a GoogleBot web crawler.
}
else
{
   // We do not have a GoogleBot web crawler.
}

By parsing a simple sub-string from the UserAgent property of the HttpRequest, we can determine the type of web client accessing the site. While this method is simple and direct, it suffers from the problem of being unable to classify the many different types of user-agent strings out there. You could certainly obtain a list of user-agent strings and add keywords to parse for each, but this could take a long time. It would also be difficult to maintain the list and keep it updated as new web bots and browsers emerge. There must be an easier way and this is exactly where Microsoft is one step ahead.

Digging Deeper Into Request.Browser

In the above code sample, we pulled the user-agent string from the HttpRequest object. Rather than parse a sub-string from the Request.UserAgent property, the Request object provides us with an additional object for accessing information about the web browser client via Request.Browser. One of the properties of interest for telling the difference between a user and a web bot is Request.Browser.Crawler. This property is a boolean and will indicate true if the web browser is actually a web bot.

if (Request.Browser.Crawler)
{
   // We have a web crawler.
}
else
{
   // We do not have a web crawler.
}

Request.Browser.Crawler Always Returns False

If you try using the above code sample and testing using various user-agent strings to simulate web bots (ie. with the Firefox User-Agent Switcher plug-in), you'll notice that Request.Browser.Crawler always returns false. This is due to missing information in one of .NET's configuration sections, called BrowserCaps. We'll need to populate the list of BrowserCaps (the list of available user-agents that we have information about) in order to use this feature.

Using the BrowserCaps To Detect Web Browsers From Web Bots

BrowserCaps is a section in the web.config file, within the system.web section. BrowserCaps allows you to specify a list of web browser user-agent strings, via regular expressions, to match against. Each item in the list indicates the capabilities of the web browser, version, whether it's a crawler, and much more.

Inside the web.config (or machine.config) file:

<configuration>
<system.web>
<browserCaps>
   <result type="class"/>
   <use var="HTTP_USER_AGENT"/>
        browser=Unknown
        version=0.0
        majorver=0
        minorver=0
        frames=false
        tables=false
      <filter>
         <case match="Windows 98|Win98">
            platform=Win98
         </case>
      <case match="Windows NT|WinNT">
         platform=WinNT
      </case>
   </filter>
   <filter match="Unknown" with="%(browser)">
      <filter match="Win95" with="%(platform)">
      </filter>
   </filter>
</browserCaps>
</system.web>
</configuration>

The above is a sample entry for detecting Windows 98 and Windows NT operating systems in the user-agent string from the web browser. While you can proceed to add entries by hand to match each web browser and crawler of interest, you can actually download a complete and updated list of user-agent BrowserCaps to add to your C# ASP .NET web application.

To add the list of BrowserCaps to your development machine or server, follow these steps:

1. Open the following file for editing:
C:\windows\Microsoft.NET\Framework\v2.0.50727\CONFIG\machine.config

2. Download the BrowserCaps list from http://owenbrady.net/browsercaps (direct download list).

3. Paste the entire contents of the XML file into the machine.config, just before the line </system.web>.

If you only want the BrowserCaps list available to a single web application, paste the BrowserCaps section into your local web.config. If you want all web applications to have access to the information, use the machine.config as noted above.

After saving the changes and refreshing the C# ASP .NET web application, you will now have proper values displaying for Request.Browser.Crawler. The regularly updated list helps you detect the majority of web crawlers, bots, scripts, and web browsers.

Using the Newer .BROWSER

BrowserCaps was introduced in the .NET 1.0 Framework. While it is still active and supported by Microsoft, it has been deprecated with .NET 2.0. The current standard is to use the .BROWSER feature to indicate the list of user-agent strings. It's important to note that entries specified in the .BROWSER feature are merged with the contents of the BrowserCaps, so that both methods may be used.

.BROWSER provides a way of specifying the web browser user-agents via XML in separate files in C:\windows\Microsoft.NET\Framework\v2.0.50727\CONFIG\Browsers. After creating a .browser file, you can execute aspnet_regsql.exe to build the browser files into the global assembly, giving access to the list to all web applications. This allows you to add new entries to the list without restarting the web application process. The actual command line to use is: C:\WINDOWS\Microsoft.NET\Framework\<versionNumber>\aspnet_regsql.exe -i

The .browser feature provides a more seamless way of incorporating web browser detection into an ASP .NET application. However, at this time, a greater number of entries are available for the BrowserCaps method, which provides a more accurate detection method of web bots in the wild. Since both methods can be used together, there is no harm in combining them.

Perfecting Traffic Statistics with Web Bot Detection

One of the primary reasons to determine a web bot from a regular user's web browser is to allow for accurate recording of statistics. For example, when counting the hits to a particular page in an ASP .NET web application, the numbers would become skewed if you included hits from GoogleBot, Yahoo Slurp, and the many other web bots. By using the Request.Browser.Crawler value, we can easily detect a web bot from a user and provide a more accurate figure.

Cloaking Isn't Just in Star Trek

The discussion about web bot detection in C# ASP .NET web applications wouldn't be complete without briefly cautioning against displaying different content to web bots and regular user web browsers, also called cloaking. More specifically, cloaking is when your web application detects a web bot and shows a different page or content, with the goal of affecting search engine ranking. It's generally a rule of thumb to display the same content to web bots as you would to normal users and only use the web bot detection methods shown above for traffic statistical means or other behind-the-scenes activities.

Conclusion

The .NET Framework provides two powerful features for detecting the web browser client and determining web spiders from users' web browsers. .NET 1.0 provides the BrowserCaps feature, which can be updated regularly with new user-agent strings as they become available. .NET 2.0 provides the .BROWSER feature, in addition to the BrowserCaps feature, for incorporating new user-agent matches more seamlessly in web applications. By using web browser and web bot detection responsibly, you can help enhance web application traffic statistics and features, creating a more powerful and resiliant C# ASP .NET web application.

Share or Bookmark this post…
  • LinkedIn
  • Google
  • Facebook
  • NuJIJ
  • MySpace
  • del.icio.us
  • Technorati
  • Digg
  • DotNetKicks
  • Yahoo! Buzz
  • Yigg
  • E-Mail

Tags: ,

BlogEngine.NET | DotNetNuke | Security | Web IIS 6 - IIS 7

osCommerce - Open Source E-Commerce Solutions

June 14
by Marty Hermsen 14. June 2009 23:25

osCommerce has attracted a large growing e-commerce community that consists of over 212,700 store owners and developers who support each other and extend osCommerce Online Merchant with add-ons being contributed on a daily basis. To date there are over 5,500 add-ons that are available for free to customize osCommerce Online Merchant online stores and to help increase sales.

osCommerce Online Merchant is an Open Source online shop e-commerce solution that is available for free under the GNU General Public License. It features a rich set of out-of-the-box online shopping cart functionality that allows store owners to setup, run, and maintain online stores with minimum effort and with no costs, fees, or limitations involved.

With over 8 years of operation, osCommerce has built a showcase of over 14,100 online shops that have been voluntarily added to the live shops section, and powers many thousands of more online shops worldwide.

osCommerce Philosophy

Open Source software provides an opportunity for people to work on software with others that share the same interest, exchanging ideas, knowledge, and work with one another, to expand and improve the solution.

The motivation for working on Open Source software originates at different sources, which include working on the software for fun as a hobby, to make the software meet own requirements, and to bring commercial interest into the software.

It is this combination of motivations that has brought together a team of developers to successfully make what osCommerce is today - and what it will be in the future - and an active and growing community, with each person having their own unique requirements but ultimately sharing the same goal: to use the software and to make it a better solution.

Open Source software always remains open providing the opportunity for anyone that is interested to work on it, at any time.

Because Open Source software is open, it provides a choice. The choice to use the software, the choice to learn the software, and the choice to join, share, and participate in a community - a community full of enthusiastic supporters that want to see the software grow and succeed.

It is this very reason why Open Source software is successful, and most importantly, why it works.

click here works also very good on Windows 2008 and IIS7

Share or Bookmark this post…
  • LinkedIn
  • Google
  • Facebook
  • NuJIJ
  • MySpace
  • del.icio.us
  • Technorati
  • Digg
  • DotNetKicks
  • Yahoo! Buzz
  • Yigg
  • E-Mail

Tags:

ASP.NET | BlogEngine.NET | DotNetNuke | Web IIS 6 - IIS 7

the JavaScript InfoVis Toolkit

June 08
by Marty Hermsen 8. June 2009 16:02

The JavaScript InfoVis Toolkit provides tools for creating Interactive Data Visualizations for the Web.

A new project page where you can access all things related to this library: documentation, demos, tutorials, this blog, etc.
A complete API documentation generated with Natural Docs, with some Narrative Documentation and Syntax Highlighted Code Examples.
A Demos page where you can find some interactive library examples and you can browse through the examples code.

The JavaScript InfoVis Toolkit is now hosted at GitHub, so you can fork it and do whatever you like with it. You can also report bugs with the new issue tracker.


Code-Related

  • The library has been split into modules for code reuse.
  • All visualizations are packaged in the same file. You can create multiple instances of any visualization. Moreover, you can combine and compose visualizations. If you want to know more take a look at the Advanced Demos.
  • This Toolkit is library agnostic. This means that you can combine this toolkit with your favorite DOM/Events/Ajax framework such as Prototype, MooTools, ExtJS, YUI, JQuery, etc.
  • You can extend this library in many ways by adding or overriding class methods. The JavaScript InfoVis Toolkit has a robust (and private) class system, heavily inspired by MooTools’, that allows you to implement new methods in the same class without having to define any new Class extension. By creating mutable classes you can add new custom Node and Edge rendering functions pretty easily.
  • Custom visualizations are created by adding or changing Node/Edge colors, shapes, rendering functions, etc. You can also implement many controller methods that are triggered at different stages of the animation, like onBefore/AfterPlotLine, onBefore/AfterCompute, onBefore/AfterPlotNode, request, etc. You can also add new Animation transitions like Elastic or Back with easeIn/Out transitions. If you want to know more about these features please take a look at the Demos code.

As you can see, this new version has been built with four concepts/goals in mind: Modularity, Customization, Composition and Extensibility. I already explained some of these things in the previous post.

Hope you enjoy it. click here to go to www.thejit.org

Thanks to Nicolas Garcia Belmonte

 

Share or Bookmark this post…
  • LinkedIn
  • Google
  • Facebook
  • NuJIJ
  • MySpace
  • del.icio.us
  • Technorati
  • Digg
  • DotNetKicks
  • Yahoo! Buzz
  • Yigg
  • E-Mail

Tags:

ASP.NET | BlogEngine.NET | DotNetNuke | Web IIS 6 - IIS 7

New Webcrawl / SearchEngine ARACHNODE.NET application

March 29
by Marty Hermsen 29. March 2009 20:44

The first in the world .NET application written in C# for Webcrawling and SearchEngine within your Intranet or Internet.   Yes also Internet !

I wanna give it try....

 Click here to see what I mean with Arachnode

Share or Bookmark this post…
  • LinkedIn
  • Google
  • Facebook
  • NuJIJ
  • MySpace
  • del.icio.us
  • Technorati
  • Digg
  • DotNetKicks
  • Yahoo! Buzz
  • Yigg
  • E-Mail

Tags:

ASP.NET | DotNetNuke | Web IIS 6 - IIS 7

Dotnetnuke Skin error....help

January 12
by Marty Hermsen 12. January 2009 13:56

 

Someone who can help with this error ! please reply




In the error in the event viewer

 
Share or Bookmark this post…
  • LinkedIn
  • Google
  • Facebook
  • NuJIJ
  • MySpace
  • del.icio.us
  • Technorati
  • Digg
  • DotNetKicks
  • Yahoo! Buzz
  • Yigg
  • E-Mail

Tags:

DotNetNuke

Nieuwe Blog Site

November 30
by Marty Hermsen 30. November 2008 21:26
Ik kon het weer niet laten en heb een nieuwe .NET applicatie geinstalleerd genaamd BlogEngine.NET
Share or Bookmark this post…
  • LinkedIn
  • Google
  • Facebook
  • NuJIJ
  • MySpace
  • del.icio.us
  • Technorati
  • Digg
  • DotNetKicks
  • Yahoo! Buzz
  • Yigg
  • E-Mail

Tags:

DotNetNuke

About Me

My name is Marty Hermsen, 45 years young, living in the Netherlands, married with Denise for almost 16 years now, without children but with our 'child' dogs in the small village Kamerik near Woerden, between cows and cheaps, in the middle from nature.... a paradise in the dense populated area in the world...

I am working at Fortis Bank Netherland and ABN Amro as IT Architect with current activities in separation Fortis Netherlands and Fortis Belgium and in integration Fortis Bank Netherland with ABN Amro. Creating a new Enterprise Microsoft Windows Platform based on Windows 2008 and integrating webapplications, sharepoint etc etc.

Creating a newbank...

click here for more about me

Calendar

<<  March 2010  >>
MoTuWeThFrSaSu
22232425262728
1234567
891011121314
15161718192021
22232425262728
2930311234

View posts in large calendar

Google Reader Picks

Blogroll Others

Download OPML file OPML

Poll

No poll