|
Link spam (also called blog spam or comment spam) is a form of spamming or spamdexing that recently became publicized most often when targeting weblogs (or blogs), but also affects wikis (where it is often called wikispam), guestbooks, and online discussion boards. Any web application that displays hyperlinks submitted by visitors or the referring URLs of web visitors may be a target. A KMail folder full of spam emails collected over a few days. ...
Spamdexing or search engine spamming is the practice of deliberately and dishonestly manipulating search engines to increase the chance of a website or page being placed close to the beginning of search engine results, or to influence the category to which the page is assigned in a dishonest manner. ...
A weblog (now more commonly known as a blog) is a web-based publication consisting primarily of periodic articles (normally, but not always, in reverse chronological order). ...
Wikibooks has more about this subject: Wiki Science A wiki (IPA: <wee-kee> or <wick-ey> (according to Ward Cunningham) is a type of website that allows users to add and edit content and is especially suited for constructive collaborative authoring. ...
A hyperlink, or simply a link, is a reference in a hypertext document to another document or other resource. ...
Referer is a common misspelling of the word referrer, so common in fact that it made it into the official specification of HTTP - the communication protocol of the world wide web. ...
Adding links that point to the spammer's web site increases the page rankings for the site in the search engine Google. An increased page rank means the spammer's commercial site would be listed ahead of other sites for certain Google searches, increasing the number of potential visitors and paying customers. PageRank, sometimes abbreviated to PR, is a family of algorithms for assigning numerical weightings to hyperlinked documents (or web pages) indexed by a search engine originally developed by Larry Page (thus the play on the words PageRank). ...
For the search engine produced by this company, see Google search; for the underlying technology, see Google platform; for other uses see Google (disambiguation). ...
For the search engine produced by this company, see Google search; for the underlying technology, see Google platform; for other uses see Google (disambiguation). ...
History
Link spamming originally appeared in internet guestbooks, where spammers repeatedly fill a guestbook with links to their own site and no relevant comment to increase search engine rankings. If an actual comment is given it is often just "cool page", "nice website", or keywords of the spammed link. <a href=http://www. ...
In 2003, spammers began to take advantage of the open nature of comments in the blogging software like Movable Type by repeatedly placing comments to various blog posts that provided nothing more than a link to the spammer's commercial web site. Jay Allen created a free plugin, called MT-BlackList, for the Movable Type weblog tool that attempts to alleviate this problem. Many current blog software now have methods of preventing or reducing the effect of blog spam. 2003 (MMIII) is a common year starting on Wednesday of the Gregorian calendar. ...
A weblog (now more commonly known as a blog) is a web-based publication consisting primarily of periodic articles (normally, but not always, in reverse chronological order). ...
Movable Type is a proprietary weblog publishing system developed by California-based Six Apart. ...
Migration to wikis Because of prevention improvements in blog software link spam is now increasingly concentrated on wikis around the World Wide Web including Wikipedia, the largest wiki on the Internet (see [1]). Wiki spam sometimes only appears on a wiki's sandbox page, but is often found defacing multiple pages. The website chongqed.org lists URLs of offending link spammers. Wikibooks has more about this subject: Wiki Science A wiki (IPA: <wee-kee> or <wick-ey> (according to Ward Cunningham) is a type of website that allows users to add and edit content and is especially suited for constructive collaborative authoring. ...
Graphic representation of the World Wide Web around Wikipedia The World Wide Web (WWW or simply the Web) is an information space in which the items of interest, referred to as resources, are identified by global identifiers called Uniform Resource Identifiers (URIs). ...
The Wikipedia logo. ...
A Uniform Resource Locator, URL (spelled out as an acronym, not pronounced as earl), or Web address, is a standardized address name layout for resources (such as documents or images) on the Internet (or elsewhere). ...
Possible solutions Instead of displaying a direct hyperlink submitted by a visitor, a web application could display a link to a script on its own website that redirects to the correct URL. This will not prevent all spam since spammers do not always check for link redirection but has proven very effective. Redirecting links prevent Google from factoring the link in its PageRank algorithm for that site making the spam ineffective. An added benefit is that the redirection script can count how many people visit external URLs, although it will increase the load on the site. A Uniform Resource Locator, URL (spelled out as an acronym, not pronounced as earl), or Web address, is a standardized address name layout for resources (such as documents or images) on the Internet (or elsewhere). ...
PageRank, sometimes abbreviated to PR, is a family of algorithms for assigning numerical weightings to hyperlinked documents (or web pages) indexed by a search engine originally developed by Larry Page (thus the play on the words PageRank). ...
Another option is for the script to be client-side JavaScript. For example, JavaScript is an object-based scripting programming language based on the concept of prototypes. ...
<a href="javascript:window.location.href='http://www.wiki.org'">Link</a> would work as a link but not be picked up by Google. Moreover, the javascript could be more complicated to ensure that the link would never be picked up since it was encoded. For example, In cryptography, encryption is the process of obscuring information to make it unreadable without special knowledge. ...
<a href="javascript:redirectFunction('hfksksgjlsll')">Link</a> where 'hfksksgjlsll' is an encoded URL that is decoded by the javascript function redirectFunction which presumably is stored in the HEAD tag of the page. A downside of this is that visitors who have disabled Javascript in their browser would be unable to follow the links. A Uniform Resource Locator, URL (spelled out as an acronym, not pronounced as earl), or Web address, is a standardized address name layout for resources (such as documents or images) on the Internet (or elsewhere). ...
In computer science, a subroutine (function, procedure, or subprogram) is a sequence of code which performs a specific task, as part of a larger program, and is grouped as one or more statement blocks; such code is sometimes collected into software libraries. ...
This article is about HTML elements. ...
This kind of redirection can also be done via the .htaccess file in Apache, thus saving the load of a script. .htaccess (Hypertext Access) is the default name of Apaches directory-level configuration file. ...
Apache HTTP Server is an open source HTTP web server for Unix platforms (BSD, Linux, and UNIX systems), Microsoft Windows, and other platforms. ...
Another way of preventing PageRank leakage without using client-side JavaScript or .htaccess file is the public redirection service like a TinyURL or My-Own.Net. For example, PageRank, sometimes abbreviated to PR, is a family of algorithms for assigning numerical weightings to hyperlinked documents (or web pages) indexed by a search engine originally developed by Larry Page (thus the play on the words PageRank). ...
JavaScript is an object-based scripting programming language based on the concept of prototypes. ...
.htaccess (Hypertext Access) is the default name of Apaches directory-level configuration file. ...
Shortcut: WP:R or WP:RDR (To go to an interwiki redirect click http://en. ...
TinyURL is a popular service providing short aliases to long URLs, and it also functions as a general term for such services. ...
<a href="http://my-own.net/alias_of_target" rel="nofollow" >Link</a> where 'alias_of_target' is the alias of target address.
No follow In early 2005 Google introduced an HTML attribute that disables the assignment of ranking credits for a particular link. This is a much easier solution that makes the improvised techniques above irrelevant. Most weblog software now comes with this enabled by default (and no option to disable it without code modification) adding the nofollow attribute to reader-submitted links: <a href="http://www.wiki.org/" rel="nofollow">Link</a> However, some weblog authors object to using the attributes, due to concerns over the motives for its introduction (the large amount of inter-linking between blogs makes search engine algorithms less accurate) and its effectiveness, since a spambot does not know whether its target is using 'nofollow' or not.
Turing tests Various methods requiring humans to do spamming by hand have been attempted. A variety of captcha gateways have been implemented, in an effort to prevent bots from submitting entries. Drawbacks to this are the annoyance it poses for regular users, the lack of any alternative for visually impaired users, and the ability of some advanced bots to fool simple captchas most of the time. A captcha (an acronym for completely automated public Turing test to tell computers and humans apart) is a type of challenge-response test used in computing to determine whether or not the user is human. ...
Specific anti-spam methods Particularly popular software products such as Movable Type and MediaWiki have developed their own custom anti-spam measures, as spammers focus more attention on targeting those platforms. Whitelists and blacklists that prevent certain IPs from posting, or that prevent people from posting content that matches certain filters, are common defenses. More advanced access control lists require various forms of validation before users can contribute anything like linkspam. Movable Type is a proprietary weblog publishing system developed by California-based Six Apart. ...
MediaWiki is a Wiki software package licensed under the GNU General Public License. ...
The access control list (ACL) is a concept in computer security used to enforce privilege separation. ...
The goal in every case is to allow good users to continue to add links to their comments, as that is considered by some to be a valuable aspect of any comments section.
RSS feed searching Some wikis allow you to access an RSS feed of recent changes. If you add that to your news reader and set up a smart search for common spam terms (usually viagra and other drug names) you can quickly identify and remove the offending spam. Sildenafil citrate, sold under the name Viagra, is a drug used to treat male erectile dysfunction (impotence), developed by the pharmaceutical company Pfizer. ...
External links - Anti-spam Features of MediaWiki
- Six Apart Comment Spam Guide, fairly broad overview from Movable Type's authors.
- The (Evil) Genius of Comment Spammers, an article on link spam from Wired magazine.
- Gilad Mishne, David Carmel and Ronny Lempel: Blocking Blog Spam with Language Model Disagreement, PDF. From the First International Workshop on Adversarial Information Retrieval (AIRWeb'05) Chiba, Japan, 2005.
- Spam Huntress The Norwegian Spam Huntress - Ann Elisabeth
|