Robots.txt

Found an interesting thing:Robots.txt .After research, originally it is a tool that prevents robot to record some web pages or data from website. Avoid the web pages to be exposed and be searched by the internet user. For example there is a directory in your website root that be called "securitydir" and you don't hope that the directory can't be searched by internet user but other directories can be recorded by robot. So, you have to add a Robots.txt in your website root and then disallow "securitydir" to be searched.

The main semantics of the "robots.txt" file are "User-agent" and "Disallow".

User-agent:This is the name of the robot that the record is describing access policy for.

For example:

User-agent: *

Access all robot to record.

User-agent: abcd

Only allow the name robot "abcd" to record.

Disallow: This can be a full path or a partial path that is not to be visited.

For example:

Disallow: /securitydir

Disallow both /securitydir and /securitydir/index.html

Disallow: /securitydir/

Allow /securitydir.html but disallow /security/index.html

Disallow: /123abc.html

Disallow 123abc.html to be retrieved.

For two instances:

1 . No robot can retrieve any URLs that starting with "/user/admin" or "/tmp" or

/user.html

----------------

# Robots.txt for http://www.myexample.com

User-agent: *

Disallow: /user/admin

Disallow: /tmp

Disallow: /user.html

---------------

2. No robot can retrieve any URLs that starting with "/user/admin" or "/tmp", except the

robot called "Ican".

----------------

# Robots.txt for http://www.myexample.com

User-agent: *

Disallow: /user/admin

Disallow: /tmp

User-agent: Ican

Disallow:

-----------------

Try it. ^^


發現了一個有趣的東西:Robots.txt .在研究一番之後,原來他是防止網路機器人Robot收錄網站某些網頁或資料的一種工具,避免網路使用著利用搜尋引擎去搜尋到不想曝光的網頁。

舉例來說,在您的網頁根目錄裡有個資料夾叫做"securitydir"是您不想被收錄索引的資料夾,您不希望被網路使用著搜尋到,但是對於其他的網頁資料卻可以開放被索引,這時候您就要再網站根目錄裡放一個robots.txt.的檔案,以達到上述的目的。

Robots.txt的主要語法有兩個:"User-agent" 和 "Disallow" 。

User-agent:這主要設定允許的robot可以進行收錄的動作,此為robot的名字

例如:

User-agent: *

允許所有的robot執行動作

User-agent: abcd

只允許名為abcd的robot進行動作

Disallow: 此為設定以哪些路徑或部分路徑開頭的URL完全禁止,也可以是檔案。

例如:

Disallow: /securitydir

不允許securitydir及securitydir以下的檔案被檢索

Disallow: /securitydir/

允許/securitydir.html 但是不允許 /security/index.html被檢索

Disallow: /123abc.html

不允許123abc.html 該檔案被檢索

再舉兩個完整的例子:

1 . 不允許 "/user/admin" 及 "/tmp" 開頭的URL和 /user.html被檢索,設定內容如下

----------------

# Robots.txt for http://www.myexample.com

User-agent: *

Disallow: /user/admin

Disallow: /tmp

Disallow: /user.html

---------------

2. 除了名為Ican的robot可以進行檢索 "/user/admin" 和 "/tmp"之外,其他robot一概被禁止。

----------------

# Robots.txt for http://www.myexample.com

User-agent: *

Disallow: /user/admin

Disallow: /tmp

User-agent: Ican

Disallow:

-----------------

試試看吧! ^^

0 Responses