Web Page Similarity Searching Based on Web Content

Satiabudhi, Gregorius and Andjarwirawan, Justinus and SETIADI, RUBIA SARI (2012) Web Page Similarity Searching Based on Web Content. In: 3rd International Conference on Soft Computing, Intelligent System and Information Technology, 24-05-2012 - 25-05-2012, Kuta Bali - Indonesia.

[img] PDF
Download (609Kb)


    Application that discussed in this paper is able to perform the process of finding web pages that have similar content to the url of the desired web page. Also developed an automated process for crawling web pages. This crawling process will continue since the process is activated. The search process begins by entering a url and web page url is obtained from the extract to get the key words that represent the web page. The keywords will be processed into a basic form using the Porter Stemmer algorithm. TF-IDF method used to obtain the importance of a keyword. Furthermore Jaccard Coefficient formula used to find similarity between web pages. Applications are limited to Web Page in English. Based on test results concluded that this application has worked well and can be utilized.

    Item Type: Conference or Workshop Item (Paper)
    Uncontrolled Keywords: Web Page Similarity, Crawler, TF-IDF, Porter Stemmer, Jaccard Coefficient, Keyword Extraction
    Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
    Divisions: Faculty of Industrial Technology > Informatics Engineering Department
    Depositing User: Admin
    Date Deposited: 05 Nov 2012 17:22
    Last Modified: 05 Nov 2012 17:22
    URI: http://repository.petra.ac.id/id/eprint/15790

    Actions (login required)

    View Item