RE: OFF-TOPIC: Help on script from Nottingham, Mark (Australia) on 1998-10-05 (squid-users)

From: Nottingham, Mark (Australia) <mark_nottingham@dont-contact.us>
Date: Mon, 5 Oct 1998 16:04:49 +1000

This is totally from memory, and untested, but it should get you
started. Requires:
Python http://www.python.org/
WebLog classes http://www.pobox.com/~mnot/script/python/WebLog/

to use: cat access_log | ./script_filename

<----- cut here ----->
#!/usr/bin/env python

from weblog import squid, url
import sys

o_log = squid.AccessParser(sys.stdin)
log = url.Parser(o_log)

domains = {}
while log.getlogent():
        if log.log_tag is not 'TCP_DENIED':
                domain = make_domain(log.url_host)
                domains[domain] = domains[domain] + 1

for domain in domains.keys():
print domain

make_domain(host):
''' take a fully-qualified hostname and return the domain. Many
ways to do this. '''

        import string
        parts = string.split(host, '.')
        if parts[-1] in ['com', 'org', 'net', 'edu', 'gov', 'mil']:
                return string.join(parts[-2:], '.')
        else:
                return string.join(parts[-3:], '.')
<----- cut here ----->

> -----Original Message-----
> From: Francis A. Vidal [mailto:francis@usls.edu]
> Sent: Monday, October 05, 1998 3:45 PM
> To: Squid Users List
> Subject: OFF-TOPIC: Help on script
>
>
> hello everyone,
>
> i'm trying to build a list of sites that i want to ban. i'm
> getting the
> list from the logfile of all the sites that have been visited by all
> users.
>
> this is the format of the logfile:
>
> 907389399.705 61 192.168.2.57 TCP_HIT/200 2172 GET
> http://www.excite.com/pfp/excite/images/big_logo.gif - NONE/-
> image/gif
>
> can someone help me on creating a script that will extract all domains
> that has no TCP_DENIED tag to a file with no duplication? i'm
> not familiar
> with sed, gawk or perl so i need your help on this.
>
> i would like the format to be (from the above example) one domain per
> line:
>
> excite.com
>
>
>
> thanks!
>
> ---
> u s l s N E T university of st. la salle, bacolod city, philippines
> . . . . . . . PGP key at ftp://ftp.usls.edu/pub/pgpkeys/francis.pgp
> francis vidal tel. nos. (6334).435.2324 / 433.3526
>
Received on Sun Oct 04 1998 - 23:06:31 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:42:20 MST