Re: POESIA - an opensource Internet content filtering project

From: Alex Rousskov <rousskov@dont-contact.us>
Date: Fri, 15 Feb 2002 09:38:57 -0700 (MST)

Dear Basile,

        Given your interest in response bodies in addition to URLs, I
would suggest to look at the ICAP protocol. If possible, designing
around a protocol such as ICAP is probably better than inventing your
own protocol. If nothing else, it will buy you compatibility with
other intermediaries that [will] support ICAP.
        
 http://www.i-cap.org/home.html
 http://www.google.com/search?q=%22Internet+Content+Adaptation+Protocol%22

If you decide to follow the ICAP route, your team will need to add
ICAP support to Squid. I am sure developers on this list will help you
with key places to start looking at (at least). You may want to look
at previous posting on squid-dev and squid-users mailing lists. The
latter is searchable:
        http://list.cineca.it/cgi-bin/wa?S2=squid&q=ICAP
        http://list.cineca.it/cgi-bin/wa?A2=ind0103&L=squid&P=R105601

Spending about 1 million Euro per year to "protect European youth from
harmful Internet content" is an impressive burn rate. I can only hope
that the future users of POESIA will define "harm" reasonably.

Alex.

On Fri, 15 Feb 2002, STARYNKEVITCH Basile wrote:

>
> [[an email to the Squid cache developers' mailing list, with copy to the
> POESIA mailing list]]
>
> Dear All Squid Developers,
>
> It is my pleasure to announce (as the technical coordinator) to the
> Squid developers team the start of the POESIA project
>
> Public Opensource Environment for a Safer Internet Access
> (IAP2117/27572)
>
> POESIA is an opensource (using the GNU General Public Licence)
> Internet Content filtering project, with partial funding from the
> European Commission, undeer the Safer Internet Action Plan (INFOSOC
> DG) = IAP. Total POESIA project budget is more than 1.9million Euro,
> with an E.C. funding of 1.02million Euro. Motivations of the European
> safer Internet Action Plan includes protection of European youth from
> harmful Internet content.
>
> The POESIA project started on february 04th 2002 and should last 24
> months.
>
> The abstract of the project is available on the following European
> Commission page:
>
> http://www.europa.eu.int/information_society/programmes/iap/projects/filtering/poesia/index_en.htm
>
> The following 2 paragraphs are copied from the above mentioned page
>
> Development covers the creation of a library of filtering components,
> and the extension of existing Internet related open-source software to
> use this library. Library components will provide a set of two-layered
> (crude/elaborate) filtering functions covering multiple filtering
> modes (e.g. images, natural language text, URLs, etc). Adaptative
> decision taking mechanisms will combine the output of these components
> to deliver a final filtering decision. POESIA uses caching (extending
> the open-source Squid cache) both for Internet content and for
> filtering scores, enabling mutualization of filtering costs and hence
> the use of more expensive filtering techniques. Communication
> mechanisms will be developed so that several POESIA systems in the
> same area can communicate to share their cached contents and scores.
>
> Filtering will cover a range of modes, including image filtering,
> natural language text filtering, URL, PICS and JavaScript
> filtering. [...]
>
> It should be noted that POESIA will incorporate highly innovative
> technologies (including natural language processing, image processing,
> static analysis), well ahead of the usual positive|negative URL or
> keywords lists techniques used in most other filters.
>
> The POESIA project will soon have its web site on
> www.poesia-filter.org - this web site will probably be available on
> march 2002.
>
> POESIA typical use should be in educational settings, for instance as
> a proxy&firewall&filter between an Internet connection and a
> classroom. POESIA aims to run on a PC/Linux.
>
> POESIA will (very probably) extend the Squid cache. Since the project just begin, we
> do not have yet a definitive architectural design of POESIA.
>
> My first tentative impressions (looking into Squid-cache version
> squid-2.5.PRE3-20020210) are that we might consider a shallow
> extension of Squid cache which:
>
> communicates with the POESIA master process (e.g. thru Unix named
> pipes) which does the bulk of the content filtering.
>
> stores and manage the POESIA filtering scores in addition of the cached
> content
>
> sends (when so requested) to the POESIA master process any needed
> content
>
> sends (when available) to the POESIA master process the filtering
> scores or otherwise request the POESIA master process to compute
> them
>
> recieves from the POESIA master process a filtering decision
> (accept/reject a content) - so that filtering decision is viewed from Squid
> as an extension of Squid's access control lists
>
> communicate with other fellow Squid+Poesia systems
>
> Is the Squid cache developer team interested in having such extensions
> into Squid (or do POESIA have to fork its own branch of Squid?)
>
> Do the few ideas above appear compatible with the current Squid
> design?
>
> Given the above tentative ideas, which part of Squid should be patched
> (we already begin to work on this but obviously will appreciate any
> help or hints)?
>
> Regards
>
> N.B. Any opinions expressed here are only mine, and not of my organization.
> N.B. Les opinions exprimees ici me sont personnelles et n engagent pas le CEA.
>
> ---------------------------------------------------------------------
> Basile STARYNKEVITCH ---- Commissariat à l Energie Atomique * France
> DRT/LIST/DTSI/SLA * CEA/Saclay b.528 (p111f) * 91191 GIF/YVETTE CEDEX
> work email: Basile point Starynkevitch at cea point fr
> home email: Basile at Starynkevitch point net
>
>
Received on Fri Feb 15 2002 - 09:39:03 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:14:48 MST