[ic] Froogle.google.com anyone using this yet?

interchange-users@icdevgroup.org interchange-users@icdevgroup.org
Fri Dec 20 08:46:00 2002


On Fri, Dec 20, 2002 at 12:31:06AM -0500, Philip S. Hempel wrote:
> Has anyone gotten involved in using froogle?
> http://froogle.google.com
> 
> I have an account and am about to create the format
> that is in the pdf sent to me from google.

We're going there with all our catalogs.  I've still not
gotten the upload format from them.

> One of the criteria is that the descriptions are
> to not have any html code for the data upload.
> 
...

> Basic File Format
>  The basic file format has the following required parameters:
>  Tab-delimited text file
>  First line of the file is the header ? must contain field names, all 
>  lower-case
>  Use the field names from the table below, and in the same column order
>  One line per item (use a newline or carriage return to terminate the line)
>  File encoding is LATIN1 (ASCII is fine, as it is a subset of LATIN1)
>  The following field elements are forbidden as part of the basic format. If 
>  you want to include them, you must use the extended format. If you 
> accidentally include them as part of the basic format, products that 
> contain errors will be dropped from the feed.
>  Tabs, carriage returns, or newline characters may not be included inside 
>  any
> field, including the description.
>  Exactly one tab must separate each field. If there are extra tabs inserted
> between fields in a line, or at the end of a line, that product will be 
> dropped.
>  HTML tags, comments, and escape sequences may not be included ?
> description must be plain text.

That sounds easy.  Better than Amazon's "tab delimited" format.

> I am considering a couple of ways to do this and need some suggestions.
> 
> 1. Do a sql dump of the fields I need and run a script over the data
> to clean out html and other characters not allowed.
> 
> 2. Work this out through IC and produce the clean descriptions with IC.
> 
> I would almost think that it would be easier with IC with it's many ways
> of filtering content.

You could just do the mysqldump and filter.  It would also be
very straightforward with perl dbi where you have the data in sql.

[The problem you have is the HTML in the database.  That makes
it really hard to reuse.  You might want to consider ways of
getting HTML out of your raw data.]

SELECT field,field,field FROM products;
for @RESULT {
  print with tab delimiters
}

send

The hard part is the send.  We do this to half a dozen aggregators now
and none of them are the same.  Most of them are browser based.  One or
two ftp.  Rsync would be nice, especially where there might be lots of
images involved.  :-)


-- 

Christopher F. Miller, Publisher                               cfm@maine.com
MaineStreet Communications, Inc           208 Portland Road, Gray, ME  04039
1.207.657.5078                                         http://www.maine.com/
Content/site management, online commerce, internet integration, Debian linux