NSLU2-Linux
view · edit · print · history

I've a problem like most *FOSS* folk, I don't like "not having the option" of what I can't have (on my platform of choice).

Well, even in the different Linux distros, not all things are [compiled] equally. The NSLU2 Package search is nice, but I need a SQL prompt for this endeavor. This project can even be done on the NSLU2 itself - but doesn't have to. I wrote & tested it under Cygwin, while I use it on a Ubuntu machine.

Armed with a itch to scratch, Perl with SQLite, wget/curl and a shell prompt, I've created some instruction how you can datamine the Packages files for apps. Maybe you need a more powerful search prompt. Or maybe you're like me and are considering what's worth compiling from the other NSLU2-Linux "distros". For me, I saw several packages on the OpenWRT? not in Optware I wanted try out without converting to OpenWRT? (yet).

Next I thought about how to share any resulting packages I might compile. I figured, anything I build for should be Optware as almost all the NSLU2 community can optional tap that feed. You might want to consider it too.

http://www.nslu2-linux.org/wiki/Optware/AddAPackageToOptware

This might also be a way for you find that perfect, simple project to start your porting experience with. Let's continue to finding what could be ported to Optware.

First, we need to make directory for the files & scripts being leveraged.

   mkdir datamine-ipkgdb
   cd datamine-ipkgdb

Second, save the attached scripts there (located at the bottom of this page)

Both of these scripts require the Perl modules: DBI, DBD::SQLite, Data::Dumper. You may need to install DBI & DBD::SQLite using ipkg, your desktop's repository of choice or CPAN. Try your repository for a pre-compiled version first. If using CPAN, you'll need a compiler to build these modules.

   build-ipkgdb.pl 
   quickquery.pl

Also, you'll need some sort of SQL prompt to get your data out. I've included the script quickquery.pl for command line usage without installing SQLite. Its stdout is TAB delimited unless you turn on DEBUG for Data Dumper output.

If you rather put it into a MySQL or PostreSQL database, change your DBD connector and CREATE TABLE command. Watch out for the refresh via file deletion later though.

Third, get some Packages files for processing.

I'm using wget but the equivalent can be done in curl. Be sure to preserve the mirrored path. This is important since the Path contains information about being unstable, cross compiled/native, etc. Add or remove feeds you are interested in.

wget -m -np -c \
ipkg.nslu2-linux.org/feeds/openwrt/kamikaze-7.09/Packages \
ipkg.nslu2-linux.org/feeds/openwrt/unstable/Packages \
ipkg.nslu2-linux.org/feeds/optware/nslu2/cross/stable/Packages \
ipkg.nslu2-linux.org/feeds/optware/nslu2/cross/unstable/Packages \
ipkg.nslu2-linux.org/feeds/optware/nslu2/native/stable/Packages \
ipkg.nslu2-linux.org/feeds/optware/nslu2/native/unstable/Packages \
ipkg.nslu2-linux.org/feeds/optware/openwrt-ixp4xx/cross/unstable/Packages \
ipkg.nslu2-linux.org/feeds/optware/slugosbe/cross/unstable/Packages \
ipkg.nslu2-linux.org/feeds/slugosbe/cross/4.8-beta/Packages \
ipkg.nslu2-linux.org/feeds/slugosbe/cross/stable/Packages \
ipkg.nslu2-linux.org/feeds/slugosbe/cross/unstable/Packages \
ipkg.nslu2-linux.org/feeds/slugosbe/oe/Packages \
ipkg.nslu2-linux.org/feeds/slugosbe/unstable/Packages 

Fourth, run the script:

   find ipkg.nslu2-linux.org -name Packages | xargs perl ./build-ipkgdb.pl

On my 1.6 Ghz dual core, it took around 15 seconds to build a 21 meg database from 26,243 package details from the above feeds.

Please note: There will be near duplicate entries occasionally if a package appears in multiple Package files ... and that's okay. Just tuck that info away for now.

BTW, WARNING! I've written the script to delete the original database and recreate it every time so it's fresh and compact. If you commit any changes to your copy, you may want to back it up, rename it, whatever. It's faster than dropping the table and making another one without having to worry if I need repack the database.

Fifth, almost there, getting to the data:

GUI interface to SQLite3 files can be found here (some will even display graphics/content stored in a BLOB - cool):

   http://www.sqlite.org/cvstrac/wiki?p=ManagementTools

Personally, I've been using the Firefox extension "SQLite Manager" quite well. The author, Mrinal Kant, updates it regularly and is actively engaged in his forum for it.

   https://addons.mozilla.org/en-US/firefox/addon/5817

For those wanting a traditional SQL prompt, you can install SQLite3 itself. But, as I mentioned in Step 2, I've included quickquery.pl. It's a simple Perl script to execute a single SQL command against the created database.

perl ./quickquery.pl "select package, version, architecture, description from ipkgs where package like 'ipkg%';"

The output is a Perl Data Dump. Redirect as necessary to | more or > results.txt.

Finally, let's go digging for gold.

The CREATE TABLE lists out the schema. For those new to SQL, these are the fields/keys/columns you can search against.

CREATE TABLE ipkgs (id INTEGER PRIMARY KEY, Package, Version, Depends, Provides, Suggests, Recommends, Replaces, Conflicts, Section, Architecture, Maintainer, MD5Sum, Size, Filename, Source, Description, OE, HomePage, Priority, xdistro, xplatform, xcompiled, xstate)

The "x-----" columns are generated from the URL paths in step 3 - hence the mirroring.

Here's some practical example SQL statements. Let's start with Music.

You are wanting to find packages that handle MP3 files in all the distros:

select package, version, xdistro, xstate from ipkgs where description like '%mp3%' order by package asc

What if you wanted to find all the Streaming packages in OpenWRT?, but not in Optware or the SlugOS/BE feeds (the word Stream could be capitalized so I'll leave off "S" for simplicity):

SELECT package, version, description FROM ipkgs where 
description like '%tream%'
and xdistro = 'openwrt' 
and xstate = 'unstable' 
and package not in (
    select package from ipkgs 
    where xdistro = 'optware' or xdistro = 'slugosbe'
) 

Some of these queries could have lots of OR logic when wild carding (you can't easily wild card using an "in" list). It's not a problem unless you need to group your OR with parens to control precedence. I'll let you toy around with search for all things audio/music based if you're interested. Descriptions could contain any of the following terms and then some:

   aac, audio, mp3, media, music, ogg, sound, stream, wave

You can find a fair about of SQL help here:

   http://www.sqlite.org/lang_expr.html

As for all the packages in OpenWRT? not found in Optware or SlugOS(/BE), here's a number to chew on. 779 unique "unstable" packages. Not all of them should be ported. Some of them will have the package name slightly different from how the other feeds represent them. But, there's quite a few listed that would make a nice addition to the Optware feeds.

SELECT count(package) FROM ipkgs where 
xdistro = 'openwrt' 
and xstate = 'unstable' 
and package not in (select package from ipkgs where xdistro = 'optware' or xdistro = 'slugosbe') 
order by package asc

Script Enhancements

Here's some future enhancements that would be nice. If you get to them first, please update the above script for others to use.

  • Cleaning up this hack. (Does one really need to optimize a hack like this? ;-)
  • Process the gzip'd Packages.gz instead
  • Simplify and automate the script so it leverages your installed /etc/ipkg.conf, /opt/etc/ipkg.conf, cached Package files and it's own config file for those feeds not in your ipkg.conf files.
  • Should it automatically get & mirror the Package/Package.gz files?

The scripts

build-ipkgdb.pl

#!/usr/bin/perl

use strict;
use Data::Dumper;
use DBI; # I'll be using DBD::SQLite for simplicity & portability/sharability

# Feel free to edit these two. If you change the $dbname you'll
# also want to change the quickquery.pl also.
my $dbname = qw(./ipkgdb.sqlite3);
my $DEBUG = undef;

#----------------------------------------
# Tinkerers can muck up beyond this point
#----------------------------------------
$Data::Dumper::Indent = "1"; # Keep output readable, but use less space

my @fields = qw(Package Version Depends Provides Suggests 
    Recommends Replaces Conflicts Section Architecture 
    Maintainer MD5Sum Size Filename Source Description 
    OE HomePage Priority);
my @pathinfo   = qw(xdistro xplatform xcompiled xstate);
my $ipkgkeys   = join( "|", @fields );
my $reipkgkeys = qr/$ipkgkeys/;
my $redistro   = qr/openwrt|optware|slugosbe/;
my $replatform = qr/nslu2|openwrt-ixp4xx|kamikaze-7.09|oe/;
my $recompiled = qr/cross|native/;
my $restate    = qr/4.8-beta|stable|unstable/;

my $dbh;

# Remove the previous database.
unlink $dbname;
if ( !-f $dbname ) { `touch $dbname`; }

$dbh =  DBI->connect( "dbi:SQLite:dbname=$dbname", "", "",
    { RaiseError => 1, AutoCommit => 0 } )
    or die
    "ERROR: Non-existant $dbname or other error.\nERROR: $!\n";

push( @fields, @pathinfo );
my $columns = join( ",", @fields );
$dbh->do("CREATE TABLE ipkgs (id INTEGER PRIMARY KEY, $columns)");
$dbh->commit;

# split string for web appearance
my $sql ="INSERT INTO ipkgs VALUES ";
$sql .= "(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)";

my $sth = $dbh->prepare($sql);

my $aofh = undef; # for Debugging data with Dumper
my $h = {};
my $i = 0;

foreach my $file (@ARGV) {
    print "Processing: $file\n";
    open( FH, "$file" );
    my $k;

    while (<FH>) {
        chomp;
        if ( $_ =~ /^($reipkgkeys):\s+?(.*)/x ) {
            $k = $1;
            my $v = $2;
            $h->{$k} = $2;

            # Due to scope issues affecting the $h structure, these have
            # been placed in this block. The regex switch 'x' is added to
            # to make up for performance loss - being non-changing.
            if ( $file =~ /$redistro/x )   { $h->{xdistro}   = $&; }
            if ( $file =~ /$replatform/x ) { $h->{xplatform} = $&; }
            if ( $file =~ /$recompiled/x ) { $h->{xcompiled} = $&; }
            if ( $file =~ /$restate/x )    { $h->{xstate}    = $&; }
        }
        elsif ( $_ =~ /^(\s+?.+)/ ) {
            $h->{$k} .= $2;    # cat the subsequent line on to the previous key.
        }
        elsif (( $_ =~ /^$/ ) && (defined $h)){
                push(@$aofh, $h) if (defined $DEBUG);
                # Setting up our column values to insert
                my $col = 1;
                $sth->bind_param( $col, undef );
                foreach (@fields) {
                    $col += 1;
                    $sth->bind_param( $col, $h->{$_} );
                }
                $sth->execute();

                if ($@) {
                    warn "Database error: $DBI::errstr\n";
                    $dbh->rollback(); # just die if rollback is failing
                }
            $h = undef;
        }
    }
    $dbh->commit(); # Commit this Package file's info to the table
    close(FH);
}
$sth->finish();
$dbh->disconnect();
print Dumper $aofh if (defined $DEBUG);

quickquery.pl

#!/usr/bin/perl

use strict;
use Data::Dumper;
use DBI; # I'll be using DBD::SQLite for simplicity & portability/sharability

my $dbname = qw(./ipkgdb.sqlite3);
my $DEBUG = undef;

$Data::Dumper::Indent = "1"; # Keep output readable, but use less space

my $dbh =  DBI->connect( "dbi:SQLite:dbname=$dbname", "", "",
    { RaiseError => 1, AutoCommit => 0 } )
    or die
    "ERROR: Non-existant $dbname or other error.\nERROR: $!\n";

foreach (@ARGV) {
    chomp;
    my $rows = $dbh->selectall_arrayref($_);
    if ( ! defined $DEBUG) {
        # output TAB delimited
        foreach my $row (@$rows) { print join("\t",@$row) . "\n"; }
    }
    else { print Dumper $rows ; }
}

$dbh->disconnect();
view · edit · print · history · Last edited by MarkStinson.
Originally by MarkStinson.
Page last modified on January 09, 2008, at 09:43 PM