Proposal for LAIR: Lisp All-Inclusive Repository

====
LAIR
====

I propose that the Lisp community build the Lisp All-Inclusive Repository (LAIR). In a nutshell, it's like CPAN: the place to go to get libraries and such. It's "all-inclusive" in the sense of providing this service for all members of the Lisp family, including Scheme and Dylan and Clojure.

I wrote this after being inspired by the "Clornucopia" idea from Clozure Associates LLC. However, this is my own proposal. I am grateful for Clozure's inspiration, but I'm speaking only for myself.

Quotation
=========

"Put the source code out there and let all persons play with it. Have a person in charge who is a quick judge of good work and who will take it in and shove it back out fast. You don't have to use what he ships, and you don't have to give back your work, but he gives all persons a fast way to spread new code to those who want it."

-- Guy Steele, "Growing a Language", Oct 1998

The High-level Goals
====================

Make Lisp a more attractive choice of language(s), by helping to solve one of its most important perceived drawbacks: the lack of a robust set of libraries and a way to find them and select among them. (Different Lisp family members suffer from this problem to a greater or lesser degree.) Increase the size of the Lisp community.

As the size of the Lisp community increases, we'll get more libraries, more tools, better testing, and more portability. Lisp will be perceived less as "dead" and more as "constantly improving". All of this will, in turn, grow the community even more, which will get us more and better libraries.

Provide the kind of benefits that CPAN provides for the Perl community; be just as good as CPAN at the things it does. This repository can be thought of as "CPAN For Lisp".

The Problem We're Addressing
============================

The most prominent complaint about Common Lisp in particular, and a main barrier to choosing Common Lisp, is the perceived lack of libraries for commonly-needed abilities. Here are some of the ones mentioned most often, although this not an exhaustive list:

* Extensible (user-defined) streams
* Threads and concurrency control mechanisms
* Modern networking: e.g. sockets, TCP and HTTP client and server, URL's, email, etc.
* Web Services (WSDL, WS-*, etc.)
* Relational database access
* Persistence (Lisp-friendly)
* Meta-object protocol (for the whole language, not just CLOS)
* System definition facility
* Better module facility
* More general-purpose access to the operating system's facilities
* XML
* Math
* Graphics
* GUI frameworks, platform-independent
* Ajax
* Text manipulation
* High-performance (asynchronous) I/O
* Access to printers
* Internationalization
* Unicode strings
* Generating HTML
* X Window System
* Foreign function interface
* Regular expressions
* Delivery of packaged-up applications

Is the perception reality? There are a lot of libraries available at sites like cl-user.net and cliki.net. But there are many problems for users:

* Users have to know where to find sites like cl-user.net and cliki.net
* Then they have to pick which library to use, wondering:
* Which ones are actively maintained?
* Are bugs being fixed?
* Is it kept up-to-date with the changing ecosystem?
* Which ones are well-debugged?
* Which ones are high-performance?
* Is the documentation good?
* Not all libraries work for all Common Lisp implementations (portability)
* Some libraries depend on specific versions of other libraries
* It's painful to follow these dependencies
* Sometimes you find a dependency on another library, but:
* That version no longer exists (it has happened to me)
* The host that the library lives on is down (it has happened to me)

It's also unfortunate that when there are many libraries to do the same thing, the effort of the open source community gets diluted.

If you want evidence that this is a real problem, I can show you many, many blog comments, discussion threads, mail from Lisp user's groups, and so on.

What It Provides
================

Provide a repository of libraries, addressing at least the areas listed above, that are, to the greatest extent possible:

* Actively maintained
* Well-debugged
* High-performance
* Portable
* Well-documented

The repository stores the following:

* The source code itself, for open-source libraries
* Binaries already compiled for various platforms
* Documentation
* Unit test suite
* Home page URL
* Names of mailing lists
* Author
* Maintainer, if that's not the author
* License
* Tags, categories, etc.
* What implementations it runs on (if not all) (whatever is known about this).
* Distributions for specific operating systems and architectures (needed rarely but sometimes).
* Release notes
* Pricing info, if any, or where to find the pricing info
* How to get help or file bug reports
* Dependencies: What libraries (at what versions) is needs
* Multiple versions of the above. Some metadata is per-library, some per-version.
* Comments by users, essentially a little discussion as found in a blog. Users can share their experiences about whether this library is good or bad at doing certain things, or discuss, endorse, and rate libraries.
* Creation and modification date/times.

Allow for different libraries that do roughly the same thing, such as multiple HTTP servers.

* Useful for more advanced users, who can figure out how to decide which to use.
* Lets you load just what you need instead of the whole thing, to avoid bloated final products.
* Allows and encourages competition and experimentation.

In addition to libraries, store named collections of libraries (and of collections). These have some of the same metadata as shown above. Collections are intended to be a set of libraries that work well together. You don't have to download everything in a library all at once. Also store other resources such as documents and applications.

Provide one special default collection, which has one of every kind of library, e.g. one HTTP server, etc.

* Useful for beginners, who don't want to have to choose between, say, multiple alternative HTTP servers.
* Provides a very quick way to answer certain criticisms of Lisp.
* A textbook could assume the presence of this collection, so that it can show sample programs that use features that live in libraries rather than the language itself.

There is a common user interface for accessing and searching the repository. Most of it is on a web site. Of course, it should be as easy to learn and use as possible. Some of it is for users looking up existing resources, allowing them to search by various criteria, including full-text search. The rest is for developers adding and modifying resources, and LAIR administrators.

Some software is needed inside Lisp, to actually connect to the server and bring over libraries, etc. Each Lisp implementation should have a function that starts up this software, named the same thing in all implementations. This could just be a small function that loads in the real client side, in case an implementor wants to keep the implementation as small as possible.

Keep a local cache (in the file system) of everything you've downloaded, so that you don't have to do it again. There's a little local database to keep track of what's in the cache. The client-server protocol has a fast way to specify a set of libraries and versions, and find out which are out of date, and there's an easy way to update them or flush them from the cache.

LAIR is highly-available. E.g., use mirrors, or a highly-available cloud.

What We Need
============

LAIR needs:

* Detailed design
* Coding and testing
* Storage for the repository iteslf
* Administrators

Initially, we must find the proper funding sources and appropriate volunteers.

Desire, LibCL

In case anyone still reads this forum and is interested in packaging, our friend Samium Gromoff announced Desire:
http://www.feelingofgreen.ru/shared/git/desire/doc/overview.txt

And earlier our friend Daniel Herring started LibCL:
http://libcl.com/

Two practical initiatives towards getting something done.

On the other hand, Sean Ross burned out and stopped mudballs, that required too much centralized maintenance on his part.

Has LAIR gotten anywhere?

Goals

I think you've set the goals a little too high: for many of the categories you've mentioned there are little or no libraries at all and would need to be developed from scratch(I18N, printing, web services), others (Meta-object protocol, Better module facility) would require basically to rewrite the entire language, making a new implementation(possibly not from scratch).
I think that we should start with more down-to-earth goals: networking and a few basic protocols(HTTP, SMTP, etc...), HTML, DB access, regexes, FFI and build from there.
Also, the package manager would have to be written from scratch, which is a non-trivial task.

Organization

I'd like to make a suggestion for the organization of LAIR. Draw on others' experiences of managing large library collections such as for Java and C/C++ to develop a generalized classification scheme for all possible libraries. (I'm thinking now in terms of systems like the ACM and AMS classification schemes for publications.) Then build the skeleton of the repository based on that scheme.

Each category in the scheme could hold multiple libraries or none. If there are more than one, as suggested, make one the default. For the non-default packages, I would add a description of what special features they may have versus the default. I would also implement features to aid user feedback and encourge people contribute to the development of the default package rather than starting anew on their own.

For the empty categories, I'd place a short description of what a library should do along with pointers to similar packages in Java or Python which could serve as a starting point for someone to create a CL package.

Keywords/Tags

I had just been thinking that having a set of tags (keywords) on each entry would be enough. That lets you do much of the same thing as what you're talking about, but it's more flexible than a hierarchy. And sometimes a hierarchy isn't right; you could imagine a library that belongs in more than one bucket, e.g. one library might be both a "numeric library" and a "graphics library".

There would have to be a central repository of tags, each with a description.

In both your proposal and mine, there's an issue of who can create a new tag. Only letting an administrator do it would be very tedious. Letting anyone do it could result in bad decisions. My inclination would be to let anyone do it, and provide a way for admins to fix things if someone does something inappropriate, not following guidelines, or whatever.

Bridges

It should probably include bridges to make the packages available as RPMs via yum, and as debs via apt-get. Then, instead of mucking around installing yet another package manager, the user could just configure their preferred download tool to pull from LAIR.

Bridges

Since Dan Weinreb mentions that he wants to support versioning, I think that indeed he should look not just at what other languages do, but also at what OS distributions do.

For instance, debian's aptitude does the hard NP-complete job of coming up with solutions to the constrained problem of finding a set of mutually-compatible versions of the various libraries.

In a completely different approach, that we may want to emulate instead, NixOS chooses the purely-functional attitude of packages as immutable values, and of compatible working-sets as other such immutable values, using symlinks, a specially hacked ldconfig. You can trivially upgrade packages without removing the old version, downgrade to a previous installation, and garbage-collect those installations that you don't use anymore (after testing).

Please consider debian apt as a valid example for LAIR design

When I started used Linux around 1995 there was a similar problem to solve. People used distributions to solve it but they where static solutions, you didn't update your system until the next distro cd was out. Then it came internet and Debian took advantage with apt and now it is a breeze to keep your machines nicely up to date, daily: the problem is not only to download and install a library but keep the system "young". Since we manage a good number of machines this is critical for us (and I think a lot of sysadmins will tell you the same). I think this could solve a lot of problems with CL systems.

Now I see the same problem in the Lisp community and I think that the very same solution could apply.

Debian has a nice know how on this problem. They have the hardware/software infrastructure and the social network needed to manage everything, and they proved it the last few years.

Of course there are other distros (a lot of them are debian based, e.g. Ubuntu), Redhat/Fedora (which we used previously), Suse, Gentoo (which uses a very interesting build-yourself scheme) with other package management systems (yum) But I think Debian is a nice example since they were first-movers.

And Common Lisp is nicely supported, there is a dedicated team of very talented people wich support it (http://alioth.debian.org/projects/pkg-common-lisp). Maybe their input could benefit greatly the LAIR project.

Are you learning from history?

What will you do different from previous attempts like CCLAN, CLOCC, clbuild, mudballs, etc?

The problem is not strictly technical -- or it would have been solved long ago.

There are social issues:
Who has the right to do what in the repository / namespace ?
How do you handle versioning?

Have you looked at how other languages do it?

Haskell Cabal / Hackage

(O)Caml Hump

Ruby Gems

Python Eggs

PLT modules

Chicken Eggs

etc.

What can we learn?

Is there anything from your list that you think we ought to be paying attention to, other than what's discussed in the proposal?

How to get it right this time

Fare, I don't think this is too similar to those previous efforts. However, I certainly do intend to seek out everyone who has tried any repository before, and learn from their experience. (If any such people are reading this, please contribute what you've learned!)

Versioning is extremely important. We need to make sure that version A of a library comes with all the appropriate versions of the libraries it depends on. All this must be in the metadata. Ideally, the metadata should be able to express, between version A of library X and version B of library Y, any of "they do work together", "they don't work together", and "we don't know yet whether they work together". If necessary, we could allow multiple people to opine on the subject and have more than one such assertion.

I definitely want to look at how other languages do it. I was starting to do so, until the work for the conference got heavy enough that I had to put it all off.

If you know me well, you know that my first approach to any problem is to see if anyone has already done it before me. It's so much easier to borrow someone else's good ideas than to try to come up with them yourself, especially in the Lisp community where there are so many smart people. So I totally agree with your general point as well as the specifics.

Re: How to get it right this time

Quick follow-up, I have to write a bit more on this...

If you look at what Perl has done with CPAN circa 1996 you see that rather than try to do it right, they just did it. Where "did it" is to provide a single repository replicated to local sites, a loose namespace organizing the modules, a (semi-)standard metadata requirement for each module including dependencies (but not version), and a simple mechanism for interactively installing packages. Much later others (iirc Michael Schwern) stepped forward to run build and quality checks. Iterative refinements. The continuous build worked, the "kwality kontrol" more or less didn't. This is where CL is with cliki, except there are several sites none entirely authoritative and I won't go into the defsystem/asdf mess because Fare has it covered and it's too deep a hole to cover in this comment. Not imposing much structure had the positive effect of lowering the bar for contribution to the repository and the obvious problem of uneven quality and compatibility. It's up to the individual end developer to verify that things work, which they will have to do anyway, so you want it to be easy to do a few common things: manage multiple versions- install, upgrade, use, remove; send a report to a quality rating service; perform a superficial dependency check; perform an exhaustive dependency check. Ideally, the last two should be available *before* installation (at least the superficial check), maybe as part of the rating, so a developer can make a decision early.

Let's just do it

I've some spare space and spare bandwidth where I keep my tiny site. I'd be happy to initially host LAIR. Let's just get the stuff out there, and use one of the variations of the open source development model.

Ease of submission

You're quite right that it ought to be designed so that it's as easy as possible to submit a new library. Metadata should be optional; you should be able to go back and put it in later. Whatever else we can do to make it very easy to add a library to LAIR should be given great attention.

re: Ease of submission

One of the great aspects of CPAN is that (almost) every module's kit is built the same way and unpacks the same way. This trivial series of commands:
h2xs -A -n Foo::Quux
cd Foo-Quux
perl Makefile.PL
make dist

is enough to generate a "CPAN compatible" module and is sufficiently feature rich and easy that developers *use it*. When trying to release manage a large (i.e., hundreds of modules) Perl environment the importance of this consistency cannot be underestimated: if nothing else a release engineer knows that this series of (potentially scripted) commands:
tar zxvf Foo-Quux-0.01.tar.gz
cd Foo-Quux-0.01
perl Makefile.PL
make
make test
make install

will build and install the module.

The (again, trivial) command

perl -MCPAN -e 'install Foo::Quux'

will do the same thing, but there are times when CPAN's not readily available (from, say, a production environment) so the ability to work with the kit by hand is invaluable.

Providing a skeleton for the library turns out to be something of a win as well. Consider that the version of most modules (including the example above) can be determined by:

perl -MFoo::Quux -le 'print $Foo::Quux::VERSION'

This is both banal and (gosh darn it!) extremely useful.

Although not directly related to packaging, the bundling of command line accessible documentation with the module is a real win. The quality of documentation provided with existing CL libraries is often very good (i'd hold elephant up as an example of top notch docs) but it'd be even more useful if one could type "cldoc elephant" from the command line (or emacs, &c.).

You state that LAIR is

You state that LAIR is "all-inclusive" in the sense of providing this service for all members of the Lisp family, including Scheme and Dylan and Clojure.

This could really simplify the work to write well-documented components as the API is in most cases very similar. And in cases where it is not there is now a trigger to discuss how to overcome this situation.

In the Dylan Wiki I gathered under the wiki item Category Engineering links to academic papers which might give inspirations to LAIR component designers. Or perhaps these links can inspire a cooperation with a specific academic group and the LISP community.

After the ILC2009 I would welcome a process description of how a work process with LAIR will work.

Wish you a very successful ILC2009.

Peter Robisch

All-inclusive

Thank you! I'd love to hear more about this.

Believe it or not, at ILC2009 there was a presentation about a tool that translates from Java to Common Lisp. It doesn't just put out Common Lisp that looks like assembly language. Rather, it uses sophisticated techniques to produce Common Lisp code such as a real programmer would write it!

It's not perfect by any means, but if we want to take existing Java libraries and translate them to Common Lisp, this tool would make it at least ten times easier. There is a huge number of Java libraries, many of them specific to particular areas, and this could be the best way to bring them into the world of Common Lisp, when they're needed.

The talk, by Tiago Maduro Dias, was extremely impressive (both in its contents and its clarify). The system is called JNIL. It still has some limitations, which are being worked on.

Python libs

Yeah, the talk about Jnil was inspiring!

For CLPython I've focused on making Python modules work run in a Python runtime with complete Python semantics. Jnil suggests a different approach: converting Python modules into "pragmatically correct" Lisp code without worrying about hairy semantics. I'll look into that.