Support This Project

Collected Notes on Debian Packages

For the latest jump down to "A Logical Approach".

Tags

DebTags are facetted classification system for debian packages. 1

Debian Package Browser

The Debian Package Browser seems to be an effort to collect tagged classifications of packages using a web browser interface.

The EDOS project has generated some graphs showing package dependencies in debian.

Lab Notes

Since these are lab notes they record a number of false starts. Well suppositions that once explored were found to be incorrect. Thus to short circuit a lot of confusion is makes sence to jump down to the last paragraph in each section.

Files and Their Contents

The following files contain information about packages.

It seesm that dselect makes a summary of installed files in available and status. Experiements on my system have led me to the following conclusions:

To substantiate my last point consider the following:

Missing dependency for libmagick6: libltdl3 (>= 1.5.2-2)
  available version is: libltdl3 1.5.6-6
    version is not in target installed install
    version is in status all available

Checking that my program is working correctly here are the entries from status. Incidencly there is full entry for libmagick6 in both status and also in available.

Package: libmagick6
...
Depends: libbz2-1.0, libc6 (>= 2.3.2.ds1-21), libdps1 (>> 4.1.0),
 libfreetype6 (>= 2.1.5-1), libice6 | xlibs (>> 4.1.0),
 libjasper-1.701-1 (>= 1.701.0), libjpeg62, liblcms1 (>= 1.08-1),
 libltdl3 (>= 1.5.2-2), libpng12-0 (>= 1.2.8rel), libsm6 | xlibs (>> 4.1.0),
 libtiff4, libx11-6 | xlibs (>> 4.1.0), libxext6 | xlibs (>> 4.1.0),
 libxml2 (>= 2.6.19), libxt6 | xlibs (>> 4.1.0), zlib1g (>= 1:1.2.1)
...
Status: install ok installed

Package: libltdl3
Status: purge ok not-installed
Priority: optional
Section: libs

So there is library that libmagick6 is dependent on which is not installed. So how can this occur?

This came about because multiple versions of some packages exist within the union of the descriptions in available and status. My program was updated to allow multiple package versions and when this was done it stopped complaining about unsupported packages.

Provides Field

I should also mention that the program first accounts for the provides field before accounting for dependencies.

  def pr_closure(target)
    pr_target = Set.new
    for x in target
      for p in x.provides do
        pr_target.add(p)
      end
      pr_target.add(x)
    end
  end

Currently there are a large number of virtual packages which are being extracted. The present program architecture treats these as seperate packages. Since the empty version string comes before any other version string any version of package x will satisfy any dependency the virtual package x.

The status file currently contains descriptions of packages without version information. It seems that, within the status file, every installed package contains a full description while uninstalled packages have minimal descriptions that don't contain a version.

The formalisation of the provides field has changed. It was realised that provides in fact only makes mention of virtual packages and that virtual packages do not provide other packages. As a result the provides field can be accounted for the definition of packages matching a dependency and packages causes in conflict. This has been done.

Next Questions

    def step(depends, current_solution, solutions)
      while ! depends.empty do
        x = depends.pop
        if ! current_solution.includes(x) then
          for y in solutions_to(x)
            current_solution.push(y)
            for z in y.depends do depends.push(z) end
            step(depends.copy, current_solution, solutions)
            current_solution.pop(y)
          end
          break
        end
      end
      if depends.empty then
        solutions.push(solution)
      end
    end

This algorithm seems a bit wrong. The algorithm needs to handle the case that dependencies later on in the list may be solved by adding dependencies from earlier in the list. The algorithm might actually be clearer when formulated using sets and the matches function defined in the program I've been writing.

  def solve_ce(ce: CExpr, ps: Package Set, solutions: Package Set Array)
    if not ce.is_empty then
      de = ce.pop
      for i in 1..de.size do
        if ps.satisfies(de[i]) then
          ps.checkpoint(ps, ce) { |ps, ce|
            solve_ce(ce, ps, solutions)
          }
        else
          for p in de[i].matching_packages do
            if not conflicts(p, ps) then
              checkpoint(ps, ce) { |ps, ce|
                ps.insert(p)
                ce.append(p.depends)
                solve_ce(ce, ps, solutions)
              }
            end
          end
        end
      end
    else
      solutions.push(ps.dup)
    end
  end

This is a rewrite of the algorithm. It uses a function checkpoint that takes a block and restores, on completion of the block, ps and ce to the values they had before the block was executed. You would get that for free in a functional language, but the aggregation of solutions would potentially be ugly (unless you passed around a reference).

The algorithm is interesting in the following ways: (i) it is basically doing the backtracking algorithm that can be used to write a prolog interpreter (but without having to do subsitution and that stuff). (ii) it is a recursive algorithm that has been written into the OO paradygm.

It might seem that the checkpoints can be removed by strategic removes. However one must be aware that remove is not exactly the reverse of insert. It misses the case of when p is already a member of ps.

    s = []

    if not ps.is_member(p) then
      ps.insert(p)
      s.push(p)
    end

    ... # some stuff

    s.each { |p| ps.remove(p) }
    s = []

It is interesting that this functionality can be provided by a simple wrapper over set.

  class SetCheckPointProvidesInsert < 'a InsertEnumerable
    include SetWrapperForInsertOnly

    def initialize(ps: 'a Set)
      @ps = ps
      @insert_stack = []
    end
    def insert(x: 'a)
      if not @ps.is_member(x) then
        insert_stack.push(x)
        @ps.insert(x)
      end
    end
    def rollback
      @insert_stack.each { |x| ps.remove(x) }
    end
  end

Doing a wrapper that also supports remove is perhaps a little more tricky because two sets need to be maintained, a set of elements to remove and a set of element to insert. For completeness however such a wrapper should probably be extended to use this functionality.

It is an interesting question in passing whether the wide range of set functions should be reduced to functions calling: insert, remove, and each. Doing such a thing makes writing wrappers easier but undoubtedly looses a lot in of functionality. Another possibility is to have a wrapper which can produce a full set interface from insert, remove and each but to implement basic sets as efficiently as possible. This means that deligation after override some methods doesn't work and instead the wrapper should include in its interface the transformer.

In practice since we don't have a working version of RJL yet (just code fragements at the moment of the type seen above) the algorithms are going to have to be implemented in C++. Checkpoints will be removed from the solve_ce by doing the required house keeping within the function itself.

Matching Conflicts

Packages commonly conflict with themseleves. This means that testing for a conflict is somewhat involved.

  Conlicts(p) \sqsubseteq Pr(P \setminus p)

This is somewhat difficult to achieve and it requires some argument that

  Pr(P \setminus p) = Pr(P) \setminus Pr(\{p\})

In fact this is not the case because a virtual package may be supported by multiple elements in P.

The difficulty with the first expression is that it is complex to compute. It could be incrementally computed in the following way:

  Pr(P \setminus p) = Pr(P) \setminus
    \{ p in Provides(p) |
         \neg \exists q \in P: q \not= p \and p \in provides(q)
    \}

Since the inverse of the provides relationship is stored this is not very expensive to compute. Just a bit laborious.

It should be noted that the program currently allows disjunctive expressions in the provides field for packages. But of course this isn't done in practice and its meaning would be very suspicious anyway.

Examples of Interaction

The following example demonstrates that a particular package cannot be installed. apt-get has given up and reported an error.

  rcole@g318-8948:fca_debian$ sudo apt-get install kdevelop3
  Password:
  Reading package lists... Done
  Building dependency tree... Done
  Some packages could not be installed. This may mean that you have
  requested an impossible situation or if you are using the unstable
  distribution that some required packages have not yet been created
  or been moved out of Incoming.
 
  Since you only requested a single operation it is extremely likely that
  the package is simply not installable and a bug report against
  that package should be filed.
  The following information may help to resolve the situation:
 
  The following packages have unmet dependencies:
    kdevelop3: Depends: kdelibs4 (>= 4:3.3.2-6) but it is not going to be
                        installed
               Depends: kdebase-bin but it is not going to be installed
               Depends: kdevelop3-plugins but it is not going to be installed
  E: Broken packages

Dselect on the other hand marks a number of packages for retrieval but still has problems.

Some Generate Results (More Questions)

The algorithm described above was implemented to search for solutions. The exact details can be seen in the file package_analyser.cpp:1.5 in the fca_debian directory.

The algorithm doesn't work because it is not recording adequately solved and unsolved dependencies. The problem stems from there being two tasks: (i) solving the current disjunctive expression, and (ii) solving all the disjunctive expressions currently on the queue.

Currently the variable result is recording only whether or not a full solution was found, not whether or not the current dependency could be solved.

The program at once needs to iterate through packages matching a disjunctive expression and at the same time aggregate the solutions for conjunctive expressions.

  def solve_ce(cs: DExpr Seq, ps: Package Set)
    if ! cs.empty then
      ds = cs.pop_front
      for es in ds do
        if ps.satifies(es) then
          for p in matches(es) do
            if ! ps.is_member(p) && ! ps.conflicts(p) then
              ps.checkpoint {
                ps.insert(p)
                result = result || (solve_cs(p.requires, ps) && solve_cs(cs, ps))
              }
            end
          end
        end
      end
    end
  end

Random Ideas

It may be of use to write a program that constructs a graph showing the debian package conflicts.

TODO

A Logical Approach

I realised that the problem can be formulated in propositional logic and then becomes the problem of finding a truth assignment function that makes a logical statement true. This is a well studied problem in computer science and there is an open source solver called miniSat that I'm going to look into using.

The following code will output logical formula from a given target. First we calculate the packages required and then we dump the logical formulae.

  def search(cs: CExpr, ps: PackageSet)
    while ! cs.empty do
      x = cs.pop
      for y in matches(x)
        if ! ps.is_member(y) then
          cs.append(y.depends)
          ps.insert(y)
        end
      end
    end
  end

  def dump(ps: PackageSet)
    for p in ps do
      ds = matches(p.depends)
      if ! ds.empty then
        puts "#{p} -> #{ds.to_a.map{|x|.to_s}.join(" + ")}"
      end

      for x in conflicts(p) do
        puts "~#{p} + ~#{x}"
      end
    end
  end

The output here is not in clausal form. Some simple rearrangement of implication is required to generate clausal form.

  a -> b + c === ~a + (b + c) === ~a + b + c

In fact as one can see it is very simple indeed as implications are already in a disjunctive form.

The following is an example of CNF (Clause Normal Form) format.

  c This is a comment
  c so is this
  p cnf 4 3
  1 2
  -2 -3
  2 4

Lines may be comments (lines starting with a c), a header (lines starting with a p) or a disjunction of literals. A literal is either a variable or the negation of a variable. Literals are denoted by integers: the absolute value indicates the variable and the sign indicates whether or not it is negated. The header line indicates the format (cnf), the number of variables (4), and the number of clauses (3) in that order.

SAT solvers attempt to find a truth assignment function (a mapping from the variables to true or false) that makes the logical formula true. In general the problem has been shown to be NP-hard, but in a wide variety of practical cases the solvers are tractable.

However using a solver on the clauses produced for kdelibs4 prodcued the following: "Trival Problem. Unsatisfiable". Our purpose is to provide an explaination to the user for why the problem is unstaisfiable.

Berhard Ganter has already explored the links between clauses and FCA. So this is where we turn now for inspiration.

I asked myself "what sort of answer do you want from the system" and in turn "what is the question", the question is "why can't i install this package" and the answer is because "these requirements are in conflict".

Take for example the wff for kdelibs this is a conjunction of wffs. Some sets of these wffs are in conflict. Let the wff for kdelibs be $ABC$ (using boolean algebra here) then it may be the case that $A and B$ conflict, in that case any set of requirements including $A$ and $B$ will conflict, i.e order filters of the powerset conflict. That is potentially important because it is linked to the theory of distributive lattices. Alternatively consider that only $ABC$ conflicts, not subsets of it (e.g. $AB$). This is possible because each of $A$, $B$, and $C$ are disjunctive expressions that may interfer with each other so to speak.

So the answer for "install kdelibs" could be "kdelibs depends on xlibs-base which is uninstallable.", or "kdelibs has conflicting dependencies: 'xlibs-base | xlibs-tools' conflicts with 'gilium'".

A term such as 'xlibs-base | xlibs-tools' can be unfolded to insert 'xlibs-base' and 'xlibs-tools'. Each of these would conflict with 'gilium' in the above example.

The following algorithm computes minimal sets of conflicting packages.

  def search(A: Int Set, M: Int Set, context: Context, next_object: Int Ref)
    for m in M - A do  # - means set minus [1]
      if conflicts(A + {m}) then # + is union, -
        context.insert_object_intent(next_object.value, A + {m})
        next_object.value += 1 # note RJL is pass by ref [2]
      else
        search(A + {m}, M, context)
      end
    end
  end

class 'a Ref

def value: 'a = nil deligate to value deligate to ObjectInstance

end

The call to conflicts is a call to a SAT solver with "A + {m}" as a target.

Finding the bound between SAT and UNSAT

  s_plus: PowerSet  = [ {} ]
  s_minus: PowerSet = [ D ]
  e_plus: PowerSet  = []
  e_minus: PowerSet = []

  s_plus_finished = false
  s_minus_finished = false

  while ! s_plus_finished || ! s_minus_finished do
    if (x = choose(s_plus, e_plus)) != nil then
      for d in D.minus(x) do
        xd = x.dup.insert(d)
        if sat(xd) then
          s_plus.insert(xd)
        else
          s_minus.insert(xd)
        end
      end
      e_plus.insert(x)
    else
      s_plus_finished = true
    end
 
    if (x = choose(s_minus, e_minus)) != nil then
      for d in x do
        xd = x.dup.remove(d)
        if sat(xd) then
          s_plus.insert(xd)
        else
          s_minus.insert(xd)
        end
      end
      e_minus.insert(x)
    else
      s_minus_finished = true
    end
  end
 

The test for SAT can make use of s_plus and s_minus to make its decisions.

  def sat(x)
    if s_plus.contains_superset_of(x) then
      return true
    elsif s_minus.contains_subset_of(x) then
      return false
    else
      return solver.is_sat(x)
    end
  end

It may be that S_plus and S_minus become very large.

SAT Solvers

The values in a clause are tri-state, +1, 0, or -1.