Database recovered from backup - data intact

Hi all,

We had a critical incident with the main system disk of our SeaFile server, which has rendered its contents unrecoverable. We have managed to recover the contents from this disk from a backup, but this backup is four months old.

Our seafile-data directory is on a different physical volume, and is intact (of this volume, we also have daily backups). Due to an oversight of our old systems engineer, the mysql database, however, was located on the system disk, not on the daily backed up volume.

So now I have the intact data as it was yesterday when the server was operating normally, but a database version from December 2017. Is there a way to fix/reconstruct the database based on the actual files in the seafile-data directory? Or is there an alternate proposal on how to proceed to get the system back operational? I read on the documentation about the seaf-fsck tool, but am unsure, whether this tool is fit for such a major operation.

Kind regards,
Sam

Dear community,

Iā€™m still pretty much brickwalled by this issue. I have tried on a clone server the seaf-fsck.sh utility, but it does not seem to do much anything, since the filesystem is, in fact, in a good state.

I have tested a procedure to export a library with the seaf-fsck-utility and I can find any files missing from the index there. However, re-introducing them to the system does not seem to be so easy. I can use the seaf-import.sh utility to re-create the library, but that creates a completely new library, losing any history, sharing options, etc. which are still mostly correct in the library version in the database. I could write a script to perform this operation massively, but this seems drastic as in the case of most libraries, itā€™s only a minority of the content that is new or has been modified in the last four months.

Any ideas would be appreciated!

Kind regards,
Sam

The database can be mostly reconstructed but itā€™ll be quite some work.

You need to find out the most recent ā€˜commit idā€™ for every library and write that one in the database. In case some users were added since you also need to add them again and link the libraries via database.

The most recent commit quite likely is the most recent file in seafile-data/storage/commits/library-id

Thanks for the reply shooper,

I have found a way to locate the most recent commit in seafile-data/storage/commits/[id]/

The only logical place in the database seems to me ā€˜head_idā€™ field on the ā€˜RepoSizeā€™ table in the ā€˜seafiledbā€™ database. By comparing the existing values to those in the seafile-data -directory I also figured that the correct commit ID as referenced in the database is the directory name (always two hexadeximals) and the filename concatenated to each other.

However, I tried to set this to the correct value on several libraries of my own in the database, but when I browse the libaries in the web interface, I still donā€™t see any of the missing files committed later than the database backup Iā€™m working on. I also tried to restart the seafile server, but to no avail.

Is there another table, where I should also change these commit idā€™s?

Did you also restart memcached / delete your cache?

Tried that, but it had seemingly no effect. Iā€™m still only seeing files listed, which existed in the repository on the date of the database backup.

So having a closer look I think this is the wrong table. The library size is calculated with a background task and in that table the commit id likely refers to the commit up to which the library size has been calculated.

I think the head commit needs to be in the seafile-db.branch table.

Edit: Iā€™ve looked through all the tables and am very certain of this being the correct table. RepoSize should not be edited, so that the backround task can calculate the new library size and number of files.

seafile-db.RepoOwner contains the library owner (in case you have libraries that were created after the last SQL backup. Such libraries also need to be added to at least seafile-db.Branch, seafile-db.Repo, seafile-db.RepoHead. Most likely there are even more locations Iā€™m not sure whether some of the other tables automatically get an entry e.g. when the file size background job runs or not.

Share links are located at seahub-db.share_fileshare in case you lost important share links Iā€™d suggest to add a new share link for the file / folder you previously shared, go to the table and add replace the token for the new share link with the old one. For upload links the table is seahub-db.share_uploadlinkshare.

Hope this helps.

Hi shoeper, Many thanks for the assistance. Using the Branches table, I had success in recovering the correct repo state. I subsequently wrote a script which compares each repository in the Branches table, checks the most recent commit id from the filesystem and updates it if they donā€™t match, and I seem to have recovered all the existing libraries.

Iā€™m now busy scripting identification of new libraries which are not in the database and deleted libraries, which still exist in the database, but no longer in the filesystem. The deleted part is pretty straightforward to script: if a repo has a record in the Repo table, but there is no matching folder in seafile-data/storage/commits/, the repo has been permanently deleted and any references to it can be safely deleted from the database.

Where the new libraries are concerned, I noticed that the commit files are in JSON format and contain the user id of the last person who made a commit, so I guess the best I can do is to assume that the person who made the last commit is the owner, and then fix manually based on complaints, where this assumption is not correct. Or is there a better way to figure out, who the owner of the library was?

You spoke of a background process which should verify and update the RepoSize and RepoFileCount tables. My server has now been running closer to 18 hours after I fixed the Branch commit idā€™s, but I donā€™t see these tables updating. Is there some way to manually launch this check?

Depending on how often libraries change the owner within our organization that might vary. Iā€™d think the owner most likely creates the first commit. So maybe you can just use the oldest commit to determine the owner.

Regarding ā€œthe realā€ current owner: I think it is only being stored in the database, so you can only make some assumptions and implement an owner mapping based on those (e.g. first commit is the owner, user with most commits is the owner (possibly time consuming), last commit is the owner (likely fastest, for shared repositories this could be often wrong), set yourself as the owner - look at the files and transfer the library manually (only suitable for a few libraries, advantage would be that it is less likely that the wrong user gets access to data it shouldnā€™t have), ā€¦).

It most likely is being triggered when new data is being added to the library. I donā€™t know if there is a manual way. I can check whether I find something, later.

ā€“

I could be helpful for others in the future if you share your scripts.

I have got the server back running, thank you very much for the assistance. Here are the scripts I used to reconstruct the database.

The script to find the latest commits and to delete references to non-existent (deleted) libraries;

resync-seafile-database.php:

 #!/usr/bin/php
<?php

/*
 *  @File resync-seafile-database
 *  @Author skauranne@naturalsciences.be
 *  @Created 2018-04-12
 *  Released to public domain without any kind of warranty or liability.
 *  Always make sure to dump a backup of your database before attempting a re-sync.
 *  This script is intended for re-synchronizing a backed up version of the
 *  Seafile v6.x database with a more recent state of the filesystem.
 *  Use of this script in other circumstances or with incorrect configuration
 *  may damage your Seafile installation.
 */

echo("\nAttempting to re-synchronize the seafile database with filesystem...\n");
echo("Connection to Seafile database...\n");

$basePath = "/opt/seafile/seafile-data/storage/commits/";
$dbUser = "USERNAME";
$dbPass = "PASSWORD";
$dbTables = ["Branch","Repo","RepoFileCount","RepoGroup","RepoHead","RepoHistoryLimit","RepoOwner","RepoSize","RepoValidSince"];

$db = new PDO("mysql:host=127.0.0.1;dbname=seafiledb", $dbUser, $dbPass);

if(!$db)
   die("Failed! Terminating...\n");

// Get the list of repositories
$repos = $db->query("SELECT * FROM Branch");
echo($repos->rowCount()." repositories found in the database.\n");


$updated=$skipped=$notExist=$errors=0;

// Loop through the repositories in the database
while($repo = $repos->fetch(PDO::FETCH_OBJ)) {
   echo('Repository '.$repo->repo_id.':');

   // Look for a corresponding directory in the filesystem
   $repoPath = $basePath .  $repo->repo_id;
   if(!is_dir($repoPath)) {
       $notExist++;
       echo("Not found on the filesystem, deleting from database.\n");
       foreach($dbTables as $table) {
         $db->exec("DELETE FROM $table WHERE repo_id='{$repo->repo_id}'");
       }
       continue;
   }

   // find the most recent commit file in the filesystem
   $commitMeta=explode(' ', exec("find $repoPath -type f -print0 | xargs -0 stat --format '%Y :%y %n' | sort -nr | cut -d: -f2- 2>/dev/null | head -n 1"));
   $commitId = str_replace('/', '', str_replace($repoPath, '', $commitMeta[3]));

   // Database shows the same commit id
   if($commitId == $repo->commit_id) {
       echo("Database up to date!\n");
       $skipped++;
   }
   // Database shows a different commit id, assume filesystem is correct
   else {
       echo("Last commit on filesystem on $commitMeta[0] at ".substr($commitMeta[1],0,8).", updating database...");
       if(0 < $db->exec("UPDATE Branch SET commit_id='$commitId' WHERE repo_id='{$repo->repo_id}'")) {
           echo("Success!\n");
    $updated++;
    }
    else {
        echo("Failed!\n");
        $errors++;
    }
}

 }

echo("Finished!\n" .
    "$notExist references to non-existent repositories deleted from the database.\n" .
    "$updated existing repositories updated, $skipped skipped, $errors errors encountered.\n\n");

And the script to find orphaned libraries and to add the new ones to the database:

find-seafile-fs-orphans.php

#!/usr/bin/php
<?php

/*
 * @File find-seafile-fs-orphans
 * @Author skauranne@naturalsciences.be
 * @Created 2018-04-12
 * This script has been released to public domain without any kind of warranty.
 * This script is intended to search for libraries in the physical filesystem
 * which are newer than a certain date, and to create the necessary database
 * entries to make these available to seafile in a scenario where the database
 * has been recovered from an older backup. Using this script in a different
 * scenario might damage your database.
 * Always make a backup dump of your Seafile database before launching this script.
 */

echo("\nSearching for orphaned libraries in the filesystem...\n");
echo("Connection to Seafile database...\n");

$basePath = "/opt/seafile/seafile-data/storage/commits/";
$dbUser = "USERNAME";
$dbPass = "PASSWORD";
// Time of the backup point to decide, if a library was created before or after
$backupTime = strtotime('YYYY-MM-DD HH:MM:SS +HHMM'); 
$db = new PDO("mysql:host=127.0.0.1;dbname=seafiledb", $dbUser, $dbPass);

if(!$db)
   die("Failed! Terminating...\n");

// Get the list of repositories in the filesystem
$repos = scandir($basePath);
echo( count($repos)-2 . " repositories found in the filesystem.\n");


$found=$notFoundOld=$notFoundNew=$created=$errors=0;

// Loop through the repositories
foreach($repos as $repoId) {
   $repoPath = $basePath .  $repoId;
   if($repoId=='.' || $repoId=='..') {
       continue;
   }

   // Check if the database has an entry for this Repo Id.
   $repoDb = $db->query("SELECT * FROM Repo WHERE repo_id='$repoId'");

   // No match found
   if($repoDb->rowCount()==0) { // No match
       echo("Repository $repoId not found in the database!\n");

       // Get the newest commit on the filesystem
       $newestCommitMeta=explode(' ', exec("find $repoPath -type f -print0 | xargs -0 stat --format '%Y :%y %n' | sort -nr | cut -d: -f2- 2>/dev/null | head -n 1"));
       $newestCommitTime = $newestCommitMeta[0] .' '. substr($newestCommitMeta[1],0,8);
       $newestCommitId = str_replace('/', '', str_replace($repoPath, '', $newestCommitMeta[3]));

       // Get the oldest commit from the filesystem
       $oldestCommitMeta=explode(' ', exec("find $repoPath -type f -print0 | xargs -0 stat --format '%Y :%y %n' | sort -n | cut -d: -f2- 2>/dev/null | head -n 1"));
       $oldestCommitTime = $newestCommitMeta[0] .' '. substr($newestCommitMeta[1],0,8);
       $oldestCommit = json_decode(file_get_contents($oldestCommitMeta[3]));
       $owner = $oldestCommit->creator_name;

       // Check if the oldest commit took place after the time the database originates from (== new library)
       if(strtotime($oldestCommitTime) > $backupTime) {
          echo("New library: Created on $oldestCommitTime by $owner\n");
          $notFoundNew++;
          echo("Writing database entries...");
          if(1 > $db->exec("INSERT INTO Repo (repo_id) VALUES ('$repoId')")) {
               $errors++;
               echo("Error creating Repo record $repoId, aborting!\n");
          }
          elseif(1 > $db->exec("INSERT INTO RepoOwner (repo_id, owner_id) VALUES ('$repoId', '$owner')")) {
               $errors++;
               echo("Error setting owner $owner to $repoId, aborting!\n");
          }
          elseif(1 > $db->exec("INSERT INTO RepoHead (repo_id, branch_name) VALUES ('$repoId', 'master')")) {
               $errors++;
               echo("Error setting Branch 'master' to $repoId, aborting!\n");
          }
          elseif(1 > $db->exec("INSERT INTO Branch (repo_id, commit_id, name) VALUES ('$repoId', '$newestCommitId', 'master')")) {
               $errors++;
               echo("Error setting latest commit $newestCommitId to $repoId branch master, aborting!\n");
          }
          else {
               $created++;
               echo("Success!\n\n");
          }
       }
       else {
          echo("Ignoring deleted library created on $oldestCommitTime\n\n");
          $notFoundOld++;
       }
   }
   // Match in the db, everything ok.
   else {
       $found++;
   }

}

echo("Finished!\n" .
     "$notFoundNew New repositories with no record in the database found.\n" .
     "$created Records successfully created in the database, $errors database errors encountered.\n" .
     "$notFoundOld Deleted repositories with no record in the database ignored.\n" .
     "$found repositories OK. \n\n");
5 Likes

Great, that you got it working. Thank you for sharing your work!

Regarding the repo size calculation. An update is being queued at different points in the code. It should be updated the next time a library is changed (for that specific library). I couldnā€™t find an easy way how to trigger it manually, but think it is only cosmetic, so there shouldnā€™t be any negative consequences. The easiest would likely be to automatically add and remove a file to all libraries via API.

Here are the places where the repo size calculation is queued and processed.

1 Like