Drupal: Enterprise Edition

07.11.2006

The title is an oxymoron, and half a joke. However, it seems that at times, websites require a very strict process of development. Typically the flow is development (either the developers localhost, or a communal crap site), to staging (where there's testing) to production. I'm going to go out on a limb here, and declare that drupal handles this kind of enviroment HORRENDOUSLY.

Today, I attempted to briefly map out how the hell one was supposed to manage updating a site's configuration without changing user generated nodes, comments, files, terms, accounts, or any other thing "non-configuration setting". This was my recommendation:

Tables to leave alone on transfer from staging to livesite:

  1. Accesslog -- likely to grow to as large as 100+ megabytes depending on how long we need to store data.
  2. Cache -- We should empty this table on both sites every time we transfer to and fro. Otherwise we'll give this table an opprotunity to be a trouble maker (especially once production goes full live -- we'll want to depend on cache for anonymous page views). 
  3. Comments -- no touch. 
  4. file_revisions, files -- no touch 
  5. forums -- leave alone for now. Fundementally, forums are nothing more than nodes with a taxonomy term -- the module simply processes the data in a special way. So, in short, the actual backend process for creating a new forum area on the live site is the same as when user Joan Doe shares her story. Risk factor = if it is a risk, than the entire site is f#cked. 
  6. Karma_objects, Karma_users, Karma Ratings -- since node's are going to be left untouched, we should leave this one untouched.
  7. All node fields should be left untouched except: a. node_type - the istings of types of custom content b. node_field_instance - individual field controls per node type c. node_field - global settings for individual fields New node types will carry over without a hitch -- minus the development/testing nodes (in theory).
  8. Profile values -- we'll want to transfer profile fields, but profile values is another story -- if we add new required field, its worth noting that drupal will grandfather users in who joined before the field existed -- until they choose to edit their profile.
  9. Search_index, search_total, search_dataset -- since we aren't carrying over changes from the node values fields, we'll want to leave these guys alone. If we create a test node, the search index will index it -- which could potentially result in an error.
  10. Sequences -- unfortunately, we'll need to update certain rows on this table, while leaving the rest alone. At the moment, we should update changes to only two rows, a. menu_mid -- for new menu items we might add, and b. view_view_vid -- views is the module that builds dynamic pages -- we may need to add new views in the coming future. 11. Sessions -- this table needs to be emptied. It stores the IP address of logged in users, basically telling drupal if you see this IP address , consider this user logged in. It also remembers what items the user should see in cached form. So, since we're clearing out cache at updates, we'll need to clear this table as well. Failure to do so could result in a a nasty error. 
  11. All taxonomy tables: term_data, term_hierarchy, term_node, term_relation, term_synonym, vocabulary, vocabulary_node_types -- shall be left untouched.
  12. all troll tables should be left untouched
  13. url_alias -- transferring changes from this table will result in error since we're leaving the node data tables untouched.
  14.  user, user_roles -- leave this untouched.

 

The first tests of this system worked on the database end. For now, it appears to have carried over configuration settings from dev, to staging without causing problems, and leaving existing content items (CCK, Views, and ALL) intact. But, I'm hardly celebrating. 

There are still issues with themes, modules, etc. And while SVN is the obvious choice -- I'll say this much: an effective implementation is going to be one hell of a pain in the ass -- and everyone using it will likely crack a couple of jokes about "TPS reports" in the process.

Yet, I feel this is a circumstance, and weakness that drupal can grow out of, and indeed MUST grow out of. But right now, I must say that when asked what is the best way is to implement a development flow for a large scale drupal site with multiple teams of developers having to coexist with teams of staff adding content , my answer is one of silence, and frankly -- embarassment. 

I'm a solid drupal developer -- and nightmarishly complex database transfers, and organized systematic updates to a site's code is not my realm; I've usually felt it wasn't my problem -- and yet, increasingly, I've found that is actually my biggest problem. And unfortunately, solving the problem is going to be BORING work that I could never imagine someone doing for "free". Ew, building a prepackaged multisite versiontracking database updating system that is required for most large scale websites. Personally, I'd rather watch mud dry and crack.

And yet a fear that sometimes keeps me up at night is this: will open source fail because the incentive for solving "boring" problems is somewhere between low to null? Will open source loose because proprietary platforms have the one resource (starts with an M, and is green in America) that seems to motivate people to spend hours, solving "gouge out my eye god I'm bored" kinds of problems? 

Those questions ought to be rhetorical -- but I'm afraid that in this case, they are anything but. Frankly, this is why there needs to be a drupal foundation.

Comments

This reminds me...

...of zope zclasses.

In zope, you could create zclasses, something like Drupal CCK. You could also write filesystem based "products" (like Drupal modules). Soon zclasses became a maintenance nightmare : the definition was stored in the zope database, the data in another part of the zope db, and custom code yould be added in the filesystem or zope. Then they invented an import/export system (zexp files), and then a syncer (zsyncer) to allow synchronisation between a dev and a staging server. This seems like an awfull reinvention of the wheel.

If development generates data inside a database, you are stuck.

I think that if development is filesystem based, it will allow versioning and staging / development techniques.

I also hope that CCK will allow definitions stored in plain php files.

can sequence ranges help with dev-staging-production?

One idea: after setting up production (or during a scheduled downtime), reset some of your production sequence numbers to something very high, such as 0x10000000. So, records created on production by end users will effectively have some high-ish bit set.

On your staging box, you would keep your normal low-numbered sequences, so that any data created on staging during editing or testing won't conflict with data created on production.

For the data records you need to transfer from staging to production, only transfer records with id less than 0x10000000. Plus, you would only bother to do this trick for those tables that need transferring, as outlined in your post above.

If you have a team of developers, you might assign each developer their own low sequence range or "page". Possibly you can add some code hacks somewhere in drupal to make sure each developer doesn't overflow their assigned sequence range -- in which case the developer should go reserve for herself a new range/page.

I'm new to drupal, so not sure if that would work? That is, does drupal use the sequence table technique uniformly for all the important tables?

Sounds dangerous. Might work

Sounds dangerous. Might work for you -- but I'd recommend against that if you need to keep track of updating content items in a consistant manner.

I'm building a remote administration/content publishing set of modules right now. Here's my hints:
1. The import/export api module is key.
2. create another module and set of db tables to track a nodes id on the remote machine, and associate that id with the different nid on the local machine. Unset the remote key value after you record it, and replace it with db_next_id({$table}_id) for the local site.

This works for nodes -- menus are another story (outside the scope of this comment).

Does this help?

I agree with you. This process is a pain.

What if you tried this:
1) Take a snapshot of your production database and drupal config
2) Install the snapshot into a testing/staging environment
3) Perform an update on the test instance to see what issues you are likely to encounter.
4) Once you have the kinks worked out you could put your production version in stasis (no modifications allowed during upgrade)
5) Take a new snapshot
6) Install this snapshot beside your production system
7) Perform the upgrade
8) If all is well perform the switch-a-roo
9) Take the site out of stasis

Publish/Subscribe

Has anyone had success using publish/subscribe modules to have distinct staging/production environments?

I’d actually never

I'd actually never considered it. I assume those modules are working again?

I would love input.

<vapuor warning> Sympal Scripts Has a two tools, being "in development": stage.php --site www.example.com The site that will be staged. --stage [dev|test|live] What stage to move the site to. extract.php Create fixtures from a site. --site www.example.com The site to extract. --fixtures directory-name Write the fixtures to the directory directory-name. Stage essentially uses extract.php to stage your site to the next level.But as you can read, its all under development. Basically, I can only work on it, when budget allows. Wich means: when a client needs a new site, I spent a percentage of his money on gettin the scripts better. http://drupal.org/project/sympal_scripts The reason for posting here, is that people who run against this same problem, might have good ideas for the import/export fixtures ideas. Bèr

Import / Export API?

The Import / Export API (my Summer of Code project) will hopefully make things like this a lot easier. Some people have already mentioned that the API would be a great candidate for helping with the task of migrating data and/or settings from test site to production site, or even from one production site to another.

The API divides all data within Drupal into 'entities' (e.g. 'user', 'comment', 'file', 'variable', 'node-type-x'). I imagine that once the API is more stable, it would be possible to write a UI on top of it, for automatically exporting all the entities that you'd typically want moved from a test site to a production site, and for importing them at the other end. This will certainly make Drupal more "enterprise-like" ;-) (although we don't want to go too far down that road!).

Jaza.

Ahh yes, staging...

You've run into something I've run into as well. It's possible. There are some things I'd recommend.

1) Ditch the 'page' content type. For content that is generally static, go to templates. Using phptemplate, it's trivial, I say *trivial* to set up a menu item that leads to _phptemplate_callback() and loads a template. In fact, the theming guy I work with *loves* this. It's extremely easy for him to work with static pages that are arranged this way, and we can have them in version control.

2) While developing, before we had a live site, we did a dump of the database and referred to it as a baseline. This isn't terribly useful any more, but we now keep database changes in a system that's modelled after the Drupal 4.7 update system. In fact, you can use exactly that system if using Drupal 4.7 with a custom module.

3) settings.php -- transfer settings from your test environment live by having your settings.php contain the bulk of your actual variables table.

4) the cache table should be truncated when you do updates or work.

5) Export all of your views and put them in a module. Views can run *without the database*. Put it in code.

6) Someday I hope CCK can do the same thing.

7) Someday I want nodequeue to be able to do that too. It'll require named queues but after trying to use node queue from code, I realized it needed the same treatment Views got. Someday it may get it, but it's not high on my priority list. For the rest of it, don't transfer databases. Make the changes you need, dump your database and do a diff. See what changed, and apply those to your _update_X() function. Then, you can dup your live database onto a staging server, run the updates, introduce new code, and see what happens.

merlin i’d be curious

merlin i'd be curious exactly how you do this
set up a menu item that leads to _phptemplate_callback() and loads a template

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.