Real Life Data Migrations

Podcast episode player
0:00 0:00

Mike and Matt gather the Lullabot team around the campfire to discuss real world data migrations into Drupal, and everything that goes into it.

Episode Guests

Karen Stevenson

Karen Stevenson wearing a white button down shirt and blazer with gray backdrop behind her.

Karen is one of Drupal's great pioneers, co-creating the Content Construction Kit (CCK), which has become part of Drupal core.

More about Karen

Juampy NR

Juampy NR wearing a blue button down shirt standing in front of a gray background.

Loves optimizing development workflows. Publishes articles, books, and code.

More about Juampy

April Sides

Thumbnail

April Sides is a seasoned Drupal Developer who is passionate about community building.

More about April
Transcript

Transcript

Matt Kleve:
For March 7th, 2019, it's the Lullabot Podcast. Hey everybody, it's the Lullabot Podcast, episode 232. I'm Matt Kleve, the senior developer at Lullabot. With me as always is co-host of the show, senior front-end dev, Mike Herchel. Hey Mike.
Mike Herchel:
Hey, how's it going?
Matt Kleve:
It's great, we're at the Smoke Tree Ranch.
Mike Herchel:
It's beautiful outside. Although, it rained earlier.
Matt Kleve:
It rained in California?
Mike Herchel:
Yeah, no joke.
Matt Kleve:
I was indoors, I didn't notice.
Mike Herchel:
Yeah.
Matt Kleve:
So, we're in Palm Springs, California, the annual Lullabot retreat. We all get together once a year, talk about the business, talk about how we improve ourselves, our workflows, and we get the chance to actually see each other. Working remotely, that isn't always the possibility, is it?
Mike Herchel:
Yeah, and we get to hang out in the hot tub.
Matt Kleve:
I just had my feet in the hot tub. You got all the way in.
Mike Herchel:
Yes.
Matt Kleve:
So we're here today talking about migrations. It's one of those things that I think a lot of times can cause a lot of heartburn.
Mike Herchel:
You can migrate through the hot tub.
Matt Kleve:
Later. But first we're gonna talk about Drupal migrations, and I think it's something that's happening fairly often these days, kind of in the current Drupal lifecycle. Because we're moving from Drupal 7 to Drupal 8. We're still moving from one CMS to Drupal, or something like that, and they're always a little bit different. So with us today, we're talking with some Lullabots that have a lot of migration experience. Wouldn't you say?
Mike Herchel:
Absolutely. First up we have a new Lullabot, a developer here at Lullabot. She is an organizer of Drupal Camp Asheville, which is officially the second-best Drupal camp in the world!
Matt Kleve:
Right behind Drupal Camp Colorado, right?
Mike Herchel:
No, Florida, Florida.
Matt Kleve:
Oh right. Yeah?
Mike Herchel:
And she is working on the migration at the state of Georgia, correct?
April Sides:
Georgia.gov, yes.
Matt Kleve:
April Sides, welcome.
Mike Herchel:
Yeah, welcome.
April Sides:
Thank you.
Matt Kleve:
Also with us today, we have the Lullabot CTO, Karen Stevenson. Hey Karen.
Karen Stevenson:
Hey there, good to talk to you.
Matt Kleve:
You've done migrations for a while now on Drupal.
Karen Stevenson:
I have, I've done quite a few of them actually.
Matt Kleve:
I mean, back to we were working with Martha Stewart several years ago.
Karen Stevenson:
Yup, yup.
Matt Kleve:
And that was a big migration.
Karen Stevenson:
Yup, D6 migrations, D7 migrations, and now D8 migrations.
Matt Kleve:
And we've talked to you recently about the Lullabot.com migration.
Karen Stevenson:
That's right. That's a big one-
Matt Kleve:
And you're also-
Karen Stevenson:
-and we're really happy to get that behind us.
Matt Kleve:
You're also involved with the state of Georgia, and other projects along the way.
Karen Stevenson:
I am, I'm also helping on that one.
Matt Kleve:
Awesome.
Mike Herchel:
And next up, we have the CEO of Juampy Corp.
Matt Kleve:
Whatever that means, I'm not sure.
Mike Herchel:
Whatever that means, he was also a senior developer here at Lullabot. Welcome, Juampy NR.
Juampy NR:
Hello, everybody.
Matt Kleve:
Hey Juampy. Glad you're here.
Juampy NR:
Likewise.
Mike Herchel:
Juampy is working on the Bravo TV migration.
Juampy NR:
Yes I am.
Matt Kleve:
So, migrations, why are they so hard? Like, what's the problem?
Karen Stevenson:
That's a good question. There's a lot of things that can make a migration hard. The easiest thing is a Drupal to Drupal migration, because at least we have some idea of what we're coming from, and what we're going to. There actually is a lot of pretty good code in core now, for handling a Drupal to Drupal migration. So if you are creating a new site that is basically a mirror of the earlier site, to some extent you can kinda push a button and do it.
Mike Herchel:
Does that actually work though?
Karen Stevenson:
It works if your new site is pretty simple, and if it matches the existing site.
Mike Herchel:
Yeah.
Matt Kleve:
So, content types might have to have fairly standard fields.
Karen Stevenson:
Right, normal kinds of fields.
Matt Kleve:
If you have texts and images and-
Karen Stevenson:
Normal kinds of [crosstalk 00:03:48] formatters, a lot of the things, the fields, the formatters, the content types. If your content types are all nodes, that kind of thing. You'll likely gonna find that you can do a pretty much of a straight up migration out of the box.
Mike Herchel:
Well that's not very normal for the type of projects that we do, right?
Karen Stevenson:
Right, not at all.
Matt Kleve:
Juampy, you were the first one to shake your head 'no', that those migrations don't actually work that way.
Juampy NR:
My particular issue was that there was a lot data to migrate, a lot of fields, and a lot contributed modules that didn't have a matching Drupal 8. When I did what Karen said, and I sort of migrated stuff, I got tons of errors, like these that weren't found, plugins that did existed, further than didn't existed, so I had to go one by one. Some of them were core things weren't migrated yet, some others were. Contributed modules that didn't have a migration path from 7 to 8, and that was a long process. But, I'd like to step a little bit further than that, because my first issue with it was that, while there's a really good foundation for migrate, getting to know what's the right way to get your migration process in place is very tricky. Because I opened Drupal.org and it says away, but then there are thousands of particles in the web, with people suggesting you to do this this way or the other. There are still open issues in core and in [country 00:05:14] about this.
Juampy NR:
So it took me ... I remember it took me like two weeks, of just reading. I had to read the migrate core module to really understand what it was doing. I read a lot of articles, I read a lot of the Drupal documentation and then, I was able to make a decision on the start ... setting up how I did things to work. Has this ever happened to you when you started, April, Karen?
April Sides:
Right, this is Drupal 8 migration, from Drupal 7 to Drupal 8 for georgia.gov is the first time I've ever done a migration, so I'd to learn from scratch. Karen gave me a lot of guidance, and recommended just starting with one content type. And in the case with georgia.gov, the content structure was very different. Content types weren't one-to-one from Drupal 7 to Drupal 8, they're split into multiple different content types. So, we had some migrations that are migrating just a piece of a content type into one, and then taking other pieces and migrating them into other different content types, and making all those connections happen as well in the process. I think that there's a great foundation for the migration system, and I think being able to use those YAML files to kind of outline a lot of the basics, and then build your process plugins and your source plugins to just do whatever you need little tweeks that you need to do for customization.
Mike Herchel:
Can you back up a little bit? Can someone explain the ten-thousand foot overview of the migration system to maybe a Drupal developer that has never done a migration before?
Juampy NR:
Okay, Karen you wanna go?
Karen Stevenson:
Sure. So, if you've done a Drupal 7 migration, you'll recognize a lot of things about the Drupal 8 migration, because it's kind of at a high level, it's the same thing. The idea is that for everything you want to migrate, you have a source. Where's it coming from? It's coming from the D7 MySQL database, or it's coming from some other source. There's multiple things that you can use as the original source, and it's got a destination, and the destination is almost always a Drupal 8 content type of field that you're trying to migrate into. And then, in between there is a process, and the process is that part where you take all the things that have to be changed between the source and the destination. All of this in Drupal 7, all of this stuff was done in code, you'd write custom classes and you'd map your fields, and do all that kind of thing.
Karen Stevenson:
In Drupal 8, it's different, in Drupal 8 we use a YAML file. A YAML file, if you've not seen a YAML file, is basically just a representation of an array. So it's a way ... it's a text file, it's human-readable, you can actually just write it in text without any problem. So you have, for instance, every field that you're gonna create in the D8 database, you have the name of the field, and then you have what's the process to get the ... what's the source of that field's data. So, the name of the field, the D7 field that maybe you're mapping it to, what's the process for getting the data transformed in any way that it needs to be transformed. There's a simple get process which means, it doesn't need any transformation, it's just a one-to-one, you just copy the same data over in the same way. Or you have a whole variety of other kinds of processes that are available.
Karen Stevenson:
Each one of these processes is a plugin, and you just give the name of the plugin, and perhaps some parameters that go with it. For instance, a really common plugin would be a migration lookup. So it could be something like ... I've got an entity reference field in Drupal 8, and I had an entity reference field in Drupal 7, but the Drupal 7 ID that I was referencing might be different in Drupal 8. So the migration lookup will say, let's get the corresponding ID after the migration, so that it still maps to the right thing. There's all kinds of other things in there, there's crazy things like a strip tags plugin, and then there's ... you guys-
April Sides:
SkipOnEmpty ...
Karen Stevenson:
SkipOnEmpty is a really interesting one. SkipOnEmpty, you can use two ways. You can say, if I encounter SkipOnEmpty, skip this record completely. If there's a certain value in a field, I don't wanna migrate this at all. Or it could be a SkipOnEmpty could mean, just don't do this field, stop migrating this field, but go ahead and continue and do the rest of the record.
Juampy NR:
You can also change plugins one after another. So for example, if you have a lot of embedded content in the body, you can parse it through several processors, each is its own class. By the end, one will pass the result to the next plugin, the next plugin, the next plugin, until you get the finally processed body.
Matt Kleve:
One thing though is that, because they're plugins, you can write your own, right?
April Sides:
Correct.
Matt Kleve:
We're often dealing with clients with snowflake data, that it needs to be touched in 17 different ways and writing your own plugin might be the right answer, and Drupal's gonna let you do that too.
Juampy NR:
Totally. It's actually a fairly straightforward process because you can just use drush to generate a template for the plugin and then start writing it, so in a minute you are already writing the code that you need, and place it in the migration file with where you need it. I'd like to highlight something else that shocked me when I started working with migrating 8, which is the [inaudible 00:10:54] discovery of fields and mappings. For example, you said it before Karen, in 7, with the migrate you had to do the mapping. I don't know how much time I would have spend doing the mapping of the fields from the 7 site in Bravo to 8, because there are so many fields. Georgia must be crazy, right? Because we're talking about hundreds of sites here, so that's something that cannot do it by hand. Because you're not only going to take a long time but you'll also make mistakes.
Juampy NR:
This is something that you can extend to other things. For example, I was ... the Bravo uses media, and the migrating media to Drupal 8 still an open issue, like there's still not a solution in core already. So, the way we are approaching it then ... and that's something you can find on Drupal.org is a patch that discovers what media bundles ... sorry, what file entity types you have in Drupal 7. I'm talking about the 7 migration, what fields each of files these have, and it creates for you each of the corresponding media bundles, along with their fields, along with their content to migration, from 7 to 8. Which is such a time saver, and we didn't have this before.
Mike Herchel:
So let's talk some real-world migrations. Let's go around, and I wanna hear a specific problem that you encountered, why it was a pain in the ass, and what you had to do to solve it.
Matt Kleve:
It sounds like a job interview question. Tell me about a problem you had once, and the steps you took to solve it.
Mike Herchel:
Let's start with you, April.
April Sides:
So I would say, paragraphs, and paragraphs nested in paragraphs, along with field collections. You can go down a rabbit hole to try to process each of those items.
Matt Kleve:
So, a paragraph inside of a paragraph that's in a field collection?
April Sides:
Just adding field collections is another thing. A lot of rabbit hole kinda stuff. What I ended up doing was just sort of looping through item, and I had functions that would determine how each type of paragraph, each type of field collection would be rendered in Drupal 8. It might be an embed snippet, it might even, to begin with, just stubbing out some output text to say "Hey, there's an image here," and then we can go back and actually decide what needs to be put into that place. Just kinda generating all of that, and we have one column and two column and three column layouts that would also hold all of these other paragraphs. It just kinda looped through and rendered each one of them and then compressed all of that into the body text, and then returned that to the body field. That was pretty interesting.
Matt Kleve:
Interesting.
Karen Stevenson:
I was impressed that she got that working, because that is probably one of the most complex things that you have to deal with, is a lot of things that are nested inside of other things, and somehow or other. Basically what we were doing on Georgia, is we were trying to sort of unravel that mess. So we were not trying to create similar nested entities on the other side, we were trying to sort of figure out what was in there, unravel it, and basically just drop it all in the body. So you didn't have that again in Drupal 8.
Mike Herchel:
Does Jeff [Eten 00:14:07] know about this?
Karen Stevenson:
Jeff Eten helped create the content model that we're migrating into, which is different than the original content model.
Matt Kleve:
Whoa. Put it all in the body field, move on.
Karen Stevenson:
Yes, yes. There's always that.
Matt Kleve:
Structured content's a good thing, we wanna-
Karen Stevenson:
Structured content is definitely a good thing. Structured content is obviously easier to migrate, because it's structured. Which is another reason why structured content, going forward, will make a lot of sense. But the content that we inherited, as is true in a lot of clients, is not structured, or not completely structured. Especially when you get the everything dropped into the body field kind of a thing.
Matt Kleve:
Karen do you have any other problems you want to talk about [crosstalk 00:14:52]?
Karen Stevenson:
Oh there are so many problems, so many problems. It's hard to even know where to start.
Matt Kleve:
You can go back through several years' worth of migrations.
Karen Stevenson:
Yeah. So, one of the things that I did the Lullabot migration is ... because the Lullabot D7 site is a decoupled site, we didn't have some things that we needed in Drupal 8.
Matt Kleve:
But the Drupal site ... I mean it's still a Drupal site-
Karen Stevenson:
It's a Drupal site.
Matt Kleve:
So the migration is Drupal-to-Drupal, but ...
Karen Stevenson:
Right. Because the front end on the D7 site is not a Drupal site. Drupal didn't know about some things that were going on on the front end. So for instance, our Drupal 7 database doesn't know what URLs are, because the front end determines that. Our Drupal 7 database doesn't know what the ... there were a number of other things, but the URLs were one of the bigger ... menu items, there's no menu structure in Drupal. The menu structures were all-
Matt Kleve:
The URLs were constructed with data that was a part of the node, but not completely.
Karen Stevenson:
Right. So there were fields that the Drupal 7 database is structured. There's fields that represent all these things, but the front end did the assembly, and so Drupal didn't know how things got assembled. So what I had to do in the migration was kind of reassemble everything so that I could pull all that information back into Drupal. So what I ran into was things like I needed to create menu items, but the menu items depended on the content. So I couldn't create the menu items until the content was created, but I couldn't finish the content until the menu items existed. One of the things that I found that was magical, was that there are event handlers in there, and so there's a way basically stop in the middle of a migration. You can say, "At the end of migration x," whatever migration I wanna do which is basically, the last migration that I needed in order to be able to build the menu. I could stop at that point and say, "Run another process, build the menu system and then go back and finish the migration." That was for instance, how I handed that problem.
Mike Herchel:
Juampy?
Juampy NR:
Mine is similar than April's, because in Bravo TV, there were-
Mike Herchel:
Have you done any migrations besides Bravo, recently?
Juampy NR:
Previous one was like three four years ago, when it was MSNBC.
Mike Herchel:
So you've had some migration stuff in the past so ...
Juampy NR:
I did, I did.
Mike Herchel:
Picking up the Drupal 8 was alright?
Juampy NR:
It was different. Things have changed so much, but for the better. It's exciting. It's tricky, but it's really really exciting and fun. I had a really good time, setting up the migration and testing it and battling with some of the challenges I faced. But, what I'd like to highlight is if, I'm coming from the same angle as April, because Bravo TV, they use multifield, which is an alternative to field collection. So, we had to migrate those multifield fields into paragraphs, and there was no migrate path, so what I had to do is go to drupal.org. I found that there was a patch that is still open for migrating field migrations to paragraphs. So I had to take that patch, and then understand how multifield works in 7 in detail, and understand how paragraphs work in 7 in 8 in detail, and then find a way to get that patch that would let me migrate all those multifields into paragraphs into 8, and that took me a while.
Karen Stevenson:
I was really glad you did that because, Lullabot also had multifields.
Juampy NR:
Oh, you're right.
Karen Stevenson:
Except, in the case of Lullabot, I was not trying to migrate into paragraphs, I was trying to migrate back into content types. So what I did is after you figured out how to solve the problem, I grabbed your code, and then I figured out how to reorganize it and solve my problem.
Juampy NR:
That's awesome.
Matt Kleve:
That sounds like an open source story.
Juampy NR:
Totally, totally.
Mike Herchel:
I think the multifield is they've remodule, right?
Juampy NR:
It is.
Karen Stevenson:
It is. It's a great module, but there's no upgrade path for it.
Juampy NR:
Yeah, or nobody has posted these, because-
Karen Stevenson:
Yeah, at this point-
Juampy NR:
The thing is that, I've seen that there are many articles that say, "Here is how you can migrate this," but the migrations are written. You can see the fields are one-to-one written there. There is no code that generate ... that introspects your data, the source database, and generates this migration file. So, most of the articles I have found on the net apply when you have a small database, or a database with no match complexity. But when you have complexity, these are not valid, you have to go deeper than that, so you can generate these migration files automatically. Otherwise, you will spend too much writing them.
Karen Stevenson:
We actually in Georgia, did end up writing them by hand because the content model was completely different. The automatic-
Juampy NR:
Because you were changing, right? The content models-
Karen Stevenson:
The automatic migration did us no good. It woulda created something that we would've thrown out most of what it created.
Juampy NR:
Go you, then it's impossible.
Karen Stevenson:
So in that case, we had to go the other direction.
April Sides:
We actually created the content types on all the fields first, and then just migrated into them. We didn't migrate that structure at all.
Juampy NR:
Right.
Karen Stevenson:
Right. And then we went through a process, because ... another thing that's interesting about migration is just thinking about how you make the process ... for instance, one of the interesting challenges, April was writing a lot of the migration code and I was reviewing a lot of it. And it's really complicated, right? This stuff gets really murky and deep, and that kind of thing. Just figuring out how to sort of pull this thing into little pieces, where you could do a PR that was actually reviewable, and say, "Okay, we're only gonna get this much done in this, so it might just be a couple fields, and a couple content types." It wasn't everything, because everything would've been a PR that nobody could've ever reviewed.
Juampy NR:
Right. So, I'm interested in that, how was that process? Like, let's say you wanted to add a new migration, or you wanted to adjust a migration, while at the same time the development team is working, right? So you need to work in parallel with that. How was the development process of you making the changes and then sharing with everybody else to peer-review, and then merge it?
April Sides:
Right, so we actually broke up the tickets into different content types. We started out with just one content type and realized we could make tickets for sort of simple fields. These are just things that are just gonna be, we gonna get the content, we're gonna put it in the field. Those are the only things we're gonna focus on in this round. And then we would go into more complex problems where we had built some ... oh what do we call those fields?
Karen Stevenson:
Rich fields?
April Sides:
Rich fields, right. We had a different kind of structure for rich fields, so like an email would have multiple pieces of content, and then it would be related. We had a round where we were doing things that would be become rich field content. And then we had another round where we would work on the paragraphs and all of that. We would focus one content type for that. So what we kind of done is gone through ... we've picked a content type for each sort of problem that we need to solve. So now that we're at kind of the end with the [WYSIWYG 00:21:45] clean up, like what needs to happen with an embed from Drupal 7, and what does it look like in Drupal 8? What does this document link look like in Drupal 8? Those sort of things. And once we can complete that, then we're gonna need to come back around and make sure we do it for each content type that might be effected by that sort of process.
Matt Kleve:
And then, the development team changes content types and fields and ...
Karen Stevenson:
That's never happened, has it?
April Sides:
Oh yeah. I mean, for the most part those have not been that hard. It's just a matter of knowing that it's happened, right? I did notice some errors and I just pulled out some fields, like this is a problem right now, we don't have the ticket ready to work on, so let's just pull these fields out, and we'll circle back around and redo that. It's mostly about the structure in Drupal 8, we just needed to make sure the content mapped the right location.
Juampy NR:
Were there many times where you did something and broke the result in migration, so development team was blocked or something? Or do you have a database dump or a site to download database to work on?
April Sides:
I think working within Tugboat a lot, we've got a base install with a base ... let's run the process, let's say let's run the migration, and its successful. So we have this database that can be used to build local development sites. So, they always kinda have a starting point. So the migration is killing something, it's not gonna effect that base instance so the database. They always have something that they can reference.
Juampy NR:
So did Tugboat run the migration, then?
Karen Stevenson:
Yes, Tugboat can run the migration. Or what we've done in a lot of cases, is, we've got the base install that has the simple migration. So for instance, we stop at the end of the simple migration, we've got databases that have the simple migration in it, but they don't have later migrations. And then April will come up with a PR that has more migration work in it. We'll run that through Tugboat, we'll literally go to Tugboat and do a rollback and a remigrate, that'll pick up all the new stuff that she added.
Juampy NR:
I see, I see.
Karen Stevenson:
We can test that on Tugboat and make sure that it's working right, and then decide whether or not it's ready to merge in.
Matt Kleve:
Who wants to explain Tugboat, just to make sure that everybody understands?
Juampy NR:
Tugboat is ...
Matt Kleve:
It's a Lullabot product, first of all, it's something that Lullabot has built, as a part of our building Drupal websites over the years. It was one of these, "Hey wouldn't it be cool if it worked this way?" And we made it work this way.
Juampy NR:
I don't know, my first experience with Tugboat was when we were doing the MSNBC project, and suddenly when I upgrade a pull request, I get a minute after, a message from a bot that says, "There is a URL for these pull requests and here it is." And I thought, "Wow, this amazing." So it's a URL pull request.
Matt Kleve:
Yeah, but every time a pull request is made, the software runs in the background and spools up your website, with the new code base.
Juampy NR:
Exactly, exactly.
Matt Kleve:
So it's an easy way to just jump in there and test.
Juampy NR:
Exactly, so I discovered it was a great way to show new stuff to stakeholders, to the peer-reviewing. For example, whenever you were working with a third-party, that gave you a JavaScript file to integrate with, you could tell them, "Hey is this fine? Is this correct? Can you verify?" Yes, and then you would merge it. So it has been a great time saver for us, since we started using it years ago.
Matt Kleve:
Yeah, Tugboat also works with GitLab, Bitbucket, and pretty much everything else. The website, for those that are interested in this, is tugboat.qa.
Matt Kleve:
It doesn't even have to be Drupal.
Mike Herchel:
Yeah, it works for ... it's changed our whole development process here.
Karen Stevenson:
It's especially useful in a migration. What we found is, because if the only other way, for instance, help review April's PRs, would be I'd have to create a local environment, do that migration on my local, and especially if we're doing several PRs and several different kinds of migrations, just the process of building all that locally takes so long that I literally could hardly get through the PRs. And this way, I can go review everything on Tugboat, I don't have to build it locally. It's just been a huge time saver.
April Sides:
Really great, because georgia.gov is multisite, and some of the sites don't have the content in Drupal 7, so you're testing content migrations on various different sites to make sure that you're covering all the bases. I've also found recently that it's running faster on Tugboat locally, even if you're like ... because you have to rebuild locally, and you have to do all that. So when I was doing test cases, it's like I should have been using Tugboat the whole time because it's already built, I just need to roll back, reimport, and then I can test it and find those test cases for QA.
Matt Kleve:
We're talking migrations on the Lullabot Podcast, here at the Lullabot retreat. We're all around the table, having a good time, right Mike?
Mike Herchel:
I am.
Matt Kleve:
Yup. Right after this, we're gonna get together and talk a little bit about keeping the migration straight. Maybe, some more stuff, all about migrations, on Lullabot Podcast, right after this.
Advertisement:
Whether you're learning how to build sites with Drupal or diving into the code, there are community-powered camps, summits, sprints, and trainings happening all over the world. Find all of these and more at drupical.com. And of course, if you want to boost your Drupal chops from the comfort of your own own, point your browser to drupalize.me, and stuff your brain full of carefully crafted videos and tutorials.
Liz Trudeau:
Hi, this is Liz Trudeau from the Drupal Association. Drupalcon Seattle is the conference where cutting-edge content, networking, and contributing come together. Meet thousands of users, developers, and designers using Drupal. Level up your skills, April 8 through the 12th, at the Washington State Convention Center. Registration rates increase March 1st, so don't delay. Events.drupal.org.
Mike Herchel:
Welcome back to the Lullabot Podcast, we're talking about migrations.
Matt Kleve:
Only because we're not in the swimming pool, right?
Mike Herchel:
Yes.
Matt Kleve:
We're at the Lullabot retreat, here in Palm Springs. We get together once a year, and we have some migrations experts at Lullabot all around the table talking migrations. Hi guys.
Karen Stevenson:
Hey.
Juampy NR:
Hello.
April Sides:
Hey.
Matt Kleve:
So one thing that I wanted to kinda just mention real quick is that, one thing that's really great about migrations in Drupal, is that it's a mature thing, right? Migrations ... the whole migrate system has been around for about 10 years or so.
Karen Stevenson:
Yeah, since Drupal 6, so I think that was kind of the beginning of the migrate module was Drupal 6.
Matt Kleve:
The old way was like a inside of Drupal field mapping, a bunch of select boxes. This field to that field, this field to that field, and then run your migration.
Karen Stevenson:
All in code, basically, everything's in code.
Matt Kleve:
And then I remember it was a drush-based, essentially plugins, where that you're writing all your migrations in. All of the commands had to happen by running a drush command, right? To make new migrations, and old ... right? To run it?
Karen Stevenson:
Yeah, I think all the way through there's been both the ability to run it in the UI and the ability to run it in drush, which is still true.
Matt Kleve:
I remember being at Drupalcon San Francisco, 10 years ago, or whatever that's been, and Mike Ryan was giving his talk about migrations.
Karen Stevenson:
Oh yeah.
Matt Kleve:
I was actually fairly upset when I left, because all he talked about was the spreadsheets he used to keep fields straight. And recently, I've come to the conclusion that that's what a migration is. The code part is relatively simple. Things are from here and data's going from here to there, moving from one database field to the other. But ...
Karen Stevenson:
I would disagree with the code is fairly simple.
Matt Kleve:
Yeah it depends on what you're doing. It depends on what you're doing.
Karen Stevenson:
Right. Yeah, it can be fairly simple.
Matt Kleve:
But keeping that data straight, making sure you have the data that you want in the end is kind of the-
Karen Stevenson:
That is true.
Matt Kleve:
That's the point.
Karen Stevenson:
That is true.
Juampy NR:
Yeah. I'll speak ... in Bravo we were lucky because we didn't have to change content types, they wanted what Greg Dunlap mentioned in an article, forklift migrations. So whatever is in 7, they want it in 8, which is great because we did a migration and then we are at the stage where the editorial is very fine that it is working as it was, with no major changes. So that was a relief for us. But the challenge we found was that, we needed the development team working on migrating the code, and obviously they needed some data to work with, and we also had to make changes on the migration everyday. So, the question was, how we can make these changes in the migration without effecting the development team. On top of that, we didn't have a development server until way later in the process. So we worked with Tugboat, so there wasn't therefore an easy way to do drush, some site alias, get me a database. Because there wasn't a development environment, it was Tugboat.
Juampy NR:
We thought about doing dynamic type aliases, but we didn't find out an easy way of setting them up, so we were working with a database dump that we used for migrations that only have configuration in it. We would have also the result of the latest full migration which is what the development team would use, and also, what Tugboat would use, so they could verify their work. While, all the running migrations and testing migration would happen in CircleCI, continuous integration tool. Did you do anything that, like in some sort of Jenkins or CircleCI, like what would run the migration in your case? For your job?
Karen Stevenson:
We do have a CircleCI workflow that James set up, and I actually can't give you a lot of details about how he set it up, but he got that part all working. He also figured out how to do things like the multisite.
Juampy NR:
Gotcha.
Karen Stevenson:
So that we can run individual migrations for every different multisite and keep them separate.
Mike Herchel:
Is Circle running the migration every time it builds? I really don't know either.
Karen Stevenson:
Yeah, I don't know either, I think one of the issues was we did not want the migration running every time we built Tugboat, because that was just gonna be too time intensive, and performance problems and all that kind of thing. So the idea ... that's where we come up with the idea of we've got at some point in time, we've got a database that is kind of the code at that point in time. And then we manage the migration by, like I said, basically going in doing a migrate rollback, and then remigrate, which will go back in basically run the migration again, picking up any new stuff that we've added. That actually's worked pretty well, in terms of a process.
April Sides:
As far as organizing the files, all the YAML files that we were talking about before are a part of the configuration of the site, with the configuration management system in Drupal 8. But then we also have a custom module in georgia.gov that any custom sources, any custom source plugins, or process plugins, all kind of live in that module so that we can keep track of where are our modifications are, so we're not ... We only have one module, to kind of manage that process.
Karen Stevenson:
We also took an approach of, there's a lot of different ways that you can manage things. You can write custom plugins for everything if you want, you can still use hookrow ... prepare [crosstalk 00:33:22] prepare a row. There's a hook plugins migrations plugins [Alter 00:33:27], that you can use. So there's a couple of different hooks that you can use. And then there's the YAML files themselves. So there's several different places where you can do this, and it's not as though one is right and one is wrong. They're alternative ways that you can accomplish things. We took an approach ... or we tried to take an approach of wherever we could to put the code into the YAML file. If it could be done in the YAML file we tried to do it in the YAML file, just so that there was sort of a consistency of approach. But even with that, we've still got lots of places where we're using hook prepare row, or one of those.
Karen Stevenson:
So hook prepare row is interesting, because that is a place where you can do individual record changes. Like, I'm gonna have a particular record that needs to be handled a little bit differently than every other record. So I can't put that into a YAML file, I need to do a dynamic process when that particular row is being processed. So I can do that kind of thing in hook prepare row. The hook migrations plugins alter is really interesting because, I can actually change what's in the YAML file. So I can create a YAML file that defines that the general state of things, and I can use this hook migrations plugins alter, to say, in some dynamic way I wanna change that YAML file for a particular situation. So, the combination gives you ... there's a tremendous amount of flexibility. It is a huge learning curve to figure out where all this is, and how you can use it. But once you've figured it out is enormously powerful.
Juampy NR:
Yeah, I had to read code. To me the key was, not documentation, not articles, but reading. Like the migrate in core module is great, it has a lot of documentation in it that you can read by just reading the source code. The set of contributed modules that are around it, the migrate tools, migrate upgrade, correct me if I'm wrong there are a couple more that I've been using as well. They work together one way or another, and they are great as well. They have examples in there. So it was for me, reading what each of these plugins, what lead me to set up the migration the way I want it to.
Matt Kleve:
April, what was the easiest way for you to learn? Like what did you find?
April Sides:
The documentation is very thorough, and then there are lists of-
Matt Kleve:
The encode documentation, is that what you're saying, or the drupal.org documentation?
April Sides:
Drupal.org. So there's documentation on ... there's lists of what process plugins are available, and there's just a lot of information there and once you click on one of the processes, it takes you to the API page, that then shows you of course, the inline documentation for that plugin. And then, just various sites to just kind of see how other people are doing it. And then using a lot of Karen's examples that she was using on Lullabot.com was very helpful.
Karen Stevenson:
By the way, the Lullabot.com migration code, I'm gonna make public someplace.
Juampy NR:
Nice.
Karen Stevenson:
Because it might be useful to other people.
Matt Kleve:
Right on.
Karen Stevenson:
So, FYI.
Matt Kleve:
That's cool.
Karen Stevenson:
Yeah, for me a lot of it was just trying things out. You have to do a certain amount of experimentation to figure out how to get the results that you want. If the out-of-the-box isn't it, you have to start to see what kind of data do I actually have to work with, and where is it? And, how do I get it into destination in the way that I want?
Juampy NR:
I'd like to share a funny migration scenario from these past few days, I think is that, suddenly the resulting migrations were giving us broken thumbnails, and we didn't know what was happening there. The thing we discovered is that ... we were running the migration in CircleCI, because as I said, we didn't have a development environment. CircleCI has to have a timeout of five hour per job, and we were over that. Besides, it takes like 20 hours to migrate everything in Bravo, and sometimes we still get a timeout.
Mike Herchel:
That's a lot of content.
Juampy NR:
It is. I think it's not just the content, it's the complexity. It takes a lot of CPU to figure out how to migrate each of the nodes, because there are so many things, there are so many plugins that take into account, that takes a lot of time. So, we discovered that silently, some of the thumbnails, were were using [Media Innate 00:38:00], so with media you got the option to just migrate, let's say, one video, but let's leave the thumbnail downloading for later. And that gives you a chance to split the migration in several steps so you don't hit a timeout like in our case. There was so much content that we weren't allowing enough time to that queue to be completed. So, bit by bit, day by day, that queue started growing until people started seeing it, and they started pointing broken thumbnails. So we started thinking, "What's going on with this?" And there was one day where that, I downloaded the database, I looked at the queue somehow, and found out there were still items there.
Matt Kleve:
So the migration actually creates a queue, and you could see that the queue wasn't clear, like there were still migration tasks pending?
Juampy NR:
Correct. Yeah, and it was just by luck that I saw the queue, but I kept looking at the code and everything went fine. And the thing is that, it's time consuming. Like whenever you're debugging a migration, unless you are able to reduce the problem to just a few seconds, sometimes I just click a button and on Friday, and I tell my girlfriend, I'll text tomorrow. And even if it doesn't work, I need just 10 minutes to make another change, click again, and Sunday I'll check again.
Matt Kleve:
That's the XKCD where the computer programmers are sword fighting or something?
Juampy NR:
Totally.
Matt Kleve:
My migration's running, I'm actually working.
Juampy NR:
Exactly, when you have no clue ... I've been through that, I have to do it that way, because I had no idea what was happening.
April Sides:
And test early and often. Test one field at a time so that you're not trying to run the whole migration. There are ways to run it and to also say, "Only do this for the next five records," or, "Do this for this particular node ID, so that it only runs for that record, and not a list of content," can cut sometimes.
Matt Kleve:
Juampy, do you have any final words?
Juampy NR:
Yes, I've been dealing also with incremental migrations. For anybody who has never done it ... so there's a full migration, and then once you're ready to go live, you run small migrations to get the latest content so you can go live. This stuff is still unstable. If you want to do this, check drupal.org, because there are patches that work.
Matt Kleve:
Unstable like it's not gonna work, or unstable like it's gonna take some effort?
Juampy NR:
Unstable means that if any source made changes in the Drupal 8 side, you will get funny results.
Matt Kleve:
Karen, any final thoughts?
Karen Stevenson:
I don't think I have any more that ... I can't think of any more right now.
Matt Kleve:
It's because migrations just kinda take everything out of you.
Karen Stevenson:
They do, there's so many things, I don't even know where to start.
Matt Kleve:
April?
April Sides:
Yeah, I don't really have anything either.
Matt Kleve:
Mike, any final thoughts?
Mike Herchel:
Thanks everybody.
Juampy NR:
Thank you.
April Sides:
Thanks.

Published in: