Unfortunately, we discovered that my blog archive cannot be extracted either by the simple device of using the "export" function in the Blogger software or through the ingenuity of my new tech people. We've gone back to those Google employees who helped me after my blog was deleted, and they say they are trying to help, sounding quite sincere about giving me personal service, which I appreciate. Two weeks ago, they told me that they had an "engineer" working "actively"on extracting my archive. We've followed up, and we've been assured this active effort continues, but still, no archive extraction.
The problem is the size of the archive (with over 20,000 posts and nearly a million comments). If anyone is blogging in Blogger, they need to know that there is an upper limit to what Blogger can handle without losing functionality. Had I known what that limit was, I would have gotten out before I hit it. I feel like I can't get out at the door — I do wish I hadn't blogged quite so much!....
Alas! it was too late to wish that! She went on growing, and growing, and very soon had to kneel down on the floor: in another minute there was not even room for this, and she tried the effect of lying down with one elbow against the door, and the other arm curled round her head. Still she went on growing, and, as a last resource, she put one arm out of the window, and one foot up the chimney, and said to herself `Now I can do no more, whatever happens. What will become of me?'
I can do no more, whatever happens. What will become of me?
UPDATE: Email from Google says: "Good news. We've improved our systems and have now exported your blog. It's 1.8G of XML representation.... Thanks for your patience." All right then.
91 comments:
There's always the manual option - you'd lose links from commenters names to their sites, but it could probably be done, a few a day til it was over...
Good luck. A lot of us out here are taking notes.
Solution:
1. To the extent you can't move certain data, keep your blogger blog and make pretend posts once a week or so, to keep everything you can't move, forever or until Blogger improves, as it might.
2. Move.
I'd suspect you can get it with wget scripts, but you say blogger is limiting session time to prevent it.
If the new place has the script set up and the only problem is blogger session time limits, ask your blogger tech to see how to lift the session limit; or ask if he'll run the script for you.
Just when Ann thinks she is out, they pull her back in.
This happened to Belmont Club twice. The first time Wretchard just started a new blog when the old one stopped working properly. Then he went to PJM. Both times he just left the old blog in place. As a regular over there, I never missed the old threads.
I'm not sure why you have to have 1M old comments on the new blog if the old blog is still available to read. You can always link to it. Very few people read ancient comments, and if they are interested it's not that hard to type in a URL.
Ann, start your new hosted blog somewhere, and put a prominent link to the old blog in the sidebar.
Then keep this site running here at Blogger but close it off to new comments so it won't get spammed.
I like this one even better.
Dan Savage would say it is unreasonable to make a life long committment to one blog platform.
Then again, you did not vow to stay with Blogger for the rest of your life.
Next on "Hoarders"
Blogmistress refuses to throw away any of her comments.
This Friday on TLC.
The problem is the size of the archive (with over 20,000 posts and nearly a million comments).
Oh My God! Are they trying to move these archives with Linotype machines? They are feeding you a line of bullshit.
Why not just leave the archives there and link to them when needed?
Aren't you the only one who searches your old archives, and you will still have access, so that you can go back and copy anything you want to bring back up?
Wv: turph - this is my turph and I want it!
York- HAH. Yeah, exactly.
I'm under no illusions that my comments will be valuable to future generations.
I volunteer to have my comments deleted, for them to be like smoke signals, or sand castles - flashes of searing insight in the fog.
Besides, I'm ashamed of every comment I've ever made, including this one.
I'm guessing she doesn't want to just leave the archive here because it might be disappeared again, like before.
I don't know much about the mechanics of Blogger's business. Does it make money off of the imprisonment of Althouse's traffic here? Is that an obvious explanation?
Start the new blog right away! Don't compound the problems with even more comments and posts.
(If it weren't too impolite and smarmy I would encourage commenters to indulge in a little "I told you so". If it was up to many of us you would have left blogger well before the last Presidential election.)
I propose a Kausian, incremental approach.
Go to the new blog now, fix it later.
Sequencing is important!
What Carol & John Lynch said.
Let's light this candle.
It's not really that much data or pointers. I don't see what's so hard about it.
I bet the engineers just keep coming across Alpha's comments, followed by ROFLTAO. That slows the process down quite a bit.
Keep in mind that this blog is her masterpiece. This is her key to immorality. It must outlive her. Every stinkin' lousy comment I ever made must be preserved in perpetuity. Would that I had the perseverance of Sippican, if only to lighten another's burden.
"Relax," said the night man,
"We are programmed to receive.
You can check out any time you like,
But you can never leave."
Althouse's "key to immorality"? I've been skipping the good posts.
It would have been safer stored on cellulose acetate than at google?
I understand not linking so to be forever not connected to Blogger. It may be necessary to begin the new blog with hope the former can be moved over. This is taking too much time for me to have a good feeling about its eventual success.
For my part I'm willing to let my few thousand comments vanish. Hell, I never rated a Tag.
Oh My God! Are they trying to move these archives with Linotype machines? They are feeding you a line of bullshit.
Not necessarily. Bulk data exports can be tricky if the system wasn't engineered for it (and given its history, Blogger almost certainly wasn't).
It is a safe bet that Blogger has not devoted a lot of resources to developing a good way for their largest customers to exit the system quickly and easily.
Seriously, though, if they're making money by slow-rolling you, you should get the F out now.
Quite the Gordian knot. But you know how to solve that problem, don't you?
Every day's delay makes it worse. Whether you eventually (with help) find an adequate work-around, you'll still be better off with your new posts, and their comments, in a new professional setting.
w/v: "suendol," a good-lookin' gal who files lots of lawsuits, I suppose, but who's not obsessed with politically correct feminist nomenclature.
You need to go to the juicing room, and get pressed.
Maybe get the new one up and running on a parallel track while the process goes forward? Or would that probably confuse and/or lose a crapload of visitors to both?
We gotta get out of this place
If it's the last thing we ever do
We gotta get out of this place
Girl, there's a better life
For me and you
My little girl you're so young and pretty
And one thing I know is true...
You'll be dead before your time is due
Yes You Will
You're not being over emotional. This is something you worked hard on. I think you've been remarkably cool about this. At least in public... I can imagine you've had less kind words in private.
Prof Althouse, I suggest you just go ahead and start with the new blog, with what archives you have, and continue working to get this resolved. It may never be resolved, so why wait for it to be? Don't let Google's schedule control your blog any more. There is no reason.
I can't say I always agree with you, but you're an excellent blogger and it's a shame Blogger didn't treat you accordingly (differently than any schmo out there). I think you probably made Google a fair chunk of money, after all, and in exchange they were supposed to host your material properly.
Rev, I worked at a publishing house for almost 35 years. I did a brief stint with computers in the early 80's. You set up a program to achieve the results that you want. It's not like manual lifting.
Remember the character who showed up not too long ago, and tried to wipe her off of the blog face-map?
I'll say this, that's her problem right now.
Off topic, but Emily Mills has posted her response
Millionth!!!!!
To be clear, I want to keep the blog in one piece. I realize I could just start with the first post in a new place! Obviously, I am not satisfied with that option.
I will keep the existing archive on Blogger in any event. I'm not going to shut down this site, which has many links on it and will get traffic long into the future.
No need to talk about that option.
There must be some kind of way out of here.
New post is up on the Emily Mills thing. Discuss that there.
"Blogmistress refuses to throw away any of her comments."
That's a good point. That is within the range of something I'm considering.
Blog on the Run.
Not necessarily. Bulk data exports can be tricky if the system wasn't engineered for it (and given its history, Blogger almost certainly wasn't).
Bulk? We're talking a couple of gigabytes of data, at most. It would fit on a thumb drive. Certainly an engineer with access to the API should be able to do this in a day or two.
Hell, if I had set about to do it, I probably could have written a scree scraping utility that would have it done by now.
"Aren't you the only one who searches your old archives, and you will still have access, so that you can go back and copy anything you want to bring back up?"
1. The search function is one aspect of Blogger functionality that no longer works properly for me.
2. On the new blog, I want the tags to work, and I want people to able to search from the front page.
3. I want old post to start getting linked (esp by me) within the new site (in part to make money from the new ads, which pay more with more traffic).
"Bulk? We're talking a couple of gigabytes of data, at most. It would fit on a thumb drive. Certainly an engineer with access to the API should be able to do this in a day or two. Hell, if I had set about to do it, I probably could have written a scree scraping utility that would have it done by now."
1. It's about 1.5 gigabytes.
2. My tech people wrote software to do it but couldn't get it all.
OK, so move the posts w/o comments.
This blog is called "Althouse," not "Althouse & Friends."
I appreciate the effort, but really no one reads old comments.
I"'m guessing she doesn't want to just leave the archive here because it might be disappeared again, like before."
Yes, that is also a point. I want my stuff. I don't want to leave it here unsecured. And remember, Blogger has shown to me that it's not designed to handle a blog this big, so it's like a bridge with too much weight on it. I don't feel safe.
"I don't know much about the mechanics of Blogger's business. Does it make money off of the imprisonment of Althouse's traffic here? Is that an obvious explanation?"
I don't carry Google ads, so probably not. I think their $ interest is in helping me, so they don't look bad... their reputation.
Why not do some New York Times/Sarah Palin email/distributed work thing, where everyone is forced to export data from one thread before being allowed to comment...
Yeah, it didn't really work out for the New York Times, did it?
"This blog is called "Althouse," not "Althouse & Friends."
The friends are implied. It's like "Seinfeld". Sure, it was named after only Jerry Seinfeld, but no one would have watched it without Costanza, Elaine and Kramer.
Palladian- Fair enough.
I just like solving problems, and I can't help with moving huge amounts of data.
So, that's enough of that.
One thing you always tell clients in IT - being in the "cloud" means its on someone else's hardware. If you don't have your data backed up locally, in your hands - it doesn't belong to you. Maintaining your own infrastructure and data connections is a pain, but it gives you the luxury of controlling your own domain, so to speak.
Google's got you by the short hairs - learn to love it or let it all go.
Alice shouldn't have eaten the mushrooms.
I can remember a very big hit ... "Put another nickel in, in the Nickelodeon." And, the poor kid who wrote that song, never saw a dime.
It happens to people who are creative all of the time.
But blogger? Doubt the future will hold another Althouse coming in their portal.
While (except for last night and this morning) ... with "Bad Report." And, "400 ERROR" messages ... The part where we come in ... had gotten better.
Man, had did Google go and destroy pixels?
If I had to guess? It won't help Apple sell its iCloud either. It's a lesson to learn that a company shouldn't be trusted with anything that has to do with records you'd like to save. And, to keep.
Next, they'll be shredding up wedding pictures.
(Of course, one solution would be for Google to try to keep you attached as their customer?) Hasn't anyone thought of that over there?
I imagine you are going to just transfer the posts and leave the comments behind.
That is a workable solution.
You have a right to your stuff and, as a couple of people have noted, it isn't that big a chunk of data as these things go.
You also have a right to the service for which you paid.
You are really giving people a lot of reasons why they shouldn't trust the cloud
I thought that the only remaining problem was exporting the comments. Weren't all the posts transferred a couple weeks ago?
I'll add to what Seven Machos and Carol said. Start your new blog and link to this one for archives, then if and when they get the archives exportable, move them. No sense adding more to this blog in the meantime.
1. It's about 1.5 gigabytes.
Wow. You know, they make key chains now that hold ten times that.
2. My tech people wrote software to do it but couldn't get it all.
Hmm...maybe you should find some tech "people" who don't eat bananas all day and scratch their heads with their feet.
All our comments will be lost, like tears in rain.
Pat pat pat
You'll be fine, Professor.
Pat pat pat
"I thought that the only remaining problem was exporting the comments. Weren't all the posts transferred a couple weeks ago?"
I'm not 100% sure. I am talking with my tech people. I was noticing that some things in the posts didn't look right.
But hey -- I actually do go back and look for an old post from time to time ... rarely I might look at the comments for something, but not especially good at finding it (probably because the seach feature malfunctioned)?
But it does resonate with me that this whole thing of Althouse includes the comments and the commenters for the Professor.
When do we get our cut from the Amazon Affiliate account, mmm?
"I'm trapped in a Blogger blog and I can't get out."
Well, that's a lot better than "I've fallen and I can't get up" any day.
Congrats, then, professor. In your honor I'm gong to Amazon and buy some earrings.
"John Lynch said...
OK, so move the posts w/o comments.
This blog is called "Althouse," not "Althouse & Friends."
I appreciate the effort, but really no one reads old comments."
Can I chime in here as an opposing view? I, for one, go back and look at old posts, and occasionally link them elsewhere on other forums where a similar/related topic is being discussed. It's not frequent, but it's not rare either. And yes, it's specifically the comments I'm looking through when I do this.
Subtracting the interaction from the readers turns a blog into a series of billboards. Half the substance of nearly any blog, especially Althouse here, is in the discussions occuring in the comments section. I'd would've hated to have seen that go by the wayside. Please register my vote as one who's glad the Professor here is making such an effort to maintain the comments.
Are the old comments being exported at a faster rate than new ones are being posted? If Palin declares her candidacy, this process may never be completed!
wv scaryloq: That pretty much settles the question of whether wv is monitoring comments.
Did they get the Data Liberation Front involved?
Or were they busy making cute videos?
The comments have been very important to me, and I care very much about bringing them along. When I put up posts, I'm nearly always thinking of setting up a discussion in the comments, so even before they are written the comments are part of the post.
I know the commenters will (I hope!) come over and participate on the new blog, but the past participation matters too.
Also there are some exceedingly important old comments threads having to do with me and Meade.
ALL YOUR COMMENT ARE BELONG TO US.
Blogger
Just for comparison, the 1.7gb of XML our company has developed is the engine for hospitals to report on Medicare quality measures regulations within an EMR.
It's a lot of XML.
Couple of things (for a change, this relates to what I do for a living):
1) 1.8 GB of XML is absolutely freakin' huge. XML is just a markup language, so most files are easily a meg or less. It's like having a 1.8 GB Word document: even if you were able to create it, it'd be hell trying to open it because Word isn't built for that.
2) Thus, I doubt strongly it's the file size that's been the slowdown (so all the thumb drive comments are way off base.) It's the processing, and I doubt that Blogger's systems were ever designed to export something that large.
My guess, I think the delay has been timeout related. Most of the time, the systems are set up to think that if something isn't done within a certain amount of time, then something's broken, and the system needs to stop so that the user can figure out what to do. With this volume, I bet they had to revamp their systems to bypass those warnings (and/or move to better hardware for this one move.)
I bet the guys over at Google are looking at each other and wondering whether they should incorporate the fix into their next release or just treat this as a one-off and hope nobody else asks them to do it again.
Well, there is one other thing - how much do you pay Google for them to allow you to blog on their websites?
"Ann, it was a system outage following a maintenance event. You immediately assume it's about you. Really, Ann, it has nothing to do with you. These are complex systems and things can go wrong."
What in this post does that purport to refer to? I'm discussing the problem of oversized blogs, which is unrelated to that outage.
Talk about immediately assuming crap! Get up to speed.
It is true that:
1. "Blogger suddenly deleted my blog, without explanation and without any information about what I could do about it." Yes, there was an outage, but the blog deletion wasn't something about which any information was given.
2. I was disgustingly bullied by nitecruizr on the forum, and I remain angry about that.
3. Google employees reached out to help me and I needed that help.
4. I decided to get out of Blogger because I felt insecure.
5. My blog was too large to extract from Blogger, a flaw in Blogger that I had not been warned about.
6. I needed to contact the Google employees again for help, it was hard even for them to help, and in the end they did help.
Now, is there any sensible point you have to make that's related to that. Otherwise you're having flashbacks to old discussions and you need to wake up to reality.
"Please register my vote as one who's glad the Professor here is making such an effort to maintain the comments."
Mine too. I've written some absolutely brilliant shit here over the years, all for free and not even under my real name, so I'd hate to see all that brilliant shit get flushed.
Stuck inside of Mobile with the Memphis blues again...
How funny to invoke Alice-- the other night when sleepless I brought up the Althouse blog & read one of my favorite parts. It felt like asking a parent to read about when Ann screamed at the false doormen, "How dare they!"- with the Cheshire cat of David Foster Wallace looking on impishly.
Ann, you have a beautiful site!
If it's any consolation, I've been to Professor Jacobson's site. And, I HATE the WORD PRESS system he uses.
I've been to Glenn Reynolds site. And, I think his "pagama" stuff is grand. But he doesn't accept comments.
Maybe, for Google, last night, was a problem?
You know, a long time, ago, when I worked. We had land line phones. And, I'd hardly know there were problems. But I'd see the guy from AT&T in the closet. Which was full of wiring. And, he said it was what kept everything connected.
For us? It's like skin.
We don't particularly think of all the sysems our skin covers ...
But I'd give Google a break.
This page is so gorgeous! Brightly lit! Easy to comment. And, a joy to read.
Maybe, not letting go of Blogger is a very good thing?
In haste we make mistakes.
Okay, 1.8 GB, so 1800 MB, so 1,800,000 KB.
And 1 KB is (very roughly) one page of typewritten text.
What does blogger use as back up?
Isn't it like banks?
Somewhere there are not only digital records. But backup's, too.
I can understand that Blogger has a stake in keeping you.
What I find unbelievable, though, is that there's no way to purchase old records.
What replaced micro-fish?
Sure. Probably a mistake not to have some sort of backup system of your own? Wasn't that what we used to do with paper records?
You know, if Blogger comes through, and shows you a way to retrieve your old records ...
I hope you still stick with them. I haven't seen anything better than what this "page" provides.
Rev, I worked at a publishing house for almost 35 years. I did a brief stint with computers in the early 80's. You set up a program to achieve the results that you want. It's not like manual lifting.
Sure, and that's a fine approach when you're just trying to get a data dump from a non-enterprise system or from one that doesn't see a lot of use.
When you're talking about a system that handles hundreds of terrabytes of data and requires 24/7 uptime, solutions like "well shit, just start a query running against the database and come back when its done" don't cut the mustard. For starters no sane DBA is going to let you try it.
It takes time for whatever custom job is being run to get written, tested, approved and run. You can cut back on the time needed by taking the time and money to engineer a good export system, but my guess is that the folks at Blogger don't export enough massive blogs for that to be a good return on investment.
GOOD LORD that is a huge text file.
WOW, Althouse. I wonder if Guiness should be consulted. How many text based files of non repetitious commentary gets into the billions of characters?
Anyway, Althouse, your patience paid off. I wouldn't have waited like you did. Well done. See you on the other side.
Google is better than most companies about putting in effort to let you extract your data and take it elsewhere. This blog is just beyond what Blogger was engineered to deal with.
Nice to see they did get it extracted, and hopefully there aren't any hiccups with it.
I just appreciate the oh-so-pleasurable surprise of the charming illustration. Remember the old days, when we read books? What a treasure the two or three included woodcut-style illustrations were? We chafed and groused that there were so few. We glanced at them often, lovingly examining every detail and finding the hidden meaning and the doubly-hidden gestalt that both indexed and unlocked our emotions regarding the story.
Today our words and images are zephyrs in the exhaust of the intertubes, an ungrounded gaggle of chattering primates forever talking while always forgetting. Thanks for reminding us Miss Ann to hold on tight to some things that are past.
I was reminded of the old BBC show Yes Minsiter by the statement: "Two weeks ago, they told me that they had an "engineer" working "actively"on extracting my archive."
The Private Secretary explained to the Minisiter of Administrative Affairs that there are two types of replies to correspondence to the Minister.
The Minister, Jim Hacker: What's the difference?
Private Secretary, Bernard Woolley: Well, "under consideration" means "we've lost the file"; "under active consideration" means "we're trying to find it".
I appreciate the effort, but really no one reads old comments.
I also occasionally go back and read old comments. Some threads are historic! Let's Take a Closer Look at Those Breasts, for example.
I wouldn't think either the size or the comments would be a problem. Couple of gig should be nothing, especially in google-land. And half of that XML is just a bunch of close-tags, anyway. Get yourself a JSON.
Consider, though, whether you want to redirect hand-coded links to your own posts to your new site using whatever URL structure WP offers. Powerline just migrated MT to WP, and I notice a few broken links there: http://bit.ly/mdINbN
Agree Blogger's search function barely ever worked, as far as I can tell. I always had much better luck using plain google narrowed by "site:foo.com"
@Charlie Martin
Okay, 1.8 GB, so 1800 MB, so 1,800,000 KB.
Ha! Your comment reminded me of this.
Grab what you can and move, Professor.
I do not think you are all that welcome at blogspot any more. Strange things keep happening when I am on this blog now. I don't think it is very stable.
<>
Except it's XML, which is piggy. I recently looked at a tweet, and it's hyoooooge. Comments can be longer, so in some cases the data will be bigger than the meta-data but they'll be bloated.
Hello. I am on the Blogger team and am one of the guys who has been helping Ann with her blog (both during the outage in May and recently with the export). I couldn't help but chime in when I noticed this post :-)
First I wanted to say that on behalf of the entire Blogger team, we're very proud to have Ann's blog on our platform and many of us are regular readers. Of course we wish she would stick around with us, but if Ann feels like it's time to find a new home elsewhere, we are committed to making sure that users have control over their data as well as tools for making the move off Blogger possible. It is the reason we have spent a non-trivial amount of time helping her with the export file, even though it may end up on another service. To be clear, the entire 1.8G file is now in her hands.
While our export tools may have been somewhat unreliable when handling blogs this large (Althouse is one of the largest Blogger blogs!), along the way helping Ann we discovered ways to improve them and moving forward Blogger will be much better equipped to handle cases like this.
So Ann while I'm personally sad to see you go (if that is indeed the decision), I wanted to let you know that you will always have a home on Blogger and a team who cares about your experience with Blogger. That also (of course) goes for everyone. We love hearing from users, and anyone can bug me directly on Twitter (@electrobutter) if something is on their mind, or hit up the team via @blogger.
Cheers,
-brett
@electrobutter
Stellar graphic; a picture truly does tell a thousand stories.
Cheers!
Thanks, Brett. I put up a new post acknowledging your kind comment.
Post a Comment