Max Mednik
  • Home
  • About
  • Interests
    • Angel investing
    • Magic
    • Scuba Diving
  • Blog
  • Contact

Readings and musings

Notes on Brett Durrett at LeanLA Talk on Continuous Deployment

1/9/2012

0 Comments

 
Continuous Deployment at Lean LA
View more presentations from Brett Durrett
Another awesome talk by the guys at LeanLA and IMVU!

Here's the blurb about the talk and the really knowledgeable speaker:

"Continuous Deployment takes continuous integration one-step further, where every commit goes live to production servers. When this process is described it is frequently met with skepticism around site reliability and the ability to scale a business this way, but it works, it scales (with challenges) and it is embraced by the entire organization. IMVU is a leader in Continuous Deployment, with over 5 years of experience scaling this process to support a technical staff of 50 and a business of more that $40 million in annual revenue. Brett G. Durrett, Vice President of Engineering & Operations for IMVU explains the basic mechanics of Continuous Deployment and discusses the value it creates for the entire company. Specific topics that will be covered: Attendees will understand that releasing to customers 20+ times per day is possible and that it does scale from individual developers to large companies. In addition, they will understand how they can make Continuous Deployment successful at their company, from both a technology and cultural standpoint.

Brett G. Durrett has over 20 years experience leading development of software and systems ranging from large-scale Internet services to video games. He serves as VP of Engineering at IMVU where he leads the engineering and technical operations teams and was responsible for the operations infrastructure that successfully scaled from two machines to over 700 servers. Prior to IMVU, Brett served as the Director of Engineering, VP of Operations and General Manager for the virtual world at There.com. Brett was also co-founder and CEO of Asylum Entertainment, a game development company."

You can watch the talk (in two parts) and see the slides above. I'm pretty much sold on what Brett preaches and am thinking of how to implement continuous deployment in my current projects. He says that having little code and process in place puts you at an advantage, though I'm still wondering how to put in the right infrastructure to have all the tests and deployment run as smoothly and automatically as they do (and how much to prioritize this process infrastructure work around other initial start-up goals).

My notes on the talk are below. Overall, I learned a lot and very much enjoyed hearing Brett speak.

Their process:
  • develop a feature increment
  • verify on buildbot
  • commit code to live in production immediately for some set of customers
  • whole process takes 15 min, release about 50 times per day
  • no staging cluster
  • no QA review
Why would you do something like that?
  • most companies develop, release, then pray for customers
  • now, smart companies develop, release, learn, iterate
  • minimize total time through build, measure, learn cycle
Why continuous deployment is good:
  1. release overhead reduces opportunity to iterate
  2. way easier to find regressions/bugs in small batches of commits
  3. fast response times for business opportunities
  4. more turns at bat
  5. book: Principles of Product Development Flow (reducing batch size, lean product development); reducing batch size reduces cycle time, reduces variability in flow, accelerates  feedback, reduces risk, reduces overhead; large batches reduce efficiency, inherently lower motivation and energy, cause  exponential cost and schedule growth, lead to even larger batches; the entire batch is limited by its worst component

Work process:
  1. local tests pass, engineer commits code
  2. lots and lots of tests run
  3. all tests pass [if no, revert commits]
  4. code deployed to % of servers
  5. metrics good [if no, rollback]
  6. code deployed to all servers
  7. metrics still good [if no, rollback]
  8. win

amount of time you need to run test depends on volume of people going through funnel

all work done on trunk (no work on branches)
  • avoids merge conflicts
  • all code gets validated in production immediately to test now
  • at bottom sees actual PHP test files and their status (time to complete, running status, etc.)
  • a tag includes multiple PHP test files
  • tests run before checkin on local sandbox
  • push for being test-driven but let people work how they want to work
  • each person responsible for writing tests for their own code
  • local sandbox test suite running through a web browser
  • checkboxes: stop after last test, pause after failure, run tests in random order, only run selected tests
  • want to make testing as unburdensome as possible
great slide in presentation with sample output of "RunTests" test view which allows filtering tags, turning test on/off, seeing tests that pass, fail, run, skip, wait, etc.


use selenium

continuous integration: they use buildbot, others use hudson, jenkins, atlassian 

bamboo


build servers
  • good screenshot in slides of buildbot view
  • each box represents a server
  • split all the tests up between multiple servers that takes an 8 hour build to be an 8 minute build
  • each server running many tests; they have 40K tests running through test suite
  • having good tests allows new people to start working and new experiments to happen quickly
  • unit tests of code
  • user workflow tests of site UI
  • if code fails in build server, email goes out and immediately the engineer's supposed to revert the code so others can continue to use build server
  • saves and emails output of the test failure

Deployment:
  • code rolled out to cluster
  • a bunch of perl and rsync code
  • symlinks on site
  • keep multiple copies of code
  • process of rolling forward and backward is just changing symlink
hard part: cluster immune system
  • monitors metrics
  • system performance (web services, disk space, DNS, cron, API availability)
  • business performance (various critical actions/functions, graphs, revenue, registrations)
  • use nagios for system and business metrics
  • if metrics bad, do rollback on cluster (changes symlinks back to previous release, blocks further commits, sends email)
  • server push status web page to diagnose rollback and which metrics killed the push
  • one unfortunate thing in the system is false positives due to real variability in business
  • once metrics good, goes out to entire cluster
  • most wait periods: a couple minutes
  • something it's not very good at: catching very small changes that hurt
deployments of deployment system:
  • was manual for a while, hacked together
  • only recently got good test coverage of deployment system (some not even in repository)
  • don't change deployment system that often
aesthetic tests? they don't

everyone emails changes to the change list (basically everyone in company) with before and after state and people can catch problems

they have one monolithic code base

don't have anything that ensures they have test code coverage automatically


Getting Started (story):
  • there were no customers
  • he came in for operational role
  • engineers wrote code and SSH'd in to cluster
  • no auditing, no monitoring
  • would see PHP syntax errors on homepage
  • only 30 customers at that time so didn't matter
  • set the culture of getting stuff out there
  • wrote a nagios check for "are we rendering HTML out to the customer?"
  • if you're writing new code, it should have some coverage (functional easiest at first)
  • commit to making forward progress
new product advice:
  • start w/ sandbox
  • just push
  • ideal time for failures
established product:
  • start w/ production
  • automate deploys. first automate the push. then automate QA.
  • build confidence
new code must have test coverage.

if new code breaks something old, must write test to catch that

expect some hurdles:
  • you will have cluster outages
  • you will spend engineering time on deployment system
  • have culture where failures are looked at as opportunities
  • how do we get excited about never letting this happen again
  • if have blame-searching culture, will have more challenges
scaling:
  • buildbot would go red, and everyone would be blocked
  • when build time 20-30 minutes, bad news
  • problem with intermittent tests
solutions:
  • build isolation [but not solution; didn't need to build this because could get away with faster test runs, buying hardware and virtualization, sorting tests by speed, dependency injection by instead of calling on real DB, just getting data that would be returned, and also built a hypothesis builder, which is like build isolation where you tag code to run on hypothesis builder that does not run on main buildbot and doesn't block anyone if it fails]
  • added a test metrics system that keeps track of success rate and speed (a lot of builds were blocked on slow tests)
  • got build times down to 8 minutes
  • when builds were over 25 minutes, it was a huge cultural issue
flaky tests / intermittent tests have huge costs:
  • disable or ignore the test
  • third-party providers
  • running tests around time and time spans is much more challenging than normal tests (DST, leap years, etc.)
  • state dependency across tests (overnight, keep running tests in random orders until they become red, and then in morning you see which tests are intermittent and can investigate)
  • they run about 40K tests now
  • even with 5 9's of reliability, you get many failures
  • move them from having to fix them when they happen to fixing them on a schedule
  • if buildbot gets a test that runs green once and then red another time, it will mark it as intermittent, start an issue in bug tracker, and allow the build to go through
trickier bits:
  • catching issues that fail slow (SQL selects from growing tables)
  • critical areas cause hard lock-ups (MySQL, memcached)
  • lack of test coverage of older code: not an issue if you start with test coverage
  • outsourcing (different hours, culture, branching, slower integration)
changing schema requires sign off from tech lead (checking indexes, scalability of changes)

added query killer (issues kill statements on long queries; better to have code die than DB to be overloaded and take down everybody)

schema changes on large tables (they use mysql):
  • create a new table
  • do copy on read
  • have background process later migrate the rest of the data
memcache changes require second set of eyes (hard to test on local sandbox)

hard to work with outsourcers who build over several days (impossible to integrate)

build system itself is critical business function; keep metrics on build system (web dashboard of build process)

integration with A/B testing inside the code (nice slide with pseudocode)
  • name the experiment
  • specify initial rollout % or amount of users
  • specify customer branches with percentage weightings of what % should see enhanced versus non-enhanced (e.g., 50% A/B split)
  • helper function that returns which branch a certain customer should see (enhanced or not) and if not yet assigned then to permanently assign [so customer always gets same experience]
  • simple if statement that splits between if user should see test feature or not
  • web page listing all experiments (available to everyone in company)
  • to user % (QA and admin only, 0%, 10%, etc.)
  • closed on status (they have a page that lists experiments that were closed but the code still exists; this allows easy housekeeping to remove unused code after a while)
per-experiment dashboard to see user groups (male, female, etc.), #s, results (highlighted by desired/undesired colors) and p-values

sprints:
  • planned sprint schedule usually not met (outstanding issues, incomplete features, tech review, refactoring)
  • when releases happen every 15 minutes, "planned sprint ends" can be arbitrary
  • changed to just say that the sprint ends when the work is done (but still understand overage reasons)
IMVU culture:
  • first day on job, engineer pushes out to live customers immediately
  • makes people feel empowered
  • hack-week: you can build anything and company provides food and drink
  • if you're convinced something's important for customers, just build it and allowed to release to 1% of customers without approval
0 Comments

Notes on The Finkler Question

1/7/2012

0 Comments

 
Picture
A friend of mine recommended to me The Finkler Question by Howard Jacobson, which recently won the Man Booker Prize. The book explores the question of religious (Jewish) identity in modern times through one man's daily life experiences. The book features both Jews and non-Jews, and the different levels of religious observance (or anti-observance, including self-hatred) tell the story of how different people viewed religious identity differently. I found it remarkable how various non-Jews in the book, like the main character, did more to be Jewish (and wanted to become Jewish), while the Jews behaved in the opposite manner. The book raised many questions, like the meaning of religion and its differences from culture and family (style and tradition).

Overall, the book started very slowly and was quite a long read. It takes place in England, and the audio version featured a reader with an English accent. While this was "authentic," it was painfully difficult to understand (at double or triple speed like I like to listen to audio books); it took me about half the book to get up to triple speed with good comprehension. (Audio books should be offered with multiple speakers to choose from!)

I found the book mostly depressing and sad (this was also the main character's recurring personality), with many themes related to mourning and death and little in the way of humor or comedy. I guess it's not my preferred genre, but after making it through to the end, I do realize why the book won its prize, and the central questions of religious identity and cultural tolerance the book raises are important for everyone to consider. I did enjoy the actual language and literary style as there were many plays on words and cool language tricks that I appreciated.

My notes on the book are below; I'm sure I must have messed up some chapter numbering (and name spelling) at some point, but I hopefully captured the main elements of the plot and my most important takeaways.

Part 1

Ch. 1
  • Treslove: Journalist at BBC, non-Jew
  • Student, writer from Oxford
  • Sam Finkler: Jew
  • Stereotype
  • Role of Israel
  • Philosophy
  • Death of wives
  • Grief
  • Bereavement
  • Loneliness
  • Trouble finding and keeping love
  • Robbed and called a Jew by a woman who mugged him 
  • Libor: teacher, wife Malki beautiful and passed away
  • Finkler: Jewish, dad pharmacist with stomach pill, wife passed away
  • Tresolve: BBC journalist, works as a party lookalike, trouble with women
Ch. 2
  • Role of guilt
  • Widower bonding
  • Mistaken identity
  • Real Jew not faithful
  • Non-Jew confused with Jew
  • Had 2 sons with different women who left him
  • Rodolpho and Alfredo
Ch. 3
  • Woman who liked him mistook him for Jew
  • 2 mis-identifications as Jew
  • Made love to Finkler's wife (Finkler cheated on his wife too)
  • Finkler's wife Tyler converted to Judaism
  • Finkler preferred shiksas
  • Lots of global antisemitic attacks
  • Thinks others think he's Jewish
Ch. 4
  • Two girlfriends that had his sons but couldn't be with him
  • Could he be with a living woman?
  • Jews aren't the only broken-hearted people.
Ch. 5
  • Ashamed Jews
  • Antizionist Jews
  • "Jewess" word
  • Cut or uncut: preference by women
  • Seder
  • Met woman from fortune teller: Juno
Part 2

Ch. 6
  • Finkler didn't like Gaza reaction, platform against it
  • Boycott from universities
  • Tresolve took vacation with sons
  • Sons asked if he's Jewish
  • Just because parents Jewish doesn't mean children necessarily
  • Can you be part-Jewish?
  • Antisemitic attacks in London
Ch. 7
  • Fell in love with Jewish woman Hepsiba
  • Studied Yiddish dictionary to woo woman
  • Gave up working as a double at her request so he could be himself
  • Became assistant curator for Anglo-Jewish museum
Ch. 8
  • Can you be defined by what you're not?
Ch. 9
  • Antisemitic attack at university against Finkler's son Emanuel
  • Sister Blaise
  • Emanuel accused Jews of stealing a country (followed what Finkler said)
  • Emanuel (who is Jew) did an antisemitic attack
  • Treslove learned Hebrew and Jewish history
  • Circumcision to limit lust
  • Book: Moses Maimonides
  • Commentary on commentary
  • Hep is true Jewish mother, large body
  • Bacon smeared on museum doors
Ch. 10
  • Want to think ill of Jews in their own way
  • Blogger who tries to restore circumcised foreskin
  • Meetings of ashamed Jews
  • Treslove wants to be Jewish to feel more gloom
Ch. 11
  • Trouble with women, relating to people
  • Face-painting incident
  • Talking feverishly about hating being Jewish is being Jewish
  • Finkler: online poker
Ch. 12
  • Libor committed suicide by jumping off ledge
  • Burdened Libor with info on Treslove's affair
  • Libor funeral at Jewish cemetery
Ch. 13
  • Treslove hits Arab demonstrator at museum opening and falls on ground
  • Stands up for something as a Jew 
Epilogue
  • Hep said Kaddish for Libor
  • Cried for Julian
  • Finkler doesn't give up saying Kaddish for wife after 30 days
  • Last line: "There are no limits to Finkler's mourning." (True Jew in end?)
0 Comments

Notes on Noah Kagan at LeanLA

1/5/2012

3 Comments

 
Watch live streaming video from leanla at livestream.com
Noah Kagan is the founder of AppSumo and has worked on marketing for 4-Hour Workweek, Facebook, and Mint. Noah’s blog, OkDork.com, focuses on the topics of marketing, entrepreneurship, and engagement with online communities.

You can watch the video above, and the main things I learned are below. He's a funny, straightforward, and brutally honest speaker, and it was cool to hear about many of the specific tactics he used to get AppSumo off the ground in a lean fashion. He even included a couple deep life lessons in here as a bonus.
  • Started AppSumo ("Groupon for software") with just a landing page
  • Only had registration system at first
  • Paid an outsourced developer in Middle East $50 for PayPal payment system
  • Needed a deal, so sent an email to head of imgur (main site for images for Reddit)
  • "The most valuable resource is your time."
  • Just use email to solve your problems.
  • Learn how to do things with just email lists.
  • Took out guy from Reddit for breakfast and got free exposure on their site
  • Do something nice and unique for someone.
  • Sent running shoes, running magazine to someone who runs
  • When someone sees something you give them everyday, they remember you and will listen to you.
  • Send cookies to people
  • Initially had ugly designs, just trying to validate as quickly as possible
  • He emailed every single person manually their discount code by hand.
  • After the business was validated, they started building the back end and then getting deals.
  • "There's no way to optimize shit; it's still shit."
  • Before you get 1000 unique's per day, you can't AB test.
  • Focused on emails initially to get users
  • "There's only 1 metric and 1 goal of your business."
  • At Facebook, the only metric that mattered was growth.
  • Only 1 metric at AppSumo is "# of emails"
  • They have a daily goal and a monthly goal.
  • This month's goal: 550,000
  • Each day have a target of # of emails they need to hit
  • Used Google Website Optimizer
  • Hired an engineer whose sole job was AB testing
  • Their view: profit and revenue today is short sighted
  • Just focused on growing emails for later
  • If they asked for email up front before showing deal, people were more likely to buy deal than if didn't ask for email up front.
  • 5% difference in conversion at top of funnel makes huge change.
  • Spent $6K for 4 iterations just on landing page
  • Were bringing 3000 to site
  • Biggest spammer in America: Facebook (recently changed policy)
  • No one talks about them as big spammer
  • People complained about Facebook but it increased retention and engagement.
  • Now that Facebook's big, they turned off emails.
  • "You'll get some backlash from 1% but will grow the 99%."
  • When you travel, you remember just the abnormal stuff; no one remembers the normal stuff that happens everyday.
  • Created AppSumo Golden Ticket ($100 credit for no reason whatsoever)
  • Golden Ticket just emailed by a customer service girl daily
  • If you're average, customers will never remember.
  • Think Zappos customer service.
  • Unsubscribe email sends sad photo that's just something different, memorable.
  • At Facebook and AppSumo, they put Easter eggs everywhere, fun stuff people will remember.
  • They have 3 developers.
  • If they don't need to build something, they don't.
  • Instead of building a 404 page, they used a Google Doc.
  • Do minimal work and if result worth it, do it nicely later.
  • Did first educational video to actually teach how to use the tools they were selling.
  • Did it ghetto with minimal editing
  • "Your business should look like shit in the beginning."
  • Now have full time content and video people
3 Comments
<<Previous
Forward>>

    Archives

    November 2020
    October 2020
    September 2020
    August 2020
    July 2020
    April 2020
    January 2020
    December 2019
    November 2019
    October 2019
    September 2019
    August 2019
    July 2019
    May 2019
    March 2019
    January 2019
    December 2018
    November 2018
    October 2018
    September 2018
    August 2018
    July 2018
    June 2018
    May 2018
    April 2018
    February 2018
    January 2018
    November 2017
    October 2017
    September 2017
    May 2017
    April 2017
    November 2016
    October 2016
    September 2016
    August 2016
    July 2016
    June 2016
    May 2016
    December 2015
    November 2015
    October 2015
    September 2015
    August 2015
    July 2015
    June 2015
    May 2015
    April 2015
    March 2015
    February 2015
    January 2015
    December 2014
    November 2014
    October 2014
    September 2014
    August 2014
    July 2014
    June 2014
    May 2014
    April 2014
    March 2014
    February 2014
    January 2014
    December 2013
    November 2013
    October 2013
    September 2013
    August 2013
    July 2013
    June 2013
    May 2013
    April 2013
    March 2013
    February 2013
    January 2013
    December 2012
    November 2012
    October 2012
    September 2012
    August 2012
    July 2012
    June 2012
    May 2012
    April 2012
    March 2012
    February 2012
    January 2012
    December 2011
    November 2011
    October 2011
    September 2011
    August 2011
    July 2011
    June 2011
    May 2011
    April 2011
    March 2011
    February 2011
    January 2011
    December 2010
    November 2010
    October 2010
    September 2010
    August 2010
    July 2010
    June 2010
    May 2010
    April 2010
    March 2010
    February 2010

    Categories

    All
    Angel Investing
    Cacti
    Cars
    China
    Community Service
    Culture
    Design
    Djing
    Dogs
    Education
    Entertainment
    Entrepreneurship
    Family
    Finance
    Food
    Google
    Happiness
    Incentives
    Investment Banking
    Judaism
    Law
    Lighting
    Magic
    Marketing
    Medicine
    Networking
    Nolabound
    Philosophy
    Professionalism
    Psychology
    Reading
    Real Estate
    Religion
    Romance
    Sales
    Science
    Shangri-La
    Social Entrepreneurship
    Social Media
    Sports
    Teams
    Technology
    Travel
    Turtles
    Ucla
    Venture Capital
    Web Services
    Weddings
    Zen

    Subscribe

    RSS Feed

Picture
Picture
  • Home
  • About
  • Interests
    • Angel investing
    • Magic
    • Scuba Diving
  • Blog
  • Contact