Picture
For the last 2 months, my mind has been blown on a daily basis by the book Antifragile: Things That Gain From Disorder by Nassim Nicholas Taleb.

This is the longest and deepest I've gotten into a book since high school, and I found pretty much every chapter thought-provoking and lifestyle-questioning. I already felt like my mind had been blown, and that was just after finishing the prologue.

"Antifragile" is the word that Taleb coins for the concept of gaining from disorder (the real opposite of fragility, which is not the same thing as "robust"). The book covers the topics of philosophy, finance, math, statistics, lifestyle, food, fitness, education, and history, and it applies various strategies and concepts to finding ways to live more naturally and with more antifragility.

I can see how many people will be angered and offended by the direct manner in which Taleb denounces the professions of consultant, banker, economist, academic, business school professor, soccer mom, and tourist. I think books that question a lot of fundamentals are the only ones that bring actual progress to our lives as human thinkers, and this book does exactly that.

Overall, I took 47 pages of notes on the book (see below), and that sheer quantity is enough to show how much I liked it. It's not easy to distill these into a few bullet points, and I will be trying over the next couple months to come up with some concrete suggestions and techniques to put the book's ideas into practice in my own life. Here are just a handful of lessons and broad concepts that come immediately to mind:

  • There are important nonlinearities in life that many professionals and advice-givers totally ignore but which make a much bigger difference over time than the first-order obvious effects.
  • Many human interventions in health and government come with really bad iatrogenic effects.
  • Things that are in nature are right until proven wrong; things that are human-made are wrong until proven right (which only time can show).
  • Don't be a turkey, and avoid sucker problems. That's 95% of being successful.
  • Via negativa: Focus on what to avoid and remove instead of what to do and add.
  • Find ways to make your life antifragile in the sense of having limited, small downside and high potential upside.
  • Be an adventurous flaneur. Live life to take advantage of new, unforeseen opportunities and volatility.
  • Entrepreneurs are the unsung heroes of antifragility and deserve way more respect than politicians and other non-practitioners and non-risk takers.
  • Innovation is antifragile.
  • Use barbell methods to manage investments and black swan risks. Focus on your exposure (f(x)) instead of trying to predict some variable (x). Predicting or following averages is for suckers.
  • Study the classics, eat and drink the classics, and avoid the media hype or technology for its own sake.

Below are the rest of my notes. I really want to discuss some of this stuff with other readers, so let me know what you think.

 
 
Picture
Last week, I attended a really cool event at the intersection of design and augmented reality (AR): the VOX Summit. We had some great speakers (including one of my favorite authors, Daniel Suarez), and we broke up into smaller groups to prototype how AR can be used effectively in fields like robotics, education, and storytelling. 

I learned a lot about the state of the art in the field and got to play with some AR apps (and learned how to drop my own AR objects into the "world"). I really enjoyed Daniel Suarez's and Blaise Aguera y Arcas's talks, and their messages about the pros and cons of the world we're in the process of creating resonated with me.

Below are some of my notes and takeaways.

a. design thinking

ii. AR as new medium
2. “We should build good ships here; at a profit if we can, at a loss if we must, but always good ships.”
3. always good experiences

i. augmentedreality.org

IV. Daniel Suarez
a. Shamanism/Daemon
b. Present-adjacent (not too far)
c. He’s optimistic about future
d. Remote-controlled drones almost obsolete because can jam communication
e. Pushes decision-making to drone itself
f. False consensus in social media; scripts creating fake trends
g. Perception is reality
h. Network needs to be more reliable for DarkNet
i. Things need to default to being encrypted to freely exchange ideas
j. People voting several times a day for local issues
k. Afraid of weapons systems that have credit card and can direct itself (Razorback)
l. New social system as a lifeboat in case old system dies
m. Hole-in: self-contained system but which is informed by other parts and can evolve
n. IP becomes something more fluid; the plan is original idea you own and others can build on top and you get some percent over time as it evolves
o. AR beginning of something very big

b. AR and entertainment
c. Augmented Hollywood
d. Hollywood as how ppl should tell stories
e. AR needs to fulfill a fantasy/wish
f. Tie in to wish fulfillment of the film
h. Remove barriers
i. Augmented movies
i. Terminator
ii. Minority report
iii. Iron man
j. Augmented theater
i. Conspiracy for Good
ii. Storylines
iii. Augmented reality cinema app
k. The Witness: the first movie in the outernet
l. Augmented marketing
m. Citadel movie
i. View clips via posters
n. Iron Man face AR experience/game
i. Wish fulfillment of becoming hero
ii. Social element of sharing your pic w/ friends
o. Augmented books
i. Ice Age book trigger cards
ii. Wonderbook book of spells
1. Magic mirror effect on TV
iii. Art of Journey book to visualize art in 3D

a. Blaise Aguera y Arcas
ii. Online services, Microsoft
iii. Training in physics and applied math
1. Never studied CS or design
iv. Seadragon
1. Founded company
2. Multiresolution images
v. Photosynth
2. Acquired by MSFT in 2007
3. Started as grad student project at UW
vi. Bing mobile
vii. Next
viii. Interaction design
1. Art deco
2. Helvetica
ix. Design of systems based on math

x. Portfolio

1. Gasbar
a. Progress bar
b. Not 1 degree of freedom (% done) but 3 (% done, uncertainty, how much activity going on now)
c. Using particles representing activity
d. Modeling fluid gas transition

2. Streetside
a. Navigation in street view
b. Rotation is easy (photos on cube/skybox)
c. But translation is hard because model environment
d. Sense of parallax weak; just need actual good imagery, not complex space
e. Simplified the geometry the image is projected on
f. Potemkin village geometry
i. Handful of polygons for entire scene

3. On{x}
a. Israeli project
b. Ifttt for mobile
c. Recipes
d. JS APIs
e. People hated that it required facebook login
f. Very closely connected to Node.JS
i. 1 language for front and back
ii. everything is reactive
iii. callbacks on events
iv. nothing ever blocks, all async
g. android
i. ease of creating app and using internal signals makes it easy for anyone to violate privacy in app
ii. phone is not a phone; it’s a computer

4. Incoming call
a. Daemon.js
b. Looking up unknown numbers on web on incming calls
5. Thinking about design mathematically
6. Interactions define behavior of app and shape of our relationship w/ real world
7. Lineage of phones inheriting from PCs not phones
a. Still desktop w/ icons
b. How far can AR go if stands in silos of apps that require its own app to be downloaded
i. Pull vs. push
ii. How to push AR info to ppl w/o app
c. Single AR key broker vs. open vs. secure
i. Need cheaper signals to do more expensive operations like vision recognition
d. CV is advancing fast, but app model is broken
e. Wii -> Kinect transition to naturalness
f. Problem of the vanishing physical
i. Kindle books getting removed from your shelf on copyright violations
ii. Book metaphor vs. online service metaphor
iii. What of what we produce is real vs not
iv. 15th century books still look great
v. is our current product ephemeral?
vi. Embodied objects vs. not

VII. Helen Papagiannis talk @arstories
a. Augmented reality as a new medium for storytelling
b. Always build a good ship
c. Moonshot
d. Reveal
e. Delight
f. Engage
g. Make
h. Reimagine
i. Use tools to change the rules
j. New planet
k. Dreamer
l. Emotional journey
m. Iterative
n. Retention
o. Engrossing
p. Curiosity
q. Wonderment
s. Make mistakes faster
t. @MarsCuriosity: Roads? Where I’m going, I don’t need roads.
u. The Future Belongs to the Curious video
v. Ask questions

 
 
I was so excited the day that Kill Decision by Daniel Suarez came out that I immediately dropped what I was reading and picked up a copy. I absolutely loved his first and second books. His writing is realistic (for techies), interesting, and explores very timely topics.

This latest book focused on autonomous military drones -- self-piloting and self-organizing planes used for reconnaissance and warfare. The title refers to the ability of certain drones (present or future) to make the decision to kill (as opposed to just gather data/imagery).

The text explored lots of cool topics like computer vision (at Stanford!), laser-guided weapons, airplanes, helicopters, and swarm intelligence. I enjoyed the discussion of weaver ant research as well as the historical role of ravens living in symbiosis with humans. It also discussed modern social media manipulation for political (i.e., corporate) purposes.

Drones are already a reality (the book specifically had scenes of military workers sitting in the US playing a "video game" (à la Ender's Game) but controlling real robots/planes halfway around the world. What happens when they will be self-organizing with kill decision authority? The book takes the view that this is a question of "when" and not "if," especially with parts and programming becoming off-the-shelf and lower cost. Will our laws and societal structures keep up/be ahead of this or behind? The book's heroes fight to stop a world with a "new age of warfare" with fully autonomous, kill-decision enabled drones built by "evil" people that eventually could destroy the world itself.

Really fun read!
 
 
Picture
I attended a fun talk at UCLA recently by Vint Cert, who is Google's Chief Internet Evangelist, and considered by many as one of the fathers of the Internet. Through humor, interesting stories, and wise perspective, he shared his thoughts on the history and the future of the Internet. You can watch the full video at the CNSI website.

Cerf actually studied at both Stanford and UCLA and was a co-designer of TCP/IP, a foundational protocol for the Internet. Of his many honors, he received the Alan Turing award for his influential work.

Below are my notes on the talk.

  • Al Gore wrote the bill that funded the backbone and created an organization to connect all government funds for networking and IT research.
  • Key was convincing private sector of business use case.

  • First 4 nodes of ARPANET: UCLA, UCSB, SRI, Utah
  • Military needed satellite and radio nets to interconnect with wired

  • Internet completely voluntary
  • No one required to connect or use TCP/IP
  • Completely distributed
  • Not designed for anything specific
  • Huge flexibility
  • Not over-designed

  • 888 million machines
  • 5.5B mobiles
  • 1.2B PCs

  • Asia and Europe have more Internet users than North America
  • Asia 1B
  • Eur 500M
  • North America 273M
  • Latin America 235M

  • IPv6 critical

  • New generic and internationalized domain names (non-ascii Latin script like Arabic, Cyrillic)

  • Origins of security weaknesses
  • Weak OS
  • Naive browsers with too much privilege
  • Poor access control practices
  • Improper configuration of hosts and clients
  • Compromised clients and servers
  • Hackers, organized crime, state-sponsored cyber warfare

  • Security problems of the Internet not all cryptographic

  • Security responses
  • Improved OS and browsers
  • Software defenses reinforced with hardware
  • BIOS signature
  • Internal and external firewalls
  • Stronger auth inside net
  • DNSSEC and RPKI
  • User training
  • StopBadware

  • Trends
  • All media digital
  • Increased collaboration in all contexts
  • Increased info sharing
  • Online digital publication
  • Bit rot hazard (keeping around binary files without the apps/versions to read them)
  • Need for revision of intellectual property concepts

  • Google Translate
  • Google Goggles
  • Google self-driving cars: no accidents in 200K miles; not autononous, use Internet and feeding back to Google and other cars what seen; allow other cars to use data/experience gained from each car
  • Medical diagnosing: pulling context back in from other experiences
  • Refrigerator on Internet: uses RFID to know what's inside, fetches you recipes, tells you in grocery store not to forget milk
  • internet bathroom (connected to Internet fridge to caution/lock you out on diet)
  • Internet-enabled light bulb, LED: monitors usage
  • Internet-enabled surfboard: surf while surfing
  • Internet sensor network in house: sampling room temperature, data on how well A/C works; Arch Rock PhyNet Server to monitor wine cellar example

  • Medical apps
  • Continuous monitoring now enabled by tech
  • Temp, blood pressure, pulse can be feasibly measured all the time
  • Remote diagnosis using handheld devices; can project medical diagnosis through the net
  • Less skilled techs/nurses can do test remotely while analysis done centrally
  • Google pandemic analysis through query analysis
  • Google can detect flu outbreak 30 days before CDC gets doctor reports
  • Virtual drug trials enabled
  • If we had adequate med records, could select from population automatically and simply analyze the data
  • Da Vinci robot helping surgeon do surgery
  • E-911: dialing a phone when you need help is archaic; sensors around you should detect you have a problem (or you just push panic button on your phone); call is automatic; having much more info for emergency call
  • Individualized treatment; genetics
  • Craig Venter, Harvard films of cell function; have changed cell DNA from one species to another successfully

 
 
Picture
As part of a class I took on biotech, we were assigned to read Science Business by Gary Pisano. I learned a lot from the class and this book, and it really answered a question I've had for a long time: Why does science/medicine move so much more slowly than technology in general?

I learned that the biotech industry as a whole has been barely profitable since its inception, and that there is a severe productivity crisis (productivity as defined by cost per successful drug has been dropping over time, which is very different from something like computer processors which have been dropping in price over time). There is a big "valley of death" between discovery of a compound or process and commercialization. It takes 10 years and $1 billion to get a drug to market, and 1 in 5,000 drugs makes it. WHOA.

People are always optimistic about biotech revolutionizing health, and it hasn't lived up to this potential yet. The book explains many reasons for this and suggests some different approaches and solutions, none of which seems easy or straightforward.

My full notes are below. I'm curious to see how the industry evolves in the future, as many lives could be saved and improved if things change drastically.

I. Preface: The rise of a new industry and a big question
a. Big hopes but disappointing financial returns over time
b. Biotech firms not more productive in R&D than big pharma
c. Fundamental business problems created by science
d. Functional requirements of sector; performance comes from how well it’s managing these (poorly)
i. Risk management
ii. Integration
iii. Learning
e. Monetizing IP leads to bad info flow, fragmentation, proliferation of new firms
f. Biotech can’t just adopt same methods as high-tech
g. Can sci be a biz?
h. Some businesses doing basic sci; some universities treating sci like biz (selling IP, starting co’s)
i. 30 year history of biotech sector data analyzed

II. Ch. 1: the science-based business: a novel experiment
a. Biotech is convergence of 2 separate realms
b. Science biz one that tries to advance sci, not just use it
c. Sci biz needs unique mgmt
d. Sector profits near zero historically
e. Different norms, values, metrics between sci and biz
f. 3 main factors
i. Profound and persistent uncertainty => needs risk rewarding and mgmt
1. Long time horizons for risk to be resolved
2. Appropriability: ability of biz to capture value from an asset
3. Openness vs. secrecy
ii. Complex and heterogenous nature of scientific knowledge => needs integration
1. Cross disciplinary
iii. Rapid progress => cumulative learning

Part 1: The Science of the business

I. Ch. 2: mapping the scientific landscape
a. Locks and keys
b. Random screening
c. rDNA
d. mAb
e. combinatorial chem
f. SNPs
g. Proteomics
h. RNA interference
i. RDD
j. HTS
k. Growing size, complexity, heterogeneity

II. Ch. 3: the complex anatomy of drug R&D
a. Can save or kill you
b. So much still unknown
c. So many places where drug can work wrong
d. Target identification and validation: find enzyme
e. Lead identification and optimization: find molecule to inhibit it
f. Preclinical development: check safety and effectiveness before humans
g. Human clinical trials phases 1-3
h. Reg approval

III. Ch. 4: drug R&D and the organizational challenges
a. Not like processor design; very little knowledge about entire system and overall spec
b. Process very complex and can’t be broken into pieces: uncertainty and integrality
c. Most R&D on losers
d. Active ingredient and formulation both matter
e. New scientific advances increase uncertainty; show more what we don’t know
f. More choice means more uncertainty
g. More advances mean harder integreation

Part 2: The business of the science

I. Ch. 5: the anatomy of a science-based business
a. Many separate technologies
b. Cyclical entry
c. Genentech started industry
i. Close links to universities
ii. Biz model innovation: contract w/ big pharma for funding development of drug and royalties in exchange for manufacturing and marketing rights
1. First time pharma did R&D through external for-profit co
iii. Pursuit of broad range of opportunities/diseases
d. Second generation used more chemistry and focused on research, allowing pharma to commercialize
e. Third gen: human genomics, industrialized R&D, platform strategy
f. Market for know-how
i. More collab w/ biotechs than w/ univ

II. Ch. 6: the performance of the biotech industry: promise vs. reality
a. Long lag times
b. Zero industry profits
c. Huge skews for Amgen and Genentech
d. R&D productivity, revenue-adjusted

III. Ch. 7: monetizing IP
a. Txr of IP from univ -> private new firms
b. Capital markets (VC) and public equity
c. Market for know-how (small firms trade IP for funding from big firms)
d. Go public much earlier for funding
i. Only 20% of public co’s today have ANY product on market, so basically R&D entities ) (GAAP not as useful)
e. Univ research -> startup w/ VC -> IPO for more funding -> license to big co to bring to patient
f. 3 requirements for risk mgmt.
i. Many options for diversification
ii. Adequate info
iii. Abilty to reap reward
g. Market for know-how -> integration
i. But biotech less modular and codified than software
ii. IP protection murky

IV. Ch. 8: organizational strategies and business models
a. Few examples of success, high uncertainty, luck plays big role
b. Financing critical for industry and its main measure, but wrong measure because it’s input, not performance
c. Alliances/IP monetization are important but not endgame
d. Movie studio model for big pharma: produce ideas of independent writers

V. Ch. 9: The path ahead
a. Venture philanthropy
b. Rethinking the publicly held biotech firm (doesn’t match 10 year investment cycle)

 
 
Another awesome talk by the guys at LeanLA and IMVU!

Here's the blurb about the talk and the really knowledgeable speaker:

"Continuous Deployment takes continuous integration one-step further, where every commit goes live to production servers. When this process is described it is frequently met with skepticism around site reliability and the ability to scale a business this way, but it works, it scales (with challenges) and it is embraced by the entire organization. IMVU is a leader in Continuous Deployment, with over 5 years of experience scaling this process to support a technical staff of 50 and a business of more that $40 million in annual revenue. Brett G. Durrett, Vice President of Engineering & Operations for IMVU explains the basic mechanics of Continuous Deployment and discusses the value it creates for the entire company. Specific topics that will be covered: Attendees will understand that releasing to customers 20+ times per day is possible and that it does scale from individual developers to large companies. In addition, they will understand how they can make Continuous Deployment successful at their company, from both a technology and cultural standpoint.

Brett G. Durrett has over 20 years experience leading development of software and systems ranging from large-scale Internet services to video games. He serves as VP of Engineering at IMVU where he leads the engineering and technical operations teams and was responsible for the operations infrastructure that successfully scaled from two machines to over 700 servers. Prior to IMVU, Brett served as the Director of Engineering, VP of Operations and General Manager for the virtual world at There.com. Brett was also co-founder and CEO of Asylum Entertainment, a game development company."

You can watch the talk (in two parts) and see the slides above. I'm pretty much sold on what Brett preaches and am thinking of how to implement continuous deployment in my current projects. He says that having little code and process in place puts you at an advantage, though I'm still wondering how to put in the right infrastructure to have all the tests and deployment run as smoothly and automatically as they do (and how much to prioritize this process infrastructure work around other initial start-up goals).

My notes on the talk are below. Overall, I learned a lot and very much enjoyed hearing Brett speak.

Their process:
  • develop a feature increment
  • verify on buildbot
  • commit code to live in production immediately for some set of customers
  • whole process takes 15 min, release about 50 times per day
  • no staging cluster
  • no QA review
Why would you do something like that?
  • most companies develop, release, then pray for customers
  • now, smart companies develop, release, learn, iterate
  • minimize total time through build, measure, learn cycle
Why continuous deployment is good:
  1. release overhead reduces opportunity to iterate
  2. way easier to find regressions/bugs in small batches of commits
  3. fast response times for business opportunities
  4. more turns at bat
  5. book: Principles of Product Development Flow (reducing batch size, lean product development); reducing batch size reduces cycle time, reduces variability in flow, accelerates  feedback, reduces risk, reduces overhead; large batches reduce efficiency, inherently lower motivation and energy, cause  exponential cost and schedule growth, lead to even larger batches; the entire batch is limited by its worst component

Work process:
  1. local tests pass, engineer commits code
  2. lots and lots of tests run
  3. all tests pass [if no, revert commits]
  4. code deployed to % of servers
  5. metrics good [if no, rollback]
  6. code deployed to all servers
  7. metrics still good [if no, rollback]
  8. win

amount of time you need to run test depends on volume of people going through funnel

all work done on trunk (no work on branches)
  • avoids merge conflicts
  • all code gets validated in production immediately to test now
  • at bottom sees actual PHP test files and their status (time to complete, running status, etc.)
  • a tag includes multiple PHP test files
  • tests run before checkin on local sandbox
  • push for being test-driven but let people work how they want to work
  • each person responsible for writing tests for their own code
  • local sandbox test suite running through a web browser
  • checkboxes: stop after last test, pause after failure, run tests in random order, only run selected tests
  • want to make testing as unburdensome as possible
great slide in presentation with sample output of "RunTests" test view which allows filtering tags, turning test on/off, seeing tests that pass, fail, run, skip, wait, etc.


use selenium

continuous integration: they use buildbot, others use hudson, jenkins, atlassian 

bamboo


build servers
  • good screenshot in slides of buildbot view
  • each box represents a server
  • split all the tests up between multiple servers that takes an 8 hour build to be an 8 minute build
  • each server running many tests; they have 40K tests running through test suite
  • having good tests allows new people to start working and new experiments to happen quickly
  • unit tests of code
  • user workflow tests of site UI
  • if code fails in build server, email goes out and immediately the engineer's supposed to revert the code so others can continue to use build server
  • saves and emails output of the test failure

Deployment:
  • code rolled out to cluster
  • a bunch of perl and rsync code
  • symlinks on site
  • keep multiple copies of code
  • process of rolling forward and backward is just changing symlink
hard part: cluster immune system
  • monitors metrics
  • system performance (web services, disk space, DNS, cron, API availability)
  • business performance (various critical actions/functions, graphs, revenue, registrations)
  • use nagios for system and business metrics
  • if metrics bad, do rollback on cluster (changes symlinks back to previous release, blocks further commits, sends email)
  • server push status web page to diagnose rollback and which metrics killed the push
  • one unfortunate thing in the system is false positives due to real variability in business
  • once metrics good, goes out to entire cluster
  • most wait periods: a couple minutes
  • something it's not very good at: catching very small changes that hurt
deployments of deployment system:
  • was manual for a while, hacked together
  • only recently got good test coverage of deployment system (some not even in repository)
  • don't change deployment system that often
aesthetic tests? they don't

everyone emails changes to the change list (basically everyone in company) with before and after state and people can catch problems

they have one monolithic code base

don't have anything that ensures they have test code coverage automatically


Getting Started (story):
  • there were no customers
  • he came in for operational role
  • engineers wrote code and SSH'd in to cluster
  • no auditing, no monitoring
  • would see PHP syntax errors on homepage
  • only 30 customers at that time so didn't matter
  • set the culture of getting stuff out there
  • wrote a nagios check for "are we rendering HTML out to the customer?"
  • if you're writing new code, it should have some coverage (functional easiest at first)
  • commit to making forward progress
new product advice:
  • start w/ sandbox
  • just push
  • ideal time for failures
established product:
  • start w/ production
  • automate deploys. first automate the push. then automate QA.
  • build confidence
new code must have test coverage.

if new code breaks something old, must write test to catch that

expect some hurdles:
  • you will have cluster outages
  • you will spend engineering time on deployment system
  • have culture where failures are looked at as opportunities
  • how do we get excited about never letting this happen again
  • if have blame-searching culture, will have more challenges
scaling:
  • buildbot would go red, and everyone would be blocked
  • when build time 20-30 minutes, bad news
  • problem with intermittent tests
solutions:
  • build isolation [but not solution; didn't need to build this because could get away with faster test runs, buying hardware and virtualization, sorting tests by speed, dependency injection by instead of calling on real DB, just getting data that would be returned, and also built a hypothesis builder, which is like build isolation where you tag code to run on hypothesis builder that does not run on main buildbot and doesn't block anyone if it fails]
  • added a test metrics system that keeps track of success rate and speed (a lot of builds were blocked on slow tests)
  • got build times down to 8 minutes
  • when builds were over 25 minutes, it was a huge cultural issue
flaky tests / intermittent tests have huge costs:
  • disable or ignore the test
  • third-party providers
  • running tests around time and time spans is much more challenging than normal tests (DST, leap years, etc.)
  • state dependency across tests (overnight, keep running tests in random orders until they become red, and then in morning you see which tests are intermittent and can investigate)
  • they run about 40K tests now
  • even with 5 9's of reliability, you get many failures
  • move them from having to fix them when they happen to fixing them on a schedule
  • if buildbot gets a test that runs green once and then red another time, it will mark it as intermittent, start an issue in bug tracker, and allow the build to go through
trickier bits:
  • catching issues that fail slow (SQL selects from growing tables)
  • critical areas cause hard lock-ups (MySQL, memcached)
  • lack of test coverage of older code: not an issue if you start with test coverage
  • outsourcing (different hours, culture, branching, slower integration)
changing schema requires sign off from tech lead (checking indexes, scalability of changes)

added query killer (issues kill statements on long queries; better to have code die than DB to be overloaded and take down everybody)

schema changes on large tables (they use mysql):
  • create a new table
  • do copy on read
  • have background process later migrate the rest of the data
memcache changes require second set of eyes (hard to test on local sandbox)

hard to work with outsourcers who build over several days (impossible to integrate)

build system itself is critical business function; keep metrics on build system (web dashboard of build process)

integration with A/B testing inside the code (nice slide with pseudocode)
  • name the experiment
  • specify initial rollout % or amount of users
  • specify customer branches with percentage weightings of what % should see enhanced versus non-enhanced (e.g., 50% A/B split)
  • helper function that returns which branch a certain customer should see (enhanced or not) and if not yet assigned then to permanently assign [so customer always gets same experience]
  • simple if statement that splits between if user should see test feature or not
  • web page listing all experiments (available to everyone in company)
  • to user % (QA and admin only, 0%, 10%, etc.)
  • closed on status (they have a page that lists experiments that were closed but the code still exists; this allows easy housekeeping to remove unused code after a while)
per-experiment dashboard to see user groups (male, female, etc.), #s, results (highlighted by desired/undesired colors) and p-values

sprints:
  • planned sprint schedule usually not met (outstanding issues, incomplete features, tech review, refactoring)
  • when releases happen every 15 minutes, "planned sprint ends" can be arbitrary
  • changed to just say that the sprint ends when the work is done (but still understand overage reasons)
IMVU culture:
  • first day on job, engineer pushes out to live customers immediately
  • makes people feel empowered
  • hack-week: you can build anything and company provides food and drink
  • if you're convinced something's important for customers, just build it and allowed to release to 1% of customers without approval
 
 
Picture
I recently finished listening to the book I'm Feeling Lucky: The Confessions of Google Employee Number 59 by Douglas Edwards. It was quite a long, detailed story, but I particularly enjoyed that level of detail, as hearing the "inside story" was what I was actually interested in.

I learned about Google around 2001, when a friend showed me a search engine he claimed worked better (and faster) than Yahoo (that's when they were just showing the milliseconds to complete a query, which they still do to this day). When I got to college, I had friends who worked at the headquarters and even invited me there for meals (it was like going to Disneyland). It was really neat hearing the detailed account in this book from an insider and correlating that with my own personal experiences of the company and people I knew there.

The story was written by a journalist who was tired of working at large corporations and wanted to experience the start-up life. He became Google's Brand Manager and continuously struggled with his own identity in the company and what his role was. It was humbling to hear about the internal politics and constant debates that took place between the initial members of the company on issues all over the board, like product features, EULA language, April Fool's jokes, logos, and UI design and copy. I felt like I could relate to the author because I too have experienced these types of debates and have felt similar frustration to his in the past.

Overall, I learned a lot more about Google and the bumpy, windy road it took to where it is today. It's so easy to think they had everything figured out from the beginning; this book explains that there could be nothing further from the truth.

Below are my main notes and takeaways from each part.

Introduction
  • Started out complete opposite of a large corporation
  • Operating principles: old rules didn't apply
  • Was there 1999-2005
  • Worked in marketing
  • Brand actually built by engineers
  • Yahoo was juggernaut but portal
  • Google: obsession with efficiency
  • Efficiency, frugality, integrity
  • Joined when there were 50 engineers
  • He was #59
  • TGIF Weekly Meetings talking about accomplishments for the week
  • Status blind culture (but engineers #1)
Part 1: You are one of us

Ch. 1: From whence you came
  • Came from marketing at San Jose Mercury News
  • Tired of big co. bureaucracy
  • Huge dot-com bubble
  • Google: huge focus on equations and algorithms
  • Sergey interview question: after he comes back in 5 minutes, explain to him something complicated that he didn't know
  • Started as online brand manager
  • No org chart in company but they had a chef and massage therapists
Ch. 2: In the beginning
  • Not clear what business vision was
  • Focus on collecting smart people; figure out plan later
  • Everyone volunteered in data center
  • Global shortage of RAM
  • Build machines to fail; use commodity parts
  • Fix and replace later
  • Crazy stories of mayhem inside data center
  • Decided PR main method of marketing
  • Little plan in marketing
  • No structure of management
  • Yahoo switched from Altavista to Inktomi, a Google competitor
  • Inktomi did all search around web
  • Go2 used free market to do bidding-based ranking
  • First project of his: codifying UI
  • Data-based divinity at Google, which he wasn't sure always made sense
  • Not sure why he was there
  • TGIF tradition: celebrate employee birthdays that week with cake
Ch. 3: A world without form
  • Crawling
  • Indexing
  • Ranking
  • Query analysis
  • Anchor text analysis
  • Disambiguation
  • Speed vs. scale
  • Focus on hiring only those as good as you; only way to double your productivity is to hire another person as good as you
  • Mission to collect all smart people in world
Ch. 4: Marketing without marketing
  • Google always believed product more important than marketing
  • Sergey wanted to give donation against cholera to gain awareness
  • Never want to take standard or approach
Ch. 5: Give a process its due
  • Sort of sarcastic tone throughout book
  • Kept wanting to do marketing but kept getting pushed aside
  • Wanted to create product management group but faced resistance
  • Google mantra: Just build cool stuff for its own sake
  • Created OKR (Objectives & Key Results) system
  • Products died by data alone, not sentiment
  • Supposed to complete only 70% of OKRs
  • Performance reviews separate from OKRs
  • UI team growth and many debates
Ch. 6: Real integrity
  • Decided to experiment with ad platform that is heavily targeted
  • Started with Amazon affiliate ads, delivered first revenue of company
  • Tried out selling CPM ads
  • Built inventory estimation system
  • Wanted to only do text ads to not be intrusive 
  • Added color backgrounds to distinguish from search results
  • Decided to move to CPC ads determined by realtime auctions
  • Key difference was that they targeted keywords, as opposed to only the auction results like Go2
  • Real ethical dilemma: pay for placement made founders angry
  • Engineers cared about quality and ethics
  • Search engines need to watch out for public by not letting people manipulate results
  • Came up with term, "Don't be evil"
  • Google Open Directory project versus search
  • Volunteers added sites to directory
  • Barter arrangements to swap ads to Google with other networks
  • Wrote own banner ads when didn't hire freelance artist (hated hiring outside contractors, especially designers or ad agencies)
  • Analyzed ad performance using data alone
Ch. 7: Healthy appetite for insecurity
  • Author had insecurity about his own value
  • Age 41, wife, 2 kids, unsure about what he's doing and why working long hours
  • Self-doubt is always linked to ambition
  • Google had leader boards internally for all sports and work
  • Developed coding style guide
  • Compiler warnings should never ignored
  • Formal code review practice
  • Elevated standards
  • Cross-pollinated ideas over meals
  • All perks and services provided
  • To make decisions when people fought, had to duke it out in a video game
  • Charlie was chef from Grateful Dead
  • Emailed different menu every day
  • TLA: three letter acronym
  • Free food policy increased collaboration and saved employee time
  • Made sure nothing went to waste
  • Webcam monitoring cafeteria line length and servery conditions
  • Cafe became circus with celebrities who visited over time
Ch. 8: Cheap bastards who can't take a joke
  • Philosophy: launch first then iterate
  • Created product review meetings with Larry
  • April Fool's joke designed for site
  • Mentalplex: Google searches when you think
  • Joke of putting nav text in foreign language went bad, users complained, engineers didn't follow spec
  • Learned that engineers are the true gatekeepers
  • Hyperbolic ideas of founders
  • Called competitors and press "bastards"
  • Feedback always ambiguous: "think about it more," "not too sucky," "not Googley"
  • Philosophy: Never pay retail
  • Overpacked office with staff
  • But then got SGI office
  • Negotiators obsessed with cheap and getting discounts from vendors
  • Put vendors on phone conference together to compete
  • Expected 80-90% off list price
  • No one talked about salary, just options
  • Had to borrow money from parents to exercise options when granted for tax reasons
  • Always put boss's idea as priority
  • Larry killed marketing budget and focused on affiliate program
  • Killed program when data didn't support
Ch. 9: Good enough is enough
  • First Google Doodle of aliens
  • Didn't think logo changing was good for brand
  • Google makes you challenge every assumption
  • Don't delegate, do all you can yourself to move forward
  • Never stop others from doing something interesting
  • Don't get in way of others doing something interesting
  • Logo doodles became controversial
  • Pressure from kids and family
  • Pressure to perform
  • Good enough is good enough
  • Get 80% of a task done and then it becomes lower priority, so should switch to another task
  • Hire based on ability over experience
  • Google generalist hiring
  • Everyone very sparse with praise
  • Never derail launch of product for marketing reasons
  • Patch and move on rather than fix underlying issue
Ch. 10: Rugged individualists
  • Larry hated ad agencies, thought they were evil, stupid people
  • Product should stand for itself
  • Believed in simplicity as a benefit
  • Must focus just on tech
  • Larry read business books to prepare to run his company
  • Founders set the terms of their VC valuations
  • Never self-aggrandized
  • Ideas valued based on merits, not source
  • "Misc" email list with long funny threads
  • Engineers don't stop asking why
  • Don't let something go by when see something wrong
  • Fight over everything
  • Lots of unsolicited advice (sounds like Israeli culture based on Start-Up Nation)
  • Founders always had laissez-faire style
  • Listened at meetings, but let others decide
  • Fought with data when cared
  • Porn filter was difficult work
  • Porn is a cutthroat business heavily using tech
  • Learned about spammers that deceive search bots
  • Google employee spoke on forums to correct rumors, could speak freely without PR approval
  • Shut off unauthorized automatic queries, even to entire ISPs and countries (France) when had no other choice
  • Spend time doing not deciding
  • Individual engineers just do what they want
Part 2: Google grows

Ch. 11: Liftoff
  • Not start-up and not search behemoth
  • Awkward time
  • Extremely long hours
  • Became Netscape's default search engine
  • Huge increases to server load from user growth way more than expected from Netscape
  • Created pager service to monitor site after went down a couple times
  • Couldn't add capacity fast enough
  • Yahoo switched to Google
  • Had contractual obligations on latency and index freshness (promised way more than could deliver at the time)
  • Made them push hard to improve
  • Problems with bad hardware and memory
  • Had to write resilient code
  • 1 billion URL indexed goal
  • Built Google 2 infrastructure
  • Made index format a lot more compressed
  • Started ignoring the word "the" on pages
  • Daily re-ranking and indexing much harder
  • Incremental indexing huge problem
  • Incestuous interconnected Silicon Valley companies
  • Job hopping expected
  • Office relationships and politics
  • Larry and Marissa became a couple
  • "Mixed marriages": spouses at competitor companies
Ch. 12: Fun and games
  • Company camping trips
  • Idea: ad self-serve
  • No rules at all on filters, worried about it
  • Have to love uncertainty to be in a start-up
  • AdWords creation
  • Was his name ("Edwards")
  • Initial customers: lobster company and porn
  • Injected humor into error messages and FAQ
  • Developed Google voice
Ch. 13: Not the usual yada yada
  • Google Toolbar privacy issues
  • Talked about privacy up front
  • Engage privacy explicitly
  • Message said, "This is not the usual yada yada"
  • The more you inform people, the more they trust you.
  • Promoted to Marketing Director
Ch. 14: Google problems and mail fail
  • Focus on search quality
  • 10k machines dedicated to search quality
  • Anchor text analysis
  • Google bombs: weird results
  • Huge customer service and email backlog
  • Google acquisition of usenet company DejaNews
  • Users rebelled but Google ignored because were actually helping them
  • Vision of founders burned brightly and always ignored initial public reaction
Ch. 15: Managers in hot tubs and hot water
  • Annual ski trip
  • Shared all financials weekly with team
  • Needed to find CEO to go public
  • Eric became CEO
  • Focus on cost cutting
Ch. 16: Is New York alive?
  • 9/11/01
  • Downloaded news articles and put on homepage without asking anyone permission
  • People kept requesting them to add news links
  • Not a soulless corporation
  • Did whatever they could to help
  • Sergey injected own personality into company
  • Searched logs to help find terrorist names
  • Had to go on instinct as added news sources
  • Struggled with tone with users
  • Worked on algorithmic news
  • Controversy on showing US flag, mourning message
  • Went back to focusing on functionality, not content portal
Part 3: Where we stand

Ch. 17: Two speakers and one voice
  • Disagreements with Marissa Mayer
  • Engineering just did what they wanted
  • Larry really wanted to scan books
  • Started with mail order catalog scanning
  • Was tough
  • Launch calendar meetings
  • Approvers of various parts had to flip approval bit on project page
Ch. 18: Male enhancement
  • CRM software update couldn't be installed by themselves
  • Instead of getting reliable CRM vendor, invested in Larry's friend's CRM startup and then bought it
  • Translation Console to translate services to other languages
  • Crowdsourced translation of site pages
  • Google way: break things into tiny solutions
Ch. 19: The cell of a new machine
  • New system for AdWords using CPC not CPM
  • Distribution of ads on others sites
  • Epiphany: CPM * CTR based on quality ranking using historical data and other inputs
  • CTR needed to be predictive and used black box/secret algorithms for ad scoring
  • Second price auction model better than Go2's first price auction
  • Care about best results for users, not advertisers
  • Larry and Sergey were never much into the ad project
  • Go2 changed name to Overture, which believed all search results will eventually be paid
  • Endless struggle for search perfection; list of 10 principles; not portal
  • Focus on user
  • Do one thing well
  • Fast is better than slow
  • Open is better than closed
  • Democracy works on the net
  • Don't have to be at your desk to work
  • Can make money without being evil
  • Always more info out there to organize
  • Need for information crosses borders
  • Don't need a suit to be serious
  • Great is not good enough
  • Underpromise and overdeliver
  • Tech company that solves hard problems
  • Don't be evil
  • Overture got Yahoo ad deal (juggernaut deal)
  • Google got Earthlink ad contract
  • Running own site was lab for advertising
  • Lowered margins on distributed ads so partners on other sites could keep a lot more revenue
  • Overture took 49% of revenue, so Google could undercut them by subsidizing through revenue off ads from own site
  • 35 engineers reporting to 1 PM
Ch. 20: Where we stand
  • Company vision: products that are work, are useful, and never evil
Ch. 21: Aloha AOL
  • Aol was giant of Internet, partnered with Overture
  • Approached AOL for business
  • AOL very hard negotiator
  • SWAG: Scientific Wild-Ass Guess
  • Had to calculate guarantee payment to AOL if didn't get any ads clicked
  • Thought had lost Yahoo and AOL deals to Overture
  • But deal not yet final
  • AOL canceled deal and did deal with Google
  • Hired lots of AdWords reps, temps
  • Lots of women hired (engineers got happy)
  • Built up ad network bigger than Overture
  • Focus on objective search results
Ch. 21: We need another billion dollar idea
  • Introduced Associate PM and Product Marketing Manager positions
  • Started becoming big company
  • Google Logic: his brand identity idea
  • Not well accepted
  • No one cared about messaging strategy
  • Next billion dollar idea: Gmail
  • Thought would never do ads on email content
  • Founders more open to it (just like spam filters/analysis)
  • Used content targeting for ad results
  • AdSense
  • Shared revenue with publishers
  • Project of moving all offline content online (started with books online)
  • 20% time concept came out of this prototyping that wasn't assigned
Ch. 23: Frugal and friction
  • 7-8 moves of his desk in 3 years
  • Became Director of Consumer and Brand Marketing
  • Marissa always prevailed in debates
  • Froogle launch
  • Wanted to use Google Product Search but overruled
  • But won tagline battle
  • Eventually named changed to what he said
Ch. 24: Don't let marketing drive
  • Pop-up blocker in toolbar
  • Privacy versus usability
  • User search data on cookie/IP address combo
  • Larry wanted to minimize public discussion of privacy and queries
  • Google Zeitgeist
  • Display in lobby of live searches (I remember that!)
  • Strain in relationship with Marissa
  • Marketing cast out of product review meeting
  • Passed 1,000 employees
  • Coordinated TGIF meetings introducing new employees, milestones reached, revenue
  • Cash bonuses to employees
  • Yahoo bought Overture
  • Microsoft awakened to Google search
  • New recruiting idea: billboards with coding challenges inside ads 
  • Google Labs, Google Aptitude Test
  • Used Crispin Bogusky ad agency for recruiting ads
  • Googlers came up with the puzzles
  • Press loved it, got Simpsons cameo
  • Got 4,000 job applications
  • Only hired 1 engineer through it directly because didn't have data to track
Ch. 25: Mistakes were made
  • Bought SGI headquarters
  • Fully vested options now
  • Huge whiteboard of joke Google world takeover plans
  • Intranet kept growing and had ton of data
  • Slowly got locked down
  • Preparation for IPO
  • Yahoo dropped Google from search results but wasn't a problem
  • Slips of information to press
  • Orkut joined Google, created social network
  • Invite-only
  • Used Microsoft-based tech at first and then moved to Google tech
  • Wanted to launch more quickly, skipped security review
  • Users started spamming it
  • Still huge in Brazil
  • Google just let it die instead of working harder on it
  • People internally found lots of problems in Gmail
  • Bought Gmail domain
  • Launched Gmail on April Fool's but was a mistake
  • People freaked out over privacy (Google reading emails)
  • Writing S1 document to go public
Part 4: Can this really be the end?

Ch. 26: S1 for the money
  • RR Donnelly editing firm in Palo Alto reviewing S1 line by line with 20 lawyers
  • Company meeting announcement of IPO plan
  • Google wanted to do Ddutch auction which banks hated
  • Everyone had laid back attitude; no one cared too much
  • IPO not big deal
  • Everyone kept working
  • Worked to keep culture the same
  • A few people started buying toys
  • Had first earnings call
  • His role in brand management not needed anymore
  • Would wind down work after 2 months
  • Thought founders sometimes too impatient, too proud
  • Founders thought they never were wrong (he disagreed)
  • Overall, had crazy ride and changed his life
 
 
Picture
itb.biologie.hu-berlin.de
A few days ago, I was trying to remember a line in a song I had heard about 9 months ago. I heard the song at a karaoke event, and the singer had a funny accent and demeanor. I vaguely remembered the line contained an alliteration, and with that clue, combined with the funny accent, I was able to remember the moment after about 30 seconds of thought.

It was those 30 seconds that then caused me to wonder how my brain did that. First of all, I was quite surprised I even could conjure up the memory, which was quite unimportant. Thus, the fact that I could do it in 30 seconds was surprising; however, why did it have to take 30 seconds? What was going on inside my skull? Was some huge table being scanned? Some map-reduce operation being done? Were old neural connections being dusted off and re-energized with electrical current for my old memory to be resuscitated?

What's neat is that our brain consolidates memories and continues to work on solving problems and answering search queries while we sleep. What's crazy to wonder about is what part of "us" controls it while we sleep....

As far as I know, the brain doesn't operate at a typical "clock speed" like computers do (where the clock speed dictates how often a CPU goes from instruction to instruction). But what does control how quickly our brain works? Clearly it changes in speed and function over time as we age, and its speed can deteriorate with various diseases. So there must be something biological/physical that somewhat resembles clock speed. IQ? From a quick search, this article tries to tackle this question, but at a very high level (and the article's somewhat old).

That got me thinking about another clock in our body, something a lot more like the clock on our wall and in a computer: our body's internal clock (circadian rhythm). I bet there are lessons both biologists and computer scientists can learn from each other in examining the parallels between our body's clock, our brain's "clock," and our computer's clock.

And finally, how does parallel processing work? In a computer, it's like having separate little brains that can do very basic tasks like read and store numbers and arithmetic; but in our brain, is it that multiple neural connections are being formed continuously and it's just a matter of which ones happen to grasp our attention at any one time? As far as I know, people aren't really able to take a large problem, split it up into many parallel parts, and assign those different sub-problems to separate mini-brains. Or are we? Is that what intuition does? Or does intuition just leap ahead magically to some final answer and not worry about sub-problems at all?

All of these questions fascinate me and make constantly wonder how our brains function deep inside.
 
 
Picture
I recently was lucky enough to attend a sold-out, uber-geek event featuring the creator of PHP, the programming language powering a couple small websites out there, including some you may have heard of, like Yahoo and Facebook.

The talk was put on by LAPHP, and the speaker was Rasmus Lerdorf. The topic was "PHP in 2011," and it discussed how PHP fits into the current technology stack, followed by an overview of what you should and shouldn't be doing, along with a summary of new and upcoming features in PHP 5.3 and PHP 5.4.

Rasmus Lerdorf is known for having gotten the PHP project off the ground in 1995 and has contributed to a number of other open source projects over the years. He spent 7 years at Yahoo and has since worked for and consulted with various startups. He was born in Greenland, grew up in Denmark and Canada, and has a Systems Design engineering degree from the University of Waterloo. 

The full "slides" for Rasmus's talk are here.

From the moment I found out that PHP is a recursive acronym (standing for "PHP: Hypertext Processor"), I found the language cute. While I personally feel the syntax leaves much to be desired in terms of prettiness, the language clearly gets the job done.


Below are my main observations and notes on the talk.

Rasmus Background
  • Rasmus seemed like a very approachable and down-to-earth guy. Definitely doesn't flaunt nerd celebrity status.
  • Not traditional hacker background
  • He doesn't even have CS degree.
  • Likes solving problems, not writing code
  • Hated repeating code
  • Wrote PHP to program less and enjoy his weekends
  • Wrote language to be functional and easy to grasp
  • Web programming was never sexy for good programmers
  • Thus, non-technical people had to build the web.
  • Made it very easy to just add PHP tags to existing web pages and make them dynamic.
  • Wrote documentation before wrote code
  • PHP written in C
  • Was always speed freak; hated slow calls to perl interpreter
  • Wanted to embrace architecture of web: store nothing/stateless approach

PHP Background
  • Mosaic first graphical web browser came out; start of real web
  • 1993: Started with basic templating system using HTML comment tags
  • Used yacc and lex to write PHP v2
  • 1995: Introduced <? tag (process instruction from SGML)
  • Others said should label his tag (forgot to read that part of SGML) so now <?php. He likes to read only as much to feel like he understands and then moves on (sometimes not the best practice).
  • 2003: added classes/OO
  • PHP has 1/3 of web market share
  • Only .NET close to PHP in market share
  • Yahoo switched all to PHP in 2002 across thousands of servers; stateless design allowed parallelization
  • FB using PHP
  • PHP did not allow you to modify web server like mod_perl so could be shared on ISPs (important for adoption)
  • Limiting CPU time, file system access were other strategic features for ISP growth
  • Added 5 second Atom parser to print XML content in 1 line
  • Geocoded Twitter search example
  • Worked at Yahoo for 7 years
  • Created YQL: Yahoo Query Language: like SQL selects from various web APIs
  • Geocoded Flickr photos example
  • New CSS: shadows, rotation of divs, rounded corners example

On his mind for 2011
  • PHP 5.4
  • Continuous integration
  • Node.JS and PHP
  • Libevent
  • ZeroQM: message queuing
  • Gearman worker management in PHP-FPM: offloading jobs to other processes to run in background
  • Getting a "real" job :)
  • Rewriting 10 year old presentation system
  • He also helps startups get going well on PHP
Technologies serving tech (what he sees in the field)
  • MongoDB, MySQL, Couchdb
  • Redis, Gearman, Memcache, ZeroMQ
  • Mod_PHP APC, PHP-FPM APC 
  • Apache, nginx
  • Linux
  • Recommends nginx and PHP-FPM
  • Sees a lot out there: nginx for load balancing frontend, Apache for web server
Performance
  • Doesn't matter which web server you choose
  • Latency and throughput both roughly stable
  • Other stuff you do in generating dynamic data much more important
  • Set error reporting to -1
  • Use strace, look for ENOENT (should never happen)
  • Look at stat cache to see if fills up
  • Use a profiler, callgrind, xdebug
  • Use "Inclued" tool to see a picture of include files
  • Use Hiphop-PHP to do static analysis (Facebook's compiler for PHP to compile to c++ through g++). Can tell Hiphop just to analyze through --analyze flag to find undeclared variables and dead code.

Recommendations
  • Put static assets on different domain/subdomain for CDN
  • Keep cookies short and remember MTU size (max transmission unit for 1420-1500 bytes on TCP) for mobile. Includes all headers, so after headers maybe 600 bytes left.
  • Don't overload web servers; don't set max serving to 200 (he never goes above 50-60 servers running no matter how beefy of machines; better to just add another machine)
  • Use out of band processing: Gearman or custom via ZeroMQ (he prefers Gearman)
  • Don't move relational data out of relational database
  • Do move huge non-relational tables out of relational DB
  • Be very careful about foreign keys. Avoid if possible. Can create deadlocks easily.
  • Think hard about ORM, caching strategy
  • Get on PHP 5.3
  • Tie cookies to www domain
  • Use other domain for all static content (helps w/ CDN)
PHP 5.3 features
  • Better stack
  • Constants in read-only memory
  • Better exception handling
  • Performance improvements
  • MD5 faster
  • 5-15% overall faster
  • Added closures, namespaces, late static binding, garbage collector, nowdoc, restricted goto, ternary shortcut, __dir__, __callstatic, dynamic static calls, improved date extension, date create from format, better date error reporting, spl, FPM, OpenSSL improvements
  • His secret: he likes to turn off the garbage collector because he's 1337 and still programs in vi
PHP 5.4 performance improvements
  • FastCGI request handling, better memory handling, startup/shutdown, repeated run-time function binding, string constants, access to global constants, access to static properties, empty hashes, @ operator, unserialize(), removed features, traits, short array syntax, function array dereferencing, binary notation, improved errors, json improvements
 
 
Picture
A couple weeks ago, I went to a talk on Securing Databases in the Cloud. The speaker was Mike Frank from Gazzang, a company that sells software to help with the exact problem he was speaking about: the risks with open source software tools and cloud hosting. The talk felt like a slightly awkward mix between promotion and education, but there was enough education that I got some good stuff out of it.

I know the importance of security and am quite fanatic about having proper security practices and aiming towards zero trust policies anywhere possible. I still managed to pick up a few new things, including considering anew the security implications of cloud-based hosting.

The most striking question that Mike brought up and which caused me pause was about virtual images. Hosting on AWS is extremely popular, and users have the perception of having a dedicated server. Most of my attention when thinking about security before went to security within the server (data architecture and encryption of data in the database) rather than security of the overall server image. Mike brought up the scenario of your AWS image, a virtual machine sitting somewhere in memory and disk, and an engineer somewhere having access to that virtual image and being able to do anything with it. How do you protect yourself in that scenario? Coming from that perspective, it made it obvious that security end-to-end and zero trust even of the (virtualized) hardware layer is important.

During the talk, Mike spoke about encrypting data within MySQL, PostgreSQL, Drizzle, and NoSQL databases Cassandra and MongoDB. Mike is Director of Products at Gazzang and prior to that, he was one of the senior product managers for MySQL both under Sun Microsystems and Oracle. He clearly knew his stuff.

Below are my notes from the talk.

1. Huge security risks out there. A new AWS instance spun up will get attacked (attempted) within minutes.

2. Non-obvious stuff that's important to protect:
  • DB config files, log files, data directory
  • Application source code
3. Ways to protect:
  • Linux firewall
  • AES 256, SHA 256, RSA
  • OpenSSL
  • mcrypt
  • ecryptfs
  • dm-crypt
  • Cloud provider's firewall and security
  • Encrypted cloud storage
  • Encrypted file system
  • Access control restrictions
4. Key management options:
  • In database (less ideal)
  • OS kernel key ring
  • Outside database
5. Always use SSL for transport security

6. Database encryption functions for data at rest. Keys on outside key store.

7. Gazzang's product is ezNcrypt. How they solve it:
  • On disk seamless encryption
  • Keys stored outside DB
  • Provide secure environment to run MySQL, Apache, PHP
  • Handle ACLs
  • Towards zero trust
8. Good article out there on issues with PCI compliance in the cloud

9. Gazzang built on top of ecryptfs
  • They added keys and access controls
  • All files are AES-encrypted so files stolen (like if AWS hacked) are worthless
  • Performance hit of encryption: 1% hit on transactions per second and latency.
  • Single passphrase and salt or RSA key for system
  • Each file encrypted with separate key which master key can access. This allows changing the master key without re-encrypting all data (that's smart).
  • Can also use their product to do PHP and perl encryption.