Ask me anything technical!

RSS Feed
  1. All responses Most smiled responses
    1. huncheng

      I thing the Django / Rails decision is really a matter of personal preference. You can hire great programmers for both, lots of great sites have been built with both, there are tons of libraries and tools for both etc etc.

      I haven't used Drupal so I can't really comment on it. My impression is that it's much better suited for portal type applications where you can leverage the large number of existing plugins, but again, I shouldn't really comment as I'm speaking in ignorance.

      We've ended up using less and less of Django over time after replacing chunks with our own custom bits. We still heavily use the templating system though.

    2. huncheng

      Yep, you can get recommendations for facebook or twitter users. Twitter user data is public since it's based on the public twitter follow graph. Predictions for Facebook users require the Facebook user to OAutht to us in order to authorize you accessing their Hunch predictions.

    3. huncheng

      We think of everything as either a "user", an "item" and a "preference" between a user and an item. So in this case you might represent dropbox usage as saying that every type of file is an item and there is a preference of "liking" or "4 stars" between the user and the file type. That way you can model that some people "like" PowerPoint files and other users like AutoCad files while others like tar.gz files.

    4. huncheng

      We currently use 50. It's kind of an arbitrary number. More factors do increase our predictive ability, but they also make the system slower. Too many factors and you risk over-fitting to your training data. 50 is a nice trade off between accuracy and speed for now.

    5. huncheng

      We don't, though not for any real reason than lack of time to try it out. We use MySQL for most of the site and a bunch of custom stuff for batch processing the taste graph edge and node lists.

    6. huncheng
    7. huncheng

      yeah, for internal graphs. for user facing charts we used to use ChartDirector from http://www.advsofteng.com/ though we don't have any user facing graphs any more.

    8. huncheng

      There's not really any one thing I recommend. Fundamentally, you have to understand how EVERYTHING on your site works (hardware, dbs, web servers etc) in order to understand how bottlenecks form.

      Second, I recommend reading websites tech blogs to see how they really work and reading stuff like http://www.mysqlperformanceblog.com/ and http://developer.yahoo.com/performance/

    9. huncheng
    10. huncheng

      right now it's just returning the most recent X 'preferences' (ratings). so it could be any amount of time depending on how active your friends are. you can also restrict it to only returning activity on a specific item if you want more depth on one thing.

    11. huncheng

      We all had some some amount of academic background, but there's a big gap between what makes a good ML paper and what makes a good recommendation product :)

    12. huncheng

      We have a pretty standard relational model for most of our data.

      The major non-relational data for us is the taste graph, which, as its name implies is a graph structure.

    13. huncheng

      It really depends on the type of requests and the read/write break down of them. Requests that write data are more expensive, generally, than requests that just read data.

      Writing data frequently involves synchronizing and persisting data in a db which is usually the least scalable part of your app. Reading data can be parallelized through reading data from caches or db replicas and so reads are much cheaper for us.

    14. huncheng

      We all code on macs and/or linux machines we ssh into. I can't imagine doing dev on windows. It's great being able to run pretty much the same software stack on my laptop as well as on a linux box.

    15. huncheng
    16. huncheng
    17. huncheng

      We use svn, which isn't really a statement about our philosophy on revision control. We like git as well, but it's just not what we ended up picking.

      We try to avoid branching if at all possible. We encourage everyone to commit often and to develop large projects as a series of small self-contained changes that can quickly go out to the production site. We typically update the site a few times a day.

    18. huncheng
    19. huncheng

      Nope. I've heard that it's join performance is pretty poor. I think it's designed to handle workloads that focus more on simple queries, inserts and updates.

    20. huncheng

      We try to break things down into small projects. A typical project might last from part of a day to at most a week. Generally you can't tell if you're behind on an N day schedule until N/2 days have elapsed, so we like short project durations.

      We like to hire programmers who can also serve as mini-project managers for what they're working on. So we rely on their good sense over building incredibly detailed specifications of what to do.

      We also like to create projects that require little coordination with anyone else. Like any good parallel algorithm, coordination and communication costs ultimately limit how closely N people come to doing N times more work than a single person.

      Finally, we always ask people to ultimately come up with their own schedules. No one believes in or will commit to a schedule that is forced on them. We don't worry about people taking too long. If anything most good developers are prone to over estimating how much they can get done, not under estimating.

friends
smiles
23 all-time

Advertisement

Who huncheng responded to

  • tim
  • corydarby
  • Ivan Kirigin
  • Mark Westling
  • Adam Lawrence
  • jeffweinstein
  • Will Grant
  • emaridou
  • Joshua Dance
See all »

Who is following huncheng

  • memm8
  • JD Maturen
  • John Clifford
  • Ade Olonoh
  • Chris Barmonde
  • Martin Cozzi
See all »