Data?, Data!, Data…

  • Don’t Become a Victim of One Key Metric
    • 단일 수치에 일희일비 하지말라는 조언을 몇가지 사례를 통해서 전달하고 있음(Pinterest, Grubhub )
    • 특히, Grubhub의 케이스는 눈여겨 볼만한 가치가 있다고 생각함
    • Pick a metric that correlates the most to success, and make sure it is an activity metric, not a vanity metric. In principle, this solves a lot of problems.
    • Figure out the portfolio of metrics that matter for a business and track them all religiously. You will always have to make tradeoffs between metrics in business, but they should be done explicitly and not hide opportunities.
  • Debate Night Twitter: Analyzing Twitter’s Reaction to the Presidential Debate.
    • 트위터에서 미국대선 후보를 분석한 과정을 소개하고 있음
    • 개인적으로 ‘데이터 수집 과정’이 좋았다고 생각했고, 분석과정에서
    • I turned on my streamer at 8:30PM est, 30 minutes before the start of the debate and turned it off at 11:15PM est, 45 minutes after the end of the debate.
    • So, I decided to drop these retweets from my dataset, which reduced my count to 150k tweets. Obviously there were millions of tweets during the debate, but I’m confident my data was a representative sample.
    • As you can see, Twitter users took to their preferred language of communication, emojis, to express their opinions on the debate. Sometimes words aren’t enough to truly grasp one’s mood and feelings.
  • Reading: “Mining Large Streams of User Data for Personalized Recommendations”
    • 알고리즘과 향상된 개선 사항과 관련하여 온라인 실험을 어떻게 측정할 수 있는지에 대해서 소개하고 있음
    • 대규모 테스트의 결과를 확인하기 위한 과정에 대해서 관심이 있으시면 읽어볼 가치가 있음
    • Netflix had started to optimise for user experience metrics: like engagement, freshness, trust, retention. They came up with a feedback loop of forming hypotheses, training offline models and experiment with them online (via A/B tests). They were able to iterate fast,reject/accept their hypotheses and reason about the results for hundreds of features. A bunch of their recent research papers demonstrate this.

Python with PY Family

  • Deep Reinforcement Learning: Playing a Racing Game
    • ‘Out Run’을 강화 학습(DQN)을 사용해서 플레이하는 방법을 소개하는 튜토리얼 기사
    • 튜토리얼에 관련된 모든 자료가 공개되어 있기 때문에 관심 있는 분들은 유심히 읽어보면 도움이 되겠지만, 실제로 따라해보려면 생각보다 많은 자원이 필요하기 때문에 팀을 구성해서 도전해보길 권함
  • Build a Slack Bot that Mimics Your Colleagues
    • Markov Chains을 사용해서 동료를 모방하는 슬랙 봇을 만드는 튜토리얼 기사
    • 전체 코드를 제공하고 있기 때문에 몇가지 부분을 수정하면 간단한 봇을 만들 수 있음
  • Solving Performance Problems in the Django ORM
    • 장고 ORM 성능 문제와 관련된 기사
    • 현재 장고를 공부하는 입장에서 모든 것을 이해하기 힘들지만, 알기 힘들었던 장고 ORM의 몇가지 기능을 알 수 있었음
    • All frameworks require upfront knowledge about how the internals work in order to write high performance code. Django is fast, but sometimes it allows you to unwittingly write slow code.
    • You should first seek to make your code clear and then work on optimizing it. As your app grows, it is important to practice good hygiene when working with the ORM. Developing good habits now regarding consumption of resources will lead to big benefits later.