Programming Language, Week1

Programming Language by Dan Grossman, Coursera

Coursera PL 클래스인데 과제 마감기한도 까다롭고, 동료평가도 있고, 여러모로 조금 빡세다. 유일한 낙은 언어의 다양한 특징들을 탐구하기 위해 ML 을 사용하고 ~~오오 갓 ML~~ , emacs 를 사용한다는 건데.. 잘 버틸수 있을까 의심스럽다. 무려 첫 강의부터 대략 300분이 넘는 동영상을 올려주시는 교수님 -_-;

미국 CS 전공자들은 모두 이렇게 빡세게 배우는가요 ㅠㅠ?

SML/nj 와 Emacs 에 sml-mode 설치 후 시작한다. c-c, c-s 는 REPL 을 켠다.

ML Variable Bindings and Expressions

(* This is a comment. *)

val x = 34;  
(*: static env: x : int *)
(*: dynamic env: x --> 34 *)
val y = 35 : int;  
val z = (x + y) + (y + 2);

각 라인마다, dynamic environment 에 바인딩을 하나씩 추가한다. 따라서 세번 째 라인에서 x 와 y 를 z 를 바인딩하기 위해 사용할 수 있다.

그리고, dynamic env 에 바인딩을 추가하기 전에 static enviornment 에 Type 을 추가한다. 맨 처음엔 x: int 가 추가되고 y 와 값이 바인딩 되기 전에는 x: int, y: int 가 static env 에 추가된다.

따라서 매 라인마다 type checking 이 먼저 일어나고 그 후에야 프로그램이 evaluated (excuted) 된다.

ML 에서 if 는 다음과 같이 작성할 수 있다. 다르 언어와 비슷하다.

val abs_of_z = if z < 0 then 0 - z else z;

여기서 잠깐 Syntax 와 Semantics 를 정리하자면

Syntax is just how you write something

Semantics is what that something means
– Type-checking (before program runs)
– Evaluation (as program runs)

variable binding 에서는, type checking 은 static environment 를 확장하고, evaluation 은 dynamic environment 를 확장한다.

Rules for Expressions

expression 은 sub-expression 을 가질 수 있기 때문에, expression 을 해석 하는데 있어서 3가지가 꼭 필요하다.

(1) Syntax
(2) Type-checking rules: produces a type of fails
(2) Evaluation rules: produces a value

다시 말하면 Syntax 와 Semantics 가 필요하단 이야기다.

Variables

Syntax: sequence of letters, digits, _, not starting with digit

Type-checking: look up type in current statix env if
non there, fail

Evaluation: look up value in current dynamic env

Addition

Addition 의 경우에는 sub-exp 가 있을 수 있다.

Syntax: e1 + e2 where e1 and e2 are expressions

Type-checking: if e1 and e2 have type int then int

Evaluation: if e1 evaluates to v1 and e2 evaluates to v2, then sum of v1 and v2

Values

모든 value 는 expression 이다. 그러나 모든 expression 이 value 인 것은 아니다. 그리고,

Every value evaluates to itself in zero steps

참고로, () 는 unit 타입을 가진다.

Conditional

Syntax: if e1 then e2 else e3 where if, then, and else are keywords and e1 e2 and e3 are sub-expressions

Type-checking: e1 must have type bool. e2 and e3 can have any type, but they must have the same type

Evaluation: first evaluate e1 to v1, if it’s true evaluate e2 and that resut is the whole expressions’s result else evaluate e3

REPL and Errors

use

use 는 특정 파일을 읽어 binding 하고 it: unit 을 돌려주는데, 이건 무시해도 된다. 그리고 같은 파일을 use 할때는 항상 REPL 을 다시 시작하자. C-d, C-c, C-s

Error

대부분의 에러는 syntax, type-checking, evaluation 의 문제다.

Shadowing

같은 변수에 대한 multiple binding 은 poor style 이다. 그러나 이를 통해 environment 와 binding 이 어떻게 동작하는지 알 수 있다.

val a = 10;  
val b = a * 2  
val a = 5; (* this is not an assignment statement *)  
(* a -> 5, b -> 20 *)
val c = 2;  
(* a -> 5, b -> 20, c -> 20 *)

val a = 5 문장은, 할당하는게 아니라 a 를 shadowing 한다. ML 에서는 mutate 할 수 없다. 매번 새롭게 dynamic env 를 만든다.

val d = a  
(* ..., d -> 5 *)
val a =  a + 1  
(* ..., a -> 6 *)
val f = a * 2

use 를 이용하면, 기존의 a 의 값이 <hidden-value> 로 나오는 것을 확인 할수 있다.

val a = 1;  
val b = a;  
val a = 2;

다음과 같은 예제가 있을 때, b 는 1 이다. eagerly evaluated 되어 바인딩 후에는 value 를 만든 expression 과는 관련이 없어진다. 다시 말해 바인딩 후에는 a 와 b는 상관이 없다.

그리고 위에서 언급 했듯이 ML 에서는 assign to 가 없고, 앞의 a 는 뒤의 a 에가 있는 dynamic env 에 의해서 가려질 뿐이다.

그렇기 때문에 REPL 을 재시작 하지 않고서 같은 파일을 여러번 use 하면 문제가 생길 수 있다고 교수님이 누차 말한 것

Functions Informally

fun pow (x: int, y: int) =  
  if y = 0
  then 1
  else x * pow(x, y-1)

fun cube(x: int)  
  pow(x, 3)

이 경우 두 함수 모두 타입은 fn: int -> int 다. 타입을 이름 뒤에 사용하는것도 그렇고, 함수 타입도 그렇고 스칼라와 문법이 비슷한 것 같다.

* 가 타입에 있을때는 곱셈이 아니라 , 같은 역할을 한다. 따라서 다음과 같이 pow 를 호출할 수 있다.

val x : int * int = (2, 3)  
val y = pow x

참고로, 함수를 사용한 후 정의하는 것은 불가능하다. 따라서 사용하는 expression 위에 함수를 정의해야 한다.

Recursion

재귀에 대해서도 간단히 언급을 하는데, 문제를 간단한 방법으로 나누어 푸는 좋은 기술이라고..

Functions Formally

우리가 Function 이 무엇인지 PL 에서 정의하려면 위에서 언급했듯이 syntax 와 semantics 가 필요하다.

Syntax: fun x0 (x1: t1, ... , xn: tn) = e

Evaluation: A function is a value. x0 is added to dynamic env

Type-checking: (t1 *, ..., * tn) -> t

타입체킹이 조금 복잡한데, e 가 t 타입을 가지는지 검사하고, 파라미터도 마찬가지로 올바른 타입을 가지는지 검사한다.

다른 언어와 마찬가지로 t1 등의 파라미터는 e 를 위한 environment 에만 추가된다.

x0 이 dynamic, static env 에 추가되므로 이후의 코드에서 recursion 을 사용할 수 있다.

Function calls

Function calls 의 syntax 는 e0 (e1, ..., en) 이다. 만약에 인자가 하나라면 괄호(parentheses) 는 없어도 된다.

참고로 ML 에서는 variable numbers of arguments 를 함수에서 받을 수 없다. 인자의 개수가 정해져야 한다.

type-checking 의 경우에는 , e0 이 (t1 * ... * tn) -> t 인지 검사하고 en 이 tn 타입을 가지면, e0 은 t 타입이다.

Evaluation 스텝은 다음과 같다.

(1) evaluate e0 to fun x0(x1: t1, ... , x: tn) = e
(2) evaluate arguments e1, … , en to v1, …, vn
(3) extend dynamic env mapping x1 to v1 , … , xn to vn

두 번째 스텝에서는 eager evaluation 이 사용되는데 pow(2, 2+2) 같은 경우 인자가 2, 4 가 된다.

세 번째 스텝에서는 dynamic environment 를 확장하는데, 현재 함수인 x0 과 인자들인 xn 을 포함하도록 한다. 따라서 recursion 이 가능하다.

사실은 스칼라의 그것과 같은데 교재에 나온 설명이 너무 함축적이어서 이해하기가 어렵다.

Pairs and Other Tuples

위에서 잠깐 보았던 t1 * t2 같은 것들이 Pair 다. 다른말로 2-tuples 라 부른다.

Syntax 는 (e1, e2) 로 Type-checking 은 e1 과 e2 가 올바른 타입을 가졌는지 검사한다.

Pair 에 접근할때는 #1 e 또는 #2 e 와 같은 Syntax 를 사용하고, e가 t1 * t2 타입인지, 그 후에 #1 e 가 t1 또는 #2 e 가 t2 타입을 가졌는지 검사한다.

fun su_two_pairs (pr1: int * int, pr2: int* int) =  
  (#1 pr1) + (#2 pr1) + (#1 pr2) + (#2 pr2)

의 경우에는 타입이 (int * int) * (int * int) -> int 된다. 그리고 다른 언어와 마찬가지로 tuple 도 겹칠 수 있다.

val x1 = (7, (true, 9)) // int * (bool * int )  
val x2 = #1 (#2 x1) // true

Introducing Lists

Tuple 은 여러 타입을 가질 수 있지만, 정해진 갯수만큼의 element 만 저장할 수 있다. 반면 List 는 하나의 타입만 가져야 하지만, 원소의 갯수가 변할 수 이다.

빈 리스트는 [] 와 같이 만든다. [3, 4, 5] 는 int list 다. [(1+2), 3, 7] 과 같이 초기화하면 [3, 3, 7] 이 나온다. 리스트는 그 자체로 value 다.

:: 는 cons 라 발음하고, 다음과 같이 쓸 수 있다.

val x = [3, 4, 5]  
val y = 2 :: x

cons 뒤에 오는것은 List 여야 한다. 리스트가 비었는지 검사하기 위해 null 을, head 를 얻기 위해 hd 를, tail 을 얻기 위해 tl 을 이용한다. 따라서 tl [3, 4, 5] 는 [4, 5] 를 돌려준다.

그리고 다른 함수형 언어와 마찬가지로 tl [9] 는 [](nil) 이다.

리스트는 다양한 타입을 가질 수 있기 때문에 (int * int) list 같은 것도 타입이 될 수 있다. [(3, 4), (5, 6)] 처럼.

null 은 fn: a list -> bool 타입이고,
hd 는 fn: a list -> a,
tl 은 fn: a list -> a list

[]: a list 는 좀 특이한데, 다양한 타입이 될 수 있다. 3 :: [], false :: [] 이라던지.

List Functions

리스트를 조작하는 간단한 함수를 ML 로 몇 개 짜보자.

fun pow(x: int, y: int) =  
    if y = 0 then 1 else x * pow(x, y - 1);

fun cube(x: int) =  
    pow(x, 3);

fun sum_list(xs: int list) =  
    if null xs
    then 0
    else hd xs + sum_list(tl xs)

fun product_list(xs: int list) =  
    if null xs
    then 1
    else hd xs * product_list(tl xs)

fun countdown(x: int) =  
    if x = 0
    then []
    else x :: countdown(x - 1)

fun append(xs: int list, ys: int list) =  
    if null xs
    then ys
    else (hd xs) :: append(tl xs, ys)

fun sum_pair_list(xs: (int * int) list) =  
    if null xs
    then 0
    else (#1 (hd xs)) + (#2 (hd xs)) + sum_pair_list(tl xs)

fun firsts(xs: (int * int) list) =  
    if null xs
    then []
    else #1 (hd xs) :: firsts(tl xs)

fun seconds(xs: (int * int) list) =  
    if null xs
    then []
    else #2 (hd xs) :: seconds(tl xs)

fun sum_pair_list(xs: (int * int) list) =  
    if null xs
    then 0
    else sum_list(firsts xs) + sum_list(seconds xs)

fun factorial(n : int) =  
    product_list(countdown(n))

참고로 # 이 + 이나 :: 보다 우선순위가 높다.

List Recursion

재귀에 대해 생각할땐, 항상 명심해야 하는게 있는데 탈출 조건 이다. 따라서 empty-list 에 대해선 어떤걸 돌려줄지, non-empty-list 에 대해서는 무엇을 처리해야 할지 항상 생각해야 한다.

Let Expressions

let 은 local variable 을 바인딩하는 법이다.

Syntax: let b1 b2 ... bn in e end. Each b1 is any binding and e is any expression

Type-checking: Type of whole let-expression is the type of e. Type-check each b1 and e in a staic env that includes the previous bindings

*Evaluation: * evaluate each b1 and e in a dynamic env that includes the previous bindings. Result of whole expression is result of evaluating e

fun silly () =  
    let
      val x = 3
    in
      (let val x = 2 in x + 1 end) + (let val y = x + 1 in y + 1 end)
    end

여기서 Scope 의 개념이 나온다.

Scope: Where a binding is in the environment

Nested Functions

Function 은 binding 이다. 따라서 let 내부에서 local binding 할 수 있다.

fun count_from_1 (x: int) =  
    let
    fun count (from: int, to: int) =
        if from == to
        then []
        else from :: count(from + 1, to)
    in
    count(1, x)
    end

이렇게 하면 top-level 에서 count 는 사라진다. 그리고 엄밀히 말해서 count 가 가진 environment 에는 to 가 있기 때문에, to 를 인자로 가질 필요가 없다.

fun count_from_1 (x: int) =  
    let
    fun count (from: int) =
        if from = x
        then []
        else from :: count(from + 1)
    in
    count(1)
    end

Let and Efficiency

가장 큰 숫자를 찾는 다음의 함수를 고려 해 보자

fun bad_max (xs : int list) =  
  if null xs
  then 0
  else if null (tl xs)
  then hd xs
  else if hd sx > bad_max(tl xs)
  then hd xs
  else bad_max(tl xs)

이 함수는 [1, 2, ... , 30] 과 같은 리스트에 굉장히 나쁜 성능 exponentially (2^30) 을 보여준다. bad_max(tl xs) 를 두번 호출하기 때문이다. 따라서 max(tl xs) 를 변수로 놓아 캐싱하면

fun good_max (xs: int list) =  
    if null xs then 0
    else if null (tl xs) then hd xs
    else
    let
        val res = good_max(tl xs)
    in
        if hd xs > res then hd xs
        else res

    end

bad_max 의 if-then-else 가 10^-7 정도의 시간이 든다고 하면, [1, 2, ..., 55] 는 100년이 넘게 걸린다. 따라서 재귀를 구현하는데 있어서 local binding 은 필수다.

Options

참고로, good_max 는 리스트가 모두 음수일때 0 을 돌려준다. 이건 리스트가 비었을때 0 을 돌려주기 때문에 생기는 문제인데, 0 말고 다른 무언갈 돌려줄 수 없을까?

SML 도 Scala 처럼 Option 을 지원한다.(물론 SML 이 먼저..) NONE 은 a option 타입이고, SOME e 는 t option 이다. t 는 e 의 타입.

Option 을 이용해 max 를 리팩토링 해 보면,

fun max1 (xs: int list) =  
  if null xs NONE
  else 
    let val res = max(tl xs)    
    in if isSome res andalso isVal res > hd xs then res
       else Some(hd xs) 
    end

그런데, 이 max1 또한 문제가 있다. 사실 [] 는 리스트가 오름차순으로 구성되어있을때 ([1, 2, 3, 4]) 맨 마지막 호출에서만 오는데, 매번 isSome 으로 검사하니까 비효율적이다. 다시 리팩토링하면

fun max2 (xs: int list) =  
    if null xs then NONE
    else
    let
        fun max_non_empty(ys: int list) =
        if null (tl ys) then hd ys
        else
            let val res = max_non_empty(tl ys)
            in if hd xs > res then hd ys
               else res
            end
    in
        SOME (max_non_empty xs)
    end

~~교수님 let in end 는 제발그만 가..가독성이~~

Booleans and Comparison Operations

다른 언어의 && (and) 와 || (or) ! (not) 은 SML 에서는 ansalso, orelse, not 이다.

그리고 andalso, orelse 연산자는 다른 언어처럼 Short circuiting 을 제공한다. 함수가 아니다.

andalso 와 orelse, not 은 다음처럼 쓸 수도 있다.

(* andalso *)
if e1  
then e2  
else false

(* orelse *)
if e1  
then true  
else e2

(* not *)
if e1  
then false  
else true

if-then-else 만 가지고도 할 수 있으나, 그냥 andalso 와 orelse, not 을 쓰는게 코드를 읽는 사람의 정신 건강에 좋다.

Comparisons

비교의 경우엔 == 가 아니라 = 를 쓴다. != 가 아니라 <> 를 쓴다.

그리고 3.0 > 2 와 같은 비교는 안된다.3.0 은 real 이고, 2 는 int 다 Real.fromInt 를 이용하자.

No Mutation

교수님이 A valuable non-feature 라고 이야기 하시는데, 기능이 없는게 장점이라고.. SML 에서는 아래의 두 코드가 같다. (inditinguishable)

fun sort_pair(pr: int * int) =  
  if #1 pr < #2 pr then pr
  else (#2 pr, #1 pr)

fun sort_pair(pr: int * int) =  
  if #1 pr < #2 pr then (#1 pr, #2pr)
  else (#2 pr, #1 pr)

위가 더 나은 style 이라고 주장할 순 있지만, 다르다고 말할 순 없다. 그러나 mutable compound data 를 다루는 다른 언어에서는 위 두 함수는 다르다

예를 들어서 mutable data 를 다루는 언어라 가정하고 다음의 코드를 고려 해 보자.

val x = (3, 4)  
val y = sort_pair x

(* somehow mutate #1 x to hold 5 *)

val z = #1 y

이제 z 의 값은 무엇일까? sort_pair 구현에 따라 다르다. then 에서 pr 을 돌려줬다면, 5 일거고, (#1 pr, #2 pr) 을 돌려줬다면 3 일거다. 그러나 ML 에선 문제가 안된다. immutable 하니까. ~~오오 immutable 오오~~

But without mutation, we can implement either way

– No code can ever distinguishe aliasing vs identical copies
– No need to think about aliasing: focus on other things
– Can use aliasing, which saves space, without danger

ML 에서는 tl 이 상수 시간내에 이뤄진다. 첫번째 원소를 제외한 나머지를 가리키기만 하면 되니까. 어차피 그 데이터는 immutable 이니까 그냥 쓰기만 하면 된다. 변경이 필요하면, 그 때 새로 만들면 된다.

Java Mutation

public String[] getAllowedUsers() {  
  // return a references of allowedUsers
}

allowedUsers 자체가 private 여도, 다음과 같은 코드에 취약하다.

p.getAllowedUsers()[0] = p.currentUser();  
p.useTheResource();

따라서 위의 getAllowedUsers 는 copy 를 돌려줘야한다.

ML 같은 immutable 한 언어에서는 Reference (Alias) vs Copy 는 문제가 안된다.

Pieces of a Language

언어를 배울때는 다음의 다섯가지를 고려해야한다.

(1) Syntax: How do you write language constructs?
(2) Semantics: What do programs mean? (Evaluation rules)
(3) Idioms: What are typical patterns for using language features to express your computation
(4) Libraries: What facilities does the language provide standard?
(5) Tools: What do language implementations provide to make your job easier? (REPL debugger..)

다시 말하면 다섯가지를 모두 배워야 한다. 언어의 코어부터 그 확장인 라이브러리와 툴까지. 그러나 이 코스에서는 Semantics 와 Idioms 에 집중한다. ML 을 고른것도 그 이유고, 이걸 잘 이해하게 되면 Libraries 가 어떻게 구성되었는지 더 잘 이해할 수 있다.

Summary

처음엔 재귀가 어려웠는데, 시간이 지날수록 재귀가 얼마나 재밌고 강력한지 알게 된다. 함수를 building block 처럼 조립하는것도 너무 재밌고

Lisp 을 이용했으면 개인적 취향에 맞아 더 재밌게 배울 수 있었을텐데. 나중에 수업이 모두 끝났을때 ML 을 고른 이유를 느낄 수 있었으면 좋겠다.

그리고 Emacs sml-mode 가 좀.. 음.. 빈약한데 더 좋은걸 찾아야겠다. MELPA 엔 없던데.. 테스팅 프레임워크도 좀 찾아서 해보고.

Coursera Emacs ml programming language

Array

Programming Language, Week1