NoSQL 2011. 11. 28. 04:48
반응형
1. [URL:http://www.igvita.com/2011/08/01/protocol-buffers-avro-thrift-messagepack/]

Protocol Buffers, Avro, Thrift & MessagePack

Perhaps one of the first inescapable observations that a new Google developer (Noogler) makes once they dive into the code is that Protocol Buffers (PB) is the "language of data" at Google. Put simply, Protocol Buffers are used for serialization, RPC, and about everything in between.

Initially developed in early 2000's as an optimized server request/response protocol (hence the name), they have become the de-facto data persistence format and RPC protocol. Later, following a major (v2) rewrite in 2008, Protocol Buffers was open sourced by Google and now, through a number of third party extensions, can be used across dozens of languages - including Ruby, of course.

But, Protocol Buffers for everything? Well, it appears to work for Google, but more importantly I think this is a great example of where understanding the historical context in which each was developed is just as instrumental as comparing features and benchmarking speed.

Protocol Buffers vs. Thrift

Let's take a step back and compare Protocol Buffers to the "competitors", of which there are plenty. Between PB,ThriftAvro and MessagePack, which is the best? Truth of the matter is, they are all very good and each has its own strong points. Hence, the answer is as much of a personal choice, as well as understanding of the historical context for each, and correctly identifying your own, individual requirements.

When Protocol Buffers was first being developed (early 2000's), the preferred language at Google was C++ (nowadays, Java is on par). Hence it should not be surprising that PB is strongly typed, has a separate schema file, and also requires a compilation step to output the language-specific boilerplate to read and serialize messages. To achieve this, Google defined their own language (IDL) for specifying the proto files, and limited PB's design scope to efficient serialization of common types and attributes found in Java, C++ and Python. Hence, PB was designed to be layered over an (existing) RPC mechanism.

By comparison, Thrift which was open sourced by Facebook in late 2007, looks and feels very similar to Protocol Buffers - in all likelihood, there was some design influence from PB there. However, unlike PB, Thrift makes RPC a first class citizen: Thrift compiler provides a variety of transport options (network, file, memory), and also tries to target many more languages.

Which is the "better" of the two? Both have been production tested at scale, so it really depends on your own situation. If you are primarily interested in the binary serialization, or if you already have an RPC mechanism then Protocol Buffers is a great place to start. Conversely, if you don't yet have an RPC mechanism and are looking for one, then Thrift may be a good choice. (Word of warning: historically, Thrift has not been consistent in their feature support and performance across all the languages, so do some research).

Protocol Buffers vs. Avro, MessagePack

While Thrift and PB differ primarily in their scope, Avro and MessagePack should really be compared in light of the more recent trends: rising popularity of dynamic languages, and JSON over XML. As most every web developers knows, JSON is now ubiquitous, and easy to parse, generate, and read, which explains its popularity. JSON also requires no schema, provides no type checking, and it is a UTF-8 based protocol - in other words, easy to work with, but not very efficient when put on the wire.

MessagePack is effectively JSON, but with efficient binary encoding. Like JSON, there is no type checking or schemas, which depending on your application can be either be a pro or a con. But, if you are already streaming JSON via an API or using it for storage, then MessagePack can be a drop-in replacement.

Avro, on the other hand, is somewhat of a hybrid. In its scope and functionality it is close to PB and Thrift, but it was designed with dynamic languages in mind. Unlike PB and Thrift, the Avro schema is embedded directly in the header of the messages, which eliminates the need for the extra compile stage. Additionally, the schema itself is just a JSON blob - no custom parser required! By enforcing a schema Avro allows us to do data projections (read individual fields out of each record), perform type checking, and enforce the overall message structure.

"The Best" Serialization Format

Reflecting on the use of Protocol Buffers at Google and all of the above competitors it is clear that there is no one definitive, "best" option. Rather, each solution makes perfect sense in the context it was developed and hence the same logic should be applied to your own situation.

If you are looking for a battle-tested, strongly typed serialization format, then Protocol Buffers is a great choice. If you also need a variety of built-in RPC mechanisms, then Thrift is worth investigating. If you are already exchanging or working with JSON, then MessagePack is almost a drop-in optimization. And finally, if you like the strongly typed aspects, but want the flexibility of easy interoperability with dynamic languages, then Avro may be your best bet at this point in time.

2.[URL:http://qconsf.com/]
 

반응형
posted by choiwonwoo
:
NoSQL 2011. 11. 28. 03:58
반응형
It must be good article for people who is studying various NoSQLs like Cassandra, MongoDB, and CouchDB etc.

http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

This site helps me to clarify out the difference and Use case among NoSQLs.

CouchDB (V1.1.0)

  • Written in: Erlang
  • Main point: DB consistency, ease of use
  • License: Apache
  • Protocol: HTTP/REST
  • Bi-directional (!) replication,
  • continuous or ad-hoc,
  • with conflict detection,
  • thus, master-master replication. (!)
  • MVCC - write operations do not block reads
  • Previous versions of documents are available
  • Crash-only (reliable) design
  • Needs compacting from time to time
  • Views: embedded map/reduce
  • Formatting views: lists & shows
  • Server-side document validation possible
  • Authentication possible
  • Real-time updates via _changes (!)
  • Attachment handling
  • thus, CouchApps (standalone js apps)
  • jQuery library included

Best used: For accumulating, occasionally changing data, on which pre-defined queries are to be run. Places where versioning is important.

For example: CRM, CMS systems. Master-master replication is an especially interesting feature, allowing easy multi-site deployments.

Redis (V2.4)

  • Written in: C/C++
  • Main point: Blazing fast
  • License: BSD
  • Protocol: Telnet-like
  • Disk-backed in-memory database,
  • Currently without disk-swap (VM and Diskstore were abandoned)
  • Master-slave replication
  • Simple values or hash tables by keys,
  • but complex operations like ZREVRANGEBYSCORE.
  • INCR & co (good for rate limiting or statistics)
  • Has sets (also union/diff/inter)
  • Has lists (also a queue; blocking pop)
  • Has hashes (objects of multiple fields)
  • Sorted sets (high score table, good for range queries)
  • Redis has transactions (!)
  • Values can be set to expire (as in a cache)
  • Pub/Sub lets one implement messaging (!)

Best used: For rapidly changing data with a foreseeable database size (should fit mostly in memory).

For example: Stock prices. Analytics. Real-time data collection. Real-time communication.

MongoDB

  • Written in: C++
  • Main point: Retains some friendly properties of SQL. (Query, index)
  • License: AGPL (Drivers: Apache)
  • Protocol: Custom, binary (BSON)
  • Master/slave replication (auto failover with replica sets)
  • Sharding built-in
  • Queries are javascript expressions
  • Run arbitrary javascript functions server-side
  • Better update-in-place than CouchDB
  • Uses memory mapped files for data storage
  • Performance over features
  • Journaling (with --journal) is best turned on
  • On 32bit systems, limited to ~2.5Gb
  • An empty database takes up 192Mb
  • GridFS to store big data + metadata (not actually an FS)

Best used: If you need dynamic queries. If you prefer to define indexes, not map/reduce functions. If you need good performance on a big DB. If you wanted CouchDB, but your data changes too much, filling up disks.

For example: For most things that you would do with MySQL or PostgreSQL, but having predefined columns really holds you back.

Riak (V1.0)

  • Written in: Erlang & C, some Javascript
  • Main point: Fault tolerance
  • License: Apache
  • Protocol: HTTP/REST or custom binary
  • Tunable trade-offs for distribution and replication (N, R, W)
  • Pre- and post-commit hooks in JavaScript or Erlang, for validation and security.
  • Map/reduce in JavaScript or Erlang
  • Links & link walking: use it as a graph database
  • Secondary indices: search in metadata
  • Large object support (Luwak)
  • Comes in "open source" and "enterprise" editions
  • Full-text search, indexing, querying with Riak Search server (beta)
  • In the process of migrating the storing backend from "Bitcask" to Google's "LevelDB"
  • Masterless multi-site replication replication and SNMP monitoring are commercially licensed

Best used: If you want something Cassandra-like (Dynamo-like), but no way you're gonna deal with the bloat and complexity. If you need very good single-site scalability, availability and fault-tolerance, but you're ready to pay for multi-site replication.

For example: Point-of-sales data collection. Factory control systems. Places where even seconds of downtime hurt. Could be used as a well-update-able web server.

Membase

  • Written in: Erlang & C
  • Main point: Memcache compatible, but with persistence and clustering
  • License: Apache 2.0
  • Protocol: memcached plus extensions
  • Very fast (200k+/sec) access of data by key
  • Persistence to disk
  • All nodes are identical (master-master replication)
  • Provides memcached-style in-memory caching buckets, too
  • Write de-duplication to reduce IO
  • Very nice cluster-management web GUI
  • Software upgrades without taking the DB offline
  • Connection proxy for connection pooling and multiplexing (Moxi)

Best used: Any application where low-latency data access, high concurrency support and high availability is a requirement.

For example: Low-latency use-cases like ad targeting or highly-concurrent web apps like online gaming (e.g. Zynga).

Neo4j (V1.5M02)

  • Written in: Java
  • Main point: Graph database - connected data
  • License: GPL, some features AGPL/commercial
  • Protocol: HTTP/REST (or embedding in Java)
  • Standalone, or embeddable into Java applications
  • Full ACID conformity (including durable data)
  • Both nodes and relationships can have metadata
  • Integrated pattern-matching-based query language ("Cypher")
  • Also the "Gremlin" graph traversal language can be used
  • Indexing of nodes and relationships
  • Nice self-contained web admin
  • Advanced path-finding with multiple algorithms
  • Indexing of keys and relationships
  • Optimized for reads
  • Has transactions (in the Java API)
  • Scriptable in Groovy
  • Online backup, advanced monitoring and High Availability is AGPL/commercial licensed

Best used: For graph-style, rich or complex, interconnected data. Neo4j is quite different from the others in this sense.

For example: Social relations, public transport links, road maps, network topologies.

Cassandra

  • Written in: Java
  • Main point: Best of BigTable and Dynamo
  • License: Apache
  • Protocol: Custom, binary (Thrift)
  • Tunable trade-offs for distribution and replication (N, R, W)
  • Querying by column, range of keys
  • BigTable-like features: columns, column families
  • Writes are much faster than reads (!)
  • Map/reduce possible with Apache Hadoop
  • I admit being a bit biased against it, because of the bloat and complexity it has partly because of Java (configuration, seeing exceptions, etc)

Best used: When you write more than you read (logging). If every component of the system must be in Java. ("No one gets fired for choosing Apache's stuff.")

For example: Banking, financial industry (though not necessarily for financial transactions, but these industries are much bigger than that.) Writes are faster than reads, so one natural niche is real time data analysis.

HBase

(With the help of ghshephard)

  • Written in: Java
  • Main point: Billions of rows X millions of columns
  • License: Apache
  • Protocol: HTTP/REST (also Thrift)
  • Modeled after BigTable
  • Map/reduce with Hadoop
  • Query predicate push down via server side scan and get filters
  • Optimizations for real time queries
  • A high performance Thrift gateway
  • HTTP supports XML, Protobuf, and binary
  • Cascading, hive, and pig source and sink modules
  • Jruby-based (JIRB) shell
  • No single point of failure
  • Rolling restart for configuration changes and minor upgrades
  • Random access performance is like MySQL

Best used: If you're in love with BigTable. :) And when you need random, realtime read/write access to your Big Data.


























  example: Facebook Messaging Database (more general example coming soon) 
반응형
posted by choiwonwoo
:
MINERVA/C_CPP 2011. 10. 12. 07:26
반응형


 


반응형
posted by choiwonwoo
:
MINERVA/C_CPP 2011. 5. 11. 01:16
반응형

매크로 상수 : _MSC_VER

1000 : Visual C++ 4.X
1100 : Visual C++ 5.0
1200 : Visual C++ 6.0
1300 : Visual C++ .NET
1310 : Visual C++ .NET 2003
1400 : Visual C++ .NET 2005
1500 : Visual C++ .NET 2008
1600 : Visual C++ .NET 2010

반응형
posted by choiwonwoo
:
Focusing on .../Java 2011. 3. 12. 04:02
반응형
1. Ivy란?
- Apache Ivy is a popular dependecny manager focusing on flexibility and simplicity.

2. 설치
- download url : http://ant.apache.org/ivy/download.cgi



- 언제나 그렇지만, 자기를 읽어달라는 파일이 있다. 당연히 봐야겠지^^
  "Please read doc/install.html for installation instructions."
   문서를 보면 이런 문구를 발견, 당연히 다음 스테이지는..고.
- 다음 지시사항을 하달 받음
지시대로 ivy.jar를 복사하고, 지시대로 hello를 해보자.


성공적으로 설치 되었다는 메시지를 확인했다.
반응형
posted by choiwonwoo
:
Focusing on .../Java 2011. 3. 10. 10:30
반응형

1. Package란?
패키지란, 클래스 또는 인터페이스의 묶음이다. 왜? 사용자(개발자)들이 사용하기 편하게 하기 위해서이다. 관련된것끼리 잘 묶어 놓았다. 

그런데 어떻게?
클래스가 물리적으로 하나의 클래스파일(.class)인 것과 같이 패키지는 물리적으로 하나의 디렉토리이다.



위의 화면을 보면, 각각의 클래스는, 각 기능별 의미에 맞추어서, 즉 IO, Lang 등과 같이 클래스틀이 구분뒤어서 디렉토리별로 정리가 되어 있다.

모든 클래스는 반드시 하나의 클래스에 포함되어야 한다. 즉, 각 클래스의 디렉토리 구조가 정해져 있다. 그래서 패키지란? 물리적으로 클래스를 포함하는 하나의 디렉토리이다.


패키지를 선언(디렉토리 위치 선언)하는 것은 아주 간단하다. 클래스나 인터페이스의 소스파일(.java)에서 다음과 같이 한 줄만 적어주면 된다.
package 패키지명;

위와 같은 패키지 선언문은 반드시 소스파일에서 주석과 공백을 제외한 첫 번째 문장이어야 하며, 하나의 소스파일에 단 한번만 선언될 수 있다. 해당 소스파일에 포함된 모든 클래스나 인터페이스는 선언된 패키지에 속하게 된다. 패키지명은 대소문자를 모두 허용하지만, 클래스명과 쉽게 구분하기 하기위해서 소문자로 하는 것을 원칙으로 하고 있다.

소스 파일을 보면, 패키명이 없다. 이런 경우는 compile option에서 지정한 위치에 class 파일을 생성한다.

소스에 패기지명을 지정한대로, 디렉토리와 클래스 파일이 생성된다.

이렇게 컴파일된 클래스를 실행하기 위해서는, 시스템에 이 패키지(클래스)의 위치를 등록해야 한다. 환경변수 classpath에 위치를 등록한다.

실행시에 반드시 클래스의 모든 패키지명을 적어준다.

cf) 클래스 패스를 따로 설정하지 않고, 실행하는 방법
1) JDK 밑의 jre\classes 에 클래스를 추가한다. 또는 jar파일로 압축된경우는 jre\lib\ext 밑에 추가한다.
2) java -cp 패키지위치 지정


모든 클래스는 반드시 하나의 패키지에 포함되어야 한다고 했다. 그럼에도 불구하고 소스파일을 작성할 때 패키지를 선언하지 않고도 아무런 문제가 없을 수 있는 이유는 자바에서 기본적으로 제공하는 '이름없는 패키지(unnamed package)' 때문이다. 


2. import란?
소스코드를 작성할 때 다른 패키지의 클래스를 사용할 때는 패키지명이 포함된 이름을 사용해야한다. 하지만, 매번 패키지명을 붙여서 작성하기란 여간 불편한 일이 아닐 것이다. 클래스의 코드를 작성하기 전에 import문으로 사용하고자 하는 클래스의 패키지를 미리 명시해주면 소스코드에 사용되는 클래스이름에서 패키지명은 생략할 수 있다.

import문의 역할은 컴파일러에게 소스파일에 사용된 클래스의 패키지에 대한 정보를 제공하는 것이다. 컴파일 시에 컴파일러는 import문을 통해 소스파일에 사용된 클래스들의 패키지를 알아 낸 다음, 모든 클래스이름 앞에 패키지명을 붙여 준다.

import문을 선언하는 방법은 다음과 같다.

import 패키지명.클래스명;
또는
import 패키지명.*;

키워드import와 패키지명을 생략하고자 하는 클래스의 이름을 패키지명과 함께 써주면 된다. 같은 패키지에서 여러 개의 클래스가 사용될 때, import문을 여러 번 사용하는 대신 '패키지명.*'을 이용해서 지정된 패키지에 속하는 모든 클래스를 패키지명 없이 사용할 수 있다.

반응형
posted by choiwonwoo
:
Focusing on .../Java 2011. 3. 10. 04:36
반응형

1. Ant란?
유닉스(리눅스)상에서 C/C++ 개발을 해본사람들은, makefile과 유사하다는 것을 알수 있다. 즉, Ant는 빌드 구성을 관리하는 빌드툴이다. 다른 점이라면, makefile은 text로 관리되지만, Ant는 xml로 관리된다는 것이다. 이러한 툴은 개발을 하다보면, 너무나 필요함을 느끼게 되고, 또한 관리적인 문제로 보았을때도 숙지가 되어야 한다.

2. Ant 설치
Download URL : http://ant.apache.org/bindownload.cgi
2-1) 압축해제

2-2) 환경변수 설정


3. ant 설치 확인


4. build.xml 파일 만들기
이제 모든 환경적인 준비는 완료, 그렇다면 makefile 만들듯이, build.xml을 만들고, 규칙에 익숙해지자.

[source : http://ant.apache.org/manual/tutorial-HelloWorldWithAnt.html]

Tutorial: Hello World with Apache Ant

This document provides a step by step tutorial for starting ava programming with Apache Ant. It does not contain deeper knowledge about Java or Ant. This tutorial has the goal to let you see, how to do the easiest steps in Ant.

Content

Preparing the project

We want to separate the source from the generated files, so our java source files will be in src folder. All generated files should be under build, and there splitted into several subdirectories for the individual steps: classes for our compiled files and jar for our own JAR-file.

We have to create only the src directory. (Because I am working on Windows, here is the win-syntax - translate to your shell):

md src

The following simple Java class just prints a fixed message out to STDOUT, so just write this code into src\oata\HelloWorld.java.

package oata;

public class HelloWorld {
    public static void main(String[] args) {
        System.out.println("Hello World");
    }
}

Now just try to compile and run that:

md build\classes
javac -sourcepath src -d build\classes src\oata\HelloWorld.java
java -cp build\classes oata.HelloWorld
which will result in
Hello World

Creating a jar-file is not very difficult. But creating a startable jar-file needs more steps: create a manifest-file containing the start class, creating the target directory and archiving the files.

echo Main-Class: oata.HelloWorld>myManifest
md build\jar
jar cfm build\jar\HelloWorld.jar myManifest -C build\classes .
java -jar build\jar\HelloWorld.jar

Note: Do not have blanks around the >-sign in the echo Main-Class instruction because it would falsify it!

Four steps to a running application

After finishing the java-only step we have to think about our build process. We have to compile our code, otherwise we couldn't start the program. Oh - "start" - yes, we could provide a target for that. We should package our application. Now it's only one class - but if you want to provide a download, no one would download several hundreds files ... (think about a complex Swing GUI - so let us create a jar file. A startable jar file would be nice ... And it's a good practise to have a "clean" target, which deletes all the generated stuff. Many failures could be solved just by a "clean build".

By default Ant uses build.xml as the name for a buildfile, so our .\build.xml would be:

<project>

    <target name="clean">
        <delete dir="build"/>
    </target>

    <target name="compile">
        <mkdir dir="build/classes"/>
        <javac srcdir="src" destdir="build/classes"/>
    </target>

    <target name="jar">
        <mkdir dir="build/jar"/>
        <jar destfile="build/jar/HelloWorld.jar" basedir="build/classes">
            <manifest>
                <attribute name="Main-Class" value="oata.HelloWorld"/>
            </manifest>
        </jar>
    </target>

    <target name="run">
        <java jar="build/jar/HelloWorld.jar" fork="true"/>
    </target>

</project>

Now you can compile, package and run the application via

ant compile
ant jar
ant run

Or shorter with

ant compile jar run

While having a look at the buildfile, we will see some similar steps between Ant and the java-only commands:

 
  

Enhance the build file

Now we have a working buildfile we could do some enhancements: many time you are referencing the same directories, main-class and jar-name are hard coded, and while invocation you have to remember the right order of build steps.

The first and second point would be addressed with properties, the third with a special property - an attribute of the <project>-tag and the fourth problem can be solved using dependencies.




Now it's easier, just do a ant and you will get

Buildfile: build.xml

clean:

compile:
    [mkdir] Created dir: C:\...\build\classes
    [javac] Compiling 1 source file to C:\...\build\classes

jar:
    [mkdir] Created dir: C:\...\build\jar
      [jar] Building jar: C:\...\build\jar\HelloWorld.jar

run:
     [java] Hello World

main:

BUILD SUCCESSFUL

Using external libraries

Somehow told us not to use syso-statements. For log-Statements we should use a Logging-API - customizable on a high degree (including switching off during usual life (= not development) execution). We use Log4J for that, because

  • it is not part of the JDK (1.4+) and we want to show how to use external libs
  • it can run under JDK 1.2 (as Ant)
  • it's highly configurable
  • it's from Apache ;-)

We store our external libraries in a new directory lib. Log4J can be downloaded [1] from Logging's Homepage. Create the lib directory and extract the log4j-1.2.9.jar into that lib-directory. After that we have to modify our java source to use that library and our buildfile so that this library could be accessed during compilation and run.

Working with Log4J is documented inside its manual. Here we use the MyApp-example from the Short Manual [2]. First we have to modify the java source to use the logging framework:

package oata;

import org.apache.log4j.Logger;
import org.apache.log4j.BasicConfigurator;

public class HelloWorld {
    static Logger logger = Logger.getLogger(HelloWorld.class);

    public static void main(String[] args) {
        BasicConfigurator.configure();
        logger.info("Hello World");          // the old SysO-statement
    }
}

Most of the modifications are "framework overhead" which has to be done once. The blue line is our "old System-out" statement.

Don't try to run ant - you will only get lot of compiler errors. Log4J is not inside the classpath so we have to do a little work here. But do not change the CLASSPATH environment variable! This is only for this project and maybe you would break other environments (this is one of the most famous mistakes when working with Ant). We introduce Log4J (or to be more precise: all libraries (jar-files) which are somewhere under .\lib) into our buildfile:

<project name="HelloWorld" basedir="." default="main">
    ...
    <property name="lib.dir"     value="lib"/>

    <path id="classpath">
        <fileset dir="${lib.dir}" includes="**/*.jar"/>
    </path>

    ...

    <target name="compile">
        <mkdir dir="${classes.dir}"/>
        <javac srcdir="${src.dir}" destdir="${classes.dir}" classpathref="classpath"/>
    </target>

    <target name="run" depends="jar">
        <java fork="true" classname="${main-class}">
            <classpath>
                <path refid="classpath"/>
                <path location="${jar.dir}/${ant.project.name}.jar"/>
            </classpath>
        </java>
    </target>

    ...

</project>

In this example we start our application not via its Main-Class manifest-attribute, because we could not provide a jarname and a classpath. So add our class in the red line to the already defined path and start as usual. Running ant would give (after the usual compile stuff):

[java] 0 [main] INFO oata.HelloWorld  - Hello World

What's that?

  • [java] Ant task running at the moment
  • 0 sorry don't know - some Log4J stuff
  • [main] the running thread from our application
  • INFO log level of that statement
  • oata.HelloWorld source of that statement
  • - separator
  • Hello World the message
For another layout ... have a look inside Log4J's documentation about using other PatternLayout's.

Configuration files

Why we have used Log4J? "It's highly configurable"? No - all is hard coded! But that is not the debt of Log4J - it's ours. We had coded BasicConfigurator.configure(); which implies a simple, but hard coded configuration. More confortable would be using a property file. In the java source delete the BasicConfiguration-line from the main() method (and the related import-statement). Log4J will search then for a configuration as described in it's manual. Then create a new file src/log4j.properties. That's the default name for Log4J's configuration and using that name would make life easier - not only the framework knows what is inside, you too!

log4j.rootLogger=DEBUG, stdout

log4j.appender.stdout=org.apache.log4j.ConsoleAppender

log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%m%n

This configuration creates an output channel ("Appender") to console named as stdout which prints the message (%m) followed by a line feed (%n) - same as the earlier System.out.println() :-) Oooh kay - but we haven't finished yet. We should deliver the configuration file, too. So we change the buildfile:

    ...
    <target name="compile">
        <mkdir dir="${classes.dir}"/>
        <javac srcdir="${src.dir}" destdir="${classes.dir}" classpathref="classpath"/>
        <copy todir="${classes.dir}">
            <fileset dir="${src.dir}" excludes="**/*.java"/>
        </copy>
    </target>
    ...

This copies all resources (as long as they haven't the suffix ".java") to the build directory, so we could start the application from that directory and these files will included into the jar.

Testing the class

In this step we will introduce the usage of the JUnit [3] testframework in combination with Ant. Because Ant has a built-in JUnit 3.8.2 you could start directly using it. Write a test class in src\HelloWorldTest.java:

public class HelloWorldTest extends junit.framework.TestCase {

    public void testNothing() {
    }
    
    public void testWillAlwaysFail() {
        fail("An error message");
    }
    
}

Because we dont have real business logic to test, this test class is very small: just show how to start. For further information see the JUnit documentation [3] and the manual of junit task. Now we add a junit instruction to our buildfile:

    ...

    <target name="run" depends="jar">
        <java fork="true" classname="${main-class}">
            <classpath>
                <path refid="classpath"/>
                <path id="application" location="${jar.dir}/${ant.project.name}.jar"/>
            </classpath>
        </java>
    </target>
    
    <target name="junit" depends="jar">
        <junit printsummary="yes">
            <classpath>
                <path refid="classpath"/>
                <path refid="application"/>
            </classpath>
            
            <batchtest fork="yes">
                <fileset dir="${src.dir}" includes="*Test.java"/>
            </batchtest>
        </junit>
    </target>

    ...

We reuse the path to our own jar file as defined in run-target by giving it an ID. The printsummary=yes lets us see more detailed information than just a "FAILED" or "PASSED" message. How much tests failed? Some errors? Printsummary lets us know. The classpath is set up to find our classes. To run tests the batchtest here is used, so you could easily add more test classes in the future just by naming them *Test.java. This is a common naming scheme.

After a ant junit you'll get:

...
junit:
    [junit] Running HelloWorldTest
    [junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 0,01 sec
    [junit] Test HelloWorldTest FAILED

BUILD SUCCESSFUL
...

We can also produce a report. Something that you (and other) could read after closing the shell .... There are two steps: 1. let <junit> log the information and 2. convert these to something readable (browsable).

    ...
    <property name="report.dir"  value="${build.dir}/junitreport"/>
    ...
    <target name="junit" depends="jar">
        <mkdir dir="${report.dir}"/>
        <junit printsummary="yes">
            <classpath>
                <path refid="classpath"/>
                <path refid="application"/>
            </classpath>
            
            <formatter type="xml"/>
            
            <batchtest fork="yes" todir="${report.dir}">
                <fileset dir="${src.dir}" includes="*Test.java"/>
            </batchtest>
        </junit>
    </target>
    
    <target name="junitreport">
        <junitreport todir="${report.dir}">
            <fileset dir="${report.dir}" includes="TEST-*.xml"/>
            <report todir="${report.dir}"/>
        </junitreport>
    </target>

Because we would produce a lot of files and these files would be written to the current directory by default, we define a report directory, create it before running the junit and redirect the logging to it. The log format is XML so junitreport could parse it. In a second target junitreport should create a browsable HTML-report for all generated xml-log files in the report directory. Now you can open the ${report.dir}\index.html and see the result (looks something like JavaDoc).
Personally I use two different targets for junit and junitreport. Generating the HTML report needs some time and you dont need the HTML report just for testing, e.g. if you are fixing an error or a integration server is doing a job.






반응형
posted by choiwonwoo
:
MINERVA/C_CPP 2011. 2. 25. 07:18
반응형

Binding :
Binding means asociating the values to symbols in the program code. There are two type compile time ( Static !) and runtime (synamic)

1. Compile time bindings are done at compile time, thus all the symbols are resolved at compile time, This increases the compile time thugh u can see some run time performance increases ( generally its the load time). Here things are static and the code generated by compiler is ready to load at a relative memory area and the start executing.

2. runtime binding or dynamic binding, where the bindings are delayed to load time or run time. the symbols will be resolved at load time or run time so that the extern declarations and other external linking can be allowed. This may add small extra time in execution of the progrm ( May be loading ).

compile time binding is when datatypes are checked when the code is compiled

runtime binding is when datatypes are not checked until the compiled code actually runs and therefore requires the programmer to be more formal and strict in variable assignment


source : http://answers.yahoo.com/question/index?qid=20060722082832AAC4KKx
반응형
posted by choiwonwoo
:
MINERVA/C_CPP 2011. 2. 24. 08:02
반응형
요즘 깜빡 깜빡 까먹는데, 나 자신에게 놀란다. 그래서 나 자신을 위해서 기록하고자 한다.

-1의 HEX값은? 당연히 0xFFFFFFFF :
-2의 HEX값은? 단연히 0xFFFFFFFE :

이렇게 되는 이유를 짐작 못하신다면, 컴퓨터에서 정수형의 표현 가능한 범위를 생각한다면, 추측이 가능하실껏이다.

반응형
posted by choiwonwoo
:
MINERVA/C_CPP 2011. 2. 24. 07:53
반응형
오늘 지인으로 아래와 같은 질문을 받았다.

질문 : "본인이 Unix(solaris로 추정)에서 작업한 프로그램을 Linux로 포팅하고 있는데, ps -ef | grep "xxx"로 모니터링을 하면, 유닉스에서는 프로세스가 나왔는데, 리눅스에서는 각 thread가 나온다고 한다. 이거 이상하다?"

답변 : 이런 상황에 대해서 이전에 경험한적이 있어 아래와 같이 정리했다.
아시다 시피, Multi-thread programming을 할때는 각 OS별로 어떠한 형식의 Thread를 지원하는 숙지할 필요가 있다. 내가 기억하기로는, Kernel thread를 지원하는 것은 윈도우, solaris, 그리고 일부 Linux버젼(커널 버젼별로 차이가 있다.)이다. 그럼 좀 더 자세하게 살펴 보자.

1.  Process vs Thread
1.1 Process = code + data + stack + heap segment
1.2 Thread = stack + register , 그리고 나머지 code + data+ heap은 다른 thread와 공유한다. 특히 heap을 다른 thread와 공유하기 때문에, 동기화 이슈가 발생한다.

2. User thread란?
결론부터 말한다면, Kernel Thread보다 빠르다. 왜냐면, context switching(thread switching)이 Kernel level로 내려가지 않고, User mode에서 모두 처리가 된다.(즉 system call이 발생하지 않는다.) 그렇기 때문에 Kernel thread보다 빠르다.

또한, 일반적으로 학교에서 많은 종류의 thread scheduling algorithms(FIFO, FILO, RR etc)을 배웠다. 하지만 이러한 알고리즘은 OS에 종속적이고, 대부분의 OS가 RR을 사용하고 있는것으로 알고 있다. 하지만 때로는 내가 쓰레드 스케줄링 알고리즘을 선택하고 싶을때가 있다. 이것은 User Thread에서 가능하다. 이것이 가능한 이유는 User Thread를 구현하기 위해서는 POSIX와 같은 라이브러리가 필요한데, 이 라이브러를 통해서 이 스케줄링 알고리즘을 선택또는 변경 할수 있기 때문이다. 또한 thread의 생성과 관리가 빠르다.

하지만 문제점이 있다.
첫째: 커널 관점에서는 하나의 프로세스(또는 thread)에서 여러개의 thread가 동작하고 있다. 프로세스에서는 내부적으로 thread가 어떻게 돌아가는지 알수가 없다. 그래서 만약 하나의 thread가 blocking system call을 한 경우, 즉 하나의 thread가 non-blocking IO를 발생시키면, Kernel에서는 어느 thread가 이 call을 했는지 모른다. 이러한 thread sheduling에 대한 부분은 사용되는 라이브러리의 안정성과 성능에 달려 있다. 

둘째: Multi-cpu환경에서는 구조성 병렬성이 떨어진다. 좀더 자세하게 이야기하면, 프로세스는 CPU의 단위이다. OS는 각각의 프로스가 관리하는 thread들에 대한 내용을 모른다. 
 

3. Kernel Thread란?
모든 thread의 생성과 관리(스케줄링)가 OS에서 한다. 당연히 system call로 작업이 되기 대문에 User thread보다 성능이 않좋다(?).

장점: 개발등 관리가 편하다. 모든 것을 OS(윈도우, 솔라리스)에서 관리해주기 때문이다.
단점: 상대적으로 속도가 늦다.


반응형
posted by choiwonwoo
: