RISS 검색 - 국내학술지논문

무료
기관 내 무료
유료

내보내기
내책장담기
한글로보기

정확도순

내림차순

내림차순

10개씩 출력

1
Excerption of Collocations by Observing the Sequence of Grammatical Properties of Words

Piotr wierzchoń,(피오트르 비에쉬혼) 아시아·중동부유럽학회 2006 동유럽발칸학 Vol.8 No.1
- 원문보기 2
  스콜라
  
  스콜라
본 논문은 구성성분의 문법적 표현 방법에 기초한 복합구문의 발췌 방법을 연구한 것이다. afera(스캔들) 단어를 사용하는 복합구문을 예로 들어 본 연구 를 진행하였으며, 굴절의 특성을 가지고 있는 폴란드어에 초점을 맞추었다. 그 결과 afera podatkowa, aferze podatkowej, afery podatkowe와 같이 다양한 굴 절형태가 나타나는 것을 알 수 있었다. 자동적인 발췌는 성, 수, 격의 호응 관계 가 되풀이 되면서 이루어진다. The article proposes a method for the excerption of collocations based on grammatical description of their components. The experiment has been explicated using collocations of the word afera (scandal) as an example. The main focal point was inflectional properties of the Polish language; therefore, only those word combinations were observed that tended to appear in various inflectional forms, such as afera podatkowa, aferze podatkowej, afery podatkowe. A step toward automatic excerption has been to find the most often repeating sequences of properties, such as Gender, Number and Case categories.
2
“Pan Tadeusz”(“Mr. Thaddeus”) by A. Mickiewicz

Piotr Wierzchoń 아시아·중동부유럽학회 2004 동유럽발칸학 Vol.6 No.1
- 원문보기
본 논문은 컴퓨터 전산 시스템을 활용하여 텍스트의 표기를 경제적이고 효과적으로 운영하기 위한 방안을 연구한 것이다. 본 저자는 논문에서 그러한 방안을 “praxeology”라고 명명하였다. 즉 본인의 연구방안에 따르면 컴퓨터에 특정 단어의 일정부분을 입력시키면 뒷 부분은 굳이 쓰지 않아도 이미 컴퓨터가 자동으로 그 단어를 인지하여 자동으로 단어가 완성되는 시스템인 것이다. Praxeology의 효율성을 증명하기 위해 본인은 폴란드 민족시인 아담 미츠키에비츠의 서사시 <판 타데우쉬> 제 1장을 연구대상으로 설정하였다. 그 이유는 <판 타데우쉬>가 “폴란드 낭만주의 문학의 최고봉”이자 “폴란드 문학의 정수”라고 일컬어지는 작품으로 폴란드 인들에게 가장 널리 알려진 익숙한 작품이기 때문이다. Through praxeology of an orthographic information we will understand such an artificial organization of an orthographic text that should be characterized by two factors: i) minimalisation of graphics elements of a text, ii) entire content equivalence of input and output (prepared) text In accordance with the mass conservation principle, there does not exist an absolute compression of orthographic information: by reducing information quantum in one place, we gather the same amount (perhaps of a different shape) in another place. A problem worth discussing, however, is such an organization of a information, so that it may be used for various purposes. An orthographic structure of a text for some purposes may be entirely unergonomic for other uses. The above mentioned places may, in specific conditions, “offer” unidentical benefits. For example, while creating, on computer hard drives, an index base referring a user to a specific place in a document, we reduce the disk space by creating artificial (i.e. recognizable for the used indexing device) indexes. The benefit is as follows: we reduce the access time to these documents, the loss, however, may be understood as decreasing the free disk space. Thus, we gain time at the cost of space. In the following work will be presented the method of so called “treeing” (resemblance of a Christmas tree) of graphic information (diacritisation1)). In a particular case lexical systems undergo treeing, therefore, for us, texts expressed in a natural language. One of the possible applications of the method is a possibility of compressing orthographic information so that the result allows a 100% perception of naturally registered input entries. In other words, while using the presented method, an erroneous perception of an analyzed text is not possible. Evidently, such a result (with a compression of the information space) will be obtained with the use of specific auxiliary devices. The main one will be so called a tree – a set of all diacritic sequences (compare below). The analysis will be performed on a specific material. In order to facilitate maximally the confrontation of our results with possible individual calculations, we refer to a popular text, the first book of Pan Tadeusz (Mr. Thaddeus) by A. Mickiewicz. All the operation illustrations, presented in this work, are taken from this text. Finally, it is worth adding that the aim of the analyses is reaching maximum communication effectiveness through a compression of a graphic message. It can be achieved with the use of rules of so called diacritological grammar (Wierzchoń 2004). While presenting the principles of creating a tree, we mainly refer to this work; we do not analyse a number2) of implications of generally scientific or linguistic nature which arise from the analyses.
3
In search of non-quantitative semiautomatic methods of collocations retrieval

Piotr Wierzchoń,(피오트르 비에쉬호인) 아시아·중동부유럽학회 2005 동유럽발칸학 Vol.7 No.1
- 원문보기
본 논문은 번역에 있어서 어려움을 야기 시키는 collocation(복합 구문)의 반(半)자동적인 도출(導出) 방법에 대해 연구한 것이다. 폴란드어의 어형 변화와 biała niedziela, białej niedzieli, białe niedziele 같은 굴절형의 기본 동형(同形)형태 부재로 T-score, Gravity Counts, Dice, Mutual Information등의 음량(音量) 방법은 사전에서 표제어가 되는 낱말의 이형, 변화형을 하나로 묶기 위해 분류를 하는 형태소 분석기에 의존하게 되면서 이용 가치가 없어졌다. 현대 어휘의 분석에 있어서, 선 형태소 분석이 세부적이지 못하듯이 실제로도 이것은 큰 영향을 주지 않는다. 이와 같이 심한 어형 변화를 하는 언어에 있어서 연결 어휘의 자동적인 도출을 위해 주어진 어휘 결합의 빈도수를 조사하는 bigram과 같은 최적 음량 방법의 제안은 불가능한 것처럼 보일 수 있다. 이 제안은 주로 텍스트에 적용되는 기본 필터를 사용해서 분석할 수 있는 방법을 사용했으며, 문장에서 주로 어휘적인 가치가 있다. 이것은 두 개, 세 개, 네 개의 복합 구문이 tzw.와 같이 약자로 표시되고, 약자의 오른쪽은 마침표, 쉼표, 세미콜론에 의해 한정되는 방법에 대한 연구이다. 텍스트의 크기는 본 연구에서 중요하지 않다. The article presents the semiautomatic method for excerption of collocations which cause most difficulties in translation practices. Due to the flectional features of the Polish language and the lack of homography of particular flection forms (biała niedziela, białej niedzieli, białe niedziele etc.), the quantitative methods such as T-score, Gravity Counts, Dice, Mutual Information etc. (cf. Daudaravičius, Marcinkevičiené 2004) lose their value since their use depends on the use of a competent morphological analyzer lemmatizing all flection forms to dictionary entry forms (lemmas). In the case of an analysis of contemporary vocabulary, noting phenomena of real life, such a morphological pre-analysis will be significantly slowed down. Thus, it seems impossible to propose an optimal quantitative method, e.g. bigram (i.e. in which an observation of frequency of given lexical connections is used; cf. Stubbs 2002, Yamamoto, Church 2001) for an automatic excerption of lexical connections in highly flectional languages. The proposition presented in this text has a mainly lexicographical value and involves the use of a simplified filter applied on a corpus. This is an observation of two-, three- and four-word collocations preceded by the abbreviation tzw. (so-called) and limited on the right side by a punctuation mark (e.g. full stop, comma, semicolon etc.). The size of the corpus does not matter in our research.

내보내기
내책장담기
한글로보기

정확도순

내림차순

내림차순

10개씩 출력

맨처음 페이지로 1 맨끝 페이지로

상세검색

RISS 보유자료

상세검색

해외전자자료

연관 검색어 추천