पाठसंग्रह

मुक्त ज्ञानकोश विकिपीडिया से

नेविगेशन पर जाएँ खोज पर जाएँ

साँचा:asbox भाषाविज्ञान में बड़े और संरचित (structured) पाठ के समुच्चय को पाठसंग्रह या कॉर्पस (corpus) कहते हैं। पाठसंग्रह के बहुत से उपयोग हैं। जैसे किसी भाषा में प्रयुक्त शब्दों की बारंबारता निकालना, किसी भाषा में प्रयुक्त सर्वाधिक १००० शब्दों की जानकारी निकालना, कोई शब्द किस-किस प्रकार से प्रयुक्त होता है आदि।

बाहरी कड़ियाँ

हिन्दी पाठसंग्रह में खोज करें (CFILT IITB)
हिन्दी का पाठसंग्रह (यूनिकोड में)
कार्पस : एक संकल्पना (प्रयास)
कॉर्पस-अनुसंधान ही तय करेगा हिंदी के भविष्य का नक़्शा - डॉ॰ गिरीशनाथ झा
Freely-available, web-based corpora (100 million - 400 million words each): American (COCA), British (BNC), TIME, Spanish, Portuguese
साँचा:dmoz
ACL SIGLEX Resource Links: Text Corpora
The Leipzig Glossing Rules: Conventions for interlinear morpheme-by-morpheme glosses
Developing Linguistic Corpora: a Guide to Good Practice
An interface for querying automatically-constructed virtual corpora साँचा:category handler साँचा:main other साँचा:main other^{[dead link]}.
An interface साँचा:category handler साँचा:main other साँचा:main other^{[dead link]} for querying text corpora constructed through guided crawling of online news sites, the corpora (both local and virtual) constructed using the SPARTAN technique, and publicly-available collections (e.g. Reuters-21578, texts from the Gutenberg project, GENIA).

"https://hiwiki.iiit.ac.in/index.php?title=पाठसंग्रह&oldid=86169" से लिया गया

श्रेणियाँ: