5 – Unstructured Text

The languages we use to communicate with each other also have defined grammatical rules. And indeed, in some situations we use simple structured sentences but for the most part human discourse is complex and unstructured. Despite that, we seem to be really good at understanding each other and even ambiguities are welcome to a certain extent. So, what can computers do to make sense of unstructured text? Here are some preliminary ideas. Computers can do some level of processing with words and phrases, trying to identify key words, parts of speech, named entities, dates, quantities, etc. Using this information they can also try to parse sentences, at least ones that are relatively more structured. This can help extract the relevant parts of statements, questions, or instructions. At a higher level computers can analyze documents to find frequent and rare words, assess the overall tone or sentiment being expressed, and even cluster or group similar documents together. You can imagine that building on top of these ideas, computers can do a whole lot with unstructured text even if they cannot understand it like us.

%d 블로거가 이것을 좋아합니다: