Part A – Question Bank

1. Define information retrieval.
Information Retrieval is finding material of an unstructured nature that satisfies an information need from within large collections.
3. List and explain components of IR block diagram.

  • Input – Store Only a representation of the document
  • A document representative – Could be list of extracted words considered to be significant.
  • Processor – Involve in performance of actual retrieval function
  • Feedback – Improve
  • Output – A set document numbers.

4. What is objective term and nonobjective term?
Objective Terms – Are extrinsic to semantic content, and there is generally no disagreement about how to assign them.
Nonobjective Terms – Are intended to reflect the information manifested in the document, and there is no agreement about the choice or degree of applicability of these terms.
5. Explain the type of natural language technology used in information retrieval.
Two types
I. Natural language interface make the task of communicating with the information source easier, allowing a system to respond to a range of inputs.
II. Natural Language text processing allows a system to scan the source texts, either to retrieve particular information or to derive knowledge structures that may be used in accessing information from the texts.
6. What is search engine?
A search engine is a document retrieval system design to help find information stored in a computer system, such as on the WWW. The search engine allows one to ask for content meeting specific criteria and retrieves a list of items that match those criteria.

7. What is conflation?
Stemming is the process for reducing inflected words to their stem, base or root form, generally a written word form. The process of stemming if often called conflation.
8. What is an invisible web?
Many dynamically generated sites are not index able by search engines; This phenomenon is known as the invisible web.
9. Define Zipf’s law.
An empirical rule that describes the frequency of the text words. It state that the i th most frequent word appears as many times as the most frequent one divided by i @, for some @>1.
10. What is open source software?
Open source software is software whose source code is available for modification or enhancement by anyone.
“Source code” is the part of software that most computer users don’t ever see; it’s the code computer programmers can manipulate to change how a piece of software—a “program” or “application”—works. Programmers who have access to a computer program’s source code can improve that program by adding features to it or fixing parts that don’t always work correctly.
