|
Indexable phrases are either proper nouns, or noun phrases of more than one word. The Resnikoff-Dolby 30:1 Rule2 tells us how large the index should be. The most-cited terms in an index summarize the text's topic. The distribution of most-cited terms within an index tells us what sort of text it is. If few terms are highly cited, it is an overview or anthology; if many are highly cited, it is an in-depth treatment of a single topic. |
Catholic(20) Germany(18) Ferdinand(17) Habsburg(13) Protestant(13) Wallenstein(11) Gustavus(9) Bohemia(9) Empire(9) Spain(9) Gustavus Adolphus(8) Maximilian(8) French(7) Spanish(7) Netherlands(7) Protestantism(6) northern Germany(6) France(6) Baltic(6) Europe(5) Sweden(5) Lutheran(5) Protestant princes(5) Swedes(5) Bohemian(5) Westphalia(4) southern Germany(4) Catholicism(4) imperial authority(3) Catholic League(2) Peace of Augsburg(2) central Europe(2) southern shores(2) Edict of Restitution(2) northern provinces(2) Treaty of Pyrenees(2) Treaty of Westphalia(2) northern kingdom(2) Catholic revival(2) Catholic Reformation(2) German Protestant princes(2)
search engine(86) Google(53) Web(33) PageRank(27) docID(19) search result(14) wordID(13) data structures(12) anchor text(12) Page(12) bill clinton(12) Bill(12) Clinton(12) quality search(9) Conference(9) information retrieval(9) high quality(8) commercial search(7) million pages(7) forward index(6) commercial search engines(6) Stanford(6) World Wide Web(6) link structure(6) document index(5) Repository(5) McBryan(5) link text(5) Figure(4) President(4) external meta information(4) ranking function(4) search engine technology(4) operating systems(4) anchor hits(4) Marchiori(4) quality search results(4) computer science(4) Lawrence Page(4) large-scale search(4) business model(4) cellular phone(4) World Wide Web Worm(3) word occurrence(3) high quality search(3) research tool(3) high precision(3) higher quality search(3) information retrieval systems(3) damping factor(3) search quality(3) Pinkerton(3) Sergey Brin(3) compact encoding(3) hash table(3) Inverted Index(3) storage space(3) large-scale search engine(3) link points(3) reasonable cost(3) Stanford University(3) full text(3) Hector Garcia-Molina(3) random surfer(3) high PageRank(3) great deal(2) single word(2) robots exclusion protocol(2) cooperative agreement(2) Driver Attention(2) random page(2) manipulating search engines(2) citation importance(2) Order to the Web(2) html version(2) plain hits(2) intuitive justification(2) Future Work(2) Text Retrieval Conference(2) search terms(2) additional information present(2) multiple file systems(2) hits occurring(2) full version(2) storage requirements(2) academic search engines(2) data mining(2) current version(2) large amount(2) extra words(2) file descriptors(2) hypertextual information(2) indexing phase(2) challenging task(2) capitalization information(2) name servers(2) HTML tags(2) structure present(2) forward barrels(2) reasonable number(2) Rajeev Motwani(2) major search engine(2) higher quality search results(2) large part(2) main goal(2) Bill Clinton Joke of the Day(2) existing systems(2) Terry Winograd(2) worth looking(2) anchors file(2) links database(2) query caching(2) indexer performs(2) broken link(2) huge amount(2) research interests(2) complex system(2) Santa Clara(2) multiple word(2) ranking system(2) designing Google(2) Google search engine(2) short barrel(2) National Science(2) indexing system(2) million words(2) relevant documents(2) fair amount(2) fancy hits(2) administration cost(2)
|
#1: search term = 'pyramid'. Note that 'pyramid' can refer to various things: tombs, a dietary guide (= 'the food pyramid'), or swindles ('pyramid schemes'). #2: search term = 'history'. The search term is quite broad. The heterogeneity of the collection is indicated by the rapid drop-off in the number of citations in the combined index-- only 5 terms are mentioned in more than 4 documents. #3: search term = 'middle ages'. The search term is more specific. 9 terms are mentioned in 5 or more documents, showing the increasing homogeneity of the collection. #4: search term = 'thirty years war'. The search term is quite specific. 18 terms are mentioned in more than 4 documents. Interestingly, quite a number of documents have nothing to do with the 17th-century conflict, other than the name. |
university(11883) program(4778) science(4506) center(4381) student(3616) service(3325) research(3141) graduate student(2940) college(2896) education(2813) school(2541) library(2437) department(2375) united state(2268) new york(2237) state(2214) information(2039) office(1956) english(1856) california(1792) home page(1788) course(1749) faculty member(1738) web(1651)
UNIVERSITY: University of California(1330) University of Minnesota(1038) University of Michigan(781) State University(738) University of Chicago(684) University of Maryland(680) University of Iowa(672) Case Western Reserve University(656) Colorado State University(598) University of Arizona(566) Northwestern University(552) Boston University(549) GRADUATE: graduate students(2940) Graduate School(1295) graduate program(1033) Graduate Studies(373) graduate courses(277) graduate study(249) graduate degree(175) Graduate College(167) graduate education(144) Graduate Assistant(135) graduate credit(131) Graduate Admissions(112) STATE: United States(2268) State University(738) Colorado State University(598) Iowa State University(511) Penn State(496) Florida State University(443) Wayne State University(408) Kansas State University(404) Arizona State University(275) Ohio State University(246) Diego State University(232) Iowa State(207) STUDENT: graduate students(2940) international students(634) undergraduate students(489) Student Affairs(470) Student Services(435) student organizations(423) transfer students(241) Student Life(238) Dean of Students(216) Prospective Students(190) medical students(188) doctoral students(176) SCIENCE: Computer Science(1610) social sciences(984) Political Science(749) National Science Foundation(429) Biological Sciences(412) Health Sciences(323) Life Sciences(283) Information Science(246) Environmental Science(243) Materials Science(232) physical sciences(215) Natural Sciences(209) SCHOOL: Graduate School(1295) high school(1017) School of Medicine(586) Law School(579) medical school(526) School of Law(381) School of Music(287) School of Education(215) School of Engineering(210) Business School(174) high school students(168) Eastman School of Music(159) PROGRAM: graduate program(1033) degree program(735) Academic Programs(335) research program(269) Honors Program(266) undergraduate program(264) certificate program(245) training program(204) doctoral program(197) education programs(172) Program Director(146) International Programs(145) INFORMATION: information technology(816) Information Systems(479) General Information(389) additional information(364) Contact Information(257) Information Science(246) information resources(149) Information Services(147) Geographic Information Systems(113) Information Center(113) information technologies(103) information session(91) EDUCATION: higher education(775) College of Education(313) distance education(233) School of Education(215) special education(194) general education(193) Physical Education(191) medical education(183) education programs(172) general education requirements(169) teacher education(151) Health Education(147) COURSE: course work(713) Course Description(538) graduate courses(277) core courses(234) course materials(188) course requirements(169) Course Schedule(161) course offerings(145) undergraduate courses(137) level courses(135) elective courses(127) online course(97) COLLEGE: College of Engineering(434) College of Arts(420) community college(314) College of Education(313) Wellesley College(256) College Park(235) Graduate College(167) College of Medicine(162) Fellows of Harvard College(156) College of Business(151) College of Liberal Arts(137) Mills College(136) RESEARCH: research project(616) research interests(417) Research Center(318) research program(269) Research Associate(190) research paper(177) Research Assistant(151) research group(148) research methods(138) undergraduate research(122) Operations Research(121) current research(120) SERVICE: Student Services(435) Career Services(417) Health Services(311) public service(296) community service(248) Human Services(177) support services(167) Computing Services(149) Information Services(147) Student Health Service(141) food service(141) Dining Services(126) CENTER: Medical Center(550) Research Center(318) Career Center(253) Resource Center(222) Health Sciences Center(151) Student Center(145) Counseling Center(145) Cancer Center(131) Health Center(117) Information Center(113) Learning Center(113) Computing Center(105) SYSTEM: Information Systems(479) operating system(448) Davis Health System(163) computer system(161) solar system(148) Health System(134) nervous system(113) Geographic Information Systems(113) system administrator(89) Systems Engineering(85) control systems(82) immune system(76) DEPARTMENT: U. S. Department(411) department head(249) department chair(220) academic department(142) Computer Science Department(124) Department of Mathematics(117) Department of Education(116) Department of Chemistry(114) Department of Physics(108) English Department(107) Police Department(105) Department of Computer Science(96) OFFICE: office hours(501) Office of Admissions(180) Office of the Registrar(166) Registrar's Office(137) Dean's Office(125) Admissions Office(123) Office of Research(118) Office of the University Registrar(111) Financial Aid Office(94) News Office(86) Dean of Students Office(80) Study Abroad Office(78) LIBRARY: University Library(178) Law Library(159) Library of Congress(132) Library Catalog(122) Digital Library(98) Science Library(96) Music Library(90) Main Library(89) Engineering Library(81) Library Services(77) library staff(77) Library Resources(76)