This is the directions document for Project 4 Autocomplete in CompSci 201 at Duke University, Spring 2024.
See the details document for information on using Git, starting the project, and more details about the project including information about the classes and concepts that are outlined briefly below. You'll absolutely need to read the information in the details document to understand how the classes in this project work independently and together. The details document also contains project-specific details, this document provides a high-level overview of the assignment.
For details, see the details document -- you'll find a high-level overview here. To complete the assignment you'll do the following, roughly in the order shown.
- Run
AutocompleteMainusingBruteAutoComplete(complete in the starter code) to see how the autocomplete application works. - Implement the
comparemethod in thePrefixComparatorclass that is used in theBinarySearchAutocompleteclass. Test withTestTerm. - Implement two methods in
BinarySearchLibrary:firstIndexandlastIndex, both of which will use thePrefixComparatoryou completed in the previous step. Test withTestBinarySearchLibrary. We recommend you try to finish these first three steps early. - Finish implementing
BinarySearchAutocompletethat extendsAutocompletorby completing thetopMatchesmethod. This will use thefirstIndexandlastIndexmethods you wrote in the previous step and the code inBruteAutocomplete.topMatchesas a model. Test withTestBinarySearchAutocompleteand runningAutocompleteMainusingBinarySearchAutocomplete. - Create and implement a new class
HashListAutocompletethat implements interfaceAutocompletor. Test by runningAutocompleteMainusingHashListAutocomplete. - Run benchmarks and answer analysis questions. Submit code, analysis.
When you fork and clone the project you'll be able to run the main method of AutocompleteMain. Doing so will launch a "GUI" (Graphical User Interface) that allows you to select a data file. The data file will determine the set of possible words to be recommended by the autocompleter application, and also includes weights for how common the words are. Several such files are included along with this project.
Once you select a file, the GUI will prompt you to enter a term. As you type, you should see the most common words that complete what you have typed so far appearing. For example, if you run AutocompleteMain and select the file words-333333.txt from the data folder you should see the output below shown in the GUI window for the search query auto. You'll use this same search term, auto to test the other implementations you develop. Refer back to this visual to see if your new classes and code work.
By default, AutocompleteMain is using BruteAutocomplete to find the correct words/weights (terms) to display. You will write two additional implementations of the Autocompleter interface: BinarySearchAutocomplete and HashListAutocomplete. When you finish one, you can again run AutocompleteMain using your new implementation by changing the AUTOCOMPLETOR_CLASS_NAME just before the main method of AutocompleteMain.
A PrefixComparator object is obtained by calling PrefixComparator.getComparator with an integer argument r, the size of the prefix for comparison purposes. The value is stored in the instance variable myPrefixSize as you'll see in the code. This class is used in BinarySearchAutocomplete, but not in BruteAutocomplete. See the details document for details on the class and method.
You can test your code with the TestTerm JUnit class which has several tests for the PrefixComparator class.
The class BinarySearchLibrary stores static utility methods used in the implementation of the BinarySearchAutocomplete class. You will need to implement two methods in particular: firstIndex and lastIndex. Both are variants on the Java API Collections.binarySearch(list, key, c) method that, in addition to returning an index dex such that c.compare(list.get(dex), key)==0, also guarantee to find the first or last such index respectively. Details on the methods you write are in the details document.
BinarySearchAutocomplete will use these methods along with the PrefixComparator you already completed to efficiently determine the range of possible completions of a given prefix of a word typed so far.
Once you've implemented the methods in class BinarySearchLibrary, you'll still need to implement code for topMatches in the BinarySearchAutocomplete class -- a method required as specified in the Autocompletor interface. The other methods in BinarySearchAutocomplete are written, though two rely on the code you implemented in BinarySearchLibrary.
There is a comment in the topMatches method indicating where you need to add more to complete the implementation. You can expand below for more details on the code already written in topMatches that you do not need to change.
Code in static methods firstIndexOf and lastIndexOf is written to use the API exported by BinarySearchLibrary. You'll see that the Term[] parameter to these methods is transformed to a List<Term> since that's the type of parameter that must be passed to BinarySearchLibrary methods.
You'll also see a Term object called dummy created from the String passed to topMatches. The weight for the Term doesn't matter since only the String field of the Term is used in firstIndex and lastIndex calls.
See the details document for more information on this class and method.
In this part, you will provide one more implementation of the Autocompletor interface, this time from scratch. Unlike BruteAutocomplete and BinarySearchAutocomplete, this third implementation will be based on the use of a HashMap instead of the binary search algorithm. This class will provide an O(1) implementation of topMatches --- with a tradeoff of requiring more memory.
A skeleton of HashListAutocomplete can be found in the HashListAutocomplete.java file that implements the Autocompletor interface. For details about the class and code you write see the details document.
You'll submit the analysis as a PDF separate from the code in Gradescope.
Question 1. Inside of BenchmarkForAutocomplete, uncomment the two other implementation names so that myCompletorNames has all three Strings: "BruteAutocomplete", "BinarySearchAutocomplete", and "HashListAutocomplete" (if you want to benchmark only a subset of these, perhaps because one isn't working, just leave it commented out).
Run BenchmarkForAutocomplete three times, once for each of the files in the Benchmark program: threeletterwords.txt, fourletterwords.txt, and alexa.txt. You can change which file is being used inside of the doMark method. Copy and paste all three results into your analysis. An example and detailed information about the output is described in the expandable section below.
On Professor Steiger's laptop, the first few lines are what's shown below for data/threeletterwords.txt (in addition, the sizeInBytes for the implementations are shown at the bottom). These numbers are for a file of every three letter word "aaa, "aab", … "zzy", "zzz", not actual words, but 3-character strings. All times are listed in seconds.
- The
init timedata shows how long it took to initialize the different implementations. - The
searchcolumn shows the prefix being used to search for autocompletions; unlabeled "search" is for an empty string""which matches on every term. - The
sizecolumn shows how many terms havesearchas a prefix. This is described asMearlier in part 4. #matchshows the number of highest weight results being returned bytopMatches. This is described askearlier in part 4.- The next three columns give the running time in seconds for
topMatcheswith the given parameters for the different implementations.
init time: 0.004612 for BruteAutocomplete
init time: 0.003348 for BinarySearchAutocomplete
init time: 0.03887 for HashListAutocomplete
search size #match BruteAutoc BinarySear HashListAu
17576 50 0.00191738 0.00306458 0.00001950
17576 50 0.00039575 0.00198267 0.00000546
a 676 50 0.00034438 0.00014479 0.00000942
a 676 50 0.00035567 0.00015113 0.00000350
b 676 50 0.00016033 0.00011954 0.00000292
Question 2. Let N be the total number of terms, let M be the number of terms that prefix-match a given search term (the size column above), and let k be the number of highest weight terms returned by topMatches (the #match column above). The runtime complexity of BruteAutocomplete is O(N + M log(k)). The runtime complexity of BinarySearchAutocomplete is O(log(N) + M log(k)). Yet you should notice (as seen in the example timing above) that BruteAutocomplete is similarly efficient or even slightly more efficient than BinarySearchAutocomplete on the empty search String "". Answer the following:
- Explain why this observation (that
BruteAutocompleteis similarly efficient or even slightly more efficient thanBinarySearchAutocompleteon the emptysearchString"") makes sense given the values ofNandM-- in other words, is this an anomaly based on small clocktimes or is there a reason for it given the values of N, M, and what will happen with BinarySearch when searching for the empty string?
Question 3. Run the BenchmarkForAutocomplete again using alexa.txt but doubling matchSize to 100 (matchSize is specified in the runAM method). Again copy and paste your results. Recall that matchSize determines k, the number of highest weight terms returned by topMatches (the #match column above). Do your data support the hypothesis that the dependence of the runtime on k is logarithmic for BruteAutocomplete and BinarySearchAutocomplete?
Question 4. Briefly explain why HashListAutocomplete is much more efficient in terms of the empirical runtime of topMatches, but uses more memory than the other Autocomplete implementations.
Push your code to Git. Do this often. Once you have run and tested your completed program locally:
- Submit your code on gradescope to the autograder.
- Submit a PDF to Gradescope in the separate Analysis assignment. Be sure to mark pages for the questions as explained in the Gradescope documentation here.
| Points | Grading Criteria |
|---|---|
| 4 | Code for Comparator |
| 8 | Code for BinarySearchLibrary firstIndex and lastIndex |
| 6 | Code for BinarySearchAutocomplete.topMatches |
| 9 | Code for HashListAutocomplete |
| 1 | API |
| 8 | Analysis |
As for all projects, we will scale the coding and analysis to be 80% and 20% of the overall project grade, respectively.
