1. Goals of this assignment
1. Tools for Completing This Assignment
2. Specifications
1. Accepting a File for Spell Checking
2. Deciding How to Fix Misspelled Words
3. Recommending Replacements
4. Class Design
5. Testing
3. Submission
5. Appendix

# Goals of this assignment

You’re going to build a simple spell checker program in this assignment. The spell checker will:

1. Identify misspelled words in a file by checking words against a provided dictionary.
2. Prompt the user with a number of ways to fix the misspelled word.
3. Write a new file with the corrected text.

## Tools for Completing This Assignment

• File I/O
• Loops & Iteration
• Getting User Input from a Scanner
• String Formatting
• Finding the top N elements from an ArrayList for small N
• Git
• Unit Testing

### Collaboration Policy

You must collaborate with one other student on this assignment. You may write code with this one other student, and you may view each other’s code. You cannot write or read code with other students.

Please include a README.txt file in your submission that lists the partner you worked with, as well as a link to the private GitHub repo that you’re using to host your code. We will be able to determine who was responsible for what code by reading the commits from your Git repo. To that end, you must make some small number of commits as you work on the project. Proper collaboration practices will count for 5% of your grade.

### Style & Design Policy

Good style includes:

• Consistent indentation in code, with lines of code indented to the correct level representing the block they belong to.
• Identifiers for methods and variables that follow the correct camel case convention and are appropriately descriptive.
• Correct spacing between operators, parentheses, and brackets.

Fortunately, most good style practices can be automatically enforced using Eclipse. From Preferences, choose Java -> Editor -> Save Actions and check the Format source code box under the Perform the selected actions on save option.

Additionally, you’ll need to make good design choices for this assignment. We won’t be too strict, but some things to look out for are:

• Declaring variables only in the scopes where they’re needed and avoiding the creation of unnecessary fields
• Fields should be set to private with access moderated by getter and setter methods
• Classes should have clear responsibilities and no one class should represent multiple entities of a system.
• Closing File Streams after using them

Style and design scoring will consist of 10% of the assignment grade.

# Specifications

Access the starter files here.

## Accepting a File for Spell Checking

### Inputting Dictionary

When the program starts, prompt the user to enter a filename that refers to the dictionary that will be used for spellchecking.

If the provided filename references a file that does indeed exist, then print a message that you’ll use that dictionary. If the file doesn’t exist, or there is some error in opening it, then you should repeatedly prompt the user to provide a new filename until they provide one that can be successfully opened.

Consider the following example output to see how your program should behave. Lines starting with >> indicate that the text was entered by the user, not by the program.

Please enter the name of a file to use as a dictionary.
>> missingDict.txt
There was an error in opening that file.
Please enter the name of a file to use as a dictionary.
>> missingDict2.txt
There was an error in opening that file.
Please enter the name of a file to use as a dictionary.
>> validDictionary.txt
Using the dictionary at 'validDictionary.txt'.


You can inspect the provided file called engDictionary.txt to see what a dictionary will look like. This file represents the English dictionary containing correctly spelled words, one per line. The file that the user provides for spell checking will have each of its words checked against a dictionary that matches this format. Be cautious–we will test your program using files that are not engDictionary.txt. You can assume that if the dictionary file exists, it is correctly formatted.

### Inputting File to Spellcheck

Once you have a valid dictionary file, prompt the user to enter a filename that will be spellchecked.

If the provided filename references a file that does indeed exist, then you should print that you’re going to output the spell checked text in a file called INPUT-FILE-NAME_chk.txt. That is, fileName.txt should have its checked output go in fileName_chk.txt.

If the provided filename does not reference a file that exists, or if there is any other error related to opening the file, repeatedly prompt the user to provide a new filename until they provide one that can be successfully opened.

Consider the following example output to see how your program should behave.

Please enter the name of a file to be spell checked.
>> missingFile.txt
There was an error in opening that file.
Please enter the name of a file to be spell checked.
>> missingFile2.txt
There was an error in opening that file.
Please enter the name of a file to be spell checked.
>> validFile.txt
Spell checking for 'validFile.txt' will be output in 'validFile_chk.txt'.


When the user provides a file, you can assume that words are delimited by spacebar characters. Also, you can assume that all of the words in the file are lowercase and that the file contains no punctuation. The file may or may not contain line breaks; they should be treated like spaces and interpreted as word boundaries. For example, the user’s file might look like the following:

the old lady pulled her spectacles down and looked over them about the room then she put them up and looked out under them she seldom or never looked through them for so small a thing as a boy they were her state pair the pride of her heart and were built for style not service she could have seen through a pair of stove lids just as well she looked perplexed for a moment and then said not fiercely but still loud enough for the furniture to hear

## Deciding How to Fix Misspelled Words

If a word from the user’s file is not present exactly in the provided dictionary file, then you can assume that word is misspelled. For each misspelled word, the user will be prompted to fix or ignore the typo. The options presented to the user will be chosen among the following:

• The user can type the letter r to indicate that they want to replace the word with one of the words provided as a suggested spelling. Then they will separately enter a number to select one of the candidate suggestions. The misspelled word in the input file will be replaced in the output file with the chosen corrected spelling.
• The user can type the letter a to leave the misspelled word unchanged.
• The user can type the letter t to choose to provide the new spelling themselves. After entering t, the user will type the word that they want to use as replacement.

Typical interaction is modeled by the following examples. Lines starting with >> indicate that the text was entered by the user, not by the program.

The word 'morbit' is misspelled.
The following suggestions are available:
1. 'morbid'
2. 'orbit'
Press 'r' to replace, 'a' to accept, and 't' to enter a replacement manually.
>> r
Your word will be replaced with the suggestion you choose.
Enter the number corresponding to the word that you want to use for replacement.
>> 1


After this example, the output file will have the word "morbit" replaced with "morbid", which had been identified as the #1 suggestion.

The word 'automagically' is misspelled.
The following suggestions are available:
1. 'automatically'
Press 'r' to replace, 'a' to accept, and 't' to enter a replacement manually.
>> a


After this example, the output file will keep the word "automagically" spelled exactly this way.

The word 'ewook' is misspelled.
The following suggestions are available:
1. 'wok'
2. 'woo'
3. 'awoke'
Press 'r' to replace, 'a' to accept, and 't' to enter a replacement manually.
>> t
Please type the word that will be used as the replacement in the output file.
>> ewok


After this example, the output file will replace this instance of "ewook" with the word "ewok".

It may be the case that the spell checker cannot come up with a meaningful suggestions at all. This is a special case that should be handled carefully.

The word 'sleepyyyyyyyyy' is misspelled.
There are no suggestions in our dictionary for this word.
Press 'a' to accept, or press 't' to enter a replacement manually.
>> t
Please type the word that will be used as the replacement in the output file.
>> sleepy


If the user chooses an option other than a, t, or r (when it’s available), you should display an error message like so:

The word 'sleepyyyyyyyyy' is misspelled.
There are no suggestions in our dictionary for this word.
Press 'a' to accept, or press 't' to enter a replacement manually.
>> dfsjk
Please choose one of the valid options.
>> r
Please choose one of the valid options.
>> x
Please choose one of the valid options.
// etc.


For the sake of consistency, you’re required to match each of these previously identified text options exactly. To that end, we provide you the file Util.java, which contains format Strings that you can use directly in your code. For example, Util.MISSPELL_NOTIFICATION is "The word '%s' is misspelled.". To display the correct message, you can simply call System.out.printf(Util.MISSPELL_NOTIFICATION, misspelledWord) and the output will appear exactly matching the specification. This assignment is partially autograded, so failure to match the outputs exactly can result in serious deductions.

## Recommending Replacements

We provide the skeleton of a WordRecommender class for you to help define how we would like you to identify and rank possible replacements for misspelled words.

### Constructor

The constructor for the WordRecommender should take in a single String representing the filename of the dictionary file that the WordRecommender will use to make recommendations. You should not change the signature of the provided constructor, but you can write the body of the constructor however makes sense to you. Just be aware that a WordRecommender should be able to accept any dictionary file name, not just the default we provide to you.

### Identifying Replacements

public ArrayList<String> getWordSuggestions(String word,
int tolerance,
double commonPercent,
int topN)


Given an incorrect word, return a ArrayList<String> of legal word suggestions. You can assume that the method will only be called with words that are not already present in the dictionary; that is, word will not be found inside of the dictionary file used to instantiate this WordRecommender. commonPercent will be a double between 0.0 and 1.0, corresponding to 0% and 100%, respectively.

When using the WordRecommender object in your overall program, you should call getWordSuggestions with a tolerance of 2, common percentage minimum of 50%, and a “top N” of 4.

A String candidate is a valid candidate for replacing word if both:

1. the difference in length between candidate and word is at most tolerance characters.
2. candidate and word have at least commonPercent % of their characters in common.

To calculate the percentage of characters in common between two Strings a and b, you should create a set of the letters in a called aSet and a set of the letters in b called bSet. Remember that a set does not contain duplicates! The percentage of characters in common between a and b is:

# of characters in aSet AND in bSet / # of characters in aSet OR in bSet


Equivalently, that’s |aSet ∩ bSet| / |aSet ∪ bSet|. You can find examples of this calculation at the bottom of this document in the appendix.

The ArrayList<String> that you return should be contain the top topN replacement candidates sorted in increasing order of “left-right similarity” to the misspelled word. The following section has details on how to compute this similarity metric.

You are not permitted to use the Collections.sort method in order to accomplish this ordering. This is inefficient when you’re trying to get the top N words from a list if N is much smaller than the size of the list. Instead, you’ll have to think about how you get the top N words using your own algorithm. Using Collections.sort will be immediate grounds for failure on the assignment! Don’t use it!

To get you started, think about getting the candidate that is the most similar to the misspelled word. Then, get the candidate that is the second-most similar. Then the third-most. Repeat until the N-most similar, and you’re done.

### Ranking Replacements

public double getSimilarity(String word1, String word2)


Once we have identified all of the replacements for a particular spelling error, then we need to rank them in a particular way so that the most “similar” word in our dictionary to the misspelled word is the one that we recommend first, followed by the second-most “similar” word, and so on.

We’ll use a very basic notion of similarity in order to rank replacements called “left-right similarity”. Left-right similarity itself is computed as the average of the “left similarity” and the “right similarity” between two words.

Left similarity refers to the number of common characters between two Strings when we left-align both Strings and compare characters in all positions. For example:

WORD A:    o b l i g e
WORD B:    o b l i v i o n
left sim.: 1+1+1+1+0+0+0+0 = 4


Right similarity refers to the number of common characters between two Strings when we right-align both Strings and compare characters in all positions. For example:

WORD A:         o b l i g e
WORD B:     o b l i v i o n
right sim.: 0+0+0+0+0+1+0+0 = 1


Therefore, for the two words oblige and oblivion, the left-right similarity is (4.0 + 1.0) / 2 = 2.5. You can find an additional example of this calculation at the bottom of this document in the appendix.

## Class Design

We will test and run many features of your submission, but it’s important that your program runs using the following SpellCheckerRunner.java class:

SpellCheckerRunner relies on certain features of a SpellChecker class. A skeleton of this class has also been provided to you, and while you can add any method or field that you like, you should preserve the provided signatures exactly. You can modify the bodies of the provided method and constuctor as you like.

## Testing

You should test your program as much as you possibly can. At a bare minimum, write a comprehensive test suite for WordRecommender.java, testing both of the methods that you’re required to fill out as well as any additional methods you write. Write your tests for WordRecommender in WordRecommenderTest.java.

You should also write other unit tests for the other classes you’ll write. Submit these as well. You don’t need to write unit tests for SpellCheckerRunner or SpellChecker.start(), although you can if you would like to. Thorough testing of WordRecommender as well as your custom classes contribute 25% of your grade for this assignment.

# Submission

Submit at least the following files: README.txt, WordRecommender.java, WordRecommenderTest.java, SpellCheckerRunner.java, and SpellChecker.java. You are likely to have other files as well–if you design other classes and write other test suites, be sure to submit them as well. Also, submit a screenshot of your GitHub repo proving that you managed to collaborate with a partner and make some commits.

Ensure that your files do not have a package header! Specifically, the first lines of your file should not read package xyz or similar.

Please find these files locally on your file system. By default, they will be found in a folder called eclipse_workspace, often in your user’s directory. You should select each of the files and upload them as files (not as a directory, and preferably not as a .zip either). If you need help, please reach out on Piazza or in Office Hours.

This assignment is weighted equally with the other ones, but it’s listed out of 100 points for easier accounting.

Rubric Section Weight
Spell Checker Correctness 60 pts
Unit Testing 25 pts
GitHub Collaboration 5 pts
Style 5 pts
Design Choices 5 pts

# Appendix

## Hints for Getting Started

You don’t have too many design decisions to make in this assignment, but there will be a few.

Consider starting by writing tests for WordRecommender and implementing its methods. Once you understand what’s required there, you should have a good idea about how to use those methods.

Make sure you know where you’d like to interact with the user, how you’ll read from the input file, and how you’ll write to the output file.

## Calculating Characters in Common

If a = committee and b = comet, then aSet = {c, o, m, i, t, e} and bSet = {c, o, m, e, t}. The number of letters in both sets is 5 (c, o, m, e, and t). The number of letters in either set is 6 (c, o, m, e, t, i). Therefore, the percentage of characters in common is 5.0/6.0.

If a = gardener and b = nerdier, then aSet = {g, a, r, d, e, n} and bSet = {n, e, r, d, i}. The number of letters in both sets is 4 (n, e, r, d). The number of letters in either set is 7 (g, a, r, d, e, n, i). Therefore, the percentage of characters in common is 4.0/7.0.

## Calculating Left-Right Similarity

Consider the two words aghast and gross. We first need to compute the left similarity:

WORD A:    a g h a s t
WORD B:    g r o s s
left sim.: 0+0+0+0+1+0 = 1


Then we compute the right similarity:

WORD A:     a g h a s t
WORD B:       g r o s s
right sim.: 0+1+0+0+1+0 = 2


Therefore, the left-right similarity is (1.0 + 2.0) / 2 = 1.5.

## Example of a Full Execution

Please enter the name of a file to use as a dictionary.
>> missingDict2.txt
There was an error in opening that file.
Please enter the name of a file to use as a dictionary.
>> validDictionary.txt
Using the dictionary at 'validDictionary.txt'.
Please enter the name of a file to be spell checked.
>> missingFile2.txt
There was an error in opening that file.
Please enter the name of a file to be spell checked.
>> validFile.txt
Spell checking for 'validFile.txt' will be output in 'validFile_chk.txt'.
The word 'morbit' is misspelled.
The following suggestions are available:
1. 'morbid'
2. 'orbit'
Press 'r' to replace, 'a' to accept, and 't' to enter a replacement manually.
>> r
Your word will be replaced with the suggestion you choose.
Enter the number corresponding to the word that you want to use for replacement.
>> 1
The word 'ewook' is misspelled.
The following suggestions are available:
1. 'wok'
2. 'woo'
3. 'awoke'
Press 'r' to replace, 'a' to accept, and 't' to enter a replacement manually.
>> t
Please type the word that will be used as the replacement in the output file.
>> ewok