COGS 500 Computing for the Research Lab, Summer 2023
- This is a fully asynchronous class, but you still have weekly requirements and due dates!
- The class focuses on the concepts and skills you'll need to work in a university research lab, but especially a lab that focuses on neuroimaging.
- If you encounter problems with the material, or think something could be clearer, there is a discussion forum on Discord. I'll use the discussion to provide feedback on your questions and issues.
- I am responsive to email (dkp@arizona.edu) if you get stuck, and you want one-on-one consultation. We'll get you through it!
- If you encounter broken links or inaccurate material, please email me right away!
- This class makes use of the following technologies:
- D2L,
- OpenClass (this is an OpenClass lesson),
- Google Cloud Shell (for learning Unix, Docker, Conda, GIT, and Datalad), and
- Apporto Virtual Machines (for MS Access, Excel, and Matlab)
- Discord (for discussions)
- There is an evolving D2L glossary, and a selection of cheatsheets to help you keep track of new terminology and concepts.
- In general, I want you to learn useful concepts. Don't worry about memorizing years, people's names, exactly how fast some cable is, etc. You can look those things up if you need them. Rather, focus on understanding concepts like bottlenecks, caches, permissions, bind-mounts, etc.
- Learning the concepts is important because the details are changing rapidly. Every year there will be new faster hardware and clever innovations in software. If you know the kinds of things that might change, then you'll be able to read up on the latest and greatest advancements to make well-informed decisions.
- I also want to ensure you get lots of practice with the Unix command line. You'll need to learn a handful of Unix commands and concepts like directory trees, permissions, absolute vs relative paths, and configuration files. You'll understand these better as you practice.
- If you encounter problems with the material, or think something could be more transparent:
- There is a discussion forum on Discord. Use it to identify problems and explore interesting issues.
- I'm responsive to email (dkp@arizona.edu) if you get stuck, and you want one-on-one Zoom consultation.
- If you prefer texting (or need to call), you can reach me at 520-270-0491 (just not early morning please)
- We'll get you through it!
- If you encounter broken links or inaccurate material, please email me right away!
Topics
The class covers 8 broad topics:
1) Hardware. If you'll be buying computers, you need to understand how they are different and what they offer. I'll explain the internal components of your computer (CPUs, RAM, Hard Drives, and motherboards), how to measure storage and performance, and how to identify cables. This should help you buy a computer for yourself or your lab, or purchase the correct cables. I'll also discuss the current state of computing and emerging trends, including GPGPUs and RISC chips. GPGPUs are an important tool for speeding up some processes. RISC chips will play an increasingly important role in the future.
2) Networking involves both hardware and software, but all of us increasingly rely on the network to access resources. I'll discuss networking concepts ranging from IP addresses to virtual private networks. You'll practice troubleshooting network problems on your own computer….something we all have to do from time to time.
3) System Software You probably use Windows or Mac, or maybe Linux. You need to understand that the operating system you use has specific consequences for the tools you can install and how they behave. For example, to prepare a drive (e.g., hard drive or USB key) for files and directories, you must format it with the correct file system. If the drive is formatted as "bootable", then it can hold an operating system, and be used to start the computer.
The operating system provides the organizational structure and basic tools to access your files. Operating systems may be installed on physical or virtual drives. Later in the course, you'll use operating systems installed on virtual drives for the practices. These are called virtual machines: Google Cloud shell instances are virtual machines, as are Apporto vCAT machines. Later, you'll also use Docker containers, which are similar to virtual machines but have an advantage in size, speed, and portability. All of these tools will be useful to you, and I want you to understand them in context.
4) Unix A significant portion of scientific software is implemented on Unix operating systems and depends on the command line. The Unix operating system is usually a variety of Linux. You need to use Unix to work on the University of National supercomputers. You'll get started using the Unix command line in Google Cloud Shell. Emphasis will be on basic commands, directory structure, permissions and configuration.
5) Programming Concepts Even if you don't intend to program, you'll have to look at code and understand it. I'll introduce basic programming concepts. I'll also explain open-source, package management systems like Conda, and programming notebooks like Jupyter. You'll have access to Conda on Google Cloud Shell and Jupyter on Google Colab. Conda and Jupyter are also available through the University's High-Performance Computing cluster, or you can download and install them for free.
We'll be almost ½ way through the class at this point, and turn our attention to data.
6) Informatics Research funding increasingly requires you to abide by FAIR principles and to document your processing in detail, so that it could be repeated. This module emphasizes automation, provenance tracking, and reproducibility. Initially, I'll focus on the FAIR principles of data management, then the implementation of those principles with version control tools (Git and Datalad), and containers (Docker and Singularity). There will be practice with Git, DataLad, and Docker. These are the tools you'll need to improve your data management.
7) Spreadsheet Data No matter what area of research you are in, you have spreadsheets of data, and you should understand how to clean and query that data. In this module, I'll compare flat and relational databases, and show you some great tools for data cleaning (Open Refine, Tableau Prep) and visualization (Tableau Desktop). Using Apporto, you'll have practice with Excel (a flat database), and Microsoft Access (a relational database).
8) Digital Signal Processing (DSP) Much of the data your data will be some kind of digital signal, and you need to understand what that means and how to manipulate it. We'll finish the semester by discussing digitized data, especially images like those produced by the MRI scanner. You'll practice with Matlab, a programming environment that is especially good for handling such digital data. Matlab-based tools are used extensively in neuroimaging. Matlab is available through Apporto. In addition, the University has a permissive Matlab license that allows you to install Matlab on any machine you own, and Matlab is available on the High-Performance Computing cluster.
Week | Dates | Topics |
---|---|---|
1 | July 3 - July 9 | Overview, Hardware |
2 | July 10 - July 16 | OS, Networking, Unix |
3 | July 17 - July 23 | Unix |
4 | July 24 - July 30 | Programming, Conda |
5 | July 31 - August 6 | Jupyter, Informatics, Version Control, Docker |
6 | August 7 - August 13 | Docker, Spreadsheets and Databases |
7 | August 14 - August 18 | DSP |
Final project due August 18
D2L Modules
You can access all OpenClass assignments from D2L, or directly from OpenClass.
Grading Overview
Grading is based on completing the materials in a timely way.
- Lessons, practices, and reviews are ordered. Weekly materials are due before 9 pm on Sunday. Start early to avoid unexpected complications!
- OpenClass will award partial credit for late work:
- 90% for submissions within 24 hours
- 80% for submissions within 48 hours
- 50% for submissions later than 48 hours.
Discussions will continue to be available throughout the semester. The proposal and final project are required. The proposal must be submitted and approved before the final project (see schedule for due dates above)
Discussions
~8% of your grade is for participating in discussions on Discord. See the instructions at the end of this lesson.
Final Project
- You must also complete a final project which consists of two parts, a proposal, (5 points), and a project, (45 points). More details about the project can be found below, but feel free to email me and ask questions or propose possible topics.
OpenClass Assignments
514/611 points (~84% of your grade)
- The bulk of your grade is based on lessons, practices, and reviews. Expectations are roughly matched across weeks and should hover around 6 hours of work each week, though the time you need may vary. Let me know in the discussion if the time estimates are radically off!
Lesson and Practice Questions
- To get credit for completing the lessons, you must at least answer all questions.
- In almost all cases, to get full credit, you must achieve mastery (at least 80% correct). If you achieve less than 80% correct, your credit will be reduced accordingly (Answers are counted as "correct" if you get them right the first time you try).
- Open-ended questions must be completed, but are not counted toward mastery. Lessons that only have open-ended questions are graded based on completion, rather than mastery.
- Each OpenClass assignment will include at least one open-ended exit question, the most common of which are these:
- Propose a topic for discussion based on what was difficult.
OR
- What did you learn in this practice?
- You must answer and share your response on Discord!
- Write in complete sentences and make sure your response is clear, relevant, focused, and useful.
- To ask a question or make a point that you don't want to share with the class, email me or indicate on OpenClass that part of your response is private. I will not expect the private portion of your response to be posted on Discord.
- Propose a topic for discussion based on what was difficult.
OR
- At the end of each lesson is a TO DO section that reminds you of what tasks you need to complete to finish the lesson.
Lessons
- Lessons like this one are available in OpenClass. They consist of text, images, videos, and questions.
- In general, the text will introduce the video and include links and explicit spelling of new terms.
- The video will complement the text by explaining in a different way.
- Questions will spot-check your understanding as you move through the lesson, and will be very similar to questions in the reviews.
- Repetition of the material in different formats (e.g., text, video, questions, and later reviews) should help you to absorb it.
- The exit question, and sometimes additional questions, need to be contributed to the discussion. See the TO DO section at the end of each lesson to ensure you've done everything.
- Lessons in a group (e.g., a main lesson and separate practice) are released together so you can work on them in any order you prefer. But for the most part, assignments are sequenced.
- Feel free to take breaks: You can work through part of the material and pick up later where you left off (OpenClass will remember where you are).
🟢 Practice Lessons
- Many modules include practice lessons. The practices may be implemented as Google Docs (Download as a Word document to complete) or Google Cloud Shell tutorials (you'll learn about these) depending on the topic.
- Fill in the OpenClass practice questions to complete the assignment. Again, in most cases, you will be graded on mastery, and not just completion.
- Sometimes you'll also be asked to upload materials to D2L to fulfill the assignment. Check the To DO.
- Always answer the exit question and contribute it to the discussion.
🔴 Reviews
Reviews are also implemented in OpenClass.
- At reasonable intervals (~1-2 per wee) you can expect to do a review.
- A review is a set of questions on OpenClass that assesses your mastery of the topics. It is more like an adaptive study session than a quiz. The OpenClass AI tracks your progress toward mastery.
- Reviews are cumulative by default, drawing questions from earlier reviews to promote spaced repetition. The prior questions drawn do not count toward the mastery progress of the current assignment.
- Reviews adapt based on the performance of each student. Some students may only have to work through a few questions in a review to demonstrate mastery, while other students may have to work through many questions to demonstrate mastery. The mastery algorithm prioritizes consistency and considers the number of total questions of a learning objective. A learner that correctly answers two consecutive questions will see their mastery progress increase more rapidly than a learner that answers one question correctly, then one incorrectly, then one correctly. Additionally, a learning objective with seven questions will require learners to answer more questions correctly than a learning objective with two questions.
- The AI will decide how many questions you need to answer, so it is not possible to predict how long the review will take you. Nevertheless, I provide estimates based on past students' time to finish.
- You may share your review responses with your classmates, and see the responses of others as well:
Watch this ~2.25-minute YouTube video explaining the motivation and benefits of this sharing responses.
Discussion
50/611 points (~8% of your grade)
- Post your response to the exit questions for each lesson on Discord, and respond to messages from your classmates. Make sure you double-check the channel you are posting to!
- Including your responses to exit questions, I expect ~100 thoughtful written messages on Discord
- About ¾ of your responses will be the required questions (including the exit questions).
- I will do a final evaluation of your participation at the end of the semester.
Summary
In summary, you can expect the materials for each week to include an assortment of lessons, practices, discussions, and reviews. There will be redundancy! The bulk of your grade will depend on completing and mastering these materials.
Proposal and Final Project
- The project is an opportunity for you to take a deeper dive into a software topic discussed in class.
- It will be worth 50 points: 5 points for the proposal, and 45 points for the final project.
- Both the proposal and the final project are required for the class.
Proposal:
- The proposal should be ~1 page: Explain what you want to do and why it interests you. The proposal should also identify resources you will use or explore (websites, software, tutorials, readings). I will provide feedback about your proposal and I may redirect you a little.
- Don't hesitate to email me beforehand and tell me what you are considering. I'm happy to provide early feedback.
- You MUST have your proposal approved in order to have the final project accepted.
Final Project:
Focus on using software (installing it, running commands, etc.). This should be ~10 pages or a 40-60 minute slide deck.
- Report your personal journey in learning the software:
- The introduction should explain why you are interested in this topic.
- The body should identify useful steps and difficulties encountered along the way, their resolution (if any), your successes, and tips.
- The discussion should explain where you will go from here, what was useful, and what was disappointing about the software.
Example Project Topics and Related Resources
- Unix (e.g., find online tutorials and dig in)
- GIT (e.g., Gitkraken )
- Datalad (e.g., Datalad Handbook)
- Docker (e.g., Play with Docker)
- Organize a Project (e.g., Apply naming principles, create readme and changelog, etc.): Data Management Resources: U of A
- Learn Obsidian and use it to organize your research/studies Obsidian
- Tableau Desktop or Tableau Prep (Tableau Desktop Practicum and Data)
- OpenRefine (OpenRefine.org)
- Redcap (REDCap University of Arizona Training and Video Tutorials)
- Microsoft Excel
- Microsoft Access
- Matlab
- Jupyterlab (How to Use Jupyter Notebook in 2020: A Beginner's Tutorial)
…and so much more!
Getting Help with Lessons
Email me if you need help with an assignment or have identified an issue. Provide enough details so I can actually help!
1) Which assignment/lesson is at issue?
2) Exactly which section of the assignment/lesson is of concern? (You might provide a section title if it is an OpenClass lesson, or a page number if it is a document. Consider copying and pasting relevant text so you can point out the issue exactly).
3) Provide detail about what was difficult, e.g. Was there a problem with the wording, and if so, what? Was there a broken link? Which one was broken?
Getting Help with Practice
When you have difficulty running commands or software, you need to report the context in which you were running (the operating system) and the directory, and anything else that might be relevant.
- What operating system are you using?
- Describe the problem in a way that can be reproduced. For example, explain what directory you were in, exactly what command you ran, what the results were, and what you expected.
- Instead of sending screenshots (which are hard to read and hard to copy from), copy and paste the relevant text.
- If there are log files or a script, send those too and explain what they demonstrate about the problem.
In the future, when you report problems on GitHub or Forums, remember these principles! Make it as clear as possible how you got the error. If you want an answer, provide the details necessary to repeat the error.
Poor problem reports are such a problem that error reports on software sites may require a particular format! Below is an example of such a template from a Github site:
Describe the bug
A clear and concise description of what the bug is.
Samples to Reproduce
Steps & Examples to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Example of an Excellent Software-related Error Report
Running qmviolin
produces an HTML report and pngs but also the following error. It does not matter whether there is a specified report directory or not.
Environment:
University of Arizona HPC
Singularity image (pulled from Docker on 2022-03-17 via singularity pull docker://hickst/qmtools
)
Call:
singularity exec /groups/adamraikes/singularity_images/qmtools_latest.sif qmviolin T1w fetched/ge_sample.tsv ../mriqc-0.15.1/T1w/group_T1w.tsv -r v0.15.1_violin -v
Error:
I think it's only affecting some side bar static images, but I'm not sure:
Example of a Poor Software-related Error Report
I am a beginner. I have been making mistakes according to the steps on the official website, and I don’t know where the problem occurred. I hope you can answer it for me. Thank you very much.
Resources
- Course Syllabus, (also posted on D2L under Content => Overview)
- Contact Information: Dianne Patterson, Ph.D. dkp@arizona.edu
- Neuroimaging Certificate Program If you are interested in pursuing neuroimaging research, consider this certificate program.
- Discord COGS 500
Digest
Stop and think about these topics
The final project
OpenClass
OpenClass lessons
OpenClass reviews
D2L Discussions
Google Cloud shell
Topics
Hardware
Networking
File systems
Operating systems
Unix
Informatics
Version Control (GIT and Datalad)
Containers: Docker and Singularity
Relational Databases
DSP: Digital Signal Processing
Matlab
Exit Question
TO DO Checklist
✅ Use this Invite Link to sign up for the Discord server: Discord COGS 500 server
- This is our discussion forum. Discord is available on the web and/or as an app.
- To sign up for Discord on the web page, scroll down to the signup button and set up an account.
- Discord will suggest you add paid features like Nitro. You can dismiss these suggestions.
- Likewise, the student hub is NOT necessary. You can dismiss it too.
On the Discord COGS 500 server:
✅ Share your response to the exit question on Discord in the GENERAL category under the class-overview channel.
✅ Introduce yourself to your classmates:
Tell us who you are, what lab you work for, what kind of work you do in that lab, and something interesting about yourself.
in the GENERAL category under the introduce-yourself channel.
✅ Read and respond to messages on Discord.
Please, share your ideas about making this class better! You each have unique insights and perspectives. You each notice ambiguities in the text or questions. Sometimes you come up with great analogies that will help other students. This class improves every time I teach it because of feedback from all of you.