Services


If you have a question that is not covered by one of the other blog postings, please post it here. Please be sure to leave an Email address we can reach you at, we’ll respond to you directly as soon as possible.

Google Apps at USFAfter an extensive evaluation headed by Student Government, the University of South Florida and Google have entered a partnership to bring Google Apps to all USF students. Google Apps is an exciting new system that will provide us with email, as well as a suite of other Google products that will enable USF students to better communicate, share, and collaborate.

To sign up, visit http://mail.usf.edu

Don’t worry, your email address won’t change, and you won’t lose your old messages! The nice thing about Google Apps is that everyone will be able to keep the same email address and we’ll migrate the mail from your existing email account to Google Apps. This means that you can search and access all mail in one central location.

Here are some of the other key benefits:

  • 6 Gigabytes of email storage — No more worries about having to delete mail.
  • Instant messaging & voice calls — Connect instantly with others when you have a question.
  • Google Calendar — Schedule meetings, book rooms, create events, and share calendars with others.
  • Docs and Spreadsheets — Create and collaborate in real-time with others at your location, and across the globe.
  • Start Page — Start off the day on the right foot by adding your favorite gadgets to your personalized start page.
  • Access your information anywhere, anytime — All content is available online. You can also access your email with a mobile device .
  • Reliability — Google Apps products are built with speed and reliability in mind.

You can use these URLs to access the service once you’ve registered:

ruckus.pngAs part of USF’s on-going efforts against copyright infringement, the University has signed an agreement with Ruckus, a digital entertainment network designed to provide students, faculty, and staff with a legal, safe, and community-based way to explore and share music and movies. For more information on how to use this service, please click on the link below or go to http://www.ruckus.com .

Signing up is simple: all you need is a USF email address.

ruckus-signup.gif

Mail services will be unavailable from 3A.M. to 6A.M. Tuesday August 7th.

As of 5:40A.M. mail services are restored.

Mail services have been down since 3:50 A.M. The services are being switched to the failover server and should be back up shortly. No mail should have been lost during this time.

-Chance Gray 5:45 A.M. 2007/07/27

UPDATE: Mail services were restored at 6:10AM

Both services have been affected. We are working to restore them. More details within the hour.

Update: my.usf is online again. The filesystem for the mail services is still being checked. It will be 10+ hours before that filesystem is available again. We are going to attempt a restore elsewhere as that may be quicker.

ETA: 3A.M. The filesystem check completed and when the system was brought online it immediately crashed. Due to the nature of the crash (Write Errors) we deem it prudent to relocate the data once again. We have since run multiple filesystem checks ( filesystem checks on a relatively clean filesystem are much quicker) until no errors were present. We have mounted the filesystem in Read-Only mode and are copying the data to its new home. We estimate the email system should be available again by 3 A.M. We appreciate your patience during these trying times. The situation will improve.

The email system is back online as of approximately 2A.M. Since we are running from a copy from a damaged filesystem it is likely that some files (primarily those active during the crash) will be affected. Most likely those will be files in the queue waiting for local or remote delivery. So far it looks like those are bulk email and spam. We will be monitoring the logs as always to pick up on those files that are having problems. It is possible that your account could be affected without any errors showing up in the logs so please post here any problems that you are seeing with your accounts and we will remedy them ASAP.

I suspect that the precautions we have taken to copy the the mail data to another filesystem will result in a more reliable solution than trying to run off of the recovered filesystem. Though this resulted in much greater delays there should be a lower incidence of residual effects compared to previous recoveries.

What else can I say besides I am sorry, and I understand your frustration. I believe one of the bloggers already said this, but we do what we can with the hand we’re dealt. There are other issues to consider, besides our intellectual capacity (or lack thereof, as some of you claim). There are budgetary issues. There are visibility issues. There’s always the “oh, this will never happen again” factor that seems to plague upper management at times.

Choices to be made

The current situation boils down to lack of funds. Some limited funds are available, but where do you apply them? I say limited because currently my budget is basically used up paying for yearly hardware and software contracts. When the end of FY 05/06 came along in June we scraped up what was left and purchased some much need machines to handle the huge amount of mail we receive, between 400 and 3,000 messages per minute. Keep in mind that these are ONLY the incoming messages, not the outgoing. Incoming messages are also scanned for viruses and spam. You can check out the stats here. Without those new machines we would probably be in a slightly modified version of where we are right now. The old boxes simply could not handle the load, and would crash.

I am sure someone is thinking “well, they should plan ahead and budget for end-of-life after 3 years.” You don’t understand: there is NO EOL budget, because there is no surplus available. All the “special projects” money we receive is one-time stuff. Is it right? No, it is not, but that’s the way it is and that’s what we have to work with.

There are also other places to spend the end of year money, plenty of places. UPS’s need to be replaced. Generators to handle extended power failures. Additional power to handle new machines. Air conditioning units need to be revamped. And on average, at the end of the year, after all the bills are paid, we’re left with about $5k to $10k.

Bad timing

The unfortunate thing was the timing. About 2-3 weeks ago we ordered a new storage solution that would replace the current one. I had to cancel some of the software licenses, because I did not have enough funds to renew them, and the difference freed up some funds for the purchase of a new server/backend storage solution. Not redundancy yet, mind you. That will have to wait until the end of the fiscal year. Since last year we have not entirely trusted the mail storage we’ve had (for obvious reasons) are we were all eager to replace it. Unfortunate the new unit did not arrive in time.

Open thread

Ultimately, I am responsible for what is happening. The final decision on where to spend the money, what to do, and when to do it is mine. I will keep this thread opened and will be checking on it through the next couple of days, answering any questions or comments on how this was handled. Also, feel free to email the Director of Academic Computing, Dr. Llewellyn, tony@usf.edu, with any issues as to the termination of my employment, but please know that both Chance and Eric ARE doing all they can and working very hard to get things back up as soon as possible. I have been working on campus for 12 years and I can honestly say there are few people I know that can match their expertise and dedication. Everything that can be done will be done.

And now, fire away.

Updates

  • What does the storage looks like? We should have more redundancy.

Well, here a cut and paste of the description of the StorEdge 3500: “Sun StorEdge 3500 RAID systems rank among the fastest and highest-reliability RAID systems in existence. Sun StorEdge 3500 are compact, rack-optimized RAID storage solutions with end-to-end 2Gbit Fibre Channel technology supporting both storage area network (SAN) and direct-attached storage (DAS) architectures. Features: single/dual RAID controllers; 12 disk drive bays; 5 or 12 disk drives; dual power supplies; 6 2Gbit host ports; 2U (3.5 inches high) enclosure. Sun StorEdge 3500 delivers up to 160,000 transactions per second from cache. It provides industry-leading 99.9998+ percent uptime and redundant, hot-swappable components including drives, RAID controllers, power supplies, fans event monitoring units, and battery-backed cache memory to prevent data loss.” Sounds great on paper, doesn’t it? and, in all fairness, it did work fine for 2-3 years, until last winter.

  • I don’t understand why all the user accounts go out at once, are they all stored on a single very large hard drive?

They looks as one drive (one partition) to the two servers attached to it, but it the StorEdge is a hardware RAID device. That much has worked flawlessly: drives on the array have died before and you guys never noticed. The array warned us, we called Sun and got the dead drive replaced without impact to the users, besides a slight performance degradation.

  • And is this the same drive that died only 1 year ago over winter break?

The StorEdge has a few hard drives, redundant power suplies, and redundant controllers. What has failed (both times) were the controllers, not the drives. When one of the controllers fail the appliance is “supposed” to notify us and start operating from the other controller. Both back in December and now, controllers failed and we received no warning beyond the machine totally crashing on us.

The controllers “control” the data into and out of the hard drives themselves. Not only the controllers dies, but a good chunk of the inbound and outbound data corrupted the info on the disks. As a result, when we brought the servers back up, the all the data was all poisoned and thus unusable. The details will come out on the investigation we will do ‘after’ we are done retrieving everything from tape.

  • Also seems strange that the outages always occur on holidays, why is that?

A well known principle called “Murphy’s Law.” I wish I knew why.

  • Even if there is an outage, it sbould not take this long to restore backups, they must be using some primitive slow equipment. 40,000 accounts at 50mb each should be about 2tb of data. I ran a backup system before that could backup or restore that much data in 24 hrs but we were using external hard drives, not tapes. They really need to upgrade their equipment.

Correct. Think of it as listening to a tape — I know, many of you may not have done it ever. We basically have to listen to all songs. No way to speed things up. We cancelled all backup jobs since Tuesday and are using 4 drives to read tapes.

First, the drive has to read what is know is the “Full Backup.” After that is done, it starts reading the “Incremental,” files that have changes since the “Full” was run. All that takes time.

Yes, in the past 5 years drives have become cheaper and they are definitely faster to use. A solution to replace our existing system has been proposed for the past, hmmm, 2 years, but funding is yet to be allocated. We are all painfully aware of the consequences of not being able to update our backup system.

The spam/virus-scanning system for mail.usf.edu will soon see a major upgrade. In preparation of this upgrade and to increase the amount of available storage for email accounts, some policy changes must be made. Beginning on Wednesday, August 10th 2006, messages marked as Spam and moved to your SPAM folder will be kept for two weeks and then deleted automatically. This change affects your SPAM folder only and messages in any of your other folders will not be affected. If you have any questions, please email us at usg@mailman.acomp.usf.edu or post a comment to this entry.

blog.usf.edu and myweb.usf.edu will be unavailable during the Blackboard maintenance window (12AM-2AM) on Aug 4, 2006. This outage is necessary to accomodate changes needed for the upcoming Blackboard upgrade. No other services (WebMail, mail.usf.edu accounts, etc) will be affected. We apologize for any inconvenience this may cause.

The latest version of USF WebMail is now out of beta! If you have any questions or comments, please post a comment to this blog entry or send an Email to USG@mailman.acomp.usf.edu

We hope you enjoy this update to WebMail — it is the first improvement of several that are planned for the 2006-07 school year.

Next Page »