AI Increases Working Hours: Evidence and Implications for Startups

Marc Griffith
Mar 15
5 min read

Summary

Studies show that the adoption of LLMs and generative tools has not reduced working hours and often has increased them, turning higher productivity into greater workload. This text explains the causes, reliability limits, induced-demand effects, and operational implications for startups and tech teams.

Key takeaways

Adopting LLMs speeds up execution but often increases working hours because the freed-up capacity is repurposed by organizations.
Model reliability issues (errors, hallucinations) require manual verification, which negates much of the theoretically saved time.
Induced demand can turn productivity savings into higher volumes of work: organizations tend to raise expectations.
For startups, choosing tools must balance automation and control: specialized tools often prove more reliable than generic LLMs.

Introduction

AI increases working hours and does not always free up time as optimistic narratives promise: this is the first practical observation that emerges from several recent studies and journalistic investigations.

In recent decades calculators, spreadsheets, emails and chatbots have been sold as tools to reduce working hours, but data show a different result. The promised breakthrough — a 15-hour workweek according to Keynes — now appears less likely in the short term, despite productivity gains.

Why AI Increases Working Hours: Empirical Evidence

A CEPR analysis shows that professions most exposed to AI have increased weekly hours worked compared with those less exposed. In particular, moving from the 25th to the 75th percentile of AI exposure is associated with about 2.2 more hours per week, and after the launch of ChatGPT, the most affected professions worked roughly 3.15 more hours per week.

The paradox arises because higher productivity enables more tasks to be done in the same time, and organizations often reuse that capacity to increase workload. Studies indicate that gains tend to improve individual productivity more than the overall economic results of companies.

When productivity increases on specific tasks, the freed-up capacity becomes reusable capacity: it does not always translate into leisure time for workers.

Reliability, Oversight Costs, and Hidden Work

Language models predict the next word and do not possess an intrinsic concept of truth, so they require human oversight and continuous correction. This generates verification, testing, and debugging activities that often negate the time savings imagined by automation.

Many real-world cases show significant errors: chatbots have misstated tax declarations or generated code with bugs that required additional debugging time. This leads to a new type of work: monitoring, data labeling, and correcting AI-generated outputs.

If every result needs to be double-checked, automation loses much of its operational value: trust in the output is crucial for real time savings.

The Engineers' Meme: More Time for Debug

Common practice shows that, after introducing generative tools, the overall project time can increase because more time is spent on testing and correction. This phenomenon is confirmed by multiple reports and company accounts that highlight increased software demand and, consequently, more technical work.

Wrong Implementation: Using the LLM for Everything

People often ask LLMs to perform tasks that could be better handled by specialized tools or custom-built ML pipelines. Confusing LLMs with a universal solution leads to operational mistakes and unanticipated costs in time and resources.

A more effective approach is to assess, on a case-by-case basis, whether to use specialized tools, simpler models, or already-established automated processes, rather than relying on generic chatbots. This reduces the risk of needing to re-check inaccurate or unsuitable results.

Economic Impact and Induced Demand

Economists call this phenomenon induced demand: time savings generate new activities or higher expectations, offsetting the reduction in working hours. The OECD has observed that automation can reduce work on some activities while increasing demand elsewhere.

In competitive, slow-growth markets, faster pace rarely translates into shorter hours: rather, it increases workload and pressure on teams. This explains why proposals like a four-day workweek struggle to gain traction despite efficiency gains.

Consequences for Startups and Tech Teams

For those working in startups, adopting AI requires a clear strategy: measurable goals, quality metrics, and human oversight criteria to avoid hidden increases in workload. Implementing LLMs without verification processes can create inefficiencies and unforeseen costs.

Startups must decide whether to favor general-purpose tools or more reliable vertical solutions, and plan roles dedicated to quality control of AI outputs. This includes investing in internal skills to assess model risks, biases, and limitations.

Concrete Operational Choices

Defining quality-oriented KPIs (error rate, verification time, number of revisions) is essential to measure the real impact of AI on productivity and working hours. Without clear metrics, the perception of efficiency may remain theoretical.

Adopting hybrid workflows, where AI handles repetitive tasks but decision-making remains human, reduces the risk of redoing work and keeps accountability for final outputs. This approach limits supervision costs in the medium term.

Debate: Divergent Perspectives and Critical Reflections

There are two main strands: optimists who believe AI can free up time and improve work-life quality, and critics who highlight the increased workload and precariarity due to induced demand.

Proponents emphasize that once reliability problems are addressed and more robust automation processes are integrated, AI could truly reduce wasted hours and create new entrepreneurial opportunities. In this scenario, investments in R&D and regulatory adjustments would promote a redistribution of benefits.

Critics argue that without wealth redistribution policies and labor rules, productivity gains will be converted into higher work intensity and rising expectations for existing wages. Additionally, reliance on unreliable tools creates hidden costs: debugging, human oversight, and legal responsibility.

A central point in the debate is AI governance: model transparency, shared metrics, and supplier accountability are necessary conditions to turn technological gains into real benefits for workers. Without these guarantees, the risk is that automation amplifies existing inequalities.

For startups this means evaluating not only immediate productivity gains but also implications for preventing costs and mid-term business models. Technology decisions must align with internal work and training policies to maximize the positive impact of AI.

Practical Guidance for Founders and Managers

Prioritize vertical tools for critical tasks, define validation processes, and assign clear responsibilities for supervising outputs to help contain rising working hours. This operational checklist reduces risks and costs associated with misuse of LLMs.

Measuring time spent on verifying, debugging, and reviewing AI-generated outputs is essential to understand whether technology adoption is truly improving efficiency. Tracking these metrics allows quick adjustments in automation choices.

Towards Sustainable AI Automation

In short: AI offers real opportunities but requires disciplined implementation; without governance, metrics, and reliability focus, the most likely result is more work, not less. For now, the promise of fewer hours has not materialized at scale.

If companies want to prevent productivity gains from turning into additional pressure, they must invest in training, quality control, and processes that integrate AI as a lever rather than a loophole. Only then can we work to translate efficiency into operational well-being.

A Final Note for Product Builders

Designing for reliability, not just speed, should be a guiding principle: AI adoption should reduce wasted work, not create more of it. This entails robust testing, UX that supports verification flows, and partnerships with transparent vendors.

Finally, measure the real impact on working hours and on control costs: without concrete data, any claim about AI effects remains speculative.

Source wired.it